Mẫn Phạm

04/10/2023, 9:21 AM
Hi everyone, I am very new to using Lakefs and I have a very basic question: Lakefs is git for data, so how to perform "git pull data" in Lakefs by python SDK ?. Thanks

Ariel Shaqed (Scolnicov)

04/10/2023, 9:31 AM
Hi @Mẫn Phạm, and welcome to lakeFS! That's a really good question. lakeFS has something of a split personality: we are "Git for data", but we are also an object store like S3. So most of the time you will not want to download an entire copy of the repository. And in fact a distributed datalake is hugely expensive and slow. So typically you will access your objects on lakeFS through the S3 gateway or through the lakeFS API. You can always use the lakectl command -- which is really just a thin user-friendly wrapper for the API -- in order to download one or more objects. You'd run "lakectl fs download" for this -- and probably use the
flag. Note that this does not give a clone of the datalake -- just a copy of the latest objects in the lake. I hope this is what you're looking to do. If not -- please let me know and I'm sur we'll figure out a useful alternative.
