Conor Simmons
01/31/2023, 6:38 PMIddo Avneri
01/31/2023, 6:41 PMConor Simmons
01/31/2023, 6:46 PMIddo Avneri
01/31/2023, 6:58 PMConor Simmons
01/31/2023, 7:00 PMI’m assuming that whatever writes to the object store can’t work directly against lakeFS after importing the data?I'm not sure what you mean by this
Iddo Avneri
01/31/2023, 7:02 PMConor Simmons
01/31/2023, 7:03 PMwork directly against lakeFSDoes this mean work with LakeFS? Or against meaning some other tool?
Iddo Avneri
01/31/2023, 8:33 PMConor Simmons
02/01/2023, 12:38 AMclient.objects.upload_object
which is an upload not an import?Iddo Avneri
02/01/2023, 12:46 AMConor Simmons
02/01/2023, 12:47 AMIddo Avneri
02/01/2023, 12:48 AMConor Simmons
02/01/2023, 12:49 AMIddo Avneri
02/01/2023, 12:50 AMConor Simmons
02/01/2023, 12:51 AMIddo Avneri
02/01/2023, 12:52 AMConor Simmons
02/01/2023, 12:52 AMIddo Avneri
02/01/2023, 12:52 AMConor Simmons
02/01/2023, 12:54 AMIddo Avneri
02/01/2023, 12:55 AMConor Simmons
02/01/2023, 12:56 AMwhatever accesses the data, will do so, via lakeFS and not directly to the object storegot it, I am doing that already. But with upload
Iddo Avneri
02/01/2023, 12:56 AMConor Simmons
02/01/2023, 12:58 AMand then mofidy the files via lakeFSWhat do you mean by modify in terms of CLI commands or Python SDK usage?
Iddo Avneri
02/01/2023, 1:00 AMConor Simmons
02/01/2023, 1:00 AMlakectl fs upload
to add or modify and lakectl fs rm
to removeHowever, when working with big data, it is unreasonable to copy terabytes (or more) of files locally to experiment, develop, test or transform the data.I understand your point here - it's not an ideal scenario. But what if we need to modify every file in our dataset. Then 3 months later, we need to reproduce the training job that was done on a previous dataset commit for some important reason. We need to be able fall back to the old dataset version to get the desired reproducibility, no? But we still want to keep the newest version for future experiments. I may be misunderstanding the use case of LakeFS
Iddo Avneri
02/01/2023, 1:06 AMConor Simmons
02/01/2023, 1:09 AMYou can easily access a historical commit in lakeFS and get the full data set as it was at the time of that commit.Got it. I've done it with upload but if there's a more efficient way I'd love to try it.
What creates the files initially on the object store?So, the files are being created locally. Previously, before importing, I've synced them to S3 with
b2 sync
(BackBlaze)Iddo Avneri
02/01/2023, 1:10 AMConor Simmons
02/01/2023, 1:16 AMrclone sync
to S3 store. Say there's 100,000 images
2. import to LakeFS
3. When adding, deleting, or modifying to the dataset, use lakectl fs upload
or lakectl fs rm
Iddo Avneri
02/01/2023, 1:54 AMConor Simmons
02/01/2023, 1:55 AMb2 sync
)Iddo Avneri
02/01/2023, 1:56 AMConor Simmons
02/01/2023, 1:58 AMIddo Avneri
02/01/2023, 1:59 AMConor Simmons
02/01/2023, 2:01 AMIddo Avneri
02/01/2023, 2:02 AMConor Simmons
02/01/2023, 2:03 AMIddo Avneri
02/01/2023, 2:04 AMConor Simmons
02/01/2023, 2:04 AMIddo Avneri
02/01/2023, 2:08 AMConor Simmons
02/01/2023, 2:09 AMDon't I need to do the same with import?e.g. using rclone
Iddo Avneri
02/01/2023, 2:10 AMConor Simmons
02/01/2023, 2:11 AMan objet store unrelated to lakeFS.
the object store in which the lakeFS repository sits.what if this is the same object store?
Iddo Avneri
02/01/2023, 2:12 AMConor Simmons
02/01/2023, 2:18 AMIddo Avneri
02/01/2023, 2:21 AMConor Simmons
02/01/2023, 2:21 AMElad Lachmi
02/01/2023, 6:09 PMAriel Shaqed (Scolnicov)
02/01/2023, 7:01 PMlakectl fs upload --direct
for copying from your local disk to lakeFS. That passes data directly to s3, and only performs metadata operations on lakeFS.Conor Simmons
02/01/2023, 7:07 PMAriel Shaqed (Scolnicov)
02/01/2023, 7:10 PM