https://lakefs.io/ logo
Title
h

himabindu tummala

09/23/2022, 2:07 PM
I have a question regarding data written through lakeFS, is it written in an encoded format on objectstore?
b

Barak Amar

09/23/2022, 4:11 PM
Hi, @himabindu tummala the user data is no encoded in any format.
If you post questions on #help you will probably get faster answers.
a

Ariel Shaqed (Scolnicov)

09/23/2022, 4:51 PM
Hi @himabindu tummala, welcome to our lake! All objects written to lakeFS exist on the backing object store, unchanged. But you're right, their names will be entirely different, and generally look random. So I guess we're not sure whether you're asking about data (which is passed unchanged to the object store) or metadata (which is managed by lakeFS)?
h

himabindu tummala

09/27/2022, 8:08 AM
@Ariel Shaqed (Scolnicov) yes, when I look in the backing object storage, I don't see the data files as I would the original data. The option is to export data from lakeFS to an object store (without lakeFS), is that accurate?
also, in this documentation https://docs.lakefs.io/understand/model.html under the merge section how do you detect "Files changed on both sides in same way"?
b

Barak Amar

09/27/2022, 9:46 AM
Here is a fresh repository with the current lakeFS README.md file committed:
$ aws s3 ls --recursive <s3://barak-bucket1/demo/>
2022-09-27 12:39:01       1023 demo/_lakefs/cf7ad1bfb1bdcae3efa24b75ab7eb69ead5757f97188253bddda49a40f27520c
2022-09-27 12:39:01       1066 demo/_lakefs/f25cbe5cc99ca42da67241354a343928acd604cdcfa42ba33884940ef7189757
2022-09-27 12:38:50       5109 demo/d1f3309312b446c28b50b811a93855b5
2022-09-27 12:38:19         70 demo/dummy
There are 4 files -
dummy
was created with the repository to verify we can read/write from the underlying object store. The
d1f3309312b446c28b50b811a93855b5
is our README.md. The underlaying storage will include the data uploaded but under unique names referenced in our metadata files (found under _lakefs).
About "Files changed on both sides in same way" - we compare the metadata on the source and destination entries. Each entry's metadata includes: size, content checksum and etc. We can say that there was a change between the base entry - but if both sides made the same change, there is no conflict and we can pick one.
h

himabindu tummala

09/27/2022, 4:35 PM
Thank you Amar, your response answers my question.
a

Ariel Shaqed (Scolnicov)

09/27/2022, 4:38 PM
And about export... Yup, it's an escape hatch from lakeFS.
After you export, you can read without going through lakeFS for anything. Great for moving your team to lakeFS without forcing your team's users to transfer immediately. (They'll still end up transferring, I reckon, once they see the value from being able to see old versions, the log, reproducibility, etc. But they can do it on their own schedule.)
h

himabindu tummala

09/28/2022, 8:16 AM
Thanks Ariel.