Anna Schooneveld

03/01/2023, 2:56 PM
I have a question about LakeFS import. If data is imported/ingested into LakeFS from say an s3 bucket, and the data in changes in that s3 bucket, does it also change in LakeFS? Or does it only if we re-upload it?

Barak Amar

03/01/2023, 3:03 PM
Hi Anna, the import process will reference the existing data. It means lakeFS will create a commit record that points to your data files and if these files will be gone or change it will affect lakeFS. This is not something you would like to do, as lakeFS stores the size of the each object as part of the metadata and having the object modified usually means different size.
If you like to keep a copy of the data and let lakeFS manage it - use upload. The data will be copied in this case.
👍 1

Anna Schooneveld

03/01/2023, 3:22 PM
Thanks!!! when you say it will affect lakefs, do you mean that it will create an uncommitted change in the repo? or does the underlying data change while on the surface the repo stays the same?

Oz Katz

03/01/2023, 3:38 PM
Hey @Anna Schooneveld 🙂 lakeFS can't make any guarantees about data not managed by it. This means that if lakeFS ingested a path from S3, if that path changes (overwritten or deleted), lakeFS will either return the new version or a
410 Gone
for that object. If you want to keep the lakeFS state up to date with the actual state of the source s3 bucket, a common approach is to run the import periodically. This way you'll also get a nice commit with a diff showing what changed between imports
:dancing_lakefs: 1
be sure to import the same source from the same branch to the same path for this to happen though 🙂
👍 1

Anna Schooneveld

03/01/2023, 4:04 PM
thanks a lot!
:lakefs: 2