Hello there, it's a few days i'm discovering LakeF...
# help
m
Hello there, it's a few days i'm discovering LakeFS, i liked many features in it, i'm using it with Minio as an S3 backend, and i see that when i commit a new file, let's name it "data.csv", LakeFS store it as an object in minio with a size of 100Mib, but when i update the csv file and i add 30 Mib of data (for example) and i commit again, it creates another obejct with size of 130Mib, and it keeps the old one in minio. I'm wondering if LakeFS has a mechanism to store only the delta in the second object, and point on the 2 object at once to read (incremental restore)? Or there is any other mechanism to optimize the storage in the backend? Thank you in advance.
e
lakeFS has no such capability. The lakeFS versioning engine implements a copy on write mechanism. It creates a new object for the new version. You can run GC to delete versions you no longer need. If you are looking to manage deltas, you can do that using delta lake or Iceberg table formats. You can still use lakeFS on top of them to version a repository with many datasets.
You can check this blog post for more details:
m
Thank you for your Guidance! appreciate it!
🙌 1
i
@Mohamed Azghari - This blog explains the underlying file representation in lakeFS.
lakefs 1
m
Thank you @Iddo Avneri!!
👍 1