Omar Talbi

05/02/2023, 7:45 PM
Hello, in the Version your ML Training Data for Easy Reproducibility webinar*,* there is this slide at the end, and I am wondering how do you achieve up to 80% storage cost reduction using LakeFS, looking for some examples thank you

Iddo Avneri

05/02/2023, 9:52 PM
Hi Omar! The storage savings can come from 2 places (and it depends how you use your storage without lakeFS): 1. Branches are zero clone copies. i.e. every developer / data scientist / data engineer can have an entire production environment at scale without copying any data. (Think about the amount of data duplications you have today). 2. By nature in delta lakes, many of the files are static and a smaller subset of files are changing on a regular basis. By using lakeFS you de-duplicate your storage over time, so you don’t need to take snapshots of the same files over and over again.