lakeFS is an open-source data version control that transforms your object storage to Git-like repositories..

lakeFS

Screen Shot 2022-03-17 at 12.13.01 PM.png

There is a community member in the awesome ML.Ops slack asking for more details on the dataset we used to test lakefs with `lakectl abuse` command <https://docs.lakefs.io/understand/sizing-guide.html#scaling-factors|documented here>. Can someone help me with answers to his questions (see screenshot)?

1. We used metadata from a real lakeFS user, representing real life data, so there was quite the wide variety of object sizes used. Keep in mind that most lakeFS operations are at the metadata level, so it generally makes very little difference as the metadata is usually very small and doesn't depend on the object's sizes.
2. see #1 - we've tested with more datasets, some artificial and some aren't, the numbers tend to be relatively consistent.
3. S3 typically has a high time-to-first-byte (in the range of 10s of ms). There are a few interesting numbers <https://github.com/dvassallo/s3-benchmark|here>