Joe M
05/02/2024, 9:06 PMOz Katz
commit
API call to the server, with a repository, branch ID, message and optional metadata
2. Server takes note of the current commit ID that branch is currently pointing to.
3. Server creates a new staging token, sealing the current one for the branch, to ensure new writes are excluded. From here on out, we have a set of changes to commit, that are ensured to be immutable.
4. All sealed staging tokens are then serialized to the object store, making up a tree of RocksDB-compatible SSTables.
5. A commit record is written, pointing to the root of that tree on the object store
6. Once that's done (and this is the part that ensures atomicity): the branch pointer is modified to point to the commit we created. this a an atomic compare-and-swap operation: the new commit takes effect only if the current commit ID is still the one observed in step 2.
Failing at any point prior to step 6 means we may have created orphan metadata objects on the object store, but reading and writing from the branch always starts by de-referencing the current commit a branch is pointing to, so has no other side effects.
Failing at step 6 could happen for 2 reasons:
1. generic error writing to the lakeFS backing KV store, in which case the server would retry the KV write operation or give up - in this case the commit operation fails and you're still pointing to the existing commit
2. compare-and-swap predicate failure: this means someone "beat us to it" - another commit/merge has successfully finished before ours did - in which case, we restart the flow at step 2. This ensures atomicity and also that parent-child relationships are properly maintained. Just like in Git, each commit points to its parents.
As the saying goes "If I Had More Time, I Would Have Written a Shorter Slack message" - sorry if this is a little contrived, but I'm happy to elaborate or answer any follow ups!Joe M
05/03/2024, 3:11 PMOz Katz