Manoj Babu
04/12/2023, 5:52 PMAHA
moments yesterday going through the kv design and over 3k lines of graveler code. I very much enjoyed reading the impl.
Just a thought experiment.
When it comes to versioning Iceberg tables, taking a typical single branch committer flow, we have these events happening.
1. For sake of simplicity, lets start with the default branch iceberg keeps the HEAD ref which is main
. Any writes
to the table i'm talking about will be only about this branch for now.
2. Also, lets assume we somehow magically/mentally have a lakeFS repo ready with the default branch main. staging token got generated and 0 sealed tokens.
3. When a write gets commited successfully to an iceberg table, we are left with a snapshot-id
. So lets commit the same to lakeFS, not so sure on what could be the key
(maybe hdfs path of the iceberg table which doesn't change or snapshot metadata file) but we have staging token partition, and the value will be the snapshot-id
. And still 0 sealed tokens. We prepare for commit and add it to the branch.
4. So far so good. The branch always points to the correct snapshot-id
as we are versioning iceberg commits.
Lets bring iceberg housekeeping tasks into picture.
1. Rewrite data files or simply compaction creates another commit changing the iceberg metadata and adding a snapshot. For lakeFS, this is just another commit. We should be good.
2. Deleteorphans doesn't create any commit it just removes dangling data and metadata files. We should be good here too.
3. Expire snapshots deletes the snapshots which are old enough or doesn't qualify iceberg's snapshot retainment rules. Which might be a problem as some of lakeFS commits point to the snapshots which are no longer available. Cherrypick or Revert operations on such commits will overwrite the commit tree placing those commits on top.
For the reader-flow let's just assume that, any iceberg table reader client integrated with lakeFS, reads `snapshot-id`from the lakeFS repo/branch HEAD ref (branch.CommitID
) and then reads the actual iceberg table as of snapshot-id
version.
Is there a way in lakeFS to tombstone the commits which point to expired snapshots and let GC handle them?
So that whenever a lakeFS client looks at commit log, he/she will see only commits/tags pointing to valid snapshots.
A typical scenario can be like this.
Iceberg snapshot ancestry on branch main:
before expire snapshots.
Tn -> Tn-1 -> Tn-2 -> Ca ... Cb -> T2-> Ce ... Cd -> T1 -> HEAD
T -> Tag and C -> commit.
Corresponding lakeFS commit tree(similar to iceberg commit tree):
Tn -> Tn-1 -> Tn-2 -> Ca ... Cb -> T2-> Ce ... Cd -> T1 -> HEAD
After expire snapshots:
iceberg snapshot ancestry
Tn -> Tn-1 -> Tn-2 -> T2-> Ce ... Cd -> T1 -> Ex -> HEAD
Ex -> Commit introduced by expire-snapshots action.
lakefs refs ???
shld be similar to iceberg snapshot ancestry tree
We need to pick those commits on lakefs referencing the snapshots which got expired and expire those commits(in this case the whole Ca ... Cb
commit tree) and change commit ancestry by linking the parents appropriately.Ariel Shaqed (Scolnicov)
04/13/2023, 7:32 AMManoj Babu
04/13/2023, 11:33 AMAriel Shaqed (Scolnicov)
04/13/2023, 11:40 AMManoj Babu
04/13/2023, 12:15 PMAriel Shaqed (Scolnicov)
04/13/2023, 1:02 PMManoj Babu
04/13/2023, 2:24 PMAriel Shaqed (Scolnicov)
04/13/2023, 2:30 PM