https://lakefs.io/ logo
Title
m

Manoj Babu

04/12/2023, 5:52 PM
Hi folks, As mentioned here https://lakefs.slack.com/archives/C04QQ6GD3ML/p1678185767372759 approach 4 looks more neat and simple. Although i'm not so sure on extension design atm, i'm trying to figure out the expected behaviour of versioning iceberg table interactions/changes on lakefs graveler kv. First of all kudos to the effort put up by the lakeFS team, I had some good
AHA
moments yesterday going through the kv design and over 3k lines of graveler code. I very much enjoyed reading the impl. Just a thought experiment. When it comes to versioning Iceberg tables, taking a typical single branch committer flow, we have these events happening. 1. For sake of simplicity, lets start with the default branch iceberg keeps the HEAD ref which is
main
. Any
writes
to the table i'm talking about will be only about this branch for now. 2. Also, lets assume we somehow magically/mentally have a lakeFS repo ready with the default branch main. staging token got generated and 0 sealed tokens. 3. When a write gets commited successfully to an iceberg table, we are left with a
snapshot-id
. So lets commit the same to lakeFS, not so sure on what could be the
key
(maybe hdfs path of the iceberg table which doesn't change or snapshot metadata file) but we have staging token partition, and the value will be the
snapshot-id
. And still 0 sealed tokens. We prepare for commit and add it to the branch. 4. So far so good. The branch always points to the correct
snapshot-id
as we are versioning iceberg commits. Lets bring iceberg housekeeping tasks into picture. 1. Rewrite data files or simply compaction creates another commit changing the iceberg metadata and adding a snapshot. For lakeFS, this is just another commit. We should be good. 2. Deleteorphans doesn't create any commit it just removes dangling data and metadata files. We should be good here too. 3. Expire snapshots deletes the snapshots which are old enough or doesn't qualify iceberg's snapshot retainment rules. Which might be a problem as some of lakeFS commits point to the snapshots which are no longer available. Cherrypick or Revert operations on such commits will overwrite the commit tree placing those commits on top. For the reader-flow let's just assume that, any iceberg table reader client integrated with lakeFS, reads `snapshot-id`from the lakeFS repo/branch HEAD ref (
branch.CommitID
) and then reads the actual iceberg table as of
snapshot-id
version. Is there a way in lakeFS to tombstone the commits which point to expired snapshots and let GC handle them? So that whenever a lakeFS client looks at commit log, he/she will see only commits/tags pointing to valid snapshots. A typical scenario can be like this. Iceberg snapshot ancestry on branch main: before expire snapshots.
Tn -> Tn-1 -> Tn-2 -> Ca ... Cb -> T2->  Ce ... Cd -> T1 -> HEAD
T -> Tag and C -> commit. Corresponding lakeFS commit tree(similar to iceberg commit tree):
Tn -> Tn-1 -> Tn-2 -> Ca ... Cb -> T2->  Ce ... Cd -> T1 -> HEAD
After expire snapshots: iceberg snapshot ancestry
Tn -> Tn-1 -> Tn-2 -> T2->  Ce ... Cd -> T1 -> Ex -> HEAD
Ex -> Commit introduced by expire-snapshots action. lakefs refs ??? shld be similar to iceberg snapshot ancestry tree We need to pick those commits on lakefs referencing the snapshots which got expired and expire those commits(in this case the whole
Ca ... Cb
commit tree) and change commit ancestry by linking the parents appropriately.
a

Ariel Shaqed (Scolnicov)

04/13/2023, 7:32 AM
Hi @Manoj Babu, Thanks! These are all rather similar to things that we are currently considering. Usually I find it easier to think in terms of lakeFS semantics and talk about "committed" and "uncommitted" objects, and maybe "branches" and "merges", rather than in terms of lakeFS implementation and talk about "staging tokens". That is not to say that lakeFS semantics are perfect and nothing may be added. On the contrary: if Iceberg integration requires additional semantics on lakeFS, I would like to define them clearly without using the word "Iceberg", possibly making them usable for other and future projects. Looking at it this way, I am not sure that your proposals will require any additional semantics, I think we can discuss them entirely in lakeFS terms. That said, the principal challenge in such integrations is always deciding what is the right thing to do, before implementing it. I think the prime question for our users should be: what is the relationship between Iceberg versions and lakeFS commits? lakeFS should give cross-collection -- and in this case cross-Iceberg -- consistency. As a user in such a world: where do I go to find history? What does history even mean? For instance: • If every Iceberg version is a lakeFS commit, then how do I get cross-collection consistency? For instance, I write Iceberg and then update README.md to document my changes. Now I have 2 commits in lakeFS, but only the second one holding Iceberg and README.md is consistent. • If a lakeFS commit can hold multiple consecutive Iceberg versions then I can get cross-collection consistency. But then, if Iceberg GC'ed away some old versions in the last version that's stored in a commit, how do I see detailed history? As devs, we can probably solve each of these issues by using merges, etc. The important first question is what do our users want?
m

Manoj Babu

04/13/2023, 11:33 AM
what if we borrow the concept of submodule from git?? Iceberg versions can be grouped into a submodule if wanted to have 1-1 relationship btwn iceberg versions and lakefs commits. i'm still not clear what i'm try to solve here yet. just thought of sharing about sub modules for lakefs repos
a

Ariel Shaqed (Scolnicov)

04/13/2023, 11:40 AM
Sure! I think we've been trying to delay adding submodules for as long as possible. I don't think anyone likes them. Also submodules end up requiring a great deal of maintenance in order to provide cross-collection consistency. So while they add some modularity to git, using them can hurt atomicity of updates. That's less important in writing source code -- which is a low-volume operation -- and more important in data processing. This is not to say I will never agree to add submodules. Only that I would prefer to see a solid use-case that it resolves. It's intriguing to think that open table formats may provide this use-case! Now I need to think... 🤔
m

Manoj Babu

04/13/2023, 12:15 PM
Xd I don't like them either, had to deal with them in the past for my vim plugins 🤯. Hence i remember it more than other things git offer. What you said is right. It could indeed be a frustrating experience for the end users to deal with.
I agree to the point you raised about history. History needs to be immutable so changing the commit ancestry and linking parent ids is subject to what a user needs at the end of the day. Also, in iceberg the snapshot ancestry is not corrected when some versions are GCed. The parent of a snapshot might still point to a non existing snapshot.(i'm mixing terms snapshot and version, but they both mean the same.. version doesn't mean iceberg spec version) As an iceberg user, I want a clean tree of the lakefs commits which point to valid snapshots. May be we can some invalid mark on the commits pointing to expired iceberg versions and provide a filter option on the commit log ??
a

Ariel Shaqed (Scolnicov)

04/13/2023, 1:02 PM
Thanks!
m

Manoj Babu

04/13/2023, 2:24 PM
My bad, just discovered that there is no concept of checkout in lakefs. So this brings me back to the question what do i want to do with the history apart from cherrypick/revert which can be managed with merge commits. This still needs a knowledge of lakefs commit which points to valid iceberg version. My initial thought was to use history to facilitate version specific queries by checking out to specific lakefs commit. May be thinking more about the extension or integration rather than relationships between iceberg versions and lakefs commits can bring me some more clarity on these aspects. Thanks @Ariel Shaqed (Scolnicov) for helping me to think through.
a

Ariel Shaqed (Scolnicov)

04/13/2023, 2:30 PM
Thanks for bringing it up! I can't say I fully understand the desired semantics. We're taking steps towards a good solution, of course. The nice thing is that the S3 gateway into lakeFS lets you use Iceberg in S3 mode to write. Now merge generally just works in the cases that we do understand. Deciding what to do with merge when both sides have changed is challenging I'm not even sure what is the correct answer. So we've come a long way, and still have an interesting road ahead.
:lakefs: 2
:gratitude-thank-you: 2