Hi, I have question about LakeFS and DB's DeltaTab...
# help
u
Hi, I have question about LakeFS and DB's DeltaTables. Here's the scenario I'm thinking about: 1. I have a DeltaTable with, say, k versions on my main branch 2. I then create a dev branch and merge data into the DeltaTable (creating version k+1) 3. I merge the dev branch into the main branch So I'm expecting that the DeltaTable on the main branch reflects version k+1. could someone help me understand how data is being written and copied in steps 2 and 3? is it the entire DeltaTable or just the differences between versions k and k+1? (also, realize this example may not be entirely clear so happy to help expand on it if that's helpful)
u
Hi Ryan, Current support for delta is summarized under Delta Lake. You'll note that merging is currently quite primitive, and will only work when only one side of the merge has changed. This is a simple lakeFS merge, so no data objects are copied - just lakeFS metadata. Our team are working to perform more content-aware merges. I'll make sure to publish our progress on Slack.
u
thanks for the info! what's the behavior if an invalid merge of DeltaTables is attempted? does the merge fail? or does it potentially result in an corrupted DeltaTable?
u
Our team are working to perform more content-aware merges. I'll make sure to publish our progress on Slack. @Ariel Shaqed (Scolnicov) would it be possible to describe the work planned on this in more detail? and what's the timeline?
u
You'll get a merge conflict, so the merge will fail. Intuitively, right now lakeFS doesn't know how to "make up" a new file, so it cannot corrupt your data. When we improve merging, we hope to be able to avoid the conflict. But our first priority is always of course not to corrupt data :-)
u
There are multiple metadata types that our users would like to merge. These include delta, Hive metastore, Glue, DBT, and more. I'm sorry but I hope you can appreciate that I have no timetable to share right now. I will say that mentioning your favourite format helps us find potential users. Would you be willing to jump on a call with us, maybe we can find a partial solution that will work for now?
u
ok, that's super helpful. I think we'll be able to get quite a bit of mileage out of the one-way merges that are supported. I envision that most of the time we'll be creating a branch, adding data to the delta tables then merging back to the main branch. That said, having the flexibility to merge from multiple branches would be even better.