I think you and I might be reading the same situation in opposite ways. Allow me to outline the way I read the situation, with a few technical comments. I do understand that your way is different, perhaps as a result of thinking of lakeFS as a component at different levels in the stack.
⢠"_Mixing up dimensions_": I don't see that at all. In my mind, every commit holds multiple tables for the different dimensions. So if you have a large table of clicks and a small table for the user dimension, I would expect you to have two Iceberg tables or Parquet "files" or even one Iceberg table for the clicks and one small JSON object for the users. How much you denormalize or don't is up to you. Here lakeFS is unopinionated!
⢠"_Cross collection consistency_" is key: to my mind it means that every commit holds a consistent view of clicks and users. You can use this to enforce application-level consistency. For instance, you can enforce that all commits or all commits on certain branches guarantee that all users in the clicks table are found in the users table. lakeFS has more of an opinion here: it encourages you to define consistency on your long-lived branches.
⢠There is no object duplication: if you don't change a partition of your clicks table, it is not duplicated across versions. Here lakeFS is opinionated: The way to create a dev environment is to branch out. The way to manage a process of multiple consecutive changes is to branch out, commit after each change, and merge back.
In the example, each ingest branch is updated separately. A process reconciles them and eventually merges a consistent view to the trunk.
As a smaller example: say you wished to perform consistent change across clicks and users. For example, you need to ingest new clicks that can have new users, and any new users must be generated or fetched from an external system. I would suggest:
⢠Branch out of production to a work branch and work there.
⦠Ingest new clicks, commit.
⦠Find all users in clicks table that are not in users
⦠Fetch missing users
⦠Write new users table
⦠Merge back to production
⢠Look at the history of production: it is always consistent, all users in clicks appear in users. And there are 2 commits: the first consistent before the new clicks and users, the second consistent after the new clicks and users.
@Oz Katz may be able to give more examples.