I have a question about branching: Say we want to ...
# help
u
I have a question about branching: Say we want to use lakeFS on our production system. Everyday we load new data in batch jobs (or streaming). What is the best pattern to make this possible using lakeFS? every batch load is a commit on main branch, or a feature branch per pipeline or ...? Basically I just want to know how a 'normal' production scenario would look like using lakeFS.
u
Hello @werner and thank you for that question. I'm looking for the best approach for you usage, and I get back to you shortly
u
Hello again @werner. The best approach is pretty much depends on your use case, but in general we recommend to take advantage of lakefs and its branching abilities. While working with the main branch and committing into it, is possible in some scenarios, using designated branches for introducing new data, will allow you to test and verify your data, before it is visible to its consumers. This aligns with the git approach we embrace You can read more on our usage and see some examples at https://docs.lakefs.io/usecases/
u
thanks, I was thinking about the scenario where we do not want to experiment but assume data is correct (production mode). But that would be commiting on main if I understand correctly.
u
Creating a branch for every update seems overkill (we do 1000s of updates a day)
u
You can still use a dedicated branch for all updates, and merge it to main. That way you will gain the benefit new data being introduced to main as an atomic action.
u
And if you are certain that your data is correct, and no experimenting or testing is required, it is possible to commit it to the main branch directly
u
I think the 1st example at https://docs.lakefs.io/usecases/production.html is pretty close to the scenario you are describing, right?
u
indeed
u
but a feature branch per daily run is also interesting
u
It is interesting. I would love to hear which approach worked best for you