I have a question about branching Say we want to use lakeFS lakeFS #help

I have a question about branching: Say we want to ...

user

02/28/2022, 1:13 PM

I have a question about branching: Say we want to use lakeFS on our production system. Everyday we load new data in batch jobs (or streaming). What is the best pattern to make this possible using lakeFS? every batch load is a commit on main branch, or a feature branch per pipeline or ...? Basically I just want to know how a 'normal' production scenario would look like using lakeFS.

user

02/28/2022, 1:27 PM

Hello @werner and thank you for that question. I'm looking for the best approach for you usage, and I get back to you shortly

user

02/28/2022, 3:11 PM

Hello again @werner. The best approach is pretty much depends on your use case, but in general we recommend to take advantage of lakefs and its branching abilities. While working with the main branch and committing into it, is possible in some scenarios, using designated branches for introducing new data, will allow you to test and verify your data, before it is visible to its consumers. This aligns with the git approach we embrace You can read more on our usage and see some examples at https://docs.lakefs.io/usecases/

user

02/28/2022, 3:17 PM

thanks, I was thinking about the scenario where we do not want to experiment but assume data is correct (production mode). But that would be commiting on main if I understand correctly.

user

02/28/2022, 3:18 PM

Creating a branch for every update seems overkill (we do 1000s of updates a day)

user

02/28/2022, 3:25 PM

You can still use a dedicated branch for all updates, and merge it to main. That way you will gain the benefit new data being introduced to main as an atomic action.

user

02/28/2022, 3:27 PM

And if you are certain that your data is correct, and no experimenting or testing is required, it is possible to commit it to the main branch directly

user

02/28/2022, 3:30 PM

I think the 1st example at https://docs.lakefs.io/usecases/production.html is pretty close to the scenario you are describing, right?

user

02/28/2022, 4:08 PM

indeed

user

02/28/2022, 4:08 PM

but a feature branch per daily run is also interesting

user

02/28/2022, 4:16 PM

It is interesting. I would love to hear which approach worked best for you

Open in Slack

Previous Next