I've setup LakeFS on my dev environment and have m...
# help
r
I've setup LakeFS on my dev environment and have manually uploaded a file to the main branch, I see it is in "uncommitted" state which leads me to this question: How do you guys manage commits in production? Let's say I have a "website" repo that contains a code to send events to Segment. Then I setup Segment to send the event data I want to keep to LakeFS. a) At which point do I make the first commit? Won't it commit only the data that has been sent until that certain point? b) Then I suppose I have to make a new commit whenever: • I change something in Segment which may affect my data • There is a code change in the website repo that may affect my data Am I right? c) If so, I suppose it'd be easier to automatically trigger a commit in LakeFS whenever there is a new commit in the website repo via a webhook, what do you think? Do you manage it this way or any other way?
After re-reading my message, I see that maybe my thinking is skewed by git: • In git, first I change something in the code, then I make a commit • In LakeFS, first I make a commit, then I change something in the code Which means that, to answer my first question: a) I would make the first commit once I make the first change in the code but not in the initial setup phase Is this right?
b
Hi @Romain, I'll try to address all the questions above: a) When to commit? - lakeFS is a tool. You commit your data depending on what you like to achieve. A commit will give you a reference you can't modify and you can go back to. Specificlly in your example you can commit before any deployment to get a point in time for your data that you can revert, in case something went wrong. But, you can also commit periodically, or after specific events in your system. I'm less familiar with Segment, but if I'll have a daily commit I can reference - I can have a reference I can use to query the data that will not modify, or use it for a branch point in case you like to experiment with your application.
b) true
c) lakeFS cli (lakectl) and the API that enables you to integrate easily in your CI/DI. You can read about it more in our docs.lakefs.io or specific examples in our blog.lakefs.io (specific category https://lakefs.io/category/integrations/)
r
@Barak Amar ok it's much more clear now, thanks for taking the time to answer my questions. I will definitely go ahead and implement LakeFS.
b
We are glad to help, here if you need more information or specific help.