Is there any guidance available for how utilizing ...
# help
a
Is there any guidance available for how utilizing
lakectl local
is intended to work with PR code review workflows? I've read through a few of the materials that are available [1] , and while basic pushing and pulling of data is clear, I'm still trying to understand how to use lakectl in the context of a team with multiple people making changes and tracking a reference state for models, computed features/outputs, etc. on the code repo's
main
branch over time. If you run
lakectl local init <lakefs://repo/feature-branch/example> ./example
, the
.lakefs_ref.yaml
file points to the
feature-branch
branch. If I have a similar git branch called
feature-branch
with a PR for it, once that PR is reviewed and ready to merge, is there a recommended set of steps to follow to keep the code repo, the lakefs_ref, and lakefs repo itself in sync? I've seen that
lakectl local checkout
will modify the
at_head
commit reference, but the
src
field still pointing to the original branch. As context, we are evaluating migrating off of dvc, where there is no concept of separate branching and trying to figure out what workflow to migrate to. The main property I'm looking to maintain is that the state of our data directories on
main
are an exact match for what the code on
main
would produce if re-run, so our team members can use that as a baseline for evaluating changes/experiments. [1] https://lakefs.io/blog/scalable-data-version-control-getting-the-best-of-both-worlds-with-lakefs/ and https://lakefs.io/blog/scalable-ml-data-version-control-and-reproducibility/ and https://docs.lakefs.io/howto/local-checkouts.html#lakectl-local-sync-lakefs-data-with-a-local-directory
t
Hi @Aaron Taylor, the flow you are implementing is a good way to go! IIUC you are trying to do the following once your Git pr is approved: 1. lakeFS merge of
feature-branch
into
main
2. lakectl local checkout of your local directory so that it is synced with lakeFS
main
3. git add and push to track main data 4. push pr Did I get it right? As for the src field that’s still pointing to the feature branch - you found a bug, thank you! On checkout, the src should update to the checked out ref - i’m opening and issue for it and will share it here.
a
Got it, that makes sense as a set of steps. I think I was primarily confused by the
src:
field not changing. While that bug is still open, is it legitimate for us to manually edit that source line to match the new ref value? If done manually, those steps do leave some gaps open for human error and a "race condition" between people trying to merge at the same time, I think? Are there any automations on the code source control CI side, or the LakeFS actions/hooks side that might help enforce workflows in this area?
t
While that bug is still open, is it legitimate for us to manually edit that source line to match the new ref value?
You can, but I believe that will fix it in the coming weeks 🙂
Are there any automations on the code source control CI side, or the LakeFS actions/hooks side that might help enforce workflows in this area?
Currently, there is no built-in solution for it. A potential solution could be to implement a pre-merge lakeFS hook that receives the
at_head
value and verifies that the HEAD of
main
is still it. Would this solve your use case, or there is more to it that I missed?
a
got it, I'll look more into hooks and how we could use those. How would we connect the LakeFS hook to the
at_head
value in our git repo? Would that be custom code which retrieves it?
o
Hi @Aaron Taylor Yes, this would be a custom code that would integrate lakeFS hooks with GitHub actions. We have an example for such integration in Databricks here with a

recorded demo

as well.
👀 1