Aaron Taylor
09/07/2024, 10:51 PMlakectl local
is intended to work with PR code review workflows? I've read through a few of the materials that are available [1] , and while basic pushing and pulling of data is clear, I'm still trying to understand how to use lakectl in the context of a team with multiple people making changes and tracking a reference state for models, computed features/outputs, etc. on the code repo's main
branch over time.
If you run lakectl local init <lakefs://repo/feature-branch/example> ./example
, the .lakefs_ref.yaml
file points to the feature-branch
branch. If I have a similar git branch called feature-branch
with a PR for it, once that PR is reviewed and ready to merge, is there a recommended set of steps to follow to keep the code repo, the lakefs_ref, and lakefs repo itself in sync? I've seen that lakectl local checkout
will modify the at_head
commit reference, but the src
field still pointing to the original branch.
As context, we are evaluating migrating off of dvc, where there is no concept of separate branching and trying to figure out what workflow to migrate to. The main property I'm looking to maintain is that the state of our data directories on main
are an exact match for what the code on main
would produce if re-run, so our team members can use that as a baseline for evaluating changes/experiments.
[1] https://lakefs.io/blog/scalable-data-version-control-getting-the-best-of-both-worlds-with-lakefs/ and https://lakefs.io/blog/scalable-ml-data-version-control-and-reproducibility/ and https://docs.lakefs.io/howto/local-checkouts.html#lakectl-local-sync-lakefs-data-with-a-local-directoryTal Sofer
09/08/2024, 6:44 AMfeature-branch
into main
2. lakectl local checkout of your local directory so that it is synced with lakeFS main
3. git add and push to track main data
4. push pr
Did I get it right?
As for the src field that’s still pointing to the feature branch - you found a bug, thank you! On checkout, the src should update to the checked out ref - i’m opening and issue for it and will share it here.Tal Sofer
09/08/2024, 6:55 AMAaron Taylor
09/09/2024, 8:17 PMsrc:
field not changing. While that bug is still open, is it legitimate for us to manually edit that source line to match the new ref value?
If done manually, those steps do leave some gaps open for human error and a "race condition" between people trying to merge at the same time, I think?
Are there any automations on the code source control CI side, or the LakeFS actions/hooks side that might help enforce workflows in this area?Tal Sofer
09/10/2024, 3:17 PMWhile that bug is still open, is it legitimate for us to manually edit that source line to match the new ref value?You can, but I believe that will fix it in the coming weeks 🙂
Are there any automations on the code source control CI side, or the LakeFS actions/hooks side that might help enforce workflows in this area?Currently, there is no built-in solution for it. A potential solution could be to implement a pre-merge lakeFS hook that receives the
at_head
value and verifies that the HEAD of main
is still it.
Would this solve your use case, or there is more to it that I missed?Aaron Taylor
09/11/2024, 5:45 PMat_head
value in our git repo? Would that be custom code which retrieves it?Offir Cohen
09/12/2024, 10:15 AM