Hi, I have a question on if I'm using LakeFS corre...
# help
u
Hi, I have a question on if I'm using LakeFS correctly. Say that I have a
main
branch with tagged commits. I create an
experiment
branch from one of those tags. There are more recent commits on
main
. I want to essentially cherry-pick one of those recent commits from
main
onto my
experiment
branch. What is the process for that? I'm most interested in avoiding copying the physical data, but I may have a use case where I want to use same commit ID if possible. From what I found, I think I can copy references to the data: 1. Use the API to list files (
ls
) at the LakeFS path on the commit I want to copy. 2. Use the API to
stat
each file to get the physical address and other metadata. 3. Use the API to stage each file onto my
experiment
branch, using the physical address and other metadata retrieved from the stat operation. 4. Commit those staged changes. Does that sound right or is there a more preferred approach to this problem?
u
Hi @Clinton Monk, lakefs doesn't enable a cherry pick, but it would be great to capture this use-case in an issue. The suggested way you described should work - reference the data by metadata. This change will be effect the stage of the the experiment branch. Does branching from
main
again, and merge the the change from the
experiment
branch will help? 1. create
experiment2
based on
main
2. commit changes on
experiment
3. merge change from
experiment
to
experiment2
If the cherry-picking is for a specific commit - new files that added from a specific branch to
main
. Use the source reference and merge it to
experiment
branch. (note that changes in the
experiment
branch should be committed. Think that understanding the use-case will help more in providing a valid solution and even form a good feature request.
u
Thanks! That's a good suggestion on branch merging. I'll think about it some more to see if that would work for us. 👍 on creating an issue to document the use case.
u
Thanks, looking forward to your feature requests 🙂
u
@Clinton Monk there is one option that can simplify the above steps, it will make a copy without a copy. Using the s3 gateway, we support copy-object, operation. So when you copy from the same repository it will just create an entry to reference the same data. Note that copy object update the timestamp to the time of the operation. You are getting the same data, without copy, but with new timestamp. So from diff point of view this is a change and not the same object.
u
Should lakeFS support cherrypick (as well as potentially other apply and cherry operations)? It really sounds like we found a use-case for doing this. Current porcelain doesn't let you do this efficiently. You could use the staging api to stage everything from a diff, though. So it could be implemented as a pure porcelain operation (at the cost of round-tripping the diff). Open an issue?
u
u
Neat, thanks!