Hi I have a question on if I m using LakeFS correctly Say th lakeFS #help

Hi, I have a question on if I'm using LakeFS corre...

user

04/01/2022, 1:13 PM

Hi, I have a question on if I'm using LakeFS correctly. Say that I have a

main

branch with tagged commits. I create an

experiment

branch from one of those tags. There are more recent commits on

main

. I want to essentially cherry-pick one of those recent commits from

main

onto my

experiment

branch. What is the process for that? I'm most interested in avoiding copying the physical data, but I may have a use case where I want to use same commit ID if possible. From what I found, I think I can copy references to the data: 1. Use the API to list files (

ls

) at the LakeFS path on the commit I want to copy. 2. Use the API to

stat

each file to get the physical address and other metadata. 3. Use the API to stage each file onto my

experiment

branch, using the physical address and other metadata retrieved from the stat operation. 4. Commit those staged changes. Does that sound right or is there a more preferred approach to this problem?

user

04/01/2022, 1:33 PM

Hi @Clinton Monk, lakefs doesn't enable a cherry pick, but it would be great to capture this use-case in an issue. The suggested way you described should work - reference the data by metadata. This change will be effect the stage of the the experiment branch. Does branching from

main

again, and merge the the change from the

experiment

branch will help? 1. create

experiment2

based on

main

2. commit changes on

experiment

3. merge change from

experiment

experiment2

If the cherry-picking is for a specific commit - new files that added from a specific branch to

main

. Use the source reference and merge it to

experiment

branch. (note that changes in the

experiment

branch should be committed. Think that understanding the use-case will help more in providing a valid solution and even form a good feature request.

user

04/01/2022, 1:48 PM

Thanks! That's a good suggestion on branch merging. I'll think about it some more to see if that would work for us. 👍 on creating an issue to document the use case.

user

04/01/2022, 1:50 PM

Thanks, looking forward to your feature requests 🙂

user

04/01/2022, 2:08 PM

@Clinton Monk there is one option that can simplify the above steps, it will make a copy without a copy. Using the s3 gateway, we support copy-object, operation. So when you copy from the same repository it will just create an entry to reference the same data. Note that copy object update the timestamp to the time of the operation. You are getting the same data, without copy, but with new timestamp. So from diff point of view this is a change and not the same object.

user

04/01/2022, 3:13 PM

Should lakeFS support cherrypick (as well as potentially other apply and cherry operations)? It really sounds like we found a use-case for doing this. Current porcelain doesn't let you do this efficiently. You could use the staging api to stage everything from a diff, though. So it could be implemented as a pure porcelain operation (at the cost of round-tripping the diff). Open an issue?

user

04/03/2022, 3:28 PM

FYI, created the issue: https://github.com/treeverse/lakeFS/issues/3162

user

04/03/2022, 5:03 PM

Neat, thanks!

2 Views

Open in Slack

Previous Next