Hi all We re using LakeFS as part of a reinforcement learnin lakeFS #help

Hi all. We're using LakeFS as part of a reinforcem...

Dan McGreal

05/13/2024, 12:24 PM

Hi all. We're using LakeFS as part of a reinforcement learning application. One thing we'd like to improve is optimising our processing so that we're not re-processing the same data through the same code versions. For example, when a new branch is created with some changed/new data, we have an action that generates data but shouldn't re-process everything that has already been processed before, unless the processing code has changed. I wonder if there's any features of LakeFS, integrations, or patterns/tools in the LakeFS ecosystem we can uses to create something as we're looking to create as little code/maintenance burden for ourselves as possible.

Itai Admi

05/13/2024, 1:16 PM

Hey Dan, welcome to the lake lakefs If I understand you correctly, you want to link between your code and data versions. The code should only run on a branch, if the data in that branch wasn’t created by the same code version. The best way to link between a code and a data version is with

lakectl local

. The command stores the

git_commit_id

in lakeFS, then I guess you will run the code if (pseudo):

Copy code

lakefs_branch.commit_metadata.git_commit_id != git_branch.commit_id

Or to be more precise, if the

diff

between

lakefs_branch.commit_metadata.git_commit_id

and

git_branch.commit_id

on the specific code path is 0 lines (assuming a git commit can contain many changes to different irrelevant actions). Does that make sense?

Itai Admi

05/13/2024, 1:17 PM

For more information on lakectl local: https://docs.lakefs.io/howto/local-checkouts.html#lakectl-local-the-way-to-work-with-lakefs-data-locally

Itai Admi

05/13/2024, 1:22 PM

btw, if working locally with the data isn’t required, you can just add the

git_commit_id

to lakeFS when you commit data there, and the rest of the flow remains the same

Niro

05/13/2024, 3:14 PM

@Dan McGreal, you might be also interested in lakeFS actions, specifically pre/post-create-branch hook

Open in Slack

Previous Next