lakeFS is an open-source data version control that transforms your object storage to Git-like repositories..

lakeFS

Hello all, come here to ask a more conceptual question. I do a lot of AWS batch processing where we have an object in S3 and for each object to be processed, we launch a job on AWS Batch top process it. We want to be able to track for each output object: the input object and the process used. My understanding of LakeFS is a bit limited, but from what I can tell I might do something like this:
1. Create a branch off of main to place new outputs
2. Each job container pulls an object from s3 via the AWS CLI
3. Process the data
4. Place the output in the bucket
5. commit the data
6. merge the branch
My issue with this process is that the metadata tag on commit can only be placed after all the processes are run. Is there a way to have each job place its own commit message. Would I have to spawn a branch for each separate job (is this feasible?)

Hey <@U02UEKKFJEB>,
Welcome :slightly_smiling_face:

Your flow sounds good,
by running each job on separated  branch, you could commit only the changes done by the specific job (and add the relevant metadata). After the commit you could merge back to the main branch.

Sounds like a great use-case, it sure is feasible