When tagging a commit, I see that the tag is also ...
# help
u
When tagging a commit, I see that the tag is also applied to the parents of that commit ID. Is there anyway to tag the files in that commit and not files modified in parent commits?
u
Hi Harris, I'll take a look at it and will let you know soon
u
Hi Harris, laekFS commits are like git commits- they work like "pointers" for a state of a repository. Therefore, a commit point for all the files that were in the repository after it was committed.
u
It means that the tag point to all the files in the state they were saved in the commit you tagged Can you explain what are you trying to achieve? Are you trying to tag files that were added/changed in your commit?
u
I was thinking about using tagging to denote different files used in certain ml models/ workflow. Ideally, the model would use all files related to a specific tag. For example, model-x would use all the files related to 'tag-x'. 'tag-x' would ideally be a specific subset of files in the repo. It does not necessary have to be tagging, just some functionality that can achieve this. Hope to get your thoughts on this!
u
You can maybe use AWS S3 tags to achieve this. You can use lakeFS hooks capabilities in order to add tags to the files changed in your commit.
u
Ah interesting, I would have to look into lakeFS hooks more. But as an overview I could use hooks to tag files in S3 that I changed in my commits?
u
Just to emphasize @Idan Novogroder’s answer, lakeFS tags are similar to tags in Git, and they are basically a marker to a specific commit, including all its objects. This is not to be confused with, like Idan mentioned, AWS S3 Object Tagging. So in short, lakeFS tags are not used to tag specific objects.
u
Ah I see thank you for the info!
u
p.s one thing to think about: if all you care about the modifications made for the commit that is tagged, this is achievable using the lakeFS API by diffing the tagged commit with the previous one (i.e. showing what was changed in that commit). Here's an example using `lakectl`:
Copy code
$ lakectl diff <lakefs://my-datalake/my-tag~1> <lakefs://my-datalake/my-tag>
Left ref: <lakefs://my-datalake/my-tag~1>
Right ref: <lakefs://my-datalake/my-tag>
+ added datasets/reports/measurements0001.csv
+ added datasets/reports/measurements0002.csv
+ added datasets/reports/measurements0003.csv
Of course, you can also do this using the Python or Java SDKs.. Do note the
~1
notation, which is used to get the previous commit for that tag (this works similarly to Git's notation which is nicely explained here) @Harris Vijayagopal Let me know if this is helpful!
u
Yes, this was very helpful thanks