Hey all, is it true that objects that exist in a t...
# help
j
Hey all, is it true that objects that exist in a tag but are no longer in the HEAD of any branch are not garbage collected? In other words, if I run garbage collection is it guaranteed that I can still access the data for all tags?
Just wanted to confirm because this blog post makes it seem so:
This ensures that data versions persist as long as they are tied to an existing branch or tag.
But I don’t see tags mentioned anywhere in the docs https://docs.lakefs.io/howto/garbage-collection/gc.html.
g
Hi @Jacob, Sorry, but that isn’t true, currently, the GC process treats tags the same way as commits. It won’t protect tags and there isn’t any way to configure it either. Can you please provide more information on your use case and requirements?
Sorry about the blog post, we will fix it!
🙏 1
j
I’d like to be able to tag specific commits for retention, in addition to the HEAD of the branches
g
If you are interested you can open a Github issue and we will look into it. If you prefer I can open it for you as well
j
If you don’t mind opening one, that would be awesome. I think this is a feature that would be beneficial to many users
g
I opened this feel free to add some comment so it will better fit your case
i
@Guy Hardonag when are files under a commit lost? When they don't reference a branch anymore?
g
Yes When they aren’t referenced by any branch anymore Or they are referenced only by branches that passed their retention period (if there is one configured)
i
Got it
Then it makes sense to also have tags act in a similar way as the commits on the HEAD
a
Then it makes sense to also have tags act in a similar way as the commits on the HEAD
I think it's trickier than that and will depend on the intent of the user of the tag. Some tags should be kept until deleted. For instance, you might want to keep the tag
version_to_reproduce_bug
if you run GC to save space. But if you run GC for compliance then probably not. Meanwhile
release_weekly_20230314
is a tag that you will probably want lakeFS to GC, if you're releasing every week. It's actually surprisingly hard to spec out. The major difference between tags and branch heads is that branch heads are essentially mutable while tags are essentially immutable. So it is easier to assume intent behind branches during a GC. I suggest adding to the issue @guy hardonag opened.