Hello team, Is there a way for me to know the root...
# help
j
Hello team, Is there a way for me to know the root commit of a branch ? The use case would be for me to be able to know the age of a branch based on the default branch (main). I would take creation_date of the first commit
Maybe also, in terms of lakefs constructs, this question makes no sense :)
i
What do you mean by a "root commit of a branch"? If for example, I have this tree:
Copy code
A---B---C----D-----R---N---X  branch "main"
          \              /
            \           /
             ---L---M--          branch "branch-a"
What would you consider as the root commit of branch-a? A/C or maybe L? If the answer is A, the answer is simple- since it is the first commit in the repository, it should be identical to the repository age so you can get the creation_date of the repository using the getRepository API.
j
For me, for branch-a it would be L. And if L does not exist, then Void.
i
L means you are looking for the first commit you pushed to the new branch after creating it? Because when branch a created it was firstly pointed to commit C
j
Indeed the root itself would be C But the closest to what I need for my use case (which is essentially to determine the age of a branch) is L.
Ideally a branch would have a creation_date of course
If, from C, I can get to L, it is good enough also (like, being able to query for the ordered list of commits of branch-a from C included)
i
Neither C nor L will be helpful for you, as you can create commit C on Monday, then create branch-a on Tuesday, and finally push a new commit L to branch-a on Wednesday. One feature that might be helpful for you is Auditing- you might use it to check when you created your branch.
j
L would still be helpful in that case, as what really matters is the first piece of extra data added to the branch (compared to the source commit). In that case it is L. The branch creation_date is just a "good enough proxy". I didn't mention it yet but we are trying to ensure that branches older than one month that contain PII are automatically deleted.
But I understand that auditing would be OK.
i
I'm not sure your process is valid, deleting a branch, doesn't mean the data is deleted. The commits that contain the PII might still exist. Have you considered using lakeFS' Garbage Collection?
j
Yes, we are aware of this, but before the GC runs properly, we understand that the branch (or the related objects) must at least be declared as deleted ?
Copy code
In the above example, objects will be retained for 14 days after deletion by default
i
That is true. GC deletes three kinds of objects- 1. Uncommitted objects 2. Committed objects that were already deleted or replaced 3. Objects that are part of dangling commits (commits that aren't directly linked to by any child commit, branch, tag, or other reference) So if your branch's HEAD doesn't point to the objects with the PII (and also no other branch's HEAD does), the GC will remove it by the retention policy you defined
j
Do we agree that the problem remains to determine whether the age of the branch is higher than a certain threshold, and that it is impossible to do without the audit log ?
i
Not necessarily, dangling commits delete immediately by GC without any connection to the branch's age
j
But if the branch is not deleted, the commits will not be dangling ?
i
And the age of the branch is not what important here, the age of the object does! 🙂
Or to be more accurate- how long has it been since the object deleted/replaced
j
But how am I able to determine which objects have been touched - relative to the main branch ?
Then, as you say, I would be able to determine e.g. the min date of the objects (LastModifiedAt)
At least I understand that at some point I need to know C (in order, maybe, to do a diff and get the set of distinct paths within the changes)
The documentation says:
Copy code
Under the hood, branches are simply a pointer to a commit
So I believe technically we could know C And then easily know L edit: I understand the pointer is to the HEAD of the branch
i
I'm not 100% sure what you are trying to achieve. If you want an object with a PII to be deleted, you should make sure there's no branch that points to a commit that contains this object. There's is nothing to do with the branch's age.