HT
10/04/2023, 7:17 AMfolder1
, folder2
, folder3
, each containing files. They added and removed with different commit.
We now want to delete all underlying file content of folder1
across all the commit history and branches. Is it possible ?
I guess we will need to go through each commit. For each commit, look at all underlying blob that are behind file in folder1
and delete them on the underlying storage.
In some form, it;s very similar to what the garbage collector do.
Any suggestion ?Elad Lachmi
10/04/2023, 7:37 AMfolder1
in the underlaying object store and remove any reference to it from all of the commits across all of the branches?HT
10/04/2023, 7:39 AMfolder1
will be "empty" while others files are still availElad Lachmi
10/04/2023, 7:41 AMHT
10/04/2023, 7:43 AMElad Lachmi
10/04/2023, 7:51 AMHT
10/04/2023, 7:54 AMc
and a path p
in a repo, how do I know what is the file in the underlying storage ??
When I look in data
in the underlying storage it's all cryptic path to me ...Elad Lachmi
10/04/2023, 8:14 AMHT
10/04/2023, 8:17 AMElad Lachmi
10/04/2023, 8:18 AMHT
10/04/2023, 8:21 AMpaths
• Translate them to underlying paths
Gather all of those paths across all commit history.
Delete all those path in the underlying storage.
The tricky part is translate lakefs path to underlying path. Basically understanding thisElad Lachmi
10/04/2023, 8:39 AM/repositories/{repository}/refs/{ref}/objects/ls
endpoint (and ref
can be branch/commit/etc.), one of the properties of the result objects is physical_address
, which is the location in the object storeHT
10/04/2023, 8:46 AMElad Lachmi
10/04/2023, 8:46 AMHT
10/04/2023, 8:47 AMElad Lachmi
10/04/2023, 8:47 AMHT
10/04/2023, 8:56 AMElad Lachmi
10/04/2023, 8:59 AMHT
10/04/2023, 10:51 PM/repositories/{repository}/refs/{ref}/objects/ls
: what is the equivalent in python sdk ??Elad Lachmi
10/05/2023, 3:06 AMHT
10/05/2023, 3:21 AMElad Lachmi
10/05/2023, 3:21 AMHT
10/06/2023, 2:23 AMphysical_address
) of specific paths.
Then when I try to download those file via S3 API of a commit that reference those paths, I get [Errno 121] Internal Server Error
I was hopping for a error similar to what would happen if the file were delete by the garbage collector (as mentioned here) : 410 Gone
Looks like the GC do something to inform the lakefs server that those file are gone and don't try to look for them
My question then become: when I delete the underlying file, what do I need to do more to let lakefs server know and handle this correctely ?
(self hosted lakefs server)
@Elad LachmiAriel Shaqed (Scolnicov)
10/06/2023, 6:57 AMHT
10/06/2023, 7:00 AMElad Lachmi
10/06/2023, 7:11 AMHT
10/06/2023, 7:11 AMElad Lachmi
10/06/2023, 7:18 AM