I am getting this error from time to time `failed ...
# help
a
I am getting this error from time to time
failed to create repository: found lakeFS objects in the storage namespace key(_truncated_name/dummy): storage namespace already in use
I am not sure about the cause. The repository is empty and I am calling the API with exist_ok=True
o
Hey @Andrij David ! lakeFS repositories cannot share the same storage namespace, it must be unique. This means you either already have a different repository configured to use the same storage path, or had one previously and that storage location hasn’t been cleared out.
exist_ok
does something different: if a repository with the same ID already exists, it won’t error.
a
A repository that used the same storage namespace may have existed but was deleted from LakeFS. Is it possible that when the repository was deleted, the storage remained?
o
indeed - deleting a repository will not remove its underlying storage.
a
Does LakeFS have a garbage collection approach to handle this, or do we need to delete the storage manually?
o
I believe you’ve run into this: https://github.com/treeverse/lakeFS/issues/5566
lakeFS’ garbage collection will currently not handle deleted repositories, so at the moment it is up to the user to delete the storage namespace
a
We were not aware of this behavior. We have deleted around 800 repositories so far. Is there any way to identify the orphaned storage namespaces so we can purge them?
o
oh sorry about that. All storage namespaces should have a
_lakefs
directory directly in them, so perhaps listing your bucket or storage account for paths that contain this directory could be helpful. see this
a
Thank you for the link, but it only explains the structure of a LakeFS repository. For your information, we have about 10,000 repositories, and 800 were removed. How can we differentiate the storage namespaces that are mapped to non-existent LakeFS repositories so we can remove them? I know this is a tricky situation, and I am curious if you have encountered this before
o
I see. I guess listing all repositories, saving a list of the storage namespaces currently in use, followed by listing the storage and subtracting the paths currently in use is the way to go. Not sure if there’s an easier way
a
Ok. Thank you. I will look into it.
🙏 1