Hello. Thank you for great product. Currently I'm ...
# help
a
Hello. Thank you for great product. Currently I'm in the middle of creating POC for migration from DVC to lakefs. I've faced issue with GC. How to reproduce: 1. Check lakefs storage, it's 48GB 2. Set gc config:
Copy code
{
  "default_retention_days": 0,
  "branches": [
    {
      "branch_id": "main",
      "retention_days": 0
    },
    {
      "branch_id": "dev",
      "retention_days": 0
    }
  ]
}
2. Create branch 3. Copy some files to branch 4. Revert uncommited changes 5. Run gc: spark-submit --class io.treeverse.gc.GarbageCollection \ --packages org.apache.hadoophadoop aws3.3.4 \ -c spark.hadoop.lakefs.api.url=http://10.17.66.46:8000/api/v1 \ -c spark.hadoop.lakefs.api.access_key=... \ -c spark.hadoop.lakefs.api.secret_key=... \ -c spark.hadoop.fs.s3a.access.key=... \ -c spark.hadoop.fs.s3a.secret.key=... \ http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client/0.14.0/lakefs-spark-client-assembly-0.14.0.jar \ abtrain ap-northeast-1 6. Observe in log: arbageCollection$: Report summary={"run_id":"gf4pgd4ouo3s73fcko5g","success":true,"first_slice":"","start_time":"2024-07-03T151415.548847Z","cutoff_time":"2024-07-03T091415.546Z","num_deleted_objects":0} 7. Check lakefs storage, it's 48GB Expected: num_deleted_objects >0, reduction of lakefs storage size. Same happens, when I commit to branch, then delete branch. Is my understandict correct, that with my config, both uncommited changes, deleted files and files in deleted branches should be collected ? Can you help me with this issue?
i
Hi @Aleksei Grachev - I have to say this is a shot in the dark, but I just don’t typically see 0. Do these files exist in any other head of a branch?
See these notes.
Another question I have is if you tried this with 1 instead of 0. It’s a little bit a long shot - I’m just not sure that I ever saw 0 there 🙂
(in retention days)
a
> Do these files exist in any other head of a branch? No, I've deleted all the branches.
> Another question I have is if you tried this with 1 instead of 0. It’s a little bit a long shot - I’m just not sure that I ever saw 0 there 🙂\ That helped. Some files have been cleared. But not all.
If you know how to run GC with cleaning EVERYTHING that's possible, that would be quite helpful!
i
To make sure I understand correctly - There are currently no files and no commits older than 1 day in any branch or in the head of any branch and still those files are not being deleted?
a
Yes, correct
n
@Aleksei Grachev If I'm not mistaken, Setting retention days to 0 means that data will never be deleted...