Aleksei Grachev
07/03/2024, 3:30 PM{
"default_retention_days": 0,
"branches": [
{
"branch_id": "main",
"retention_days": 0
},
{
"branch_id": "dev",
"retention_days": 0
}
]
}
2. Create branch
3. Copy some files to branch
4. Revert uncommited changes
5. Run gc:
spark-submit --class io.treeverse.gc.GarbageCollection \
--packages org.apache.hadoophadoop aws3.3.4 \
-c spark.hadoop.lakefs.api.url=http://10.17.66.46:8000/api/v1 \
-c spark.hadoop.lakefs.api.access_key=... \
-c spark.hadoop.lakefs.api.secret_key=... \
-c spark.hadoop.fs.s3a.access.key=... \
-c spark.hadoop.fs.s3a.secret.key=... \
http://treeverse-clients-us-east.s3-website-us-east-1.amazonaws.com/lakefs-spark-client/0.14.0/lakefs-spark-client-assembly-0.14.0.jar \
abtrain ap-northeast-1
6. Observe in log:
arbageCollection$: Report summary={"run_id":"gf4pgd4ouo3s73fcko5g","success":true,"first_slice":"","start_time":"2024-07-03T151415.548847Z","cutoff_time":"2024-07-03T091415.546Z","num_deleted_objects":0}
7. Check lakefs storage, it's 48GB
Expected: num_deleted_objects >0, reduction of lakefs storage size.
Same happens, when I commit to branch, then delete branch.
Is my understandict correct, that with my config, both uncommited changes, deleted files and files in deleted branches should be collected ?
Can you help me with this issue?Iddo Avneri
07/03/2024, 4:42 PMIddo Avneri
07/03/2024, 4:43 PMIddo Avneri
07/03/2024, 5:05 PMIddo Avneri
07/03/2024, 5:06 PMAleksei Grachev
07/03/2024, 5:11 PMAleksei Grachev
07/03/2024, 5:11 PMAleksei Grachev
07/03/2024, 5:18 PMIddo Avneri
07/03/2024, 5:20 PMAleksei Grachev
07/03/2024, 5:21 PMNiro
07/03/2024, 7:54 PM