<@U013V6T7UMC> let's continue our lakeFSFS recursi...
# dev
a
@Barak Amar let's continue our lakeFSFS recursive deletes discussion here rather than on DMs.
b
count the number of objects we delete or the trace of each delete as part of api delete-objects?
a
Some info about what objects were deleted by a deleteObjects call.
b
checking
will build an image with additional logging for the above
a
sg.
I'm trying to understand the flow of an overwriting Spark write that uses FileOutputCommitter.
Specifically: how does it delete the old data?
BTW, CSV overwrite takes about as long as Parquet overwrite. So probably nothing Parquet-specific (it is sometimes a bit different, even wraps OutputCommitters)
b
Copy code
2022-09-22 13:05:48	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":5,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:47Z"}Show context
2022-09-22 13:05:44	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:44Z"}
2022-09-22 13:05:41	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:41Z"}
2022-09-22 13:05:38	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:38Z"}
2022-09-22 13:05:36	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:36Z"}
The above is the number of calls to DeleteBatch as part of writing the data through the s3 gateway
This is from my env that uses the lakefsfs 0.1.6
I'm using
mode(SaveMode.Overwrite)
a
Neat! Now I'll look for non-bulk deletes near those times. Same org?
b
yes