Title
#dev
Ariel Shaqed (Scolnicov)

Ariel Shaqed (Scolnicov)

09/22/2022, 9:15 AM
@Barak Amar let's continue our lakeFSFS recursive deletes discussion here rather than on DMs.
Barak Amar

Barak Amar

09/22/2022, 9:18 AM
count the number of objects we delete or the trace of each delete as part of api delete-objects?
Ariel Shaqed (Scolnicov)

Ariel Shaqed (Scolnicov)

09/22/2022, 9:19 AM
Some info about what objects were deleted by a deleteObjects call.
Barak Amar

Barak Amar

09/22/2022, 9:19 AM
checking
9:35 AM
will build an image with additional logging for the above
Ariel Shaqed (Scolnicov)

Ariel Shaqed (Scolnicov)

09/22/2022, 9:59 AM
sg.
10:00 AM
I'm trying to understand the flow of an overwriting Spark write that uses FileOutputCommitter.
10:00 AM
Specifically: how does it delete the old data?
10:01 AM
BTW, CSV overwrite takes about as long as Parquet overwrite. So probably nothing Parquet-specific (it is sometimes a bit different, even wraps OutputCommitters)
Barak Amar

Barak Amar

09/22/2022, 10:36 AM
2022-09-22 13:05:48	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":5,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:47Z"}Show context
2022-09-22 13:05:44	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:44Z"}
2022-09-22 13:05:41	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:41Z"}
2022-09-22 13:05:38	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:38Z"}
2022-09-22 13:05:36	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:36Z"}
10:36 AM
The above is the number of calls to DeleteBatch as part of writing the data through the s3 gateway
10:37 AM
This is from my env that uses the lakefsfs 0.1.6
10:45 AM
I'm using
mode(SaveMode.Overwrite)
Ariel Shaqed (Scolnicov)

Ariel Shaqed (Scolnicov)

09/22/2022, 10:49 AM
Neat! Now I'll look for non-bulk deletes near those times. Same org?
Barak Amar

Barak Amar

09/22/2022, 10:50 AM
yes