https://lakefs.io/ logo
#dev
Title
# dev
a

Ariel Shaqed (Scolnicov)

09/22/2022, 9:15 AM
@Barak Amar let's continue our lakeFSFS recursive deletes discussion here rather than on DMs.
b

Barak Amar

09/22/2022, 9:18 AM
count the number of objects we delete or the trace of each delete as part of api delete-objects?
a

Ariel Shaqed (Scolnicov)

09/22/2022, 9:19 AM
Some info about what objects were deleted by a deleteObjects call.
b

Barak Amar

09/22/2022, 9:19 AM
checking
will build an image with additional logging for the above
a

Ariel Shaqed (Scolnicov)

09/22/2022, 9:59 AM
sg.
I'm trying to understand the flow of an overwriting Spark write that uses FileOutputCommitter.
Specifically: how does it delete the old data?
BTW, CSV overwrite takes about as long as Parquet overwrite. So probably nothing Parquet-specific (it is sometimes a bit different, even wraps OutputCommitters)
b

Barak Amar

09/22/2022, 10:36 AM
Copy code
2022-09-22 13:05:48	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":5,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:47Z"}Show context
2022-09-22 13:05:44	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:44Z"}
2022-09-22 13:05:41	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:41Z"}
2022-09-22 13:05:38	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:38Z"}
2022-09-22 13:05:36	
{"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:36Z"}
The above is the number of calls to DeleteBatch as part of writing the data through the s3 gateway
This is from my env that uses the lakefsfs 0.1.6
I'm using
mode(SaveMode.Overwrite)
a

Ariel Shaqed (Scolnicov)

09/22/2022, 10:49 AM
Neat! Now I'll look for non-bulk deletes near those times. Same org?
b

Barak Amar

09/22/2022, 10:50 AM
yes
6 Views