Thread
#dev
    Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    6 days ago
    @Barak Amar let's continue our lakeFSFS recursive deletes discussion here rather than on DMs.
    Barak Amar

    Barak Amar

    6 days ago
    count the number of objects we delete or the trace of each delete as part of api delete-objects?
    Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    6 days ago
    Some info about what objects were deleted by a deleteObjects call.
    Barak Amar

    Barak Amar

    6 days ago
    checking
    will build an image with additional logging for the above
    Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    6 days ago
    sg.
    I'm trying to understand the flow of an overwriting Spark write that uses FileOutputCommitter.
    Specifically: how does it delete the old data?
    BTW, CSV overwrite takes about as long as Parquet overwrite. So probably nothing Parquet-specific (it is sometimes a bit different, even wraps OutputCommitters)
    Barak Amar

    Barak Amar

    6 days ago
    2022-09-22 13:05:48	
    {"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":5,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:47Z"}Show context
    2022-09-22 13:05:44	
    {"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:44Z"}
    2022-09-22 13:05:41	
    {"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:41Z"}
    2022-09-22 13:05:38	
    {"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:38Z"}
    2022-09-22 13:05:36	
    {"branch_id":"test-overwrite2","file":"build/pkg/graveler/graveler.go:1477","func":"pkg/graveler.(*KVGraveler).DeleteBatch","keys_len":1000,"level":"debug","msg":"DeleteBatch","repository":"test-1","service_name":"graveler_graveler","time":"2022-09-22T10:05:36Z"}
    The above is the number of calls to DeleteBatch as part of writing the data through the s3 gateway
    This is from my env that uses the lakefsfs 0.1.6
    I'm using
    mode(SaveMode.Overwrite)
    Ariel Shaqed (Scolnicov)

    Ariel Shaqed (Scolnicov)

    6 days ago
    Neat! Now I'll look for non-bulk deletes near those times. Same org?
    Barak Amar

    Barak Amar

    6 days ago
    yes