Hi, does lakefs S3 interface provide some sort of ...
# help
Hi, does lakefs S3 interface provide some sort of checksum for each object?
Hi @HT, Yes, lakeFS provides ETags, which are for most purposes checksums of the object. ETags are part of the S3 API.
You can read a lot more about ETags on AWS S3 over here.
So lakefs follow that specification, which from what I understand, the presence and the content of the etag will depends on how the file were uploaded? Like a big file that trigger multi part upload will not have the checksum in the etag ?
Yes. Unfortunately multipart pretty much means you cannot checksum.
I wonder how rclone do to not reupload same file when using the flag checksum...
I figured it out a while ago. Basically they put another header on the object, and then s3 (both AWS and lakeFS) gives it back on a HEAD request. Still means rclone had to scan a huge file before copying it.
Oh, so lakefs will store that header that i can retrieve later on using something like fsspec
Re scanning huge file, I believe not much can be done ... Either you use checksum or modification time, which can be quite unreliable, depending on the use case ...
Yup. The fact that it's understandable doesn't make it fun.
But: sometimes etag is good enough!
Thanks for the help 😊