Any information about LakeFS processing with S3 wo...
# help
v
Any information about LakeFS processing with S3 would be useful. Sharing this problem investigation and discussion for the context: the problem is in the renaming (for S3 filesystem it's copy and delete) after switching to the so called Hadoop ”S3A Committer" - magic committer. My current opinion that delete may happen before copy due to S3 eventual consistency without special S3Guard configuring - S3Guard setup is not required anymore according to the current state S3 about strong consistency (e.g. this article). Meanwhile the Hadoop doc describes the next for the magic committer: However, it has extra requirements of the filesystem 1. The object store must be consistent. 2. The S3A client must be configured to recognize interactions with the magic directories and treat them as a special case. So, my main questions and any help about LakeFS processing is related to these requirements 1. optionally - is LakeFS object store consistent? 2.
Is there any configuring related to the magic committer and LakeFS as a S3A client?
i
Hi Vasyl, lakeFS is not an object storage, it's a "versioning layer" on top of the object storage. Commits in lakeFS are atomic and consistent but there's no consistency promise for uploading objects before committing. About configuring the magic committer to work with LakeFS- how do you plan to make hadoop work with lakeFS? Do you plan on using the lakeFS s3 gateway? https://docs.aws.amazon.com/filegateway/latest/files3/create-gateway-file.html
v
Hello @Idan Novogroder Thank you for reply If I understood correctly your question: LakeFS is installed in the own K8S server with accessing by AWS spark cluster by endpoint with access/secret key
a
@Idan Novogroder, afaik we provide a read-after-write and list-after-write consistency guarantee.
i
@Ariel Shaqed (Scolnicov) Yes, that's true. I meant that writing objects is done asynchronously when using
lakectl fs upload
for example.
a
Exactly. But this is exactly like S3: during object upload you may or may not see the object. After the upload returns successfully you will read the new object. And at no point will you see a partial object.
👍🏽 1