So S3 requires a locking cliënt for concurrent wri...
# help
i
So S3 requires a locking cliënt for concurrent writers to write to the same delta table but I'm wondering whether that is also required for the S3 emulation in LakeFS if the actual storage backend is Azure adls? Are you able to shed some light on this? :)
o
Hey @Ion! 👋 Welcome 🙂 Sure - let me try and explain. Short answer is yes - you'd still need a locking client for concurrent writers, even if the backing object store is ADLS. With non-s3 object stores (for example, ADLS) Delta's log store implementation would use a conditional header such as
If-None-Match
(see conditional requests) to make sure new log entries don't accidentally overwrite another writer's log entries. Since the S3 gateway in lakeFS implements the S3 protocol, and S3 doesn't support conditional writes - even if we were to add the required headers to support this, no S3 client would know how to use them, regardless of the underlying storage used by lakeFS. This is yet another reason why we prefer native clients when possible.
Let me know if that makes sense - happy to elaborate!
❤️ 1
i
That makes sense! Thanks for explaining:) Unfortunate to hear though, because that would make it difficult to start using this for us.
Regarding a native integration, do you reckon your team could work on LakeFS-client crate? Which implements the trait methods of the ObjectStore?
Or we can do some joined effort here, because once such thing is available I can integrate it directly in delta-RS and make it a first class citizen of the library
sunglasses lakefs 2
o
sure! there’s some Rust experience on the team but not a huge amount - collaborating on this sounds like a great idea!
💯 1
i
Btw, @Oz Katz one question that still popped up on the Hadoop file system integration. When you use presigned mode it suggests it will do write operations directly to the storage. In this case it mentions it is supported on Azure, then I can assume in this mode since it will write directly to ADLS I don't need to bother with a locking cliënt. Did I understand this correctly?
o
Well not quite 🙂 There’s currently no logstore implementation for the Hadoop file system. what we typically recommend is to use branching and merging to ensure safe writes: if every write is a sequence of creating an isolated branch, writing to the table and then committing and merging back into the original branch - any attempt to overwrite an existing log entry will (rightfully) result in a conflict. so yes - it works without a locking client, but at the moment- requires you to branch/write/merge yourself. a logstore implementation could automatically do that for you transparently in the future
💯 1
i
Ah I see that seems doable to do, each job execution just needs a random branch run
Any reason why it doesn't use Adls capability here to do file locks?
o
lakefs provides indirection between a logical path and the actual stored object.
0001.json
in branch A needs to point to a different object than
0001.json
on branch B (for isolation). lakeFS makes sure that they are always unique, so even if adls supports a conditional put on an object, that condition won’t help since whether a path exists is knowledge that lakeFS has but adls doesn’t
hope thats clear enough?
i
That helps! Thanks @Oz Katz