Question about the <https github com treeverse lakeFS blob m lakeFS #dev

Question about the <Spark OutputCommitter proposal...

Oz Katz

05/12/2022, 3:47 PM

Question about the Spark OutputCommitter proposal 🙂 I acknowledge that this might be a dumb question - but does it really have to depend on the lakeFS HadoopFilesystem ("lakeFSFS")? In theory, Spark could write directly to the object store, and have the OutputCommitter stage and commit all written files. This could be a terrible idea, but perhaps has a smaller surface area...?

Oz Katz

05/12/2022, 3:48 PM

cc @Ariel Shaqed (Scolnicov)

Ariel Shaqed (Scolnicov)

05/12/2022, 4:30 PM

Nice one! I think I tried something similar but couldn't think it through to success. Thing is, you're also allowed to read from the output committer temporaries - we'd need a filesystem just for that. And to (job) commit - we'd need a filesystem just to be able to list the files. These can be written by other shards, so local knowledge is not enough. There are also retries and cleanups to take into account. I reckon you could have done it, but not with Hadoop OutputCommitter semantics. And you'd still need to go through a separate branch to get atomicity.

Ariel Shaqed (Scolnicov)

05/12/2022, 4:32 PM

So... We could e.g. use an S3a connected to the S3 gateway, and tell it to write to a temporary branch, and that would work perfectly. But performance (and configuration).

Ariel Shaqed (Scolnicov)

05/12/2022, 4:33 PM

And finally, the first users want a committer for scale. I reckon they'll want lakeFSFS anyway to read at scale. I will ask :-)

Oz Katz

05/12/2022, 4:37 PM

These are all very valid reasons not to decouple the two 🙂 Thanks for the elaborate answer!

4 Views

Open in Slack

Previous Next