Ariel Shaqed (Scolnicov)

04/27/2022, 7:00 AM
Good morning Daniel, Lior, We are reviewing this top-level design for the proposed LakeFSOutputCommitter. If you can read it I'd love to hear your comments (before or after the merge, of course). So @Tal Sofer and I would like to meet you next week to go over our next steps. When would be convenient for you?
Hi @Daniel Satubi, @Lior Resisi -- just noticed that I forgot to tag you 🤦🏼 . Sorry.

Lior Resisi

04/28/2022, 3:35 PM
Looks great, thanks! We’ll go over it next week and ping you with a date. Thanks!

Ariel Shaqed (Scolnicov)

04/28/2022, 3:42 PM
Have a great weekend, and see you soon.

Daniel Satubi

05/18/2022, 4:20 PM
Hi, looks great! sorry for the late reply. so currently the recommended route is using the staging committer for our case. for both cases (future LakeFSCommitter & hadoop staging committer) we would need to use hadoop >=3.1, correct?

Ariel Shaqed (Scolnicov)

05/19/2022, 7:14 AM
Thanks! Sorry for a long reply: I did not understand whether we are referring to the "staging" or the "magic" committers; I do not recommend using the staging committer if you can avoid it; and on a personal level I can no longer recommend using Hadoop 2. I am not sure about the "_*staging*_ committer"; I'm looking at these docs, and it seems like that requires an additional HDFS setup; if you do not have HDFS for temporary storage then my reading of the docs is that it will not work for you. But even if you did, your Redis locking should remove the need to use it according to the "Which Committer to Use?" section. The staging committer also adds a full data copy (from HDFS to the end filesystem), and I do not think new systems should use it. Regarding the _*magic*_ committer: AFAIU it requires support in the filesystem to work. So while it will work with S3A and the lakeFS S3 gateway, I do not think that it will work out of the box with lakeFSFS. I am unable to find much documentation for using Hadoop 2 with these committers, and I do not know when either became recommended for use. But my impression from reading code in Hadoop 2 for hadoop-common and for hadoop-aws is that it is very complex and has many rough edges. So personally, I will not recommend any new Hadoop 2 installations.