Tal Sofer

10/20/2022, 1:09 PM
lakeFS Spark client v0.5.1 released! 🐞 Bug fixes • Make GC backup and restore support expired addresses list including object not in the underlying object store (#4367) • Don’t package with hadoop-aws. This removes many dependency failures and simplifies configuration. But it also means that for plain Spark distributions such as that provided when downloading from the Apache Spark homepage you will need to add
--packages org.apache.hadoop:hadoop-aws:2.7.7
--packages org.apache.hadoop:hadoop-aws:3.2.1
or similar, to add in this package. (#4399)
👍 3
:jumping-lakefs: 3

Oz Katz

10/20/2022, 2:32 PM
So I could use it with Hadoop 3 on Spark??

Ariel Shaqed (Scolnicov)

10/20/2022, 3:34 PM
This already worked, but you needed to be very lucky with the exact Hadoop versions on your cluster. Now it should work as long as you provide a hadoop-aws which is compatible with your Spark. • For EMR and DataBricks, and probably many other packaged solutions, you already get a working hadoop-aws and don't even need to do anything. • With a clean Spark you will have to provide your own hadoop-aws. Just pick the version matching whatever hadoop-common package is included in your Spark (or you provided).
👏 1

Oz Katz

10/20/2022, 3:48 PM