Hi I`m trying to make the LakeFS work in databricks but enco lakeFS #help

Hi, I`m trying to make the LakeFS work in databric...

Zdenek Hruby

01/16/2023, 4:15 PM

Hi, I`m trying to make the LakeFS work in databricks but encounter the error below:

Copy code

Py4JJavaError: An error occurred while calling o407.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found

I used the hadoop fs settings as mentioned in docus `

Copy code

spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem

Please 🙏, any idea how to deal with that?

Eden Ohana

01/16/2023, 4:25 PM

Hi Zdenek, Can you share your configuration

Zdenek Hruby

01/16/2023, 4:27 PM

Copy code

sc._jsc.hadoopConfiguration().set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
sc._jsc.hadoopConfiguration().set("fs.lakefs.endpoint", "<http://XX.XXX.XXX.XXX:8000>")

sc._jsc.hadoopConfiguration().set("fs.lakefs.access.key", "XXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.lakefs.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")


sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXX")

Zdenek Hruby

01/16/2023, 4:28 PM

Hi Eden, here is it. The same result was with setting in cluster directly.

Adi Polak

01/16/2023, 4:34 PM

Hi Zdenek, did you install the lakeFS file system library on the cluster? This is a jar you download from the maven repository.

👍 1

Zdenek Hruby

01/16/2023, 5:34 PM

Hi Adi, yes, I did.

Zdenek Hruby

01/16/2023, 5:35 PM

but I had to upload the jar file and then install it. It was not possible per installer in UI 🤷‍♂️

Adi Polak

01/16/2023, 6:19 PM

does your endpoint looks like this:

Copy code

spark.hadoop.fs.lakefs.endpoint=<https://lakefs.example.com/api/v1>

^ could it be that you forgot the

/api/v1

Adi Polak

01/16/2023, 6:26 PM

this is a very good tutorial, it configures the cluster from the cluster UI, yet might be very useful to you. I would try removing the port(8000) from the endpoint as i don't recall it was necessary (i might be wrong here)

Zdenek Hruby

01/16/2023, 6:41 PM

Cluster should work because lakefs_client works fine. I fixed the endpoint as you told me and now getting a different kind of error:

Copy code

Py4JJavaError: An error occurred while calling o408.parquet.
: java.lang.RuntimeException: unsupported URI scheme https, lakeFS FileSystem currently supports translating s3 => s3a only

Adi Polak

01/16/2023, 6:55 PM

yay! progress 💪 . what does the configuration look like now?

😁 1

Zdenek Hruby

01/16/2023, 7:05 PM

Copy code

<http://spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net|spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net> {{secrets/lakefs/lakefs-storage-sk}}
spark.hadoop.fs.lakefs.secret.key XXXXX
spark.hadoop.fs.lakefs.access.key XXXXX
spark.hadoop.fs.lakefs.endpoint <http://10.162.160.196:8000/api/v1>
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
spark.databricks.delta.preview.enabled true

this is from cluster configuration. Account key works for the client and key and secret as well 🤷‍♂️

Adi Polak

01/16/2023, 7:07 PM

Copy code

http -> https

Adi Polak

01/16/2023, 7:07 PM

also, please remove the credentials 🙂 - (i mean from the slack )

Zdenek Hruby

01/16/2023, 7:11 PM

creadentials are fake, just the format is same 🙂. But for sure I remove it

Adi Polak

01/16/2023, 7:11 PM

oh so no worries!

Adi Polak

01/16/2023, 7:11 PM

i wonder if the https will fix it

Zdenek Hruby

01/16/2023, 7:41 PM

no it doesn`t 😢. And the http get:

Copy code

{
  "message": "invalid API endpoint"
}

Eden Ohana

01/16/2023, 7:46 PM

What command are you running when getting the error?

Zdenek Hruby

01/16/2023, 7:47 PM

Copy code

spark.read.parquet("<lakefs://dbxdata/main/gendata.parquet>")

Eden Ohana

01/16/2023, 7:55 PM

try to add to the configuration

Copy code

spark.hadoop.fs.s3a.access.key
spark.hadoop.fs.s3a.secret.key

Eden Ohana

01/16/2023, 8:04 PM

the key and access key to your bucket

Zdenek Hruby

01/16/2023, 8:14 PM

do I need this if I set spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net ?

Oz Katz

01/16/2023, 8:35 PM

@Zdenek Hruby hi! is this lakeFS installation using Azure Blob as the underlying object store? currently the lakeFS HadoopFilesystem integration is only supported for AWS S3 based installations.. If you are on Azure, you can use the s3 gateway based integration (yes, on azure!)

Oz Katz

01/16/2023, 8:36 PM

btw, Azure aupport for the native HadoopFilesystem is on the roadmap

Zdenek Hruby

01/16/2023, 8:44 PM

Hi Oz. Thanks for the explanation 👍

🙏 1

30 Views

Open in Slack

Previous Next