https://lakefs.io/ logo
Title
z

Zdenek Hruby

01/16/2023, 4:15 PM
Hi, I`m trying to make the LakeFS work in databricks but encounter the error below:
Py4JJavaError: An error occurred while calling o407.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
I used the hadoop fs settings as mentioned in docus `
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
Please 🙏, any idea how to deal with that?
e

Eden Ohana

01/16/2023, 4:25 PM
Hi Zdenek, Can you share your configuration
z

Zdenek Hruby

01/16/2023, 4:27 PM
sc._jsc.hadoopConfiguration().set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
sc._jsc.hadoopConfiguration().set("fs.lakefs.endpoint", "<http://XX.XXX.XXX.XXX:8000>")

sc._jsc.hadoopConfiguration().set("fs.lakefs.access.key", "XXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.lakefs.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")


sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXX")
Hi Eden, here is it. The same result was with setting in cluster directly.
a

Adi Polak

01/16/2023, 4:34 PM
Hi Zdenek, did you install the lakeFS file system library on the cluster? This is a jar you download from the maven repository.
👍 1
z

Zdenek Hruby

01/16/2023, 5:34 PM
Hi Adi, yes, I did.
but I had to upload the jar file and then install it. It was not possible per installer in UI 🤷‍♂️
a

Adi Polak

01/16/2023, 6:19 PM
does your endpoint looks like this:
spark.hadoop.fs.lakefs.endpoint=<https://lakefs.example.com/api/v1>
^ could it be that you forgot the
/api/v1
this is a very good tutorial, it configures the cluster from the cluster UI, yet might be very useful to you. I would try removing the port(8000) from the endpoint as i don't recall it was necessary (i might be wrong here)
z

Zdenek Hruby

01/16/2023, 6:41 PM
Cluster should work because lakefs_client works fine. I fixed the endpoint as you told me and now getting a different kind of error:
Py4JJavaError: An error occurred while calling o408.parquet.
: java.lang.RuntimeException: unsupported URI scheme https, lakeFS FileSystem currently supports translating s3 => s3a only
a

Adi Polak

01/16/2023, 6:55 PM
yay! progress 💪 . what does the configuration look like now?
😁 1
z

Zdenek Hruby

01/16/2023, 7:05 PM
<http://spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net|spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net> {{secrets/lakefs/lakefs-storage-sk}}
spark.hadoop.fs.lakefs.secret.key XXXXX
spark.hadoop.fs.lakefs.access.key XXXXX
spark.hadoop.fs.lakefs.endpoint <http://10.162.160.196:8000/api/v1>
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
spark.databricks.delta.preview.enabled true
this is from cluster configuration. Account key works for the client and key and secret as well 🤷‍♂️
a

Adi Polak

01/16/2023, 7:07 PM
http -> https
also, please remove the credentials 🙂 - (i mean from the slack )
z

Zdenek Hruby

01/16/2023, 7:11 PM
creadentials are fake, just the format is same 🙂. But for sure I remove it
a

Adi Polak

01/16/2023, 7:11 PM
oh so no worries!
i wonder if the https will fix it
z

Zdenek Hruby

01/16/2023, 7:41 PM
no it doesn`t 😢. And the http get:
{
  "message": "invalid API endpoint"
}
e

Eden Ohana

01/16/2023, 7:46 PM
What command are you running when getting the error?
z

Zdenek Hruby

01/16/2023, 7:47 PM
spark.read.parquet("<lakefs://dbxdata/main/gendata.parquet>")
e

Eden Ohana

01/16/2023, 7:55 PM
try to add to the configuration
spark.hadoop.fs.s3a.access.key
spark.hadoop.fs.s3a.secret.key
the key and access key to your bucket
z

Zdenek Hruby

01/16/2023, 8:14 PM
o

Oz Katz

01/16/2023, 8:35 PM
@Zdenek Hruby hi! is this lakeFS installation using Azure Blob as the underlying object store? currently the lakeFS HadoopFilesystem integration is only supported for AWS S3 based installations.. If you are on Azure, you can use the s3 gateway based integration (yes, on azure!)
btw, Azure aupport for the native HadoopFilesystem is on the roadmap
z

Zdenek Hruby

01/16/2023, 8:44 PM
Hi Oz. Thanks for the explanation 👍
🙏 1