Hi, I`m trying to make the LakeFS work in databric...
# help
z
Hi, I`m trying to make the LakeFS work in databricks but encounter the error below:
Copy code
Py4JJavaError: An error occurred while calling o407.parquet.
: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class io.lakefs.LakeFSFileSystem not found
I used the hadoop fs settings as mentioned in docus `
Copy code
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
Please 🙏, any idea how to deal with that?
e
Hi Zdenek, Can you share your configuration
z
Copy code
sc._jsc.hadoopConfiguration().set("fs.lakefs.impl", "io.lakefs.LakeFSFileSystem")
sc._jsc.hadoopConfiguration().set("fs.lakefs.endpoint", "<http://XX.XXX.XXX.XXX:8000>")

sc._jsc.hadoopConfiguration().set("fs.lakefs.access.key", "XXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.lakefs.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")


sc._jsc.hadoopConfiguration().set("fs.s3a.access.key", "XXXXXXXXXXXXXXXXXXXXXXXXX")
sc._jsc.hadoopConfiguration().set("fs.s3a.secret.key", "XXXXXXXXXXXXXXXXXXXXXXXX")
Hi Eden, here is it. The same result was with setting in cluster directly.
a
Hi Zdenek, did you install the lakeFS file system library on the cluster? This is a jar you download from the maven repository.
👍 1
z
Hi Adi, yes, I did.
but I had to upload the jar file and then install it. It was not possible per installer in UI 🤷‍♂️
a
does your endpoint looks like this:
Copy code
spark.hadoop.fs.lakefs.endpoint=<https://lakefs.example.com/api/v1>
^ could it be that you forgot the
/api/v1
this is a very good tutorial, it configures the cluster from the cluster UI, yet might be very useful to you. I would try removing the port(8000) from the endpoint as i don't recall it was necessary (i might be wrong here)
z
Cluster should work because lakefs_client works fine. I fixed the endpoint as you told me and now getting a different kind of error:
Copy code
Py4JJavaError: An error occurred while calling o408.parquet.
: java.lang.RuntimeException: unsupported URI scheme https, lakeFS FileSystem currently supports translating s3 => s3a only
a
yay! progress 💪 . what does the configuration look like now?
😁 1
z
Copy code
<http://spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net|spark.hadoop.fs.azure.account.key.lakefstest.dfs.core.windows.net> {{secrets/lakefs/lakefs-storage-sk}}
spark.hadoop.fs.lakefs.secret.key XXXXX
spark.hadoop.fs.lakefs.access.key XXXXX
spark.hadoop.fs.lakefs.endpoint <http://10.162.160.196:8000/api/v1>
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
spark.databricks.delta.preview.enabled true
this is from cluster configuration. Account key works for the client and key and secret as well 🤷‍♂️
a
Copy code
http -> https
also, please remove the credentials 🙂 - (i mean from the slack )
z
creadentials are fake, just the format is same 🙂. But for sure I remove it
a
oh so no worries!
i wonder if the https will fix it
z
no it doesn`t 😢. And the http get:
Copy code
{
  "message": "invalid API endpoint"
}
e
What command are you running when getting the error?
z
Copy code
spark.read.parquet("<lakefs://dbxdata/main/gendata.parquet>")
e
try to add to the configuration
Copy code
spark.hadoop.fs.s3a.access.key
spark.hadoop.fs.s3a.secret.key
the key and access key to your bucket
z
o
@Zdenek Hruby hi! is this lakeFS installation using Azure Blob as the underlying object store? currently the lakeFS HadoopFilesystem integration is only supported for AWS S3 based installations.. If you are on Azure, you can use the s3 gateway based integration (yes, on azure!)
btw, Azure aupport for the native HadoopFilesystem is on the roadmap
z
Hi Oz. Thanks for the explanation 👍
🙏 1