Quentin Nambot
12/30/2022, 3:23 PMspark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint <http://s3.eu-west-1.amazonaws.com|s3.eu-west-1.amazonaws.com>
spark.hadoop.fs.lakefs.secret.key ...
spark.hadoop.fs.lakefs.access.key ...
spark.hadoop.fs.lakefs.endpoint http://...:8000/api/v1
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
and io.lakefs:hadoop-lakefs-assembly:0.1.9
installed on the cluster
And when I try to write any date with spark I have the following error:
java.io.IOException: get object metadata using underlying wrapped s3 client
In the stacktrace I also see:
Caused by: java.lang.NoSuchMethodException: com.databricks.sql.acl.fs.CredentialScopeFileSystem.getWrappedFs()
I am wondering if Databricks changed something ๐ค
(Note that using spark locally or python client on databricks I can upload objects so it seems really related to spark on databricks)Tal Sofer
12/30/2022, 3:31 PMQuentin Nambot
12/30/2022, 3:33 PMimport random
df = spark.createDataFrame(data=[(f"uid{i}", random.randint(1, 100)) for i in range(10)], schema=["user", "age"])
df.write.parquet("<lakefs://datamarts/main/users/>")
Tal Sofer
12/30/2022, 3:33 PMQuentin Nambot
12/30/2022, 3:34 PMs3a
gateway fails tooQuentin Nambot
12/30/2022, 3:34 PMTal Sofer
12/30/2022, 3:35 PMTal Sofer
12/30/2022, 4:45 PMTal Sofer
01/02/2023, 9:17 AMQuentin Nambot
01/02/2023, 9:36 AMQuentin Nambot
01/02/2023, 9:53 AMQuentin Nambot
01/02/2023, 9:53 AMTal Sofer
01/02/2023, 10:03 AMQuentin Nambot
01/02/2023, 10:22 AMfs.s3a.endpoint
, and I sill have the same issue ๐ค (I can read, but I cannot write)Tal Sofer
01/02/2023, 10:22 AMQuentin Nambot
01/02/2023, 10:24 AMs3:*
on my databricks cluster to be sure that it is not related to IAM rights..)Quentin Nambot
01/02/2023, 10:53 AMTal Sofer
01/02/2023, 12:35 PMQuentin Nambot
01/02/2023, 12:36 PMAriel Shaqed (Scolnicov)
01/02/2023, 3:32 PMspark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
in your configuration?
(This is the end of my longer comment on the issue.)Quentin Nambot
01/02/2023, 3:34 PMspark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint <http://s3.eu-west-1.amazonaws.com|s3.eu-west-1.amazonaws.com>
spark.hadoop.fs.lakefs.secret.key ...
spark.hadoop.fs.lakefs.access.key ...
spark.hadoop.fs.lakefs.endpoint http://...:8000/api/v1
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
And my only lib installed: io.lakefs:hadoop-lakefs-assembly:0.1.9
Ariel Shaqed (Scolnicov)
01/02/2023, 3:38 PMcom.databricks.sql.acl.fs.CredentialScopeFileSystem
drops out. Did you run any Spark-SQL code as part of your notebook or job?
Context for this strangeness: lakeFSFS needs to get the AWS S3 client used by the S3A filesystem in order to be able to call getObjectMetadata on the S3 object that S3A generates. So it tries a variety of methods in method io.lakefs.MetadataClient.getObjectMetadata. The second one is to call a nonpublic method S3AFileSystem.getAmazonS3Client.... but I am beginning to suspect that somehow in your case there's actually a different filesystem there!Ariel Shaqed (Scolnicov)
01/02/2023, 3:44 PMAriel Shaqed (Scolnicov)
01/02/2023, 3:46 PMQuentin Nambot
01/02/2023, 3:47 PM# Databricks notebook source
# To test if i can connect to lakefs (it works)
df = spark.read.parquet("<lakefs://issue-us/main/users>")
df.show()
# COMMAND ----------
import pyspark.sql.functions as f
df.withColumn("gender", f.lit("u")).write.mode("overwrite").parquet("<lakefs://issue-us/main/users>")
Ariel Shaqed (Scolnicov)
01/02/2023, 3:48 PMQuentin Nambot
01/02/2023, 3:48 PMQuentin Nambot
01/02/2023, 3:48 PMAriel Shaqed (Scolnicov)
01/02/2023, 3:49 PMTal Sofer
01/02/2023, 3:57 PMQuentin Nambot
01/02/2023, 3:59 PMAriel Shaqed (Scolnicov)
01/03/2023, 7:57 AMQuentin Nambot
01/03/2023, 8:52 AMSingle User
cluster have different libs underneath to use Unity Catalog.. it doesn't surprise me, it's not the first time that I have some magical issues like this with Databricks
Do not hesitate to ask me to test different things!Ariel Shaqed (Scolnicov)
01/04/2023, 12:40 PM