https://lakefs.io/ logo
#help
Title
# help
q

Quentin Nambot

12/30/2022, 3:23 PM
help lakefs Hi, Does anyone have issues with Databricks since today? I cannot write dataframes on lakefs using databricks today but yesterday it works fine and I did not change anything.. I am using latest runtime (12.0) and a spark config:
Copy code
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint <http://s3.eu-west-1.amazonaws.com|s3.eu-west-1.amazonaws.com>
spark.hadoop.fs.lakefs.secret.key ...
spark.hadoop.fs.lakefs.access.key ...
spark.hadoop.fs.lakefs.endpoint http://...:8000/api/v1
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
and
io.lakefs:hadoop-lakefs-assembly:0.1.9
installed on the cluster And when I try to write any date with spark I have the following error:
Copy code
java.io.IOException: get object metadata using underlying wrapped s3 client
In the stacktrace I also see:
Copy code
Caused by: java.lang.NoSuchMethodException: com.databricks.sql.acl.fs.CredentialScopeFileSystem.getWrappedFs()
I am wondering if Databricks changed something ๐Ÿค” (Note that using spark locally or python client on databricks I can upload objects so it seems really related to spark on databricks)
t

Tal Sofer

12/30/2022, 3:31 PM
Hi @Quentin Nambot! Let me try to reproduce and get back to you ๐Ÿ™‚ it sounds like databricks potentially changed something. In the meantime, do you mind pasting the command you are running?
q

Quentin Nambot

12/30/2022, 3:33 PM
I tried a lot a different things, but basically writing a simple dataframe in parquet/csv fails, like:
Copy code
import random
df = spark.createDataFrame(data=[(f"uid{i}", random.randint(1, 100)) for i in range(10)], schema=["user", "age"])
df.write.parquet("<lakefs://datamarts/main/users/>")
t

Tal Sofer

12/30/2022, 3:33 PM
Thanks ๐Ÿ™‚
q

Quentin Nambot

12/30/2022, 3:34 PM
I am trying to see if using
s3a
gateway fails too
๐Ÿ‘ 1
And it seems to work with S3 gateway ๐Ÿค”
t

Tal Sofer

12/30/2022, 3:35 PM
Will try to run myself and let you know what I find! ๐Ÿ™‚
Hi @Quentin Nambot! I managed to use the lakeFS filesystem to write with DBR 12.0 without running into any error. The only difference my setup has is that iโ€™m using the default fs.s3a.endpoint because my data is on us-east-1, and I see that you are using s3.eu-west-1.amazonaws.com. I will open an issue for it and try to reproduce in this setup ๐Ÿ™‚ Do you mind sharing the full stacktrace?
Opened this issue to track the problem https://github.com/treeverse/lakeFS/issues/4923
q

Quentin Nambot

01/02/2023, 9:36 AM
Thank you! I will try with a US bucket, and I'll send you full stacktrace very soon
๐Ÿ™ 1
Here is the full stacktrace:
t

Tal Sofer

01/02/2023, 10:03 AM
Thank you!
q

Quentin Nambot

01/02/2023, 10:22 AM
I tried using a US bucket, without overriding
fs.s3a.endpoint
, and I sill have the same issue ๐Ÿค” (I can read, but I cannot write)
t

Tal Sofer

01/02/2023, 10:22 AM
Thanks for the update! we are looking into it
q

Quentin Nambot

01/02/2023, 10:24 AM
(Note that I attach a policy
s3:*
on my databricks cluster to be sure that it is not related to IAM rights..)
๐Ÿ˜ฎ I found something interesting: With a No Isolation shared cluster it works, but it doesn't work with a Signel User cluster (Same spark config, same notebook)
t

Tal Sofer

01/02/2023, 12:35 PM
Thanks for sharing your findings, that helps us to move forward ๐Ÿ™‚ On your end, are you ok with using no isolation shared cluster access mode for now?
q

Quentin Nambot

01/02/2023, 12:36 PM
Yes totally ๐Ÿ‘Œ
sunglasses lakefs 1
๐Ÿค— 1
a

Ariel Shaqed (Scolnicov)

01/02/2023, 3:32 PM
@Quentin Nambot, just to make sure: do you also have
Copy code
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
in your configuration? (This is the end of my longer comment on the issue.)
q

Quentin Nambot

01/02/2023, 3:34 PM
Yes. (Sorry I forgot the send the message on the issue) My complete spark configuration is:
Copy code
spark.hadoop.fs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3a.endpoint <http://s3.eu-west-1.amazonaws.com|s3.eu-west-1.amazonaws.com>
spark.hadoop.fs.lakefs.secret.key ...
spark.hadoop.fs.lakefs.access.key ...
spark.hadoop.fs.lakefs.endpoint http://...:8000/api/v1
spark.hadoop.fs.lakefs.impl io.lakefs.LakeFSFileSystem
And my only lib installed:
io.lakefs:hadoop-lakefs-assembly:0.1.9
a

Ariel Shaqed (Scolnicov)

01/02/2023, 3:38 PM
Strange. I'll try to figure out from where that
com.databricks.sql.acl.fs.CredentialScopeFileSystem
drops out. Did you run any Spark-SQL code as part of your notebook or job? Context for this strangeness: lakeFSFS needs to get the AWS S3 client used by the S3A filesystem in order to be able to call getObjectMetadata on the S3 object that S3A generates. So it tries a variety of methods in method io.lakefs.MetadataClient.getObjectMetadata. The second one is to call a nonpublic method S3AFileSystem.getAmazonS3Client.... but I am beginning to suspect that somehow in your case there's actually a different filesystem there!
I think I know how I will steal the AWS S3 client even through this one, but until I manage to reproduce the failure... it will be hard to test the fix.
Ah! How do you authenticate to S3? That's different!
q

Quentin Nambot

01/02/2023, 3:47 PM
No nothing, my entire notebook is:
a

Ariel Shaqed (Scolnicov)

01/02/2023, 3:48 PM
That is strange: lakeFSFS writes directly to S3 through the S3AFileSystem. And that needs to authenticate to S3 AFAICT.
q

Quentin Nambot

01/02/2023, 3:48 PM
๐Ÿค” A Single User cluster can access to Databrikcs Unity Catalog, so it is possible that the library is different there
So it is possible that there is a different filesystem
a

Ariel Shaqed (Scolnicov)

01/02/2023, 3:49 PM
Oooh, @Tal Sofer maybe we don't support DataBricks Unity? Sounds like the "un" in "fun". ๐Ÿ˜•
๐Ÿ˜„ 1
t

Tal Sofer

01/02/2023, 3:57 PM
Interesting! we donโ€™t yet support databricks unity. but @Quentin Nambot maybe you are using an instance profile to authenticate to s3?
q

Quentin Nambot

01/02/2023, 3:59 PM
Ok so that's why! Yes I am using an instance profile on my cluster
a

Ariel Shaqed (Scolnicov)

01/03/2023, 7:57 AM
Still not sure which of the two it is. A quick search for the filesystem in the error yields this SO answer (your bug report is the only other hit!), which also says "Unity Catalog". Right now I have a plan but I would like to reproduce first. Sorry, it will take me a while because of some irrelevant stuff. If I cannot reproduce by Wednesday I might ask to trouble you with testing solutions. I would rather not do that because it may take multiple attempts: the solution involves Java reflection, so type-checking for silly bugs that I make will only occur when it runs.
q

Quentin Nambot

01/03/2023, 8:52 AM
It seems like
Single User
cluster have different libs underneath to use Unity Catalog.. it doesn't surprise me, it's not the first time that I have some magical issues like this with Databricks
a

Ariel Shaqed (Scolnicov)

01/04/2023, 12:40 PM
Thanks for your offer! Right now we suspect it is related to using Unity Catalog, possibly in relation to using a Single User cluster. I've updated the issue accordingly with short-term and medium-term proposals to fix it. We hope to have access to Unity Catalog soon, which will allow us to continue work. I prefer to do it directly due to the number of dependencies involved. A principal issue is that all relevant code uses reflection to access JVM code for which we do not have a defined interface and that uses classes that are not available at compile time. So there is no type-checking and trivial errors are likely. Having our own cluster will shorten the loop and save time and be less annoying to everyone.
๐Ÿ‘ 1
27 Views