Clinton Monk
03/30/2022, 9:06 PMlakefs://
URI:
Caused by: java.lang.NoSuchMethodException: com.databricks.s3a.S3AFileSystem.getWrappedFs()
The command fails when running in Databricks Runtimes 7.3 LTS and 9.1 LTS. However, the command succeeds in Databricks Runtime 6.4.
Reading a lakefs://
URI works in all three of those Databricks Runtimes.
Is this an issue anyone else has experienced? Does anyone have guidance on how I could get this to work for Databricks Runtime 9.1 LTS (Spark 3.1)?Clinton Monk
03/30/2022, 9:07 PMClinton Monk
03/30/2022, 9:08 PM<s3://treeverse-clients-us-east/hadoop/hadoop-lakefs-assembly-0.1.6.jar>
Edmondo Porcu
03/30/2022, 9:14 PMEdmondo Porcu
03/30/2022, 9:19 PMClass.forName("com.databricks.s3a.S3AFileSystem").getClassLoader().
Barak Amar
Edmondo Porcu
03/30/2022, 9:19 PMEdmondo Porcu
03/30/2022, 9:20 PMval s3AFileSystemClass = Class.forName("com.databricks.s3a.S3AFileSystem")
s3AFileSystemClass.getProtectionDomain().getCodeSource().getLocation()
Edmondo Porcu
03/30/2022, 9:20 PMres1: java.net.URL = file:/databricks/jars/s3--s3-spark_3.2_2.12_deploy.jar
Barak Amar
Edmondo Porcu
03/30/2022, 9:23 PMCaused by: java.lang.NoSuchMethodException: com.databricks.s3a.S3AFileSystem.getWrappedFs()
at java.lang.Class.getDeclaredMethod(Class.java:2130)
at io.lakefs.MetadataClient.getObjectMetadata(MetadataClient.java:72)
Barak Amar
Barak Amar
Edmondo Porcu
03/30/2022, 9:25 PMBarak Amar
Clinton Monk
03/30/2022, 9:26 PMorg.apache.hadoop.fs.s3native.NativeS3FileSystem and org.apache.hadoop.fs.s3.S3FileSystem are no longer supported for accessing S3.
We strongly encourage you to use com.databricks.s3a.S3AFileSystem, which is the default for s3a://, s3://, and s3n:// file system schemes in Databricks Runtime. If you need assistance with migration to com.databricks.s3a.S3AFileSystem, contact Databricks support or your Databricks representative.
https://docs.databricks.com/release-notes/runtime/7.x-migration.html
Edmondo Porcu
03/30/2022, 9:27 PMBarak Amar
Clinton Monk
03/30/2022, 9:27 PMBarak Amar
Edmondo Porcu
03/30/2022, 9:30 PMBarak Amar
Barak Amar
Edmondo Porcu
03/30/2022, 9:32 PMBarak Amar
Ryan Green
03/30/2022, 11:56 PMBarak Amar
Yoni Augarten
03/31/2022, 10:56 AMClinton Monk
03/31/2022, 12:50 PMorg.apache.hadoop.fs.s3.S3FileSystem
rather than the LakeFS-incompatible com.databricks.s3a.S3AFileSystem
. I had to install hadoop-aws
to add it back. However, I ran into a class incompatibility issue: The Hadoop S3FileSystem
uses the public jets3t
library, but Databricks has their own jets3t
library that is installed in the cluster. Classes from one didn't seem to be compatible with the others (or perhaps I had a library version mismatch somewhere). I didn't go much further, but it feels like the solution is probably to update MetadataClient.getObjectMetadata()
to be compatible with the new com.databricks.s3a.S3AFileSystem
. I'll leave that decision up to you all though!Yoni Augarten
04/01/2022, 8:56 AM<http://shaded.databricks.org|shaded.databricks.org>.apache.hadoop.fs.s3a.S3AFileSystem
was used as the file system for s3a. This is consistent across different Databricks Runtime versions. Only when I manually set the file system to com.databricks.s3a.S3AFileSystem
, I'm able to reproduce the problem.
2. We will update the code to support the Databricks proprietary file system.
3. I will contact Databricks support to understand why we are getting different file systems.Clinton Monk
04/01/2022, 1:53 PMfs.s3a.impl shaded.databricks.org.apache.hadoop.fs.s3a.S3AFileSystem
Thanks @Yoni Augarten and @Barak Amar! 🙌 🍾