another one if anyone is able to help we’re trying...
# help
r
another one if anyone is able to help we’re trying to use export and hitting:
Copy code
exporter.exportAllFromBranch("master")
Copy code
IllegalAccessError: tried to access field org.rocksdb.RocksObject.nativeHandle_ from class org.rocksdb.SstFileReader
	
at org.rocksdb.SstFileReader.<init>(SstFileReader.java:14)
	at io.treeverse.clients.SSTableReader.<init>(SSTableReader.scala:90)
	at io.treeverse.clients.SSTableReader$.forMetaRange(SSTableReader.scala:72)
	at io.treeverse.clients.LakeFSInputFormat.getSplits(LakeFSInputFormat.scala:110)
	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:137)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:283)
	at scala.Option.getOrElse(Option.scala:189)
b
which spark version are you using?
also any environment information, if you are using a cloud solution? which one
r
databricks, spark 3.0.1
b
Richard there is a bug https://github.com/treeverse/lakeFS/issues/1847 we are working on to resolve this issue on databricks
This is something we can probably solve in the follow week or two - will verify tomorrow morning with the team.
👍 1
a
I'm working on this bug. It's unfortunately non-trivial: DataBricks comes preloaded with a rocksdbjni version from 2 years ago, that does not contain the functionality needed. And it's a JNI library, so not possible to shade in the assembly. @Barak Amar is right and I'm all on it. Lots more details on the issue (or just wait).
a
We can reach out to our contacts at Databricks to see if there is a specific reason they need to use an older version of rocksdbjni, if you like?
b
Thanks @Anders Cassidy there is more information in https://github.com/treeverse/lakeFS/issues/1847. Even if they will switch to a newer version, another issue can come up when one of us will want to change the version. Not having control of the specific version can raise a number of challenges in the future.
👍 1
a
Thanks, @Anders Cassidy! Please do. While as @Barak Amar explains our analysis is that it will not resolve the support issue for a customers of the metadata client on DataBricks, we would still get some benefits, including: 1. Support for you and a possible route for others; 2. Partial support for all customers: it seems sufficient in most cases for conflicting versions of rocksdbjni to have some api compatibility in order to work. So a Spark cluster with a more modern version might end up working merely by our code creating and using an SstFileReader. The issue with 6.2.x releases is that they don't expose any api to read a single sstable.
a
I've sent the request. I also asked them to take a look at the issue in case they are aware of some alternate method for overriding JNI libs inside a Databricks container. (My own thought was to create a custom Databricks docker image with the relevant jar replaced, but it's a bit cumbersome to be running a custom container on every cluster)
b
Thank you very much @Anders Cassidy
a
Cool, @Anders Cassidy! I was not aware of this. I agree we would need to consider the ops burden this can generate. I'm also not sure how I coulddo something like this and retain access to DataBricks features.
a
I believe the images need to be built on top of the base databricks runtime image so all features should be available. So pseudo-Dockerfile would just be something like
Copy code
FROM databricksruntime/standard:latest
ADD <https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.20.3/rocksdbjni-6.20.3.jar> <location to overwrite outdated jar>
but again this is just a workaround...I wouldn't really want to run custom containers
b
Will have to look into it more, but from https://docs.databricks.com/clusters/custom-containers.html and the https://github.com/databricks/containers/blob/master/ubuntu/standard/Dockerfile it looks like the custom image is used as a base to the actual image that will be used at runtime.
Copy code
1. VMs are acquired from the cloud provider.
2. The custom Docker image is downloaded from your repo.
3. Databricks creates a Docker container from the image.
4. Databricks Runtime code is copied into the Docker container.
5. The init scrips are executed. See Init script execution order.
So, if the databricks runtime is copied over the image, it will be tricky. Other option is to override it using the init script - but we will need to identify the location and if the init script got the permission to write to that location. Still just thoughts at this point.
a
I should read right to the bottom of the docs 🙂