another one if anyone is able to help we re trying to use ex lakeFS #help

another one if anyone is able to help we’re trying...

Richard Gilmore

06/30/2021, 6:21 PM

another one if anyone is able to help we’re trying to use export and hitting:

Copy code

exporter.exportAllFromBranch("master")

Copy code

IllegalAccessError: tried to access field org.rocksdb.RocksObject.nativeHandle_ from class org.rocksdb.SstFileReader
	
at org.rocksdb.SstFileReader.<init>(SstFileReader.java:14)
	at io.treeverse.clients.SSTableReader.<init>(SSTableReader.scala:90)
	at io.treeverse.clients.SSTableReader$.forMetaRange(SSTableReader.scala:72)
	at io.treeverse.clients.LakeFSInputFormat.getSplits(LakeFSInputFormat.scala:110)
	at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:137)
	at org.apache.spark.rdd.RDD.$anonfun$partitions$2(RDD.scala:283)
	at scala.Option.getOrElse(Option.scala:189)

Richard Gilmore

06/30/2021, 6:24 PM

following https://docs.lakefs.io/reference/export.html

Barak Amar

06/30/2021, 6:35 PM

which spark version are you using?

Barak Amar

06/30/2021, 6:38 PM

also any environment information, if you are using a cloud solution? which one

Richard Gilmore

06/30/2021, 7:03 PM

databricks, spark 3.0.1

Barak Amar

06/30/2021, 7:09 PM

Richard there is a bug https://github.com/treeverse/lakeFS/issues/1847 we are working on to resolve this issue on databricks

Barak Amar

06/30/2021, 7:17 PM

This is something we can probably solve in the follow week or two - will verify tomorrow morning with the team.

👍 1

Ariel Shaqed (Scolnicov)

06/30/2021, 8:17 PM

I'm working on this bug. It's unfortunately non-trivial: DataBricks comes preloaded with a rocksdbjni version from 2 years ago, that does not contain the functionality needed. And it's a JNI library, so not possible to shade in the assembly. @Barak Amar is right and I'm all on it. Lots more details on the issue (or just wait).

Anders Cassidy

07/01/2021, 10:35 PM

We can reach out to our contacts at Databricks to see if there is a specific reason they need to use an older version of rocksdbjni, if you like?

Barak Amar

07/01/2021, 10:48 PM

Thanks @Anders Cassidy there is more information in https://github.com/treeverse/lakeFS/issues/1847. Even if they will switch to a newer version, another issue can come up when one of us will want to change the version. Not having control of the specific version can raise a number of challenges in the future.

👍 1

Ariel Shaqed (Scolnicov)

07/02/2021, 4:43 AM

Thanks, @Anders Cassidy! Please do. While as @Barak Amar explains our analysis is that it will not resolve the support issue for a customers of the metadata client on DataBricks, we would still get some benefits, including: 1. Support for you and a possible route for others; 2. Partial support for all customers: it seems sufficient in most cases for conflicting versions of rocksdbjni to have some api compatibility in order to work. So a Spark cluster with a more modern version might end up working merely by our code creating and using an SstFileReader. The issue with 6.2.x releases is that they don't expose any api to read a single sstable.

Anders Cassidy

07/02/2021, 8:43 AM

I've sent the request. I also asked them to take a look at the issue in case they are aware of some alternate method for overriding JNI libs inside a Databricks container. (My own thought was to create a custom Databricks docker image with the relevant jar replaced, but it's a bit cumbersome to be running a custom container on every cluster)

Barak Amar

07/02/2021, 8:55 AM

Thank you very much @Anders Cassidy

Ariel Shaqed (Scolnicov)

07/02/2021, 6:52 PM

Cool, @Anders Cassidy! I was not aware of this. I agree we would need to consider the ops burden this can generate. I'm also not sure how I coulddo something like this and retain access to DataBricks features.

Anders Cassidy

07/02/2021, 8:09 PM

I believe the images need to be built on top of the base databricks runtime image so all features should be available. So pseudo-Dockerfile would just be something like

Copy code

FROM databricksruntime/standard:latest
ADD <https://repo1.maven.org/maven2/org/rocksdb/rocksdbjni/6.20.3/rocksdbjni-6.20.3.jar> <location to overwrite outdated jar>

but again this is just a workaround...I wouldn't really want to run custom containers

Barak Amar

07/02/2021, 8:32 PM

Will have to look into it more, but from https://docs.databricks.com/clusters/custom-containers.html and the https://github.com/databricks/containers/blob/master/ubuntu/standard/Dockerfile it looks like the custom image is used as a base to the actual image that will be used at runtime.

Copy code

1. VMs are acquired from the cloud provider.
2. The custom Docker image is downloaded from your repo.
3. Databricks creates a Docker container from the image.
4. Databricks Runtime code is copied into the Docker container.
5. The init scrips are executed. See Init script execution order.

So, if the databricks runtime is copied over the image, it will be tricky. Other option is to override it using the init script - but we will need to identify the location and if the init script got the permission to write to that location. Still just thoughts at this point.

Anders Cassidy

07/02/2021, 10:50 PM

I should read right to the bottom of the docs 🙂

6 Views

Open in Slack

Previous Next