Hi everyone Is there a way we can have BigDL tool setup on L lakeFS #help

Join Slack

Hi everyone, Is there a way we can have BigDL to...

# help

user

05/22/2022, 8:56 PM

Hi everyone, Is there a way we can have BigDL tool setup on LakeFS?

user

05/22/2022, 9:05 PM

Hi Jude 🙂, Do you mean something like running DLlib (Spark) with lakeFS?

user

05/23/2022, 6:09 AM

Yes, exactly

user

05/23/2022, 6:11 AM

Cool, did you get a chance to read about our Spark integration?

user

05/23/2022, 6:12 AM

I have a jupyter notebook configured to communicate with LakeFS, where I can run spark and other machine learning library. But I'm finding it difficult to setup DLlib to also run in my environment (jupyter notebook)

user

05/23/2022, 6:13 AM

Yes I did

user

05/23/2022, 6:19 AM

can you elaborate on what makes it difficult to set up DLlib to run in your Jupyter notebook?

user

05/23/2022, 6:30 AM

I followed the guidelines on how to install DLLib then I discovered I can install it using pip but the drawback with this, is that I will have to unset the SPARK_HOME in my bashrc file. In this case if I do, then I won't be able to use spark to communicate with LakeFS, meaning this method of setting DLlib up with pip obviously can't work for my case.

user

05/23/2022, 6:32 AM

Do you mind sharing the command you use in your notebook to run the DLlib Spark job?

user

05/23/2022, 6:41 AM

I used "pip install bigdl-spark3" (bigdl built on spark3). Then after it successfully installed I tried to verify the installation using the below command "from bigdl.orca import init_orca_context sc = init_orca_context() " After I run I get module not found error

user

05/23/2022, 6:42 AM

Meanwhile, Java home is already set on my environment as I understood it is also required to successfully run DLlib

user

05/23/2022, 6:42 AM

So according to the official docs of DLlib, you must first start your program with something like:

Copy code

val conf = Engine.createSparkConf()
        .setAppName("Train Lenet on MNIST")
        .set("spark.task.maxFailures", "1")
      val sc = new SparkContext(conf)
      Engine.init

Do you mind trying the following:

Copy code

val conf = Engine.createSparkConf()
        .setAppName("Train Lenet on MNIST")
        .set("spark.task.maxFailures", "1")
val sc = new SparkContext(conf)
sc.hadoopConfiguration.set("fs.s3a.endpoint", <your lakeFS server endpoint>)
sc.hadoopConfiguration.set("fs.s3a.access.key", <your lakeFS access key>)
sc.hadoopConfiguration.set("fs.s3a.secret.key", <your lakeFS secret key>)
sc.hadoopConfiguration.set("fs.s3a.path.style.access", true)
Engine.init

user

05/23/2022, 6:51 AM

Okay, I will look into this. But I am more concerned to know if bigdl actually works after installation. Can you check out the python guide doc to see what I mean? Would try this and give you feedback.

user

05/23/2022, 7:18 AM

The change I'm suggesting will only direct Spark to the right location (lakeFS's endpoint). Unfortunately, I'm no BigDL expert or familiar with the library to the point I can actually help you realize if it works after the installation... Do you have a DLlib example app that runs successfully with S3 as its backing object store?

user

05/23/2022, 7:32 AM

Actually, I don't have any at the moment still trying to figure that out. I think the issue here is that DLlib runs its separate spark and jupyter notebook once it is installed and those things I already have running in my environment.

user

05/23/2022, 7:49 AM

So it sounds like you've got some discovery work to do 🧐 I think that it would be useful for you to reach out to the BigDL community to get help initializing your environment. When you have a simple app that works with S3 or other object stores, please reach out, and we would be glad to help you integrate your app with lakeFS 🙂

user

05/23/2022, 10:47 AM

Thank you very much for the tip. I will surely do that

2 Views

Open in Slack

Previous Next