I m looking over some of the integrations for databricks and lakeFS #help

Join Slack

Channels

help

dev

I'm looking over some of the integrations for data...

# help

user

12/05/2021, 6:03 AM

I'm looking over some of the integrations for databricks and deltalake. The documentation references some hardcoded s3 variables, as such

spark.hadoop.fs.s3a.bucket.<repo-name>.access.key AKIAIOSFODNN7EXAMPLE

spark.hadoop.fs.s3a.bucket.<repo-name>.secret.key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

spark.hadoop.fs.s3a.bucket.<repo-name>.endpoint <https://lakefs.example.com>

spark.hadoop.fs.s3a.path.style.access true

If I'm using Azure Databricks, would it work to just change these to the azure specific references? , and then same question for the delta lake integration

user

12/05/2021, 6:10 AM

Hi @Yusuf Khan, what do you mean in Azure specific references? reference non S3 underlying storage?

user

12/05/2021, 6:10 AM

Yeah, so using azure blob storage instead of S3

user

12/05/2021, 6:16 AM

The reference in the above example points to lakeFS itself. The lakefs service implements S3 protocol (S3 gateway) and here you map s3a schema to communicate with lakeFS, not the underlying bucket. You can set up lakeFS to use Azure Blob storage as an underlaying bucket.

user

12/05/2021, 6:17 AM

But you still need to reference lakefs using s3a. S3 addresses.

user

12/05/2021, 6:18 AM

Can you explain the use case more?

user

12/05/2021, 6:33 AM

Ah I see okay. Sorry I thought maybe the example was specific to using Databricks on AWS. But if I want to use Databricks on Azure I just set it up the same way? Am I understanding that correctly? I'm going to try it out tomorrow but just wanted to check

user

12/05/2021, 6:49 AM

The configuration is the same also in case you run your cluster in-house. From that aspect there should not be a difference, but if you are into any issue, please share and we will be happy to help.

2 Views

Open in Slack

Previous Next