I'm looking over some of the integrations for data...
# help
u
I'm looking over some of the integrations for databricks and deltalake. The documentation references some hardcoded s3 variables, as such
spark.hadoop.fs.s3a.bucket.<repo-name>.access.key AKIAIOSFODNN7EXAMPLE
spark.hadoop.fs.s3a.bucket.<repo-name>.secret.key wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
spark.hadoop.fs.s3a.bucket.<repo-name>.endpoint <https://lakefs.example.com>
spark.hadoop.fs.s3a.path.style.access true
If I'm using Azure Databricks, would it work to just change these to the azure specific references? , and then same question for the delta lake integration
u
Hi @Yusuf Khan, what do you mean in Azure specific references? reference non S3 underlying storage?
u
Yeah, so using azure blob storage instead of S3
u
The reference in the above example points to lakeFS itself. The lakefs service implements S3 protocol (S3 gateway) and here you map s3a schema to communicate with lakeFS, not the underlying bucket. You can set up lakeFS to use Azure Blob storage as an underlaying bucket.
u
But you still need to reference lakefs using s3a. S3 addresses.
u
Can you explain the use case more?
u
Ah I see okay. Sorry I thought maybe the example was specific to using Databricks on AWS. But if I want to use Databricks on Azure I just set it up the same way? Am I understanding that correctly? I'm going to try it out tomorrow but just wanted to check
u
The configuration is the same also in case you run your cluster in-house. From that aspect there should not be a difference, but if you are into any issue, please share and we will be happy to help.