Hello I m running into an issue with a lakeFS playground env lakeFS #help

Hello! I’m running into an issue with a lakeFS pla...

Paul Singman

04/13/2022, 11:30 AM

Hello! I’m running into an issue with a lakeFS playground environment on Databricks. I have a dataset in a spark DF and I’m trying to write it into the playground bucket. Code is below

Copy code

spark.conf.set("fs.s3a.bucket.my-repo.access.key", "xxxx")
spark.conf.set("fs.s3a.bucket.my-repo.secret.key", "xxxxxx")
spark.conf.set("fs.s3a.bucket.my-repo.endpoint", "<https://exact-barnacle.lakefs-demo.io>")
spark.conf.set("fs.s3a.path.style.access", "true")

data_path = "/databricks-datasets/amazon/test4K/"
data = spark.read.parquet(data_path)

lakefs_repo = 'my-repo'
lakefs_branch = 'main'
tablename = 'amazon_reviews'
data.write.mode('append').save(f"s3a://{lakefs_repo}/{lakefs_branch}/{tablename}/")

and here’s the error I get:

Copy code

com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: c9582a88-f367-4a89-bcac-c91579031a14)

is this expected behavior of the playground env or is it user error. Thank you in advance!

Yoni Augarten

04/13/2022, 11:41 AM

Hey @Paul Singman, can you please share the Databricks runtime version you're using?

Paul Singman

04/13/2022, 11:42 AM

yup, it’s:

10.2 (includes Apache Spark 3.2.0, Scala 2.12)

Yoni Augarten

04/13/2022, 12:01 PM

Thanks, let me try to reproduce this

Yoni Augarten

04/13/2022, 12:13 PM

@Paul Singman, in order for lakeFS to work with Delta tables, you need to add the following configurations to your cluster (replacing <repo-name> with

my-repo

Copy code

spark.hadoop.fs.s3a.bucket.<repo-name>.aws.credentials.provider shaded.databricks.org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
spark.hadoop.fs.s3a.bucket.<repo-name>.session.token lakefs

Also see the relevant docs.

Paul Singman

04/13/2022, 12:18 PM

ah thank you, makes sense. Lemme try it out

Paul Singman

04/13/2022, 12:33 PM

looks like the data was written to the repo, but an error does get raised

com.databricks.s3commit.S3CommitFailedException: java.io.IOException: Bucket my-repo does not exist

Yoni Augarten

04/13/2022, 12:33 PM

Taking a look

Paul Singman

04/13/2022, 12:34 PM

could be disabling multi cluster writes

Yoni Augarten

04/13/2022, 12:35 PM

That was my thought as well

Paul Singman

04/13/2022, 12:38 PM

hmm it actually did not

Yoni Augarten

04/13/2022, 12:45 PM

For me, disabling multi cluster writes worked:

Copy code

spark.databricks.delta.multiClusterWrites.enabled false

Paul Singman

04/13/2022, 12:46 PM

ok, i’ll try setting it on the cluster instead of in the notebook

👍🏻 1

Paul Singman

04/13/2022, 12:49 PM

it worked! ty

Yoni Augarten

04/13/2022, 1:00 PM

You're welcome 🙂

3 Views

Open in Slack

Previous Next