Hello! I’m running into an issue with a lakeFS pla...
# help
p
Hello! I’m running into an issue with a lakeFS playground environment on Databricks. I have a dataset in a spark DF and I’m trying to write it into the playground bucket. Code is below
Copy code
spark.conf.set("fs.s3a.bucket.my-repo.access.key", "xxxx")
spark.conf.set("fs.s3a.bucket.my-repo.secret.key", "xxxxxx")
spark.conf.set("fs.s3a.bucket.my-repo.endpoint", "<https://exact-barnacle.lakefs-demo.io>")
spark.conf.set("fs.s3a.path.style.access", "true")

data_path = "/databricks-datasets/amazon/test4K/"
data = spark.read.parquet(data_path)

lakefs_repo = 'my-repo'
lakefs_branch = 'main'
tablename = 'amazon_reviews'
data.write.mode('append').save(f"s3a://{lakefs_repo}/{lakefs_branch}/{tablename}/")
and here’s the error I get:
Copy code
com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: The security token included in the request is invalid. (Service: AWSSecurityTokenService; Status Code: 403; Error Code: InvalidClientTokenId; Request ID: c9582a88-f367-4a89-bcac-c91579031a14)
is this expected behavior of the playground env or is it user error. Thank you in advance!
y
Hey @Paul Singman, can you please share the Databricks runtime version you're using?
p
yup, it’s:
10.2 (includes Apache Spark 3.2.0, Scala 2.12)
y
Thanks, let me try to reproduce this
@Paul Singman, in order for lakeFS to work with Delta tables, you need to add the following configurations to your cluster (replacing <repo-name> with
my-repo
):
Copy code
spark.hadoop.fs.s3a.bucket.<repo-name>.aws.credentials.provider shaded.databricks.org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
spark.hadoop.fs.s3a.bucket.<repo-name>.session.token lakefs
Also see the relevant docs.
p
ah thank you, makes sense. Lemme try it out
looks like the data was written to the repo, but an error does get raised
com.databricks.s3commit.S3CommitFailedException: java.io.IOException: Bucket my-repo does not exist
y
Taking a look
p
could be disabling multi cluster writes
y
That was my thought as well
p
hmm it actually did not
y
For me, disabling multi cluster writes worked:
Copy code
spark.databricks.delta.multiClusterWrites.enabled false
p
ok, i’ll try setting it on the cluster instead of in the notebook
👍🏻 1
it worked! ty
y
You're welcome 🙂