user
03/09/2022, 9:18 AMuser
03/09/2022, 9:30 AMAn error occurred while calling o94.parquet. s3a://<DATA BUCKET>/data/<REPO>-<BRANCH>:a3de1af8-f7f1-4fb6-a784-72ac9c20cbb1/-nvHaNa15P3gE3S6jXr4G: getFileStatus on s3a://<DATA BUCKET>/data/<REPO>-<BRANCH>:a3fe1af8-f7f1-4fb6-aa84-72ac9c20cbb1/-nvGaNa15P4gE3S6jXr4G: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: JQJNKMESD9T3XNGE; S3 Extended Request ID: D/sNM/+LqxuX/Vm7NloIQwwY8H+bac14LaQxXnbTLGNTfPEaUhs5rOnGxgersjI6JW3t72i+KEI=; Proxy: null), S3 Extended Request ID: D/sNM/+LqxuX/Vm7NloIQwwY8H+bac14LaQxXnbTLGNTfPEaUhs5rOnGxgersjI6JW3t72i+KEI=
I believe I have configured everything correctly according to this:
https://docs.lakefs.io/integrations/spark.html#configuration-1
In my case:
fs.s3a.access / secret are set to an IAM user credentials which has s3* allowed on the data bucket
fs.s3a.endpoint is set to a region-specific endpoint
fs.lakefs.access / secret are set to what I used for the S3A gateway
fs.lakefs.endpoint is set to the API endpoint on my EC2 instance.
What I'm struggling with is understanding where this error is coming from. Is it S3 of LakeFS related.
We don't have AWS support and I don't see any entries in Cloudwatch / Cloudtrail. I don't see any log entries in my lakefs log (it has log level DEBUG).
The Glue job role also has s3* permission to the data bucket (in case).
Here is the Spark code I use to attempt the write. I'm using the lakefs protocol as described in the previous link.
output = f"lakefs://{repo}/{branch}/data/sample_data_lakefs_write/"
df.write.partitionBy(["yyyy_mm_dd", "hh_mm"]).parquet(output)
I've tested with versions 0.1.4 and 0.1.6 of the assembly jar, same issue.user
03/09/2022, 9:32 AMuser
03/09/2022, 9:38 AMuser
03/09/2022, 9:42 AM"s3*
actions (which is an invalid policy AFAICT and will not work) but also an AWS IAM policy allowing "s3:*"
actions (which has a colon after "s3", and should therefore work).
Just making sure the colon is in there...user
03/09/2022, 9:44 AMuser
03/09/2022, 9:46 AMuser
03/09/2022, 9:47 AMuser
03/09/2022, 9:48 AMuser
03/10/2022, 12:27 PMglue_version = "2.0"
# Spark 2.4.3, Python 3.7
number_of_workers = 2
worker_type = "Standard"
user
03/10/2022, 12:31 PMuser
03/14/2022, 6:48 AM