Cristian Caloian
03/17/2022, 8:47 AMspark-shell --conf spark.hadoop.fs.s3a.access.key=${LAKECTL_CREDENTIALS_ACCESS_KEY_ID} \
--conf spark.hadoop.fs.s3a.secret.key=${LAKECTL_CREDENTIALS_SECRET_ACCESS_KEY} \
--conf spark.hadoop.fs.s3a.endpoint=${LAKECTL_SERVER_ENDPOINT_URL} \
--conf spark.hadoop.fs.s3a.path.style.access=true
When I try to read a LakeFS file rom the Spark shell I get the following error (replacing actual repo and branch):
java.nio.file.AccessDeniedException: s3a://<my-repo>/<my-branch>/<path-to-file>: getFileStatus on s3a://<my-repo>/<my-branch>/<path-to-file>: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 4442587FB7D0A2F9; S3 Extended Request ID: null), S3 Extended Request ID: null:403 Forbidden
Do I need other key/secret in addition to the LakeFS key/secret? e.g. AWS
Thank you!Itai Admi
03/17/2022, 8:54 AMItai Admi
03/17/2022, 8:55 AMCristian Caloian
03/17/2022, 9:10 AMlakectl fs cat
the file locally for example, and read the local file into a Spark df successfully.Itai Admi
03/17/2022, 9:13 AMCristian Caloian
03/17/2022, 9:21 AMlakectl
, and for a test I just downloaded the file from lakefs locally. Just as a sanity check, I read the local copy of the file in a Spark df. What I would like to be able to do is to read the data in Spark directly from LakeFS, using a path of the form s3a://<repo>/<branch>/<filepath>
. This last step is where I get the 403 Forbidden error above.Itai Admi
03/17/2022, 9:34 AMItai Admi
03/17/2022, 9:39 AMLeonard Aukea
03/17/2022, 12:50 PMLeonard Aukea
03/17/2022, 2:35 PMSQL query returned no results
{"args":["rdda-concept-team-pilot-1","%",""],"duration":9578071,"file":"build/pkg/db/logged_rows.go:33","func":"pkg/db.(*LoggedRows).logDuration","level":"debug","msg":"rows done","query":"\n\t WITH resolved_policies_view AS (\n SELECT auth_policies.id, auth_policies.created_at, auth_policies.display_name, auth_policies.statement, auth_users.display_name AS user_display_name\n FROM auth_policies INNER JOIN\n auth_user_policies ON (auth_policies.id = auth_user_policies.policy_id) INNER JOIN\n\t\t auth_users ON (auth_users.id = auth_user_policies.user_id)\n UNION\n\t\tSELECT auth_policies.id, auth_policies.created_at, auth_policies.display_name, auth_policies.statement, auth_users.display_name AS user_display_name\n\t\tFROM auth_policies INNER JOIN\n\t\t auth_group_policies ON (auth_policies.id = auth_group_policies.policy_id) INNER JOIN\n\t\t auth_groups ON (auth_groups.id = auth_group_policies.group_id) INNER JOIN\n\t\t auth_user_groups ON (auth_user_groups.group_id = auth_groups.id) INNER JOIN\n\t\t auth_users ON (auth_users.id = auth_user_groups.user_id)\n\t ) SELECT id, created_at, display_name, statement FROM resolved_policies_view WHERE (user_display_name = $1 AND display_name LIKE $2) AND display_name \u003e $3 ORDER BY display_name","time":"2022-03-17T13:53:39Z","type":"start query","user":"rdda-concept-team-pilot-1"}
{"args":["api"],"file":"build/pkg/db/tx.go:87","func":"pkg/db.(*dbTx).Get","level":"trace","msg":"SQL query returned no results","query":"SELECT storage_namespace, creation_date, default_branch FROM graveler_repositories WHERE id = $1","time":"2022-03-17T13:53:39Z","took":331396,"type":"get","user":"rdda-concept-team-pilot-1"}
This is not the case when you use lakectl
directlyLeonard Aukea
03/17/2022, 2:39 PMItai Admi
03/17/2022, 2:51 PMLeonard Aukea
03/17/2022, 2:53 PMLeonard Aukea
03/17/2022, 2:53 PMItai Admi
03/17/2022, 2:57 PMCristian Caloian
03/17/2022, 3:26 PMhttps://<my-url>
. Initially I was setting it to https://<my-url>/api/v1
as we do for lakectl
.Cristian Caloian
03/17/2022, 3:45 PMItai Admi
03/17/2022, 3:53 PMhttps://<my-url>
. The openAPI endpoint offers a wider set of versioning capabilities like committing, creating branches, etc.. and is accepting traffic under https://<my-url>/api/v1/
.Itai Admi
03/17/2022, 3:54 PMCristian Caloian
03/17/2022, 4:30 PM