https://lakefs.io/ logo
Title
m

mishraprafful

04/28/2023, 1:56 PM
Hi there šŸ‘‹ Does the GC job support use service account based auth (IRSA IAM) instead of having s3 secrets mounted to it? We are planning to have GC run as a
SparkApplication
in k8s. Thanks
i

Idan Novogroder

04/28/2023, 2:34 PM
Hi Prafful, lakeFS support using credentials-file using the
blockstore.s3.credentials_file
and `blockstore.s3.profile`configurations. You can give a path to a configuration file that will look something like that:
[lakefs]
role_arn = <YOUR_ROLE_ARN>
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
role_session_name = <ROLE_SESSION_NAME>
I think GC uses the same configuration, but let me check and get back to you about that by the end of the day.
Sorry, I think I mixed it up a bit. I guess you ment for the s3a configuration and not for lakeFS configuration. The GC job should also work with service account. Please let me know if I get it right and if it worked for you šŸ™šŸ½
m

mishraprafful

04/28/2023, 3:05 PM
Thanks @Idan Novogroder for your response, will try this and get back to you. šŸ™‚
šŸ™Œ 1
Hi again @Idan Novogroder I have some more questions before I reach the state of trying the service account stuff. (and I would really appreciate your help šŸ™) Follow-up question on this: ā€¢ Do you have an example of SparkJob for running GC on k8s, I want to understand if this is the correct package I should be specifying when running it.
--packages org.apache.hadoop:hadoop-aws:2.7.7
ā€¢ Is there a specific reason for GC supporting only one repo name as an argument, instead of having a spark job that goes through all the repos for a lakefs installation?
i

Idan Novogroder

05/03/2023, 2:52 PM
Hi @mishraprafful, Let me check both questions and I'll get back to with an answer by tomorrow
šŸ‘ 1
Sorry, still working on that one. I'll get back to you on Sunday. Sorry for the delay.
šŸ‘ 1
šŸ™ 1
Hi @mishraprafful 1. Yes, this is the right package according to our docs. 2. There are a couple of reasons but the main one is that It's safer to run a job that deletes files from your repository with fewer permissions. In other words- It's a better thing to run one job that deletes files from your "dev" repository with a role that has permission to delete files only from your "dev" bucket, and another job that deletes files from your "production" repository with a role that has permission to delete files only from your "production" bucket.
m

mishraprafful

05/08/2023, 9:40 AM
Thanks @Idan Novogroder for the answer.
:lakefs: 1