Hi there wave Does the GC job support use service account ba lakeFS #help

Hi there :wave: Does the GC job support use servic...

mishraprafful

04/28/2023, 1:56 PM

Hi there 👋 Does the GC job support use service account based auth (IRSA IAM) instead of having s3 secrets mounted to it? We are planning to have GC run as a

SparkApplication

in k8s. Thanks

Idan Novogroder

04/28/2023, 2:34 PM

Hi Prafful, lakeFS support using credentials-file using the

blockstore.s3.credentials_file

and `blockstore.s3.profile`configurations. You can give a path to a configuration file that will look something like that:

Copy code

[lakefs]
role_arn = <YOUR_ROLE_ARN>
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
role_session_name = <ROLE_SESSION_NAME>

I think GC uses the same configuration, but let me check and get back to you about that by the end of the day.

Idan Novogroder

04/28/2023, 3:04 PM

Sorry, I think I mixed it up a bit. I guess you ment for the s3a configuration and not for lakeFS configuration. The GC job should also work with service account. Please let me know if I get it right and if it worked for you 🙏🏽

mishraprafful

04/28/2023, 3:05 PM

Thanks @Idan Novogroder for your response, will try this and get back to you. 🙂

🙌 1

mishraprafful

05/03/2023, 2:36 PM

Hi again @Idan Novogroder I have some more questions before I reach the state of trying the service account stuff. (and I would really appreciate your help 🙏) Follow-up question on this: • Do you have an example of SparkJob for running GC on k8s, I want to understand if this is the correct package I should be specifying when running it.

Copy code

--packages org.apache.hadoop:hadoop-aws:2.7.7

• Is there a specific reason for GC supporting only one repo name as an argument, instead of having a spark job that goes through all the repos for a lakefs installation?

Idan Novogroder

05/03/2023, 2:52 PM

Hi @mishraprafful, Let me check both questions and I'll get back to with an answer by tomorrow

👍 1

Idan Novogroder

05/04/2023, 1:02 PM

Sorry, still working on that one. I'll get back to you on Sunday. Sorry for the delay.

👍 1

🙏 1

Idan Novogroder

05/07/2023, 8:27 AM

Hi @mishraprafful 1. Yes, this is the right package according to our docs. 2. There are a couple of reasons but the main one is that It's safer to run a job that deletes files from your repository with fewer permissions. In other words- It's a better thing to run one job that deletes files from your "dev" repository with a role that has permission to delete files only from your "dev" bucket, and another job that deletes files from your "production" repository with a role that has permission to delete files only from your "production" bucket.

mishraprafful

05/08/2023, 9:40 AM

Thanks @Idan Novogroder for the answer.

lakefs 1

6 Views

Open in Slack

Previous Next