Justin Pottenger

09/02/2023, 5:52 PM
Hello wonderful people. I am really impressed with LakeFS and working getting it running in my Kubernetes cluster with MinIO as a backend. Consulting the documentation I now have a working Helm deployment where I set the MinIO specific configs (discover_bucket_region: false, force_path_style: true). This setup is working as expected when I create a user in the UI. However, for my setup I need headless CICD so using the UI as part of the setup process is a no-go. I dug into the forums and found the recommended solution of using “lakefs superuser --user-name admin”. When I run this in the K8 pod it successfully recognizes my config for MinIO passed in from the Helm chart, however it appears that this endpoint relies on a metadata provider that is part of the AWS SDK which was not implemented in MinIO. The following error is shown:
INFO  [2023-09-02T17:23:06Z]build/pkg/auth/service.go:189 pkg/auth.NewAuthService initialized Auth service           service=auth_service
WARNING[2023-09-02T17:23:06Z]build/pkg/cloud/aws/metadata.go:64 pkg/cloud/aws.(*MetadataProvider).GetMetadata.func1 Tried to to get AWS account ID for BI     error="InvalidParameterValue: Unsupported action GetCallerIdentity\n\tstatus code: 400, request id: 178124CA4D3D18C0"
Digging into the code it appears that the AWS package is trying to find the email of the AWS account, and since MinIO doesn’t have one, it fails. Relevant code is here and here. Before I fork LakeFS and try to dig in myself I thought I would see if anyone here had any ideas on a more elegant solution (aka supported) solution? TLDR: LakeFS + Minio + Headless = not supported?

Barak Amar

09/03/2023, 8:47 AM
Hi @Justin Pottenger Will try to resolve and answer all the above concerns: Depending on the database for your lakeFS running
lakefs setup
using the same configuration (required access to the database) should work. The setup is currently accessing the database, adding seed data + an admin user. It doesn't communicate with your lakeFS instance or your block storage (minio) As part of metrics collection, understanding the underlying object store lakeFS on start will collect information. In your case, LakeFS setup, read the configuration, and collect information about the S3 blockstore configured. The code found using AWS STS API to extract the account ID (hashed) - the specific line is inside a retry loop. This code doesn't extract email information and will fail using MinIO. When the retry loop completes it will not fail the lakeFS - it will collect data or no data in your case. The code found at is part of the authorization service. • The emailer there is part of a deprecated service which in lakeFS setup is nil. And we do not instruct configuring any emailer as part of lakeFS. • The initialization of the invite handler is relevant in the case of using an external authenticator. Therefore, it is not necessary to configure an emailer in lakeFS. Finally, the deprecated service is not applicable in this context. Back to the non-UI CI/CD - if your setup job fails there should be another error. It may be related to accessing the database. Also in case you need to control the key/secret configured with your setup/admin there is an option of passing
to the setup command. Let me know if this helps you with your concerns and enables you to have a better experience setup lakeFS.

Justin Pottenger

09/04/2023, 3:56 AM
Got it! Working as expected now. Thank you so much!