gang ye
10/19/2023, 9:41 PMsecrets:
authEncryptSecretKey: "123"
# Use the following to fetch PostgreSQL connection string from an existing secret:
databaseConnectionString: "postgres://***"
lakefsConfig: |
database:
type: "postgres"
blockstore:
type: "s3"
s3:
region: "us-west-2"
credentials:
access_key_id: "***"
secret_access_key: "***"
If I want to switch to service account to access S3 bucket, do you have any example for the helm config files?
Will it be something like below
serviceAccount: <service-account-name>
secrets:
authEncryptSecretKey: "123"
# Use the following to fetch PostgreSQL connection string from an existing secret:
databaseConnectionString: "postgres://***"
lakefsConfig: |
database:
type: "postgres"
blockstore:
type: "s3"
s3:
region: "us-west-2"
Amit Kesarwani
10/19/2023, 9:57 PMblockstore.s3.credentials_file
and `blockstore.s3.profile`configurations.
You can give a path to a configuration file that will look something like that:
[lakefs]
role_arn = <YOUR_ROLE_ARN>
web_identity_token_file = /var/run/secrets/eks.amazonaws.com/serviceaccount/token
role_session_name = <ROLE_SESSION_NAME>
gang ye
10/20/2023, 12:08 AMAmit Kesarwani
10/20/2023, 12:12 AMgang ye
10/20/2023, 12:14 AMItai Admi
10/20/2023, 7:36 AMsecrets:
authEncryptSecretKey: "123"
# Use the following to fetch PostgreSQL connection string from an existing secret:
databaseConnectionString: "postgres://***"
lakefsConfig: |
database:
type: "postgres"
blockstore:
type: "s3"
s3:
region: "us-west-2"
will default to aws sdk credentials lookup, which will find the service account creds.gang ye
10/20/2023, 4:42 PMserviceAccount:
name: <service-account-name>
DefaultAWSCredentialsProviderChain
?
after setting service account in helm deployment, pod env variables have the configuration below
AWS_DEFAULT_REGION=us-west-2
AWS_REGION=us-west-2
AWS_ROLE_ARN=arn:aws:iam::***:role/data-experimentation
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_STS_REGIONAL_ENDPOINTS=regional
But the lakefs server cannot create the block adapter as expected.Itai Admi
10/20/2023, 9:05 PMLAKEFS_
)? Also, can you share the logs from the server?gang ye
10/20/2023, 9:08 PMimage:
repository: <http://docker.io/treeverse/lakefs|docker.io/treeverse/lakefs>
pullPolicy: IfNotPresent
# Keys used for existingSecret
secrets:
authEncryptSecretKey: "123"
lakefsConfig: |
logging.level: TRACE
stats.enabled: false
database:
type: local
blockstore:
type: s3
serviceAccount:
name: data-experimentation-sa
env variable
~ $ printenv | grep lakefs
HOSTNAME=lakefsingress-cc96645cf-46xs7
HOME=/home/lakefs
PWD=/home/lakefs
~ $ printenv | grep AWS
AWS_ROLE_ARN=arn:aws:iam::***:role/data-experimentation
AWS_WEB_IDENTITY_TOKEN_FILE=/var/run/secrets/eks.amazonaws.com/serviceaccount/token
AWS_STS_REGIONAL_ENDPOINTS=regional
AWS_DEFAULT_REGION=us-west-2
AWS_REGION=us-west-2
time="2023-10-20T21:08:50Z" level=info msg="Configuration file" func=github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig file="/build/cmd/lakefs/cmd/root.go:109" fields.file=/etc/lakefs/config.yaml file="/build/cmd/lakefs/cmd/root.go:109" phase=startup
time="2023-10-20T21:08:50Z" level=info msg="Config loaded" func=cmd/lakefs/cmd.initConfig file="cmd/root.go:151" fields.file=/etc/lakefs/config.yaml file="cmd/root.go:151" phase=startup
time="2023-10-20T21:08:50Z" level=info msg=Config func=cmd/lakefs/cmd.initConfig file="cmd/root.go:159" actions.enabled=true actions.lua.net_http_enabled=false auth.api.endpoint="" auth.api.supports_invites=false auth.api.token=------ auth.cache.enabled=true auth.cache.jitter=3s auth.cache.size=1024 auth.cache.ttl=20s auth.cookie_auth_verification.auth_source="" auth.cookie_auth_verification.default_initial_groups="[]" auth.cookie_auth_verification.external_user_id_claim_name="" auth.cookie_auth_verification.friendly_name_claim_name="" auth.cookie_auth_verification.initial_groups_claim_name="" auth.cookie_auth_verification.validate_id_token_claims="map[]" auth.encrypt.secret_key="******" auth.login_duration=168h0m0s auth.logout_redirect_url=/auth/login auth.oidc.default_initial_groups="[]" auth.oidc.friendly_name_claim_name="" auth.oidc.initial_groups_claim_name="" auth.oidc.validate_id_token_claims="map[]" auth.remote_authenticator.default_user_group=Viewers auth.remote_authenticator.enabled=false auth.remote_authenticator.endpoint="" auth.remote_authenticator.request_timeout=10s auth.ui_config.login_cookie_names="[internal_auth_session]" auth.ui_config.login_failed_message="The credentials don't match." auth.ui_config.login_url="" auth.ui_config.logout_url="" auth.ui_config.rbac=simplified blockstore.azure.auth_method="" blockstore.azure.disable_pre_signed=false blockstore.azure.disable_pre_signed_ui=true blockstore.azure.pre_signed_expiry=15m0s blockstore.azure.storage_access_key="" blockstore.azure.storage_account="" blockstore.azure.test_endpoint_url="" blockstore.azure.try_timeout=10m0s blockstore.gs.credentials_file="" blockstore.gs.credentials_json="" blockstore.gs.disable_pre_signed=false blockstore.gs.disable_pre_signed_ui=true blockstore.gs.pre_signed_expiry=15m0s blockstore.gs.s3_endpoint="<https://storage.googleapis.com>" blockstore.local.allowed_external_prefixes="[]" blockstore.local.import_enabled=false blockstore.local.import_hidden=false blockstore.local.path="~/lakefs/data/block" blockstore.s3.client_log_request=false blockstore.s3.client_log_retries=false blockstore.s3.credentials_file="" blockstore.s3.disable_pre_signed=false blockstore.s3.disable_pre_signed_ui=true blockstore.s3.discover_bucket_region=true blockstore.s3.endpoint="" blockstore.s3.force_path_style=false blockstore.s3.max_retries=5 blockstore.s3.pre_signed_expiry=15m0s blockstore.s3.profile="" blockstore.s3.region=us-east-1 blockstore.s3.server_side_encryption="" blockstore.s3.server_side_encryption_kms_key_id="" blockstore.s3.skip_verify_certificate_test_only=false blockstore.s3.web_identity.session_duration=0s blockstore.s3.web_identity.session_expiry_window=5m0s blockstore.type=s3 committed.block_storage_prefix=_lakefs committed.local_cache.dir="~/lakefs/data/cache" committed.local_cache.max_uploaders_per_writer=10 committed.local_cache.metarange_proportion=0.1 committed.local_cache.range_proportion=0.9 committed.local_cache.size_bytes=1073741824 committed.permanent.max_range_size_bytes=20971520 committed.permanent.min_range_size_bytes=0 committed.permanent.range_raggedness_entries=50000 committed.sstable.memory.cache_size_bytes=400000000 database.drop_tables=false database.dynamodb.aws_access_key_id=------ database.dynamodb.aws_profile="" database.dynamodb.aws_region="" database.dynamodb.aws_secret_access_key=------ database.dynamodb.endpoint="" database.dynamodb.health_check_interval=0s database.dynamodb.scan_limit=1024 database.dynamodb.table_name=kvstore database.local.enable_logging=false database.local.path="~/lakefs/metadata" database.local.prefetch_size=256 database.local.sync_writes=true database.postgres.connection_max_lifetime=5m0s database.postgres.connection_string=------ database.postgres.max_idle_connections=25 database.postgres.max_open_connections=25 database.postgres.metrics=false database.postgres.scan_page_size=0 database.type=local diff.delta.plugin="" email_subscription.enabled=true fields.file=/etc/lakefs/config.yaml file="cmd/root.go:159" gateways.s3.domain_name="[s3.local.lakefs.io]" gateways.s3.fallback_url="" gateways.s3.region=us-east-1 graveler.background.rate_limit=0 graveler.batch_dbio_transaction_markers=false graveler.commit_cache.expiry=10m0s graveler.commit_cache.jitter=2s graveler.commit_cache.size=50000 graveler.ensure_readable_root_namespace=true graveler.repository_cache.expiry=5s graveler.repository_cache.jitter=2s graveler.repository_cache.size=1000 installation.access_key_id=------ installation.fixed_id="" installation.secret_access_key=------ installation.user_name="" listen_address="0.0.0.0:8000" logging.audit_log_level=DEBUG logging.file_max_size_mb=102400 logging.files_keep=100 logging.format=text logging.level=TRACE logging.output="[-]" logging.trace_request_headers=false phase=startup plugins.default_path="~/.lakefs/plugins" plugins.properties="map[]" security.audit_check_interval=24h0m0s security.audit_check_url="<https://audit.lakefs.io/audit>" security.check_latest_version=true security.check_latest_version_cache=1h0m0s stats.address="<https://stats.lakefs.io>" stats.enabled=false stats.extended=false stats.flush_interval=30s stats.flush_size=100 tls.cert_file="" tls.enabled=false tls.key_file="" ugc.prepare_interval=1m0s ugc.prepare_max_file_size=20971520 ui.enabled=true ui.snippets="[]"
time="2023-10-20T21:08:50Z" level=info msg="lakeFS run" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:91" version=0.113.0
time="2023-10-20T21:08:50Z" level=info msg="initialized Auth service" func=pkg/auth.NewAuthService file="build/pkg/auth/service.go:187" service=auth_service
time="2023-10-20T21:08:50Z" level=debug msg="failed to collect account metadata" func=pkg/stats.NewMetadata file="build/pkg/stats/metadata.go:34" error="not found"
Itai Admi
10/20/2023, 9:12 PMgang ye
10/20/2023, 9:12 PMItai Admi
10/20/2023, 9:14 PMgang ye
10/20/2023, 9:15 PMLiveness probe failed: Get "<http://172.16.95.57:8000/_health>": dial tcp 172.16.95.57:8000: connect: connection refused
I guess it’s caused by server is not running, so connection won’t workItai Admi
10/20/2023, 9:15 PMgang ye
10/20/2023, 9:16 PMItai Admi
10/20/2023, 9:21 PMgang ye
10/20/2023, 9:24 PMlogging.level
in lakeConfig only enables lakefs server debug logItai Admi
10/20/2023, 9:26 PMlakefsConfig: |
logging.level: TRACE
stats.enabled: false
database:
type: local
blockstore:
type: s3
s3:
client_log_retries: true
client_log_request: true
Also, please share
printenv | grep LAKEFS
gang ye
10/20/2023, 9:33 PM~ $ printenv | grep LAKEFS
LAKEFSINGRESS_PORT=<tcp://10.100.146.226:80>
LAKEFS_SERVICE_HOST=10.100.145.100
LAKEFSINGRESS_SERVICE_PORT=80
LAKEFSINGRESS_PORT_80_TCP_ADDR=10.100.146.226
LAKEFS_SERVICE_PORT=80
LAKEFS_PORT=<tcp://10.100.145.100:80>
LAKEFSINGRESS_PORT_80_TCP_PORT=80
LAKEFSINGRESS_PORT_80_TCP_PROTO=tcp
LAKEFS_PORT_80_TCP_ADDR=10.100.145.100
LAKEFS_PORT_80_TCP_PORT=80
LAKEFS_PORT_80_TCP_PROTO=tcp
LAKEFSINGRESS_PORT_80_TCP=<tcp://10.100.146.226:80>
LAKEFS_PORT_80_TCP=<tcp://10.100.145.100:80>
LAKEFSINGRESS_SERVICE_PORT_HTTP=80
LAKEFS_AUTH_ENCRYPT_SECRET_KEY=123
LAKEFS_SERVICE_PORT_HTTP=80
LAKEFSINGRESS_SERVICE_HOST=10.100.146.226
time="2023-10-20T21:34:36Z" level=info msg="Configuration file" func=<http://github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig|github.com/treeverse/lakefs/cmd/lakefs/cmd.initConfig> file="/build/cmd/lakefs/cmd/root.go:109" fields.file=/etc/lakefs/config.yaml file="/build/cmd/lakefs/cmd/root.go:109" phase=startup
time="2023-10-20T21:34:36Z" level=info msg="Config loaded" func=cmd/lakefs/cmd.initConfig file="cmd/root.go:151" fields.file=/etc/lakefs/config.yaml file="cmd/root.go:151" phase=startup
time="2023-10-20T21:34:36Z" level=info msg=Config func=cmd/lakefs/cmd.initConfig file="cmd/root.go:159" actions.enabled=true actions.lua.net_http_enabled=false auth.api.endpoint="" auth.api.supports_invites=false auth.api.token=------ auth.cache.enabled=true auth.cache.jitter=3s auth.cache.size=1024 auth.cache.ttl=20s auth.cookie_auth_verification.auth_source="" auth.cookie_auth_verification.default_initial_groups="[]" auth.cookie_auth_verification.external_user_id_claim_name="" auth.cookie_auth_verification.friendly_name_claim_name="" auth.cookie_auth_verification.initial_groups_claim_name="" auth.cookie_auth_verification.validate_id_token_claims="map[]" auth.encrypt.secret_key="******" auth.login_duration=168h0m0s auth.logout_redirect_url=/auth/login auth.oidc.default_initial_groups="[]" auth.oidc.friendly_name_claim_name="" auth.oidc.initial_groups_claim_name="" auth.oidc.validate_id_token_claims="map[]" auth.remote_authenticator.default_user_group=Viewers auth.remote_authenticator.enabled=false auth.remote_authenticator.endpoint="" auth.remote_authenticator.request_timeout=10s auth.ui_config.login_cookie_names="[internal_auth_session]" auth.ui_config.login_failed_message="The credentials don't match." auth.ui_config.login_url="" auth.ui_config.logout_url="" auth.ui_config.rbac=simplified blockstore.azure.auth_method="" blockstore.azure.disable_pre_signed=false blockstore.azure.disable_pre_signed_ui=true blockstore.azure.pre_signed_expiry=15m0s blockstore.azure.storage_access_key="" blockstore.azure.storage_account="" blockstore.azure.test_endpoint_url="" blockstore.azure.try_timeout=10m0s blockstore.gs.credentials_file="" blockstore.gs.credentials_json="" blockstore.gs.disable_pre_signed=false blockstore.gs.disable_pre_signed_ui=true blockstore.gs.pre_signed_expiry=15m0s blockstore.gs.s3_endpoint="<https://storage.googleapis.com>" blockstore.local.allowed_external_prefixes="[]" blockstore.local.import_enabled=false blockstore.local.import_hidden=false blockstore.local.path="~/lakefs/data/block" blockstore.s3.client_log_request=true blockstore.s3.client_log_retries=true blockstore.s3.credentials_file="" blockstore.s3.disable_pre_signed=false blockstore.s3.disable_pre_signed_ui=true blockstore.s3.discover_bucket_region=true blockstore.s3.endpoint="" blockstore.s3.force_path_style=false blockstore.s3.max_retries=5 blockstore.s3.pre_signed_expiry=15m0s blockstore.s3.profile="" blockstore.s3.region=us-east-1 blockstore.s3.server_side_encryption="" blockstore.s3.server_side_encryption_kms_key_id="" blockstore.s3.skip_verify_certificate_test_only=false blockstore.s3.web_identity.session_duration=0s blockstore.s3.web_identity.session_expiry_window=5m0s blockstore.type=s3 committed.block_storage_prefix=_lakefs committed.local_cache.dir="~/lakefs/data/cache" committed.local_cache.max_uploaders_per_writer=10 committed.local_cache.metarange_proportion=0.1 committed.local_cache.range_proportion=0.9 committed.local_cache.size_bytes=1073741824 committed.permanent.max_range_size_bytes=20971520 committed.permanent.min_range_size_bytes=0 committed.permanent.range_raggedness_entries=50000 committed.sstable.memory.cache_size_bytes=400000000 database.drop_tables=false database.dynamodb.aws_access_key_id=------ database.dynamodb.aws_profile="" database.dynamodb.aws_region="" database.dynamodb.aws_secret_access_key=------ database.dynamodb.endpoint="" database.dynamodb.health_check_interval=0s database.dynamodb.scan_limit=1024 database.dynamodb.table_name=kvstore database.local.enable_logging=false database.local.path="~/lakefs/metadata" database.local.prefetch_size=256 database.local.sync_writes=true database.postgres.connection_max_lifetime=5m0s database.postgres.connection_string=------ database.postgres.max_idle_connections=25 database.postgres.max_open_connections=25 database.postgres.metrics=false database.postgres.scan_page_size=0 database.type=local diff.delta.plugin="" email_subscription.enabled=true fields.file=/etc/lakefs/config.yaml file="cmd/root.go:159" gateways.s3.domain_name="[<http://s3.local.lakefs.io|s3.local.lakefs.io>]" gateways.s3.fallback_url="" gateways.s3.region=us-east-1 graveler.background.rate_limit=0 graveler.batch_dbio_transaction_markers=false graveler.commit_cache.expiry=10m0s graveler.commit_cache.jitter=2s graveler.commit_cache.size=50000 graveler.ensure_readable_root_namespace=true graveler.repository_cache.expiry=5s graveler.repository_cache.jitter=2s graveler.repository_cache.size=1000 installation.access_key_id=------ installation.fixed_id="" installation.secret_access_key=------ installation.user_name="" listen_address="0.0.0.0:8000" logging.audit_log_level=DEBUG logging.file_max_size_mb=102400 logging.files_keep=100 logging.format=text logging.level=TRACE logging.output="[-]" logging.trace_request_headers=false phase=startup plugins.default_path="~/.lakefs/plugins" plugins.properties="map[]" security.audit_check_interval=24h0m0s security.audit_check_url="<https://audit.lakefs.io/audit>" security.check_latest_version=true security.check_latest_version_cache=1h0m0s stats.address="<https://stats.lakefs.io>" stats.enabled=false stats.extended=false stats.flush_interval=30s stats.flush_size=100 tls.cert_file="" tls.enabled=false tls.key_file="" ugc.prepare_interval=1m0s ugc.prepare_max_file_size=20971520 ui.enabled=true ui.snippets="[]"
time="2023-10-20T21:34:36Z" level=info msg="lakeFS run" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:91" version=0.113.0
time="2023-10-20T21:34:36Z" level=info msg="initialized Auth service" func=pkg/auth.NewAuthService file="build/pkg/auth/service.go:187" service=auth_service
time="2023-10-20T21:34:36Z" level=debug msg="failed to collect account metadata" func=pkg/stats.NewMetadata file="build/pkg/stats/metadata.go:34" error="not found"
Itai Admi
10/20/2023, 9:43 PMgang ye
10/20/2023, 9:47 PMItai Admi
10/20/2023, 9:51 PMlivenessProbe
& readinessProbe
section in the values.yaml
file?gang ye
10/20/2023, 9:56 PMItai Admi
10/20/2023, 9:57 PMgang ye
10/20/2023, 10:11 PMextraEnvVars:
# Override K8S defaults for readinessProbe
readinessProbe:
failureThreshold: 10
periodSeconds: 5
successThreshold: 4
timeoutSeconds: 1
# Override K8S defaults for livenessProbe
livenessProbe:
failureThreshold: 20
periodSeconds: 5
successThreshold: 4
timeoutSeconds: 1
initialDelaySeconds: 5
but there is format issueItai Admi
10/20/2023, 10:13 PMreadinessProbe
& livenessProbe
)gang ye
10/20/2023, 10:15 PMlivenessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
livenessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
readinessProbe:
httpGet:
path: /_health
port: http
scheme: HTTP
timeoutSeconds: 5
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
Liveness probe failed: Get "<http://172.16.150.9:8000/_health>": dial tcp 172.16.150.9:8000: connect: connection refused
Itai Admi
10/20/2023, 10:20 PMgang ye
10/20/2023, 10:24 PMItai Admi
10/20/2023, 10:24 PMgang ye
10/20/2023, 10:24 PMIsan Rivkin
10/21/2023, 12:49 PMvalues.yaml
serviceAccount:
name: my-svc-acc
extraManifests:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: '{{ .Values.serviceAccount.name }}'
annotations:
eks.amazonaws.com/role-arn: "arn:aws:iam::ACCOUNT_ID:role/my-pod-role"
gang ye
10/22/2023, 6:03 PMIsan Rivkin
10/22/2023, 9:42 PMgang ye
10/22/2023, 11:46 PMIsan Rivkin
10/23/2023, 6:20 AMk describe deploy <lakefs-deploy>
k describe pod <lakefs-pod> # try to catch pod events before it restarts
k describe replicaset <lakefs-replicaset>
k describe sa <your service account>
k describe svc <lakefs service>
gang ye
10/24/2023, 12:39 AMkubectl describe deploy lakefstest -n data-experimentation
Name: lakefstest
Namespace: data-experimentation
CreationTimestamp: Mon, 23 Oct 2023 17:33:35 -0700
Labels: app=lakefs
<http://app.kubernetes.io/instance=lakefstest|app.kubernetes.io/instance=lakefstest>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=lakefs|app.kubernetes.io/name=lakefs>
<http://app.kubernetes.io/version=0.113.0|app.kubernetes.io/version=0.113.0>
<http://helm.sh/chart=lakefs-0.13.3|helm.sh/chart=lakefs-0.13.3>
Annotations: <http://deployment.kubernetes.io/revision|deployment.kubernetes.io/revision>: 1
<http://meta.helm.sh/release-name|meta.helm.sh/release-name>: lakefstest
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: data-experimentation
Selector: app=lakefs,<http://app.kubernetes.io/instance=lakefstest,app.kubernetes.io/name=lakefs|app.kubernetes.io/instance=lakefstest,app.kubernetes.io/name=lakefs>
Replicas: 1 desired | 1 updated | 1 total | 0 available | 1 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: app=lakefs
<http://app.kubernetes.io/instance=lakefstest|app.kubernetes.io/instance=lakefstest>
<http://app.kubernetes.io/name=lakefs|app.kubernetes.io/name=lakefs>
Annotations: checksum/config: 5b7e985282116d067aa462b11debaeb3fc43e4fcdd55194645756e057e9bcc89
Service Account: data-experimentation-sa
Containers:
lakefs:
Image: <http://docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64|docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64>
Port: 8000/TCP
Host Port: 0/TCP
Args:
run
--config
/etc/lakefs/config.yaml
Liveness: http-get http://:http/_health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/_health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
LAKEFS_AUTH_ENCRYPT_SECRET_KEY: <set to the key 'auth_encrypt_secret_key' in secret 'lakefstest'> Optional: false
Mounts:
/etc/lakefs from config-volume (rw)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: lakefstest
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing True ReplicaSetUpdated
OldReplicaSets: <none>
NewReplicaSet: lakefstest-5c9c5b454b (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 28s deployment-controller Scaled up replica set lakefstest-5c9c5b454b to 1
Describe pod
kubectl describe pod $POD_NAME -n data-experimentation
Name: lakefstest-5c9c5b454b-hd7bj
Namespace: data-experimentation
Priority: 0
Service Account: data-experimentation-sa
Node: ip-172-16-162-137.us-west-2.compute.internal/172.16.162.137
Start Time: Mon, 23 Oct 2023 17:33:36 -0700
Labels: app=lakefs
<http://app.kubernetes.io/instance=lakefstest|app.kubernetes.io/instance=lakefstest>
<http://app.kubernetes.io/name=lakefs|app.kubernetes.io/name=lakefs>
pod-template-hash=5c9c5b454b
Annotations: checksum/config: 5b7e985282116d067aa462b11debaeb3fc43e4fcdd55194645756e057e9bcc89
<http://kubernetes.io/psp|kubernetes.io/psp>: 00-fully-open
Status: Running
IP: 172.16.134.189
IPs:
IP: 172.16.134.189
Controlled By: ReplicaSet/lakefstest-5c9c5b454b
Containers:
lakefs:
Container ID: <docker://0395f05274ef10d047b1b779d344f163656bb7f3157aad4ad63d03ba4ab0b7e>1
Image: <http://docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64|docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64>
Image ID: <docker-pullable://docker.apple.com/aiml-datainfra/lakefs@sha256:2b98fe0283384197441d83d8ac6f25014df0b16f39dd611bb48063b489940255>
Port: 8000/TCP
Host Port: 0/TCP
Args:
run
--config
/etc/lakefs/config.yaml
State: Running
Started: Mon, 23 Oct 2023 17:34:37 -0700
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 23 Oct 2023 17:33:36 -0700
Finished: Mon, 23 Oct 2023 17:34:36 -0700
Ready: False
Restart Count: 1
Liveness: http-get http://:http/_health delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:http/_health delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
LAKEFS_AUTH_ENCRYPT_SECRET_KEY: <set to the key 'auth_encrypt_secret_key' in secret 'lakefstest'> Optional: false
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::xxx:role/aiml-data-experimentation
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/etc/lakefs from config-volume (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-49t2q (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: lakefstest
Optional: false
kube-api-access-49t2q:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: <http://node.kubernetes.io/not-ready:NoExecute|node.kubernetes.io/not-ready:NoExecute> op=Exists for 300s
<http://node.kubernetes.io/unreachable:NoExecute|node.kubernetes.io/unreachable:NoExecute> op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 84s default-scheduler Successfully assigned data-experimentation/lakefstest-5c9c5b454b-hd7bj to ip-172-16-162-137.us-west-2.compute.internal
Normal Killing 54s kubelet Container lakefs failed liveness probe, will be restarted
Normal Pulled 23s (x2 over 84s) kubelet Container image "<http://docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64|docker.apple.com/aiml-datainfra/lakefs:0.113.0-amd64>" already present on machine
Normal Created 23s (x2 over 84s) kubelet Created container lakefs
Normal Started 23s (x2 over 84s) kubelet Started container lakefs
Warning Unhealthy 4s (x13 over 83s) kubelet Readiness probe failed: Get "<http://172.16.134.189:8000/_health>": dial tcp 172.16.134.189:8000: connect: connection refused
Warning Unhealthy 4s (x5 over 74s) kubelet Liveness probe failed: Get "<http://172.16.134.189:8000/_health>": dial tcp 172.16.134.189:8000: connect: connection refused
kubectl describe svc lakefstest -n data-experimentation
Name: lakefstest
Namespace: data-experimentation
Labels: app=lakefs
<http://app.kubernetes.io/instance=lakefstest|app.kubernetes.io/instance=lakefstest>
<http://app.kubernetes.io/managed-by=Helm|app.kubernetes.io/managed-by=Helm>
<http://app.kubernetes.io/name=lakefs|app.kubernetes.io/name=lakefs>
<http://app.kubernetes.io/version=0.113.0|app.kubernetes.io/version=0.113.0>
<http://helm.sh/chart=lakefs-0.13.3|helm.sh/chart=lakefs-0.13.3>
Annotations: <http://meta.helm.sh/release-name|meta.helm.sh/release-name>: lakefstest
<http://meta.helm.sh/release-namespace|meta.helm.sh/release-namespace>: data-experimentation
Selector: <http://app.kubernetes.io/instance=lakefstest,app.kubernetes.io/name=lakefs,app=lakefs|app.kubernetes.io/instance=lakefstest,app.kubernetes.io/name=lakefs,app=lakefs>
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.178.193
IPs: 10.100.178.193
Port: http 80/TCP
TargetPort: http/TCP
Endpoints:
Session Affinity: None
Events: <none>
describe sa
kubectl describe sa data-experimentation-sa -n data-experimentation
Name: data-experimentation-sa
Namespace: data-experimentation
Labels: <none>
Annotations: <http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxx:role/aiml-data-experimentation
Image pull secrets: <none>
Mountable secrets: data-experimentation-sa-token-4m9d5
Tokens: data-experimentation-sa-token-4m9d5
Events: <none>
image:
repository: <http://docker.apple.com/aiml-datainfra/lakefs|docker.apple.com/aiml-datainfra/lakefs>
#repository: <http://docker.io/treeverse/lakefs|docker.io/treeverse/lakefs>
tag: 0.113.0-amd64
pullPolicy: IfNotPresent
# Keys used for existingSecret
secrets:
authEncryptSecretKey: "123"
lakefsConfig: |
logging.level: TRACE
stats.enabled: false
database:
type: local
blockstore:
type: s3
s3:
client_log_retries: true
client_log_request: true
serviceAccount:
name: data-experimentation-sa
time="2023-10-21T00:15:18Z" level=info msg="initialize blockstore adapter" func=pkg/block/factory.BuildBlockAdapter file="build/pkg/block/factory/build.go:32" type=s3
time="2023-10-21T00:15:18Z" level=info msg="initialized blockstore adapter" func=pkg/block/factory.buildS3Adapter file="build/pkg/block/factory/build.go:111" type=s3
Isan Rivkin
10/24/2023, 4:40 PM<http://docker.io/treeverse/experimental-lakefs|docker.io/treeverse/experimental-lakefs>
and tag 1.0.0-vvv1
(attached full values.yaml
below)
b. All the logs I added contain the field trace_flow=true
, please run the new chart and attach the logs. If possible re-run twice to make sure we don’t crush on random places in the code based on the last log message we will see.
3. Regarding the issue itself: 🕵️♂️ the latest logs you sent me, I think that K8S cannot locate runtime dependencies (i.e., the var/run/secrets/kubernetes.io or service account files are missing).
a. What K8S Version is the cluster?
b. Are you using any special plugins for authentication / authz between pods (i.e calico)?
c. This might occur when some containers inside the pod attempt to interact with an API without the default access token.
d. I suspect it because the ServiceAccount you attached has values set in Mounted Secrets
and that’s an old way of accessing SA tokens.
e. If that’s the case it can fix this error by allowing all new mount creations to adhere to the default access level throughout the pod space. Ensure that new pods using custom tokens comply with this access level to prevent continuous startup failures. This can be done with setting .Values.podSecurityContext
and .Values.securityContext
please refer to this SOF as a nice reference to the values and maybe check other charts you have in the cluster?
image:
repository: <http://docker.io/treeverse/experimental-lakefs|docker.io/treeverse/experimental-lakefs>
tag: 1.0.0-vvv1
pullPolicy: IfNotPresent
# Keys used for existingSecret
secrets:
authEncryptSecretKey: "123"
lakefsConfig: |
logging.level: DEBUG
stats.enabled: false
database:
type: local
blockstore:
type: s3
s3:
client_log_retries: true
client_log_request: true
# 90 seconds grace to start, maybe something will pop up in the logs
livenessProbe:
initialDelaySeconds: 90
serviceAccount:
name: data-experimentation-lakefs-sa
# this will create a new service account
extraManifests:
- apiVersion: v1
kind: ServiceAccount
metadata:
name: data-experimentation-lakefs-sa
annotations:
# set the correct ARN
<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: "<http://eks.amazonaws.com/role-arn|eks.amazonaws.com/role-arn>: arn:aws:iam::xxx:role/aiml-data-experimentation"
gang ye
10/24/2023, 5:29 PMkubectl version --short
Flag --short has been deprecated, and will be removed in the future. The --short output will become the default.
Client Version: v1.25.9
Kustomize Version: v4.5.7
Server Version: v1.23.17-eks-2d98532
Isan Rivkin
10/24/2023, 5:50 PMgang ye
10/24/2023, 5:53 PMtime="2023-10-24T18:04:07Z" level=info msg="starting build of block adapter" func=pkg/block/factory.BuildBlockAdapter file="build/pkg/block/factory/build.go:31" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="blockstore type: s3" func="pkg/logging.(*logrusEntryWrapper).Infof" file="build/pkg/logging/logger.go:272" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="initialize blockstore adapter" func=pkg/block/factory.BuildBlockAdapter file="build/pkg/block/factory/build.go:36" type=s3
time="2023-10-24T18:04:07Z" level=info msg="initialized blockstore adapter" func=pkg/block/factory.buildS3Adapter file="build/pkg/block/factory/build.go:115" type=s3
time="2023-10-24T18:04:07Z" level=info msg="finished block adapter build, starting runtime collector" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:164" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post SetRuntimeCollector" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:166" trace_flow=true
time="2023-10-24T18:04:07Z" level=trace msg="dummy sender received metadata" func="pkg/stats.(*dummySender).UpdateMetadata" file="build/pkg/stats/sender.go:147" metadata="{InstallationID:d42db8a3-e6ec-4fdd-b5d1-fa06564a2e8b Entries:[{Name:is_docker Value:true} {Name:instrumentation Value:Run} {Name:lakefs_version Value:dev} {Name:lakefs_kv_type Value:local} {Name:golang_version Value:go1.20.6} {Name:os Value:linux} {Name:architecture Value:amd64} {Name:is_k8s Value:true} {Name:installation_id Value:d42db8a3-e6ec-4fdd-b5d1-fa06564a2e8b} {Name:blockstore_type Value:s3}]}" service=stats_collector
time="2023-10-24T18:04:07Z" level=info msg="post CollectMetadata, initiating catalog" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:169" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="head build block adapter trace_flow true" func="pkg/logging.(*logrusEntryWrapper).Infof" file="build/pkg/logging/logger.go:272"
time="2023-10-24T18:04:07Z" level=info msg="starting build of block adapter" func=pkg/block/factory.BuildBlockAdapter file="build/pkg/block/factory/build.go:31" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="blockstore type: s3" func="pkg/logging.(*logrusEntryWrapper).Infof" file="build/pkg/logging/logger.go:272" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="initialize blockstore adapter" func=pkg/block/factory.BuildBlockAdapter file="build/pkg/block/factory/build.go:36" type=s3
time="2023-10-24T18:04:07Z" level=info msg="initialized blockstore adapter" func=pkg/block/factory.buildS3Adapter file="build/pkg/block/factory/build.go:115" type=s3
time="2023-10-24T18:04:07Z" level=info msg="Post Catalog Initialization" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:178" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="Pre scheduler" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:180" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="Pre scheduler: deleteScheduler.StartAsync" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:186" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="Pre new actions service" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:202" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="Post new actions service" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:212" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre middlewareAuthenticator" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:217" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post middlewareAuthenticator" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:221" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre NewDefaultAuditChecker" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:231" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post NewDefaultAuditChecker" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:235" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre checkRepos" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:245" trace_flow=true
time="2023-10-24T18:04:07Z" level=debug msg="lakeFS isn't initialized, skipping mismatched adapter checks" func=cmd/lakefs/cmd.checkRepos file="cmd/run.go:387"
time="2023-10-24T18:04:07Z" level=info msg="post checkRepos" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:247" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre: updating SetHealthHandlerInfo" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:249" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post: updating SetHealthHandlerInfo" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:252" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre: init api.Serve" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:255" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="initialize OpenAPI server" func=pkg/api.Serve file="build/pkg/api/serve.go:38" service=api_gateway
time="2023-10-24T18:04:07Z" level=info msg="post: init api.Serve" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:275" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="pre SSO auth middlewares" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:285" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post SSO auth middlewares" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:300" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="initialized S3 Gateway handler" func=pkg/gateway.NewHandler file="build/pkg/gateway/handler.go:124" s3_bare_domain="[s3.local.lakefs.io]" s3_region=us-east-1
time="2023-10-24T18:04:07Z" level=info msg="pre apiAuthenticator" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:314" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="post apiAuthenticator" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:316" trace_flow=true
time="2023-10-24T18:04:07Z" level=info msg="starting HTTP server" func=cmd/lakefs/cmd.glob..func8 file="cmd/run.go:322" listen_address="0.0.0.0:8000"
It looks like the server starts successfully. I will verify the s3 access.Isan Rivkin
10/24/2023, 6:58 PM