HT
05/31/2023, 3:20 AM$ lakectl --verbose commit <lakefs://hieu-test/main> --message "add file"
Branch: <lakefs://hieu-test/main>
It takes minutes for the command return, while the web interface say that the commit been done.--verbose
flag is not doing anything. How do I get more verbose in order to diagnose the issue ??Guy Hardonag
05/31/2023, 3:32 AMHT
05/31/2023, 3:46 AMtime="2023-05-31T03:35:59Z" level=error msg="Post-commit hook failed" func="pkg/graveler.(*Graveler).Commit" file="build/pkg/graveler/graveler.go:1916" error="updating commit ID: run id 5iutn73meojc73d2gmfg: postgres get: context canceled" host=<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io> method=POST operation_id=Commit path=/api/v1/repositories/[REDACTED]/branches/main/commits pre_run_id=5iutn73meojc73d2gmfg request_id=16232042-ada5-4d62-a499-1ff37e943d9c run_id=5iutmobmeojc73d2gmg0 service_name=rest_api user=hieut
Guy Hardonag
05/31/2023, 3:47 AMHT
05/31/2023, 3:48 AMtime="2023-05-31T01:22:14Z" level=error msg="could not update metadata" func="pkg/gateway/operations.(*PathOperation).finishUpload" file="build/pkg/gateway/operations/operation_utils.go:51" error="postgres set: timeout: context canceled" host=<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io> matched_host=false method=PUT operation_id=put_object path=[REDACTED] ref=main repository=XXXX request_id=2332a8c1-7854-49dc-9ee6-60d03c5e64c0 service_name=s3_gateway user=hieut
time="2023-05-31T01:22:14Z" level=error msg="could not write request body to block adapter" func=pkg/gateway/operations.handlePut file="build/pkg/gateway/operations/putobject.go:263" error="context canceled" host=<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io> matched_host=false method=PUT operation_id=put_object path=XXXXX ref=main repository=XXXX request_id=00aae4e9-acc8-42ac-852d-c7055c5ca4df service_name=s3_gateway user=hieut
Guy Hardonag
05/31/2023, 3:53 AMHT
05/31/2023, 3:54 AMGuy Hardonag
05/31/2023, 3:54 AMHT
05/31/2023, 3:54 AMGuy Hardonag
05/31/2023, 3:55 AMHT
05/31/2023, 3:56 AMGuy Hardonag
05/31/2023, 3:56 AMHT
05/31/2023, 3:56 AMGuy Hardonag
05/31/2023, 3:56 AMHT
05/31/2023, 3:58 AMIt looks like you are loosing connection with postgres once in a whileIt does look like that
Guy Hardonag
05/31/2023, 4:07 AMHT
05/31/2023, 4:08 AMGuy Hardonag
05/31/2023, 4:16 AMHT
05/31/2023, 4:19 AMtime="2023-05-30T09:02:41Z" level=error msg="could not write request body to block adapter" func=pkg/gateway/operations.handlePut file="build/pkg/gateway/operations/putobject.go:263" error="context canceled" host=<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io> matched_host=false method=PUT operation_id=put_object path=XXXX ref=main repository=XXXX request_id=81b5e6bc-afc8-402b-a108-474521c8fd1c service_name=s3_gateway user=hieut
Is this databse related or Blob storage ?Guy Hardonag
05/31/2023, 4:32 AMNiro
05/31/2023, 5:35 AMHT
05/31/2023, 5:39 AM504 Gateway Timeout
about 4min after the commit command ...Niro
05/31/2023, 8:58 AMHT
05/31/2023, 9:01 AMtime="2023-05-31T08:47:01Z" level=error msg="could not write request body to block adapter" func=pkg/gateway/operations.handlePut file="build/pkg/gateway/operations/putobject.go:263" error="context canceled" host=<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io> matched_host=false method=PUT operation_id=put_object path=XXXX.xml ref=main repository=capture-20211018-141231879861-clone071 request_id=94047de4-f9c4-4e28-ab3f-cff76101768a service_name=s3_gateway user=hieut
Niro
05/31/2023, 9:02 AM504
error?HT
05/31/2023, 9:04 AMNiro
05/31/2023, 9:06 AMHT
05/31/2023, 9:06 AMI understand you set up hooks?I don't think so .... we just deploy the default setup.
Niro
05/31/2023, 9:09 AMHT
05/31/2023, 9:09 AMNiro
05/31/2023, 9:11 AMHT
05/31/2023, 9:15 AMNiro
05/31/2023, 9:45 AMcurl <lakefs_url>/metrics | grep ^pgxpool
HT
05/31/2023, 9:53 AMpgxpool_acquire_count{db_name="kv"} 2.7428454e+07
pgxpool_acquire_duration_ns{db_name="kv"} 1.227152849835e+12
pgxpool_acquired_conns{db_name="kv"} 1
pgxpool_canceled_acquire_count{db_name="kv"} 30
pgxpool_constructing_conns{db_name="kv"} 0
pgxpool_empty_acquire{db_name="kv"} 158220
pgxpool_idle_conns{db_name="kv"} 24
pgxpool_max_conns{db_name="kv"} 25
pgxpool_total_conns{db_name="kv"} 25
LAKEFS_DATABASE_POSTGRES_MAX_OPEN_CONNECTIONS
200
LAKEFS_DATABASE_POSTGRES_MAX_IDLE_CONNECTIONS
200
LAKEFS_LOGGING_LEVEL
DEBUG
• commit is stalling ....
• /metrics:
pgxpool_acquire_count{db_name="kv"} 1771
pgxpool_acquire_duration_ns{db_name="kv"} 453377
pgxpool_acquired_conns{db_name="kv"} 0
pgxpool_canceled_acquire_count{db_name="kv"} 0
pgxpool_constructing_conns{db_name="kv"} 0
pgxpool_empty_acquire{db_name="kv"} 0
pgxpool_idle_conns{db_name="kv"} 200
pgxpool_max_conns{db_name="kv"} 200
pgxpool_total_conns{db_name="kv"} 200
Iddo Avneri
05/31/2023, 10:30 PMHT
06/01/2023, 12:52 AMIddo Avneri
06/01/2023, 1:20 AMHT
06/01/2023, 2:41 AM2023-06-01 14:17:59.926 : rclone start copying
2023-06-01 14:18:05.489 : lakectl commit
2023-06-01 14:22:05.675 : timeout 504
2023-06-01 14:22:06.228 : check if new repo exists and lakectl repo create new repo
2023-06-01 14:22:06.844 : rclone copying new files
Below is ALL logs on lakefs server between 021804 and 022207Guy Hardonag
06/01/2023, 9:31 AMHT
06/01/2023, 9:32 AM#!/bin/bash
# lakefsServer="<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io>"
lakefsStorageBackEnd="<http://tidevstorage0312lakefs.blob.core.windows.net/hieu-test|tidevstorage0312lakefs.blob.core.windows.net/hieu-test>"
srcDir=/data/hieu/deleteme/stressTest/clones
nThread=32
log=$(realpath $BASH_SOURCE).log
if [ -n "$2" ]; then
log=$(realpath $2)
fi
function echoLog {
echo "$(date +'%Y-%m-%d %T.%3N') | $@" | tee -a $log
}
function run {
echoLog "> $@"
eval "$@"
res=$?
echo
return $res
}
function createRepo() {
name=$1
lakectl branch list lakefs://$name &> /dev/null
if [ $? -eq 0 ]; then
return 0
fi
run lakectl repo create lakefs://$name https://$lakefsStorageBackEnd/$name
return $?
}
function upload() {
srcPath=$1; shift
repo=$1; shift
run rclone copy $srcPath sandbox:$repo/main/ --transfers=$nThread -P --checksum
return $?
}
function commit() {
repo=$1;
toCommit=$(lakectl diff lakefs://$repo/main | sed '1d' | wc -l)
if [ $toCommit -eq 0 ]; then
echoLog "Nothing to commit. Continuing .."
return 0
fi
run "lakectl commit lakefs://$repo/main --message \"Initial commit\""
return $?
}
function main() {
opt=$1
pushd $srcDir
repos=$(find . -maxdepth 1 -type d -name "*_clone*" | sed 's|./||g' | sort $opt)
for repoOrg in $repos
do
repo=$(echo $repoOrg | tr '[:upper:]' '[:lower:]' | tr '_' '-')
done=".$repo.commit"
if [ -f $done ]; then
echoLog "Skipping $repoOrg."
continue
fi
echoLog "Processing $repo"
createRepo $repo && \
upload $repoOrg $repo && \
commit $repo && \
run touch $done
done
}
main $1 2>> $log
Guy Hardonag
06/01/2023, 10:19 AMHT
06/01/2023, 10:19 AMGuy Hardonag
06/01/2023, 10:19 AMHT
06/01/2023, 10:19 AMGuy Hardonag
06/01/2023, 10:19 AMHT
06/01/2023, 10:20 AMlakectl commit
itself ?Guy Hardonag
06/01/2023, 10:51 AMHT
06/01/2023, 10:55 AMGuy Hardonag
06/01/2023, 10:55 AMHT
06/01/2023, 10:57 AMGuy Hardonag
06/01/2023, 8:30 PMHT
06/02/2023, 2:50 AMGuy Hardonag
06/02/2023, 9:14 AMHT
06/02/2023, 9:47 AM"configuration": {
"secrets": null,
"activeRevisionsMode": "Single",
"ingress": {
"fqdn": "<http://ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io|ti-dev-aca-03-12-lakefs.proudplant-6acee3df.australiaeast.azurecontainerapps.io>",
"external": true,
"targetPort": 8000,
"exposedPort": 0,
"transport": "Auto",
"traffic": [
{
"weight": 100,
"latestRevision": true
}
],
"customDomains": null,
"allowInsecure": false,
"ipSecurityRestrictions": null,
"corsPolicy": null,
"clientCertificateMode": null,
"stickySessions": null
},
"registries": null,
"dapr": null,
"maxInactiveRevisions": null,
"service": null
},
"template": {
"revisionSuffix": "",
"containers": [
{
"image": "treeverse/lakefs:0.100.0",
"name": "lakefs-server",
"env": [
{
"name": "LAKEFS_DATABASE_TYPE",
"value": "postgres"
},
{
"name": "LAKEFS_DATABASE_POSTGRES_CONNECTION_STRING",
"value": "postgres://************@ti-dev-postgres-03-12-lakefs.postgres.database.azure.com:5432/lakefs?sslmode=require"
},
{
"name": "LAKEFS_AUTH_ENCRYPT_SECRET_KEY",
"value": "************************"
},
{
"name": "LAKEFS_BLOCKSTORE_TYPE",
"value": "azure"
},
{
"name": "LAKEFS_BLOCKSTORE_AZURE_STORAGE_ACCOUNT",
"value": "<https://tidevstorage0312lakefs.blob.core.windows.net/>"
},
{
"name": "LAKEFS_DATABASE_POSTGRES_MAX_OPEN_CONNECTIONS",
"value": "200"
},
{
"name": "LAKEFS_DATABASE_POSTGRES_MAX_IDLE_CONNECTIONS",
"value": "200"
},
{
"name": "LAKEFS_LOGGING_LEVEL",
"value": "TRACE"
}
],
"resources": {
"cpu": 1,
"memory": "2Gi",
"ephemeralStorage": "4Gi"
},
"probes": []
}
],
"initContainers": null,
"scale": {
"minReplicas": 1,
"maxReplicas": 4,
"rules": [
{
"name": "cpuscalerule",
"custom": {
"type": "cpu",
"metadata": {
"type": "Utilization",
"value": "60"
}
}
}
]
},
"volumes": null,
"serviceBinds": null
},
{
"sku": {
"name": "Standard_D2s_v3",
"tier": "GeneralPurpose"
},
"systemData": {
"createdAt": "2023-05-19T03:58:00.7694907Z"
},
"properties": {
"authConfig": {
"activeDirectoryAuth": "Disabled",
"passwordAuth": "Enabled"
},
"dataEncryption": {
"type": "SystemManaged"
},
"fullyQualifiedDomainName": "<http://ti-dev-postgres-03-12-lakefs.postgres.database.azure.com|ti-dev-postgres-03-12-lakefs.postgres.database.azure.com>",
"version": "14",
"minorVersion": "7",
"administratorLogin": "******",
"state": "Ready",
"availabilityZone": "1",
"storage": {
"storageSizeGB": 256
},
"backup": {
"backupRetentionDays": 7,
"geoRedundantBackup": "Disabled",
"earliestRestoreDate": "2023-05-27T04:06:03.1274861+00:00"
},
"network": {
"publicNetworkAccess": "Enabled"
},
"highAvailability": {
"mode": "Disabled",
"state": "NotEnabled"
},
"maintenanceWindow": {
"customWindow": "Disabled",
"dayOfWeek": 0,
"startHour": 0,
"startMinute": 0
},
"replicationRole": "Primary",
"replicaCapacity": 5
},
"location": "Australia East",
"id": "********",
"name": "ti-dev-postgres-03-12-lakefs",
"type": "Microsoft.DBforPostgreSQL/flexibleServers"
}
Iddo Avneri
06/02/2023, 10:00 AMHT
06/05/2023, 9:42 AM