I ran a large backfill operation in dagster on Fri...
# help
i
I ran a large backfill operation in dagster on Friday, went to check if everything went through, for some reason 25% of the runs all got a connection refused to Lakefs while creating a branch I assume we need to bump up the max_open_connevtions/idle_connections to PostgreSQL?
n
Hi @Ion, there can be a number of reasons for that. If you suspect the max_open_connections - I suggest you verify this in the postgres logs/metrics
i
@Niro hey, what else might it be besides the PostgreSQL connections? We already have a very buffed up node for Lakefs, also for PostgreSQL we upped the resources alot
n
It really depends on your setup. I think that as a start, it's best to go into the lakeFS logs find where the error comes from
i
Unfortunately I lost the logs, since all our nodes got killed, I think they did some upgrades to the cluster
n
Do you have metrics for your postgres instance?
i
Nope, we havent really set that up yet
To log things structurally
n
Is it a managed postgres service?
i
No deployed in k8s manually for time being, we are waiting on a managed azure deployment
n
So basically you should be able to access the postgres logs via the cluster
i
I could, but the logs of that pod are not visible anymore since it got killed
n
Maybe you can try with the
--previous
flag (full disclosure: not a k8 expert in any way)?
In any case, if there are no logs I'd say we're in limbo. Perhaps it's best to wait until this reproduces again
i
I tried but that one didn't show any logs, so I guess it got restarted a couple times in between
Yeah I think I'll just bump up the default from 25 connections to 100
That shouldn't harm I suppose?
n
Definitely won't hurt - the only question is, if this was the real issue. Take a look at the metrics exposed by lakeFS, it can help you determine whether this was indeed a problem of connection limit. Also see our sizing guide to better understand system requirements as they are applicable to your use case
i
Yeah we looked at that, currently it's 2000m CPU and 3000mi Memory
👍🏽 1