Hey all I ve noticed that when I roll out changes to my Lake lakeFS #help

Hey all, I’ve noticed that when I roll out changes...

Jacob

09/29/2023, 4:44 PM

Hey all, I’ve noticed that when I roll out changes to my LakeFS server pods on k8s, some clients receive errors (presumably because they are waiting for a response from a pod that was terminated). Is this a known issue? (This seems to be a tricky k8s problem based on some reading)

Jacob

09/29/2023, 4:45 PM

Is there anything I can do to roll out changes without breaking connections with clients?

Niro

09/30/2023, 9:22 AM

Hi @Jacob, I am not aware of such an issue. Can you please share your setup as well as how you perform the rolling update and the errors you encounter?

Itai Admi

10/02/2023, 12:45 PM

Hey @Jacob, lakeFS respects the

SIGTERM

signal k8s is using to flag that the container will exit soon. For the lakeFS Cloud version, we manage the rollout ourselves and don’t encounter any errors. I find this guide pretty useful for understanding k8s shutdown process.

Jacob

10/02/2023, 1:11 PM

@Itai Admi Thanks for the pointers! Should we be configuring the probes ourselves? I noticed that the lakefs helm chart just uses the default k8s values

Itai Admi

10/02/2023, 3:12 PM

It depends on your particular use-case, like what the clients are doing when they are failing. If I had to guess it’s probably not the probes. Like you mentioned, it seems like a pod is terminated during some clients operation when a deployment is running. That could happen if your clients perform long operations, like transferring big files in a single request. So it seems like the default k8s grace period is too short for your usage. If that’s the case, you can either: 1. Break the long operations into pieces (multiparts, read directly from the object-store, etc.) 2. Increase k8s grace period for terminating pods.

2 Views

Open in Slack

Previous Next