Hey all, I’ve noticed that when I roll out changes...
# help
j
Hey all, I’ve noticed that when I roll out changes to my LakeFS server pods on k8s, some clients receive errors (presumably because they are waiting for a response from a pod that was terminated). Is this a known issue? (This seems to be a tricky k8s problem based on some reading)
Is there anything I can do to roll out changes without breaking connections with clients?
n
Hi @Jacob, I am not aware of such an issue. Can you please share your setup as well as how you perform the rolling update and the errors you encounter?
i
Hey @Jacob, lakeFS respects the
SIGTERM
signal k8s is using to flag that the container will exit soon. For the lakeFS Cloud version, we manage the rollout ourselves and don’t encounter any errors. I find this guide pretty useful for understanding k8s shutdown process.
j
@Itai Admi Thanks for the pointers! Should we be configuring the probes ourselves? I noticed that the lakefs helm chart just uses the default k8s values
i
It depends on your particular use-case, like what the clients are doing when they are failing. If I had to guess it’s probably not the probes. Like you mentioned, it seems like a pod is terminated during some clients operation when a deployment is running. That could happen if your clients perform long operations, like transferring big files in a single request. So it seems like the default k8s grace period is too short for your usage. If that’s the case, you can either: 1. Break the long operations into pieces (multiparts, read directly from the object-store, etc.) 2. Increase k8s grace period for terminating pods.