Hey @Joe M, thanks for sharing. The stats you shared are an aggregated snapshot of some point in time. It may contain several steps in the job execution or even several job runs. It’s recommended to collect these metrics with some agent, like Grafana, so that you could see these metrics over time.
However I do see some 499 responses. This means requests cancellations by the client, with Spark this suggests a timeout of the requests. In that case, the recommendation is either to configure retries or extend the timeout of whichever operation that point to 499.