Feature: draining connections during shutdown

### Motivation

Currently, (in release 8.0.0), when Varnish receives a SIGTERM (e.g. by the kubelet in Kubernetes when deployed as a container in a pod - our scenario) then the following happens:
1. the listener socket is closed so no new connections will be accepted (this is fine)
2. Varnish awaits all ongoing requests to finish before finally shutting down (actually, it awaits all VCL references to go to 0 before shutting down)
3. Varnish shuts down

The problem is at step 2: Unless clients have some "out-of-band" information that the Varnish server they are trying to call is currently trying to shut down, clients won't stop sending new requests over existing keep-alive connections. This leads to the situation that Varnish will never come to a state where no active requests exist anymore. It will then eventually be killed with SIGKILL by the kubelet once the pod's terminationGracePeriodSeconds is reached.

This will lead to two scenarios:
1. currently active requests being aborted
2. new requests that come in exactly when Varnish stops, will get a TCP RST

Both scenarios invoke errors in clients depending on the idempotency of the request, which clients may then retry or not.

### Proposal

Add a "drain_timeout" period (via a Varnish parameter - default = 0s preserving current behaviour) which, when set to >0s will have the following effect when Varnish enters shutdown (via SIGINT or SIGTERM):
1. keep current idle connections open
2. respond with `Connection: close` in responses to new requests (and in responses that will be sent after drain started for requests that were received _before_ drain started)
3. periodically monitor the number of currently active sessions/client connections
4. once the number of sessions/client connections drops to zero, end drain period and proceed with normal shutdown (as if drain_timeout was 0s; meaning: wait until all active _requests_ finish and until VCL refcount drops to zero before finally shutting down)

This is in line with what other HTTP proxies like [Envoy proxy](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/operations/draining) are doing, and what also nginx supports exactly like this using the nginx core property [`keepalive_min_timeout`](https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_min_timeout).

### How do people currently workaround this?

I've had a nice discussion with a user at the Varnish discord channel. They were using a preStop exec hook together with a "mock" backend in VCL whose "healthy" state is set to "sick" via varnishadm in the preStop exec command. The VCL can then check (in vcl_deliver) whether the mock backend is healthy, and if not, `set resp.http.connection = "close";`.
Then a loop (also in the preStop exec hook) will periodically poll varnishstat to see whether the number of client connections / sessions drop to zero.
Only then will the preStop exec hook finish and let the kubelet send SIGTERM to Varnish.
Essentially, this is my preStop hook which I modified to support this:
```bash
# Wait 15 seconds for new connections to stop coming in due to kube-proxy
# updating netfilter rules in all clients' nodes. This usually will take at least 5s.
# https://learnkube.com/graceful-shutdown mentions that 15s is a safe duration.
# If clients do not route via kube-proxy/netfilter but enumerate pod IPs via DNS
# then with kube-dns it can take as long as 30s for clients to get an updated pods
# list after the pod enters the 'Terminating' phase.
# But we will assume that there are only kube-proxy/netfilter-based clients that
# connect to the Service IPs.
echo > /proc/1/fd/1 "preStop: Waiting 15s for new connections to stop coming in..."
sleep 15
# Mark the drain_flag backend as sick to trigger draining of existing connections
# with the help of the VCL script which checks backend health on each vcl_deliver
# invocation and adds 'Connection: close' to all responses if unhealthy.
varnishadm -T 127.0.0.1:6082 -S /var/lib/varnish/secrets/secret backend.set_health drain_flag sick
if [ "$LOOP_TIMEOUT" -gt 0 ]; then
  deadline=$(( $(date +%s) + LOOP_TIMEOUT ))
fi
echo > /proc/1/fd/1 "preStop: Waiting at most ${LOOP_TIMEOUT}s for all connections to be drained..."
while :; do
  # Query the number of active client connections with varnishstat
  val=$(varnishstat -n /tmp/varnish_workdir -1 \
        | awk '/MEMPOOL\.sess[0-9]+\.live/ {a+=$2} END {print a+0}')
  if [ "$val" -eq 0 ]; then
    echo > /proc/1/fd/1 "preStop: All connections are gone. Telling Varnish to shut down now."
    break
  elif [ "${LOOP_TIMEOUT:-0}" -gt 0 ] && [ "$(date +%s)" -ge "$deadline" ]; then
    echo > /proc/1/fd/1 "preStop: Deadline reached while there are still connections. Telling Varnish to shut down now anyway."
    break
  fi
  echo > /proc/1/fd/1 "preStop: There are still $val client connections. Continue waiting..."
  sleep 1
done
```
with a supporting VCL like:
```
# Dummy backend to use as a drain flag. The preStop hook will set this backend to sick to trigger draining
# via varnishadm and wait for LOOP_TIMEOUT before letting the kubelet send SIGTERM to the container.
backend drain_flag {
  .host = "127.0.0.1";
  .port = "9";
}
...
sub handle_draining {
  # During draining, which is activated via varnishadm setting the 'drain_flag' backend health to sick,
  # we respond with 'Connection: close' to inform clients not to reuse the connection anymore.
  # This will eventually lead to all connections being closed and no new requests being received.
  if (!std.healthy(drain_flag)) {
    set resp.http.Connection = "close";
  }
}
sub vcl_deliver {
  # Check whether we are draining and adjust the Connection header in client responses accordingly.
  call handle_draining;
  ...
}
```
This works spendidly in a Kubernetes environment and results in no dropped connections or TCP RST or aborted requests when a Varnish pod is terminated (either due to node drain or HPA or rolling deployment updates).

### Additional considerations

There can be, of course, a time when a client will simply not send a new request over an existing idle keep-alive connection after Varnish enters the drain period. This is where we need to tune the `drain_timeout` to the client-side keep-alive/idle timeout.
Currently, we do exactly this between all our services: The server needs to drain for at least the amount of time that the client will keep a connection idle before the client itself will either send a new request over that connection (to which the server will then respond with `Connection: close`, making the client close this connection) or before the client itself will voluntarily close the connection.
Both scenarios will lead to a desired situation of eventually no idle connections sitting around anymore.
But it requires that the keep-alive/idle client-side timeout can be controlled.

However, if the "client" in front of Varnish is merely another HTTP proxy like Nginx or Envoy proxy or a cloud load balancer, then these client-side idle timeouts can be configured in those services appropriately.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: draining connections during shutdown #4441

Motivation

Proposal

How do people currently workaround this?

Additional considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature: draining connections during shutdown #4441

Description

Motivation

Proposal

How do people currently workaround this?

Additional considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions