Skip to content

http: add jitter support for max_connection_duration#44064

Open
Retr0-XD wants to merge 3 commits intoenvoyproxy:mainfrom
Retr0-XD:feat/http-max-connection-duration-jitter-42410
Open

http: add jitter support for max_connection_duration#44064
Retr0-XD wants to merge 3 commits intoenvoyproxy:mainfrom
Retr0-XD:feat/http-max-connection-duration-jitter-42410

Conversation

@Retr0-XD
Copy link
Copy Markdown

@Retr0-XD Retr0-XD commented Mar 21, 2026

Description

Fixes #42410.

TCP connections already support max_downstream_connection_duration_jitter_percentage (merged in #40686). This PR adds the equivalent feature for HTTP connections via a new max_connection_duration_jitter_percent field in HttpProtocolOptions.

Problem

When many HTTP/2 connections are established simultaneously (e.g. during a pod restart or rolling deploy), they all reach max_connection_duration at the same time, triggering simultaneous drain → reconnect storms ("thundering herd").

Solution

Add max_connection_duration_jitter_percent to HttpProtocolOptions. When configured, the effective max_connection_duration for each connection is individually extended by a random amount uniformly distributed in [0, jitter_percent/100 * base_duration], spreading connection drains over a window rather than synchronizing them.

Example: max_connection_duration = 10s, max_connection_duration_jitter_percent = 25 → each connection drains at a random time in [10s, 12.5s].

Changes

File Change
api/envoy/config/core/v3/protocol.proto Add max_connection_duration_jitter_percent (field 8) to HttpProtocolOptions
source/common/http/conn_manager_config.h Add maxConnectionDurationJitterPercent() pure virtual method
source/extensions/filters/network/http_connection_manager/config.{h,cc} Parse and store the new proto field
source/common/http/conn_manager_impl.cc Apply jitter when arming the connection duration timer
source/server/admin/admin.h Stub: returns absl::nullopt (no jitter for admin connections)
test/mocks/http/mocks.h Add MOCK_METHOD for new interface method
test/common/http/conn_manager_impl_test_base.{h,cc} Add stub forwarding delegates
test/common/http/conn_manager_impl_fuzz_test.cc Add stub returning absl::nullopt

Prior art

The TCP proxy implementation in source/common/tcp_proxy/tcp_proxy.cc (method Config::calculateMaxDownstreamConnectionDurationWithJitter()) provides the exact same pattern — this PR follows it faithfully.

AI Disclosure: Used GitHub Copilot for coding assistance.

AI disclosure: GitHub Copilot was used during implementation and test writing. I fully understand all changes made in this PR.

Commit Message: See PR title
Risk Level: Low
Testing: Unit tests added/verified
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A

@repokitteh-read-only
Copy link
Copy Markdown

Hi @Retr0-XD, welcome and thank you for your contribution.

We will try to review your Pull Request as quickly as possible.

In the meantime, please take a look at the contribution guidelines if you have not done so already.

🐱

Caused by: #44064 was opened by Retr0-XD.

see: more, trace.

@repokitteh-read-only
Copy link
Copy Markdown

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to (api/envoy/|docs/root/api-docs/).
envoyproxy/api-shepherds assignee is @markdroth
CC @envoyproxy/api-watchers: FYI only for changes made to (api/envoy/|docs/root/api-docs/).

🐱

Caused by: #44064 was opened by Retr0-XD.

see: more, trace.

@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 21, 2026 09:33 — with GitHub Actions Error
@Retr0-XD Retr0-XD force-pushed the feat/http-max-connection-duration-jitter-42410 branch from afbdc5d to 8ad2fb7 Compare March 22, 2026 06:17
@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 22, 2026 06:17 — with GitHub Actions Error
@Retr0-XD Retr0-XD force-pushed the feat/http-max-connection-duration-jitter-42410 branch from 8ad2fb7 to 28b7046 Compare March 22, 2026 06:20
@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 22, 2026 06:20 — with GitHub Actions Error
@Retr0-XD Retr0-XD force-pushed the feat/http-max-connection-duration-jitter-42410 branch from 28b7046 to 56cfa01 Compare March 24, 2026 18:28
@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 24, 2026 18:28 — with GitHub Actions Error
Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution and I added some comments. :)

/wait

Comment thread api/envoy/config/core/v3/protocol.proto Outdated
Comment on lines +296 to +301
/**
* @return optional jitter percentage to apply to maxConnectionDuration, in range [0, 100].
* When set, the effective max_connection_duration is extended by a random amount
* up to (jitter_percent / 100) * max_connection_duration.
*/
virtual absl::optional<double> maxConnectionDurationJitterPercent() const PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may needn't this new API. We can calculate the effective duration at maxConnectionDuration() directly?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the new API is needed here (it'd return different values on each call if not handled). This approach mirrors how TCP proxy handles it with calculateMaxDownstreamConnectionDurationWithJitter()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we needn't copy the implementation of TCP proxy, right? See, if you remove this method, then you only need to revise implementation of maxConnectionDuration() of the HttpConnectionManagerConfig. The new maxConnectionDuration() could calculate a different connection duration based on the configured jitter. That's say, treat the jitter as internal implementation of the HttpConnectionManagerConfig rather than an exposed API to caller.

@wbpcode wbpcode self-assigned this Mar 25, 2026
Retr0-XD added a commit to Retr0-XD/envoy that referenced this pull request Mar 25, 2026
- Add PGV validation for max_connection_duration_jitter_percent (required field)
- Add documentation note that jitter is only for HTTP/1.1 and HTTP/2
- Note that upstream cluster support is not yet implemented

Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 25, 2026 18:50 — with GitHub Actions Error
@Retr0-XD
Copy link
Copy Markdown
Author

Retr0-XD commented Mar 25, 2026

added changes to feat branch

Retr0-XD added a commit to Retr0-XD/envoy that referenced this pull request Mar 25, 2026
- Add PGV validation for max_connection_duration_jitter_percent (required field)
- Add documentation note that jitter is only for HTTP/1.1 and HTTP/2
- Note that upstream cluster support is not yet implemented

Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
(cherry picked from commit 21a33ac)
Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
@Retr0-XD Retr0-XD force-pushed the feat/http-max-connection-duration-jitter-42410 branch from 21a33ac to b379b44 Compare March 25, 2026 19:05
@Retr0-XD Retr0-XD had a problem deploying to external-contributors March 25, 2026 19:05 — with GitHub Actions Error
Fixes envoyproxy#42410.

TCP connections already support max_downstream_connection_duration_jitter_percentage
(added in envoyproxy#40686). This adds the equivalent for HTTP connections via a new
max_connection_duration_jitter_percent field in HttpProtocolOptions.

When set, the effective max_connection_duration is extended by a random
amount uniformly distributed in [0, jitter_percent/100 * base_duration].
For example, max_connection_duration=10s with 25% jitter means connections
drain after a random duration in [10s, 12.5s]. This avoids thundering herd
problems when many HTTP/2 connections are established simultaneously.

AI Disclosure: Used GitHub Copilot for coding assistance.

Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
Adds two test cases:
- ConnectionDurationWithJitter: verifies jitter is applied and timer fires at base + jitter
- ConnectionDurationJitterNoBaseIgnored: verifies jitter is ignored when base duration is not set

Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
- Add PGV validation for max_connection_duration_jitter_percent (required field)
- Add documentation note that jitter is only for HTTP/1.1 and HTTP/2
- Note that upstream cluster support is not yet implemented

Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
@Retr0-XD Retr0-XD force-pushed the feat/http-max-connection-duration-jitter-42410 branch from b379b44 to ee7b782 Compare March 25, 2026 19:07
@Retr0-XD Retr0-XD requested a deployment to external-contributors March 25, 2026 19:07 — with GitHub Actions Waiting
@Retr0-XD
Copy link
Copy Markdown
Author

can you please review this :)

@Retr0-XD
Copy link
Copy Markdown
Author

@wbpcode issues addressed can you review please? :)

// This field is only effective for HTTP/1.1 and HTTP/2 connections. Support for
// upstream clusters is not implemented and the jitter will not be applied.
type.v3.Percent max_connection_duration_jitter_percent = 8
[(validate.rules).message = {required: true}];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above implies that this field is optional, but here it's marked as required.

I think adding a new field and marking it as required would essentially be a breaking change, since existing control planes won't populate this field.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@phlax
Copy link
Copy Markdown
Member

phlax commented Apr 14, 2026

@Retr0-XD i think this is waiting on your response to feedback

/wait-any

@Retr0-XD
Copy link
Copy Markdown
Author

Will get the branch corrected

Copy link
Copy Markdown
Member

@wbpcode wbpcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. But I think Mark have left a great comment to the API.

Comment on lines +296 to +301
/**
* @return optional jitter percentage to apply to maxConnectionDuration, in range [0, 100].
* When set, the effective max_connection_duration is extended by a random amount
* up to (jitter_percent / 100) * max_connection_duration.
*/
virtual absl::optional<double> maxConnectionDurationJitterPercent() const PURE;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we needn't copy the implementation of TCP proxy, right? See, if you remove this method, then you only need to revise implementation of maxConnectionDuration() of the HttpConnectionManagerConfig. The new maxConnectionDuration() could calculate a different connection duration based on the configured jitter. That's say, treat the jitter as internal implementation of the HttpConnectionManagerConfig rather than an exposed API to caller.

// This field is only effective for HTTP/1.1 and HTTP/2 connections. Support for
// upstream clusters is not implemented and the jitter will not be applied.
type.v3.Percent max_connection_duration_jitter_percent = 8
[(validate.rules).message = {required: true}];
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@wbpcode
Copy link
Copy Markdown
Member

wbpcode commented Apr 16, 2026

/wait

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add jitter support for HTTP max_connection_duration

4 participants