http: add jitter support for max_connection_duration#44064
http: add jitter support for max_connection_duration#44064Retr0-XD wants to merge 3 commits intoenvoyproxy:mainfrom
Conversation
|
Hi @Retr0-XD, welcome and thank you for your contribution. We will try to review your Pull Request as quickly as possible. In the meantime, please take a look at the contribution guidelines if you have not done so already. |
|
CC @envoyproxy/api-shepherds: Your approval is needed for changes made to |
afbdc5d to
8ad2fb7
Compare
8ad2fb7 to
28b7046
Compare
28b7046 to
56cfa01
Compare
wbpcode
left a comment
There was a problem hiding this comment.
Thanks for this contribution and I added some comments. :)
/wait
| /** | ||
| * @return optional jitter percentage to apply to maxConnectionDuration, in range [0, 100]. | ||
| * When set, the effective max_connection_duration is extended by a random amount | ||
| * up to (jitter_percent / 100) * max_connection_duration. | ||
| */ | ||
| virtual absl::optional<double> maxConnectionDurationJitterPercent() const PURE; |
There was a problem hiding this comment.
I think we may needn't this new API. We can calculate the effective duration at maxConnectionDuration() directly?
There was a problem hiding this comment.
Yep, the new API is needed here (it'd return different values on each call if not handled). This approach mirrors how TCP proxy handles it with calculateMaxDownstreamConnectionDurationWithJitter()
There was a problem hiding this comment.
But we needn't copy the implementation of TCP proxy, right? See, if you remove this method, then you only need to revise implementation of maxConnectionDuration() of the HttpConnectionManagerConfig. The new maxConnectionDuration() could calculate a different connection duration based on the configured jitter. That's say, treat the jitter as internal implementation of the HttpConnectionManagerConfig rather than an exposed API to caller.
- Add PGV validation for max_connection_duration_jitter_percent (required field) - Add documentation note that jitter is only for HTTP/1.1 and HTTP/2 - Note that upstream cluster support is not yet implemented Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
|
added changes to feat branch |
- Add PGV validation for max_connection_duration_jitter_percent (required field) - Add documentation note that jitter is only for HTTP/1.1 and HTTP/2 - Note that upstream cluster support is not yet implemented Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com> (cherry picked from commit 21a33ac) Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
21a33ac to
b379b44
Compare
Fixes envoyproxy#42410. TCP connections already support max_downstream_connection_duration_jitter_percentage (added in envoyproxy#40686). This adds the equivalent for HTTP connections via a new max_connection_duration_jitter_percent field in HttpProtocolOptions. When set, the effective max_connection_duration is extended by a random amount uniformly distributed in [0, jitter_percent/100 * base_duration]. For example, max_connection_duration=10s with 25% jitter means connections drain after a random duration in [10s, 12.5s]. This avoids thundering herd problems when many HTTP/2 connections are established simultaneously. AI Disclosure: Used GitHub Copilot for coding assistance. Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
Adds two test cases: - ConnectionDurationWithJitter: verifies jitter is applied and timer fires at base + jitter - ConnectionDurationJitterNoBaseIgnored: verifies jitter is ignored when base duration is not set Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
- Add PGV validation for max_connection_duration_jitter_percent (required field) - Add documentation note that jitter is only for HTTP/1.1 and HTTP/2 - Note that upstream cluster support is not yet implemented Signed-off-by: Retr0-XD <sakthi.harish@edgeverve.com>
b379b44 to
ee7b782
Compare
|
can you please review this :) |
|
@wbpcode issues addressed can you review please? :) |
| // This field is only effective for HTTP/1.1 and HTTP/2 connections. Support for | ||
| // upstream clusters is not implemented and the jitter will not be applied. | ||
| type.v3.Percent max_connection_duration_jitter_percent = 8 | ||
| [(validate.rules).message = {required: true}]; |
There was a problem hiding this comment.
The comment above implies that this field is optional, but here it's marked as required.
I think adding a new field and marking it as required would essentially be a breaking change, since existing control planes won't populate this field.
|
@Retr0-XD i think this is waiting on your response to feedback /wait-any |
|
Will get the branch corrected |
wbpcode
left a comment
There was a problem hiding this comment.
Sorry for the delay. But I think Mark have left a great comment to the API.
| /** | ||
| * @return optional jitter percentage to apply to maxConnectionDuration, in range [0, 100]. | ||
| * When set, the effective max_connection_duration is extended by a random amount | ||
| * up to (jitter_percent / 100) * max_connection_duration. | ||
| */ | ||
| virtual absl::optional<double> maxConnectionDurationJitterPercent() const PURE; |
There was a problem hiding this comment.
But we needn't copy the implementation of TCP proxy, right? See, if you remove this method, then you only need to revise implementation of maxConnectionDuration() of the HttpConnectionManagerConfig. The new maxConnectionDuration() could calculate a different connection duration based on the configured jitter. That's say, treat the jitter as internal implementation of the HttpConnectionManagerConfig rather than an exposed API to caller.
| // This field is only effective for HTTP/1.1 and HTTP/2 connections. Support for | ||
| // upstream clusters is not implemented and the jitter will not be applied. | ||
| type.v3.Percent max_connection_duration_jitter_percent = 8 | ||
| [(validate.rules).message = {required: true}]; |
|
/wait |
Description
Fixes #42410.
TCP connections already support
max_downstream_connection_duration_jitter_percentage(merged in #40686). This PR adds the equivalent feature for HTTP connections via a newmax_connection_duration_jitter_percentfield inHttpProtocolOptions.Problem
When many HTTP/2 connections are established simultaneously (e.g. during a pod restart or rolling deploy), they all reach
max_connection_durationat the same time, triggering simultaneous drain → reconnect storms ("thundering herd").Solution
Add
max_connection_duration_jitter_percenttoHttpProtocolOptions. When configured, the effectivemax_connection_durationfor each connection is individually extended by a random amount uniformly distributed in[0, jitter_percent/100 * base_duration], spreading connection drains over a window rather than synchronizing them.Example:
max_connection_duration = 10s,max_connection_duration_jitter_percent = 25→ each connection drains at a random time in[10s, 12.5s].Changes
api/envoy/config/core/v3/protocol.protomax_connection_duration_jitter_percent(field 8) toHttpProtocolOptionssource/common/http/conn_manager_config.hmaxConnectionDurationJitterPercent()pure virtual methodsource/extensions/filters/network/http_connection_manager/config.{h,cc}source/common/http/conn_manager_impl.ccsource/server/admin/admin.habsl::nullopt(no jitter for admin connections)test/mocks/http/mocks.hMOCK_METHODfor new interface methodtest/common/http/conn_manager_impl_test_base.{h,cc}test/common/http/conn_manager_impl_fuzz_test.ccabsl::nulloptPrior art
The TCP proxy implementation in
source/common/tcp_proxy/tcp_proxy.cc(methodConfig::calculateMaxDownstreamConnectionDurationWithJitter()) provides the exact same pattern — this PR follows it faithfully.AI Disclosure: Used GitHub Copilot for coding assistance.
AI disclosure: GitHub Copilot was used during implementation and test writing. I fully understand all changes made in this PR.
Commit Message: See PR title
Risk Level: Low
Testing: Unit tests added/verified
Docs Changes: N/A
Release Notes: N/A
Platform Specific Features: N/A