fix(proto): retry off-path NAT traversal probes and retire stale CIDs by dignifiedquire · Pull Request #524 · n0-computer/noq

dignifiedquire · 2026-03-21T12:11:21Z

Description

Off-path NAT traversal probes were fire-and-forget: sent once per address per round with no retry. This broke simultaneous-open NAT traversal because the first probe is typically dropped (the peer's NAT mapping doesn't exist yet when the probe arrives).

Now probes are retransmitted up to 10 times at initial-RTT PTO intervals, with fresh CIDs reserved for each attempt.

Fixes #410, relates to #376

Changes

ServerState tracks per-probe attempt count via ProbeState
New NatTraversalProbeRetry connection timer fires at initial PTO-base intervals to re-queue probes
Each probe reserves a fresh CID (no cross-path CID reuse)
On new round, stale off-path challenges are cleared

References

draft-seemann-quic-nat-traversal-02 §4: Off-path challenges for NAT traversal
picoquic challenge retry in paths.c: picoquic retries challenges up to PICOQUIC_CHALLENGE_REPEAT_MAX (3) attempts

Notes

If no reserved CIDs are available, the probe is skipped (not sent with the active CID)
Retry timer uses PTO-base from initial RTT without max_ack_delay, since PATH_RESPONSE must be sent immediately

github-actions · 2026-03-21T12:13:53Z

Documentation for this PR has been generated and is available at: https://n0-computer.github.io/noq/pr/524/docs/noq/

Last updated: 2026-04-08T10:45:45Z

github-actions · 2026-03-21T12:45:55Z

Performance Comparison Report

3132a6b3687256af2ac68b6bb6ed5301fd68583e - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5342.1 Mbps	7898.0 Mbps	-32.4%	94.5% / 106.0%
medium-concurrent	5413.6 Mbps	7842.3 Mbps	-31.0%	91.5% / 96.8%
medium-single	4146.8 Mbps	4749.5 Mbps	-12.7%	95.7% / 109.0%
small-concurrent	3867.4 Mbps	5327.1 Mbps	-27.4%	96.9% / 109.0%
small-single	3615.0 Mbps	4799.4 Mbps	-24.7%	92.9% / 109.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3086.7 Mbps	3953.1 Mbps	-21.9%
lan	782.4 Mbps	810.4 Mbps	-3.5%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 25.7% slower on average

---

689cc33f78e3daa7bdb4f4a4e03f054b6e8be2b1 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5453.9 Mbps	7820.3 Mbps	-30.3%	93.6% / 98.1%
medium-concurrent	5332.9 Mbps	7668.2 Mbps	-30.5%	93.9% / 98.5%
medium-single	3740.4 Mbps	4189.2 Mbps	-10.7%	98.3% / 134.0%
small-concurrent	3759.5 Mbps	5143.5 Mbps	-26.9%	98.7% / 138.0%
small-single	3377.7 Mbps	4461.2 Mbps	-24.3%	86.9% / 96.5%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3035.4 Mbps	3925.2 Mbps	-22.7%
lan	782.4 Mbps	800.1 Mbps	-2.2%
lossy	69.8 Mbps	69.9 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 25.0% slower on average

---

c200a40c22e0c3a3a8369676584d59aabf9278a4 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5434.2 Mbps	8012.9 Mbps	-32.2%	91.5% / 96.9%
medium-concurrent	5368.7 Mbps	7864.6 Mbps	-31.7%	88.4% / 95.9%
medium-single	3560.1 Mbps	4544.2 Mbps	-21.7%	99.5% / 188.0%
small-concurrent	3874.1 Mbps	5200.6 Mbps	-25.5%	97.7% / 127.0%
small-single	3340.8 Mbps	4721.8 Mbps	-29.2%	85.6% / 96.1%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3145.4 Mbps	3665.7 Mbps	-14.2%
lan	782.4 Mbps	796.4 Mbps	-1.8%
lossy	69.8 Mbps	55.9 Mbps	+25.0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 26.6% slower on average

---

2d783facd08ff1700b8ff62e17a917613c6faf80 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5533.1 Mbps	7890.4 Mbps	-29.9%	97.6% / 98.9%
medium-concurrent	5479.5 Mbps	7794.6 Mbps	-29.7%	97.6% / 100.0%
medium-single	4124.9 Mbps	4676.2 Mbps	-11.8%	95.8% / 98.3%
small-concurrent	4002.0 Mbps	5180.5 Mbps	-22.7%	97.4% / 99.6%
small-single	3614.8 Mbps	4746.3 Mbps	-23.8%	96.0% / 98.5%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	2863.6 Mbps	3615.9 Mbps	-20.8%
lan	777.9 Mbps	796.5 Mbps	-2.3%
lossy	69.8 Mbps	55.9 Mbps	+25.0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 23.8% slower on average

---

0ae9b27dd4324c33ca3894be802df6e26080922f - artifacts

No results available

---

e57489f044cf9adabb59fd21e99c32ba6e1366c9 - artifacts

No results available

---

ea0c0c430b08f848cef5ae8c36c94f49504c8462 - artifacts

No results available

---

dea8f18b812551a4f1ed25a37a638d4573694fcd - artifacts

No results available

---

4c4aabdf3322a533b905b730f5cd50064ffc9c6d - artifacts

No results available

---

34eb5667b5b3c528ec4581b2ac74eda68e3b07a9 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5501.2 Mbps	8044.0 Mbps	-31.6%	95.8% / 106.0%
medium-concurrent	5426.1 Mbps	7742.1 Mbps	-29.9%	96.3% / 108.0%
medium-single	3964.9 Mbps	4749.3 Mbps	-16.5%	92.2% / 106.0%
small-concurrent	3813.8 Mbps	5431.3 Mbps	-29.8%	95.0% / 109.0%
small-single	3491.5 Mbps	4911.5 Mbps	-28.9%	88.8% / 97.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3072.8 Mbps	3911.2 Mbps	-21.4%
lan	782.4 Mbps	810.4 Mbps	-3.5%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 26.7% slower on average

---

0aa51b888852a0e3088e3e3ed3864c3f51ff4c25 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5362.8 Mbps	7984.5 Mbps	-32.8%	97.5% / 163.0%
medium-concurrent	5456.5 Mbps	7625.1 Mbps	-28.4%	95.1% / 109.0%
medium-single	3873.5 Mbps	4189.2 Mbps	-7.5%	90.7% / 98.0%
small-concurrent	3954.2 Mbps	5151.0 Mbps	-23.2%	93.9% / 110.0%
small-single	3633.4 Mbps	4343.6 Mbps	-16.4%	89.1% / 97.5%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3130.8 Mbps	3687.3 Mbps	-15.1%
lan	782.4 Mbps	796.4 Mbps	-1.8%
lossy	69.8 Mbps	55.9 Mbps	+25.0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 22.3% slower on average

---

890355a622787ba2b44ec9b6dcc510fb474d07d6 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5664.5 Mbps	7749.1 Mbps	-26.9%	94.2% / 100.0%
medium-concurrent	5420.0 Mbps	7775.9 Mbps	-30.3%	93.3% / 99.3%
medium-single	3740.2 Mbps	4749.5 Mbps	-21.3%	91.5% / 99.1%
small-concurrent	3920.3 Mbps	5385.5 Mbps	-27.2%	95.0% / 124.0%
small-single	3520.1 Mbps	4715.6 Mbps	-25.4%	92.3% / 102.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3030.3 Mbps	N/A	N/A
lan	782.4 Mbps	N/A	N/A
lossy	69.8 Mbps	N/A	N/A
wan	83.8 Mbps	N/A	N/A

Summary

noq is 26.7% slower on average

---

65807f55d5fbe67d601089914accdb17b983d9f6 - artifacts

No results available

---

6fcbf2eded950b6343c7764f16ae64e1ed40a225 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5656.1 Mbps	7956.0 Mbps	-28.9%	94.2% / 108.0%
medium-concurrent	5353.7 Mbps	7599.4 Mbps	-29.6%	92.4% / 97.3%
medium-single	4027.2 Mbps	4469.6 Mbps	-9.9%	98.6% / 162.0%
small-concurrent	3877.0 Mbps	5156.4 Mbps	-24.8%	98.2% / 163.0%
small-single	3542.3 Mbps	4743.7 Mbps	-25.3%	93.7% / 111.0%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	N/A	4036.8 Mbps	N/A
lan	N/A	810.4 Mbps	N/A
lossy	N/A	55.9 Mbps	N/A
wan	N/A	83.8 Mbps	N/A

Summary

noq is 25.0% slower on average

---

58dec50b3a74384b3d19bb32a03d73ed13cf3dfa - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	6027.8 Mbps	8019.3 Mbps	-24.8%	97.3% / 98.8%
medium-concurrent	6145.9 Mbps	7589.2 Mbps	-19.0%	97.2% / 100.0%
medium-single	4124.4 Mbps	4571.8 Mbps	-9.8%	97.1% / 99.5%
small-concurrent	3979.0 Mbps	5257.2 Mbps	-24.3%	96.9% / 99.4%
small-single	3622.9 Mbps	4746.4 Mbps	-23.7%	96.6% / 98.5%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3142.6 Mbps	4022.9 Mbps	-21.9%
lan	782.5 Mbps	810.4 Mbps	-3.4%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 20.4% slower on average

---

3cbd18a6de6737ee42bd140766378d88856e9266 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5389.2 Mbps	7703.7 Mbps	-30.0%	98.1% / 160.0%
medium-concurrent	5349.9 Mbps	7097.3 Mbps	-24.6%	95.9% / 105.0%
medium-single	4253.5 Mbps	4361.7 Mbps	-2.5%	90.4% / 98.5%
small-concurrent	3817.8 Mbps	5108.8 Mbps	-25.3%	95.0% / 109.0%
small-single	3514.3 Mbps	4380.6 Mbps	-19.8%	88.8% / 96.8%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3106.7 Mbps	N/A	N/A
lan	782.4 Mbps	N/A	N/A
lossy	69.9 Mbps	N/A	N/A
wan	83.8 Mbps	N/A	N/A

Summary

noq is 22.1% slower on average

---

fd9a7f5f09eb463690cd263173d467bc378add4a - artifacts

No results available

---

8e6d40577fd48e7b6e95c2761f991c7ae539124c - artifacts

No results available

---

a3375242bf9c7333e077eb7621125a9e14c529c4 - artifacts

Raw Benchmarks (localhost)

Scenario	noq	upstream	Delta	CPU (avg/max)
large-single	5390.7 Mbps	7908.7 Mbps	-31.8%	96.7% / 131.0%
medium-concurrent	5334.0 Mbps	7532.6 Mbps	-29.2%	90.5% / 96.7%
medium-single	3757.0 Mbps	4745.6 Mbps	-20.8%	91.8% / 101.0%
small-concurrent	3865.7 Mbps	5229.2 Mbps	-26.1%	91.7% / 99.3%
small-single	3371.7 Mbps	4817.0 Mbps	-30.0%	86.7% / 96.5%

Netsim Benchmarks (network simulation)

Condition	noq	upstream	Delta
ideal	3198.5 Mbps	4022.8 Mbps	-20.5%
lan	782.4 Mbps	810.4 Mbps	-3.4%
lossy	69.8 Mbps	69.8 Mbps	~0%
wan	83.8 Mbps	83.8 Mbps	~0%

Summary

noq is 26.6% slower on average

divagant-martian

This is a partial review but I found things important enough for a partial review

divagant-martian · 2026-03-26T04:00:44Z

@@ -1,6 +1,6 @@
 use bytes::{BufMut, BytesMut};
+use proptest::{prelude::*, prop_assert_ne};


another one to revert

flub · 2026-03-30T13:26:24Z

(removed myself from review because it is still in draft. please request again once ready)

Off-path probes were fire-and-forget: sent once per address per round with no retry. This broke simultaneous-open NAT traversal because the first probe is typically dropped (the peer's NAT mapping doesn't exist yet when the probe arrives). Changes: - Retry off-path probes up to 10 times (once per PTO firing) - Track per-probe CID so retries reuse the same CID (RFC 9000 §9.5 compliant: same CID to same remote address) - New OffPathProbeRetry connection timer drives retransmission - On new NAT traversal round, retire CIDs from old round's failed probes to prevent CID exhaustion (#410) Fixes #410 Relates to #376

handle_reach_out may silently ignore frames (old round, unsupported IP family) without advancing the round. Compare current_round() before and after to avoid clearing valid ongoing probes.

Move mark_as_sent after PacketBuilder completes so attempt count isn't incremented if packet build fails. Uses new borrow-free next_probe_info/mark_probe_sent API to avoid holding a mutable borrow across the packet build.

…tion queue_retries now returns CIDs from probes that exceeded max attempts, enabling callers to retire them. Currently unused but plumbed for when CidQueue gains a retire-by-CID API.

divagant-martian

This seem ok on myend, we can work on the other changes later

flub

No serious objection, only some nits.

flub · 2026-04-08T09:48:54Z

+        if let Ok(server_state) = self.n0_nat_traversal.server_side_mut()
+            && server_state.has_pending_retries()
+        {
+            let pto = self.pto(SpaceKind::Data, path_id);


Using the on-path PTO here is strange. There's no reason at all that this is a relevant duration. It's the equivalent of making this a fairly random.

Probably better off using the PTO-base of the configured initial RTT?

changed, can you please check the initial one I am now using is calculated correctly?

flub · 2026-04-08T09:49:57Z

+        {
+            let pto = self.pto(SpaceKind::Data, path_id);
+            self.timers.set(
+                Timer::Conn(ConnTimer::OffPathProbeRetry),


Could we name this timer NatTraverslaProbeRetry? Off-path is going to confuse me at some point, since not all off-path probes are for nat traversal. But IIUC this timer is more specific than that.

flub · 2026-04-08T10:40:36Z

+            let initial_pto = RttEstimator::new(self.config.initial_rtt).pto_base()
+                + self.ack_frequency.max_ack_delay_for_pto();


You should not do the + max_ack_delay since the probes must be responded to immediately.

Suggested change

let initial_pto = RttEstimator::new(self.config.initial_rtt).pto_base()

+ self.ack_frequency.max_ack_delay_for_pto();

let initial_pto = RttEstimator::new(self.config.initial_rtt).pto_base();

n0bot Bot added this to iroh Mar 21, 2026

github-project-automation Bot moved this to 🚑 Needs Triage in iroh Mar 21, 2026

dignifiedquire added a commit that referenced this pull request Mar 22, 2026

merge: resolve conflicts between #522 and #524 timer variants

830139d

divagant-martian self-requested a review March 23, 2026 02:52

divagant-martian moved this from 🚑 Needs Triage to 👀 In review in iroh Mar 23, 2026

dignifiedquire force-pushed the fix/off-path-probe-retry branch from c200a40 to 2d783fa Compare March 23, 2026 11:00

dignifiedquire added a commit that referenced this pull request Mar 23, 2026

merge: resolve #522/#524 timer conflicts

3068d5b

dignifiedquire added this to the noq: iroh v0.98 milestone Mar 23, 2026

dignifiedquire force-pushed the fix/off-path-probe-retry branch 2 times, most recently from ea0c0c4 to dea8f18 Compare March 23, 2026 16:14

flub self-requested a review March 25, 2026 15:07

dignifiedquire force-pushed the fix/off-path-probe-retry branch 2 times, most recently from 0aa51b8 to 890355a Compare March 25, 2026 17:59

divagant-martian requested changes Mar 26, 2026

View reviewed changes

github-project-automation Bot moved this from 👀 In review to 🏗 In progress in iroh Mar 26, 2026

divagant-martian marked this pull request as draft March 26, 2026 18:49

flub removed their request for review March 30, 2026 13:25

dignifiedquire added 9 commits April 6, 2026 20:42

fix(proto): trim redundant comments and remove unused Clone derive

1caeeac

fix(proto): only clean up stale probes when round actually advances

ad7342e

handle_reach_out may silently ignore frames (old round, unsupported IP family) without advancing the round. Compare current_round() before and after to avoid clearing valid ongoing probes.

fix(proto): mark probe as sent after packet build succeeds

e1fa060

Move mark_as_sent after PacketBuilder completes so attempt count isn't incremented if packet build fails. Uses new borrow-free next_probe_info/mark_probe_sent API to avoid holding a mutable borrow across the packet build.

fix(proto): return expired CIDs from queue_retries for future reclama…

c41e4cb

…tion queue_retries now returns CIDs from probes that exceeded max attempts, enabling callers to retire them. Currently unused but plumbed for when CidQueue gains a retire-by-CID API.

fix(proto): remove dead ServerProbing API, migrate test to new API

11c01fc

fix: cargo fmt

cf18a66

fix: remove unused mut and cargo make format

35a6373

fixup the small stuff

6fcbf2e

dignifiedquire force-pushed the fix/off-path-probe-retry branch from 65807f5 to 6fcbf2e Compare April 6, 2026 18:42

fix: remove broken reuse logic, always get a new cid

58dec50

divagant-martian approved these changes Apr 8, 2026

View reviewed changes

dignifiedquire marked this pull request as ready for review April 8, 2026 09:01

Merge branch 'main' into fix/off-path-probe-retry

3cbd18a

flub approved these changes Apr 8, 2026

View reviewed changes

dignifiedquire added 2 commits April 8, 2026 12:34

apply CR from @flub

fd9a7f5

restore commetns

8e6d405

flub reviewed Apr 8, 2026

View reviewed changes

one more CR

a337524

dignifiedquire enabled auto-merge April 8, 2026 10:44

dignifiedquire disabled auto-merge April 8, 2026 10:44

dignifiedquire enabled auto-merge April 8, 2026 10:46

dignifiedquire added this pull request to the merge queue Apr 8, 2026

Merged via the queue into main with commit 7d60937 Apr 8, 2026
36 checks passed

dignifiedquire deleted the fix/off-path-probe-retry branch April 8, 2026 11:07

github-project-automation Bot moved this from 🏗 In progress to ✅ Done in iroh Apr 8, 2026

divagant-martian mentioned this pull request Apr 8, 2026

wip: client does not open path for nat traversal #572

Draft

		@@ -1,6 +1,6 @@
		use bytes::{BufMut, BytesMut};
		use proptest::{prelude::*, prop_assert_ne};

		let initial_pto = RttEstimator::new(self.config.initial_rtt).pto_base()
		+ self.ack_frequency.max_ack_delay_for_pto();

Conversation

dignifiedquire commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

References

Notes

Uh oh!

github-actions Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance Comparison Report

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Raw Benchmarks (localhost)

Netsim Benchmarks (network simulation)

Summary

Uh oh!

divagant-martian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

divagant-martian Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

dignifiedquire Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

flub commented Mar 30, 2026

Uh oh!

divagant-martian left a comment

Choose a reason for hiding this comment

Uh oh!

flub left a comment

Choose a reason for hiding this comment

Uh oh!

flub Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

dignifiedquire Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

flub Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

dignifiedquire commented Mar 21, 2026 •

edited

Loading

github-actions Bot commented Mar 21, 2026 •

edited

Loading

github-actions Bot commented Mar 21, 2026 •

edited

Loading

flub Apr 8, 2026 •

edited

Loading