Safely bypass the flow-filter by Fredi-raspall · Pull Request #1342 · githedgehog/dataplane

Fredi-raspall · 2026-03-14T15:29:21Z

Augments pipeline with some PipelineData so that all NFs that are part of the pipeline can have access to it on their creation.
Currently the pipeline data only includes the generation Id. We use this so that NFs can easily know the generation id of the currently applied config, with minimal changes.
Augment flows to have a generation Id. The generation id of a flow is the config generation when the flow was created (or subsequently validated).
Augment NAT NFs so that they create flows with generation ids
Implement bypass of flow-filter so that:
- packets that refer to flows can bypass the filter if the configuration has not changed.
- on config changes, if a packet refers to a flow set up with a previous config, the packet does not bypass the filter.
  If the packet survives, the flow is upgraded and subsequent packets bypass, because the flow conforms to the new config.
  If the packet is not allowed, that means that the flow should not be allowed and it gets invalidated.

These changes not only expedite packet processing but also simplify the flow-filter implementation.
E.g. with these, the extra checks in PR #1341 should no longer be needed.

This fixes #1312

qmonnet

This is a great addition, but I've got a concern about concurrency. Could it be that we bump the config while the packet is being processed, for example after we've agreed to bypass flow-filter or after we've looked at the NAT context, and that we pick the genid for a new config that no longer allows it when creating the sessions? Should we attach the genid to the packet in the flow-filter instead, and pass it along to be sure that it doesn't change while we process the packet?

pipeline/src/pipeline.rs

pipeline/src/static_nf.rs

mgmt/Cargo.toml

nat/src/stateful/mod.rs

flow-filter/src/lib.rs

nat/src/portfw/mod.rs

Fredi-raspall · 2026-03-16T14:47:24Z

This is a great addition, but I've got a concern about concurrency. Could it be that we bump the config while the packet is being processed, for example after we've agreed to bypass flow-filter or after we've looked at the NAT context, and that we pick the genid for a new config that no longer allows it when creating the sessions? Should we attach the genid to the packet in the flow-filter instead, and pass it along to be sure that it doesn't change while we process the packet?

I did a first implementation with that idea, but later thought that this was cleaner. (I can re-add the marking of packets with the genid). In the case you mentioned, yes, we'd allowing a packet that should not longer be allowed. But is that a problem? I mean, if the config was applied 1ns later, tons of packets would be allowed. Sure, NFs may not find the right state to process it, but all are (should otherwise) be prepared for that, right?

qmonnet · 2026-03-16T15:08:43Z

In the case you mentioned, yes, we'd allowing a packet that should not longer be allowed.

I agree, that part is a non-issue.

But is that a problem?

I'm not sure we're talking of the same thing. My issue is the following: let's consider a change of config from gen X to X+1, and a packet valid for gen X but invalid for gen X+1.

Packet goes through flow-filter, config X is applied, there's no flow established: packet is supposed to be port-forwarded, gets sent to port-forwarding stage
Packet goes through port-forwarding stage, we retrieve the rule based on packet flow info and port-forwarding context; flow table entry has not been created yet.
Change of config happens, we're now at gen X+1, for which the packet is invalid.
We retrieve the genid and create new session for the flow (forward and reverse), labelling the flow with gen X+1.
Packet gets translated and goes through.
Follow-up packets match the existing flow table entry, tagged for gen X+1, they bypass the flow-filter and go through.

The issue is not at step 5, it's fine that this packet goes through. The issue is at step 4 (then 6): if we retrieve the genid after we've switched to X+1 but before we create the flow table entry, we might tag the packet for the wrong generation and allow subsequent packets to go through.

Looking again at the code I think you're right that we don't need to pass the genid to the packet, and it's fine if we have a mismatch between the genid found in flow-filter and in NAT; but I believe that inside of the NAT stages we should move the call to self.pipeline_data.genid() to the beginning of the processing function (with a comment advising caution if moving it) to avoid retrieving it after we've been through the NAT context lookup logic?

Fredi-raspall · 2026-03-16T19:06:09Z

In the case you mentioned, yes, we'd allowing a packet that should not longer be allowed.

I agree, that part is a non-issue.

But is that a problem?

I'm not sure we're talking of the same thing. My issue is the following: let's consider a change of config from gen X to X+1, and a packet valid for gen X but invalid for gen X+1.

Packet goes through flow-filter, config X is applied, there's no flow established: packet is supposed to be port-forwarded, gets sent to port-forwarding stage

Packet goes through port-forwarding stage, we retrieve the rule based on packet flow info and port-forwarding context; flow table entry has not been created yet.

Change of config happens, we're now at gen X+1, for which the packet is invalid.

We retrieve the genid and create new session for the flow (forward and reverse), labelling the flow with gen X+1.

Packet gets translated and goes through.

Follow-up packets match the existing flow table entry, tagged for gen X+1, they bypass the flow-filter and go through.

The issue is not at step 5, it's fine that this packet goes through. The issue is at step 4 (then 6): if we retrieve the genid after we've switched to X+1 but before we create the flow table entry, we might tag the packet for the wrong generation and allow subsequent packets to go through.

Looking again at the code I think you're right that we don't need to pass the genid to the packet, and it's fine if we have a mismatch between the genid found in flow-filter and in NAT; but I believe that inside of the NAT stages we should move the call to self.pipeline_data.genid() to the beginning of the processing function (with a comment advising caution if moving it) to avoid retrieving it after we've been through the NAT context lookup logic?

That's the exact example I was thinking about. I have to think of a solution to that.

Fredi-raspall · 2026-03-16T19:49:18Z

In the case you mentioned, yes, we'd allowing a packet that should not longer be allowed.

I agree, that part is a non-issue.

But is that a problem?

I'm not sure we're talking of the same thing. My issue is the following: let's consider a change of config from gen X to X+1, and a packet valid for gen X but invalid for gen X+1.

Packet goes through flow-filter, config X is applied, there's no flow established: packet is supposed to be port-forwarded, gets sent to port-forwarding stage

Packet goes through port-forwarding stage, we retrieve the rule based on packet flow info and port-forwarding context; flow table entry has not been created yet.

Change of config happens, we're now at gen X+1, for which the packet is invalid.

We retrieve the genid and create new session for the flow (forward and reverse), labelling the flow with gen X+1.

Packet gets translated and goes through.

Follow-up packets match the existing flow table entry, tagged for gen X+1, they bypass the flow-filter and go through.

The issue is not at step 5, it's fine that this packet goes through. The issue is at step 4 (then 6): if we retrieve the genid after we've switched to X+1 but before we create the flow table entry, we might tag the packet for the wrong generation and allow subsequent packets to go through.
Looking again at the code I think you're right that we don't need to pass the genid to the packet, and it's fine if we have a mismatch between the genid found in flow-filter and in NAT; but I believe that inside of the NAT stages we should move the call to self.pipeline_data.genid() to the beginning of the processing function (with a comment advising caution if moving it) to avoid retrieving it after we've been through the NAT context lookup logic?

That's the exact example I was thinking about. I have to think of a solution to that.

Ok. I've had the time now to think about this. Yes, this needs to be addressed, good catch.
Marking packets with the generation Id when they entered (or were evaluated by the filter) does not help. The problem is if NAT (for example) learns a generation Id larger than (before) the flow-filter, for a given packet. I think the window where that could happen is tiny (it would need to be for a packet establishing a new flow before a new config was applied but after it was learnt and NAT should learn that config is X+1 while the flow filter didn't). That being said, it has to be addressed.

With that in mind, the problem is that the condition to bypass (flow.genid = current.genid) is insufficient to tell that the flow is okay for the new genid, because by the time the flow was created, we skipped the validation in the case of the race.
So, the NAT module assumes that the packet was validated with the configuration that it sees (X+1), but in reality the packet may have been validated with X. Given that:

this is only an issue when transitioning from allowed -> disallowed
this only happens if a flow is created between X and X+1
we can't go back in time to re-evaluate the packet (forget recirculation)
labeling packets would not help (we could label a packet as "validated by X" but then if NAT saw X+1 what would it do?)

... a simple solution involves augmenting the flow with a new field that says that it was actually validated under the current genid.
Sample workflow:

* config transitions X -> X+1, but, for a packet, flow-filter sees X and NAT sees X+1
* NAT creates flow with X+1, "not-validated"
* flow-filter eventually sees that generation is X+1. When it gets packet, 
it sees flow is X+1 but "not-validated". So, it does NOT bypass the flow-filter.
If config X+1 would allow packet, the flow is marked as validated. Else, it is invalidated as before.

qmonnet · 2026-03-16T20:25:22Z

... a simple solution involves [...]

I need to think more about it, too, but I think your proposal should work

Fredi-raspall · 2026-03-16T20:48:12Z

... a simple solution involves [...]

I need to think more about it, too, but I think your proposal should work

Ok!. There's actually a simpler one that involves the same variables, repurposed.
Instead of letting NAT set the generation Id, let the flow-filter do so:

nat would create flow, without generation id or maybe zero.
when seeing zero (or anything < current genid), flow filter would validate and then set the "valid_for(genid)" annotation. This way a single atomic is needed as before.

qmonnet

There's actually a simpler [solution] [...]

Yes, I think it looks correct. The flows are tagged with the right genid, it means we can only bypass starting from the 3rd packet (1st creates the flow, 2nd gets validated and sets the genid) but it's still better than no bypass or a flawed bypass.

Please make sure to address, or reply to/resolve all comments from the first review.

flow-filter/src/lib.rs

We need an easy way for NFs that belong to a pipeline to access some common data (e.g. the generation Id of the current configuration applied) in an easy way, without needing to propagate or change APIs each time. This commit: - defines a PipelineData struct with an atomic u64 for the generation id. A struct is defined so that other fields can be added later. - extends trait NetWorkFunction so that the stages that require so can have an Arc to the PipelineData when added to the pipeline. - Modifies the method to add stages to the pipeline so that those NFs that implement the new method get access to the shared PipelineData. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Extend the masquerade, port-forwarding and flow-filter NFs with an Arc<> to pipeline data and implement the trait method set_data(). Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Pass pipeline data to mgmt and update its generation id when a new config is applied. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Augment the flow_info object with a generation id. The value represents the generation id of the configuration under which the flow was created. Subsequent configuration changes will update the value to the current generation id if the flow is still allowed under that configuration. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

When creating a flow for masquerading or port forwarding, label it with the current generation id. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Modify flow filter so that packets are not unnecessarily re-evaluated if they refer to a flow and the flow was created with the current configuration. The logic is: - if packet refers to a flow and the flow's gen id is the same as that of the current configuration, we can bypass the flow filter since the flow should be allowed. - if packet refers to a flow but its gen id is lower than that of the current config, the packet needs to be re-evaluated; that is, we can't bypass the flow-filter. - if a packet referring to a flow is re-evaluated, then: - if the packet is allowed by the flow filter, set the flow gen id to the current (as the flow is allowed with the current config) for subsequent packets to bypass. - if the packet is not anymore allowed, drop it and invalidate the flow(s) it referred to. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Move the declaration of tracing target "port-frwarding" to the modules' root so that it governs all port-forwarding logs. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

When creating flows, neither masquerade nor port-forwarding labels them, so that the flow gets a default generation id of 0. This way the flow-filter must always validate a flow after it is created thereby avoiding the potential race where a flow is created for generation N while the packet, processed before by the flow filter, gets validated for config N-1. This race stems from the fact that the generation id is stored in an atomic and could change in the interim between the validation and the flow creation. If config N would allow the flow but N+1 wouldn't we could otherwise be in the situation where we would allow a flow that should be denied (until the next reconfiguration) because NAT would create the flow validated by a prior config. Note: with this changes, neither masquerade nor port-forwarding need to access the pipeline data. However, we leave the Arc's for future use since their cost is negligible. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Add convenience methods to set the genid of a flow and its related flow and to invalidate flows in pairs. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Build the two flows for a stateful NAT session in pairs. This allows invalidating them both on a configuration change. Right now invalidation may not shorten their lifetime, but it will soon. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

qmonnet

Looks good to me, thank you!

Fredi-raspall requested review from mvachhar and qmonnet March 14, 2026 15:29

Fredi-raspall requested a review from a team as a code owner March 14, 2026 15:29

Fredi-raspall mentioned this pull request Mar 14, 2026

Drop packets with invalid NAT requirements in flow-filter #1341

Draft

Fredi-raspall added the ci:+vlab Enable VLAB tests label Mar 14, 2026

Fredi-raspall closed this Mar 14, 2026

Fredi-raspall reopened this Mar 14, 2026

Fredi-raspall changed the title ~~Safe bypass the flow-filter~~ Safely bypass the flow-filter Mar 14, 2026

qmonnet requested changes Mar 16, 2026

View reviewed changes

Fredi-raspall requested a review from qmonnet March 16, 2026 22:28

Fredi-raspall force-pushed the pr/fredi/bypass_filter branch from 3750381 to f54731e Compare March 16, 2026 22:30

qmonnet requested changes Mar 17, 2026

View reviewed changes

flow-filter/src/lib.rs Outdated Show resolved Hide resolved

flow-filter/src/lib.rs Show resolved Hide resolved

Fredi-raspall added 11 commits March 17, 2026 14:36

feat(nat,flow-filter): impl set_data() for NAT and flow-filter

9d3a203

Extend the masquerade, port-forwarding and flow-filter NFs with an Arc<> to pipeline data and implement the trait method set_data(). Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(dataplane): build pipeline with pipeline data

83ddfef

Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(mgmnt,dataplane): pass pipeline data to mgmt

a2fee25

Pass pipeline data to mgmt and update its generation id when a new config is applied. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(nat): set generation id on flow creation

4d09427

When creating a flow for masquerading or port forwarding, label it with the current generation id. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(flow-filter): move tracing target declaration up

65f7d6a

Move the declaration of tracing target "port-frwarding" to the modules' root so that it governs all port-forwarding logs. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(net,nat,flow-info): add methods to flow-info

48892fd

Add convenience methods to set the genid of a flow and its related flow and to invalidate flows in pairs. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

feat(stateful-nat): build flows in pairs

70828ec

Build the two flows for a stateful NAT session in pairs. This allows invalidating them both on a configuration change. Right now invalidation may not shorten their lifetime, but it will soon. Signed-off-by: Fredi Raspall <fredi@githedgehog.com>

Fredi-raspall force-pushed the pr/fredi/bypass_filter branch from f54731e to 70828ec Compare March 17, 2026 13:38

Fredi-raspall enabled auto-merge March 17, 2026 13:40

Fredi-raspall requested a review from qmonnet March 17, 2026 13:40

qmonnet approved these changes Mar 17, 2026

View reviewed changes

Fredi-raspall added this pull request to the merge queue Mar 17, 2026

Merged via the queue into main with commit 6369705 Mar 17, 2026
21 checks passed

Fredi-raspall deleted the pr/fredi/bypass_filter branch March 17, 2026 15:18

Conversation

Fredi-raspall commented Mar 14, 2026

Uh oh!

qmonnet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fredi-raspall commented Mar 16, 2026

Uh oh!

qmonnet commented Mar 16, 2026

Uh oh!

Fredi-raspall commented Mar 16, 2026

Uh oh!

Fredi-raspall commented Mar 16, 2026

Uh oh!

qmonnet commented Mar 16, 2026

Uh oh!

Fredi-raspall commented Mar 16, 2026

Uh oh!

qmonnet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

qmonnet left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants