Skip to content

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526

Open
etiennemunnich wants to merge 3 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag
Open

feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526
etiennemunnich wants to merge 3 commits intoowasp-modsecurity:v3/masterfrom
etiennemunnich:feature/ctl-regex-ruleRemoveTarget-byid-bytag

Conversation

@etiennemunnich
Copy link
Copy Markdown

Summary

Add regex pattern matching in the variable-key position of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag.

This enables exclusion patterns like:

ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

Fixes #3505

Problem

JSON body processing generates argument names with unpredictable numeric indices (json.0.JobDescription, json.1.JobDescription, ...). Without regex key support, operators must either:

  1. Write one exclusion per possible array index (impractical)
  2. Exclude the entire ARGS collection from the rule (too broad, loses protection)
  3. Use path-based exclusion (loses all CRS protection for the endpoint)

This is a common pain point for anyone running CRS with JSON/GraphQL APIs.

Approach

Following your guidance in the issue discussion, the regex is compiled once at config load — never recompiled per request. This directly addresses the concern about the v2 PR #3121 where regex compilation ran on every exclusion check.

How it works

  1. Detection: In init(), the /pattern/ delimiter is detected in the target string (e.g. ARGS:/^json\.\d+\.Field$/)
  2. Compilation: The pattern is compiled via Utils::Regex (PCRE2 by default, PCRE1 with --with-pcre) at config load time
  3. Storage: Compiled regex is stored as shared_ptr<Utils::Regex> — shared across all requests, zero per-request allocation
  4. Matching: During rule evaluation, searchAll() runs against the short variable-key string (typically 10-40 chars)
  5. Backward compat: Literal targets (ARGS:password) continue to work unchanged via the existing == / case-insensitive comparison

Shared design for ById and ByTag

Both actions use a common RuleRemoveTargetSpec struct with matchesFullName() and matchesKeyWithCollection() methods, avoiding code duplication.

Lexer change

The scanner character class REMOVE_RULE_TARGET_VALUE (previously REMOVE_RULE_TARGET_BY_ID_VALUE, used only by ById) is now shared by both ById and ByTag. It includes regex metacharacters (^ $ + ( ) | ? \) but not comma — so chained ctl: actions still split correctly on ,.

Context: ModSecurity v2 and Coraza

Engine ById ByTag ByMsg Regex keys
ModSec v2 (v2.9.12)
ModSec v3 (upstream)
Coraza (post PR #1561)
This PR ✅ (ById + ByTag)

This PR adds regex key support to the two actions that v3 already has (ById, ByTag). The missing ctl:ruleRemoveTargetByMsg is a separate, larger discussion and is intentionally excluded.

Files Changed (11 files)

File Change
headers/modsecurity/rule_remove_target_entry.h New. RuleRemoveTargetSpec, ByIdEntry, ByTagEntry structs
headers/modsecurity/transaction.h ById + ByTag list types updated to use new entry structs
src/actions/ctl/rule_remove_target_by_id.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_id.h Add shared_ptr<Regex> member
src/actions/ctl/rule_remove_target_by_tag.cc Parse /pattern/, compile regex in init()
src/actions/ctl/rule_remove_target_by_tag.h Add shared_ptr<Regex> member
src/parser/seclang-scanner.ll REMOVE_RULE_TARGET_VALUE shared by ById + ByTag
src/rule_with_operator.cc Both match paths use target.matchesFullName() / matchesKeyWithCollection()
test/test-cases/regression/issue-3505.json 5 tests: ById regex, ByTag regex, ByTag literal compat
test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json 2 CRS-style tests with @detectSQLi + JSON body
test/test-suite.in Register new test files

Test Results

7 new tests, all passing:

Test Description
ById regex — JSON array keys ARGS:/^json\.\d+\.JobDescription$/ excludes dynamic JSON args
ById regex — suffix match ARGS:/mixpanel$/ excludes args by suffix pattern
ByTag regex — JSON array keys Same as above, matching rules by tag
ByTag regex — suffix match Same as above, matching rules by tag
ByTag literal — backward compat ARGS:password — proves literal targets still work unchanged
CRS-style ById + @detectSQLi Realistic JSON POST with SQL injection, excluded by regex key
CRS-style ByTag + @detectSQLi Same scenario, excluded by tag + regex key

Full regression suite: 5005 total, 4987 pass, 18 skip, 0 fail, 0 error.

Performance

Benchmark: 25,000 iterations × 5 trials, JSON POST with 20 ARGS keys, 3 detection rules (@detectSQLi, @detectXSS, @rx), 2 regex exclusions (one ById, one ByTag).

Scenario Median Per-request
Baseline (no exclusions) 1,209 ms 0.048 ms
With regex exclusions 1,326 ms 0.053 ms
Overhead +117 ms +0.005 ms/req (+9.7%)

Scaling with more ARGS keys:

Keys/request Baseline With regex Overhead
20 1,209 ms 1,326 ms +9.7%
50 2,620 ms 2,873 ms +9.7%
100 5,199 ms 5,543 ms +6.6%

The overhead scales linearly with ARGS count — no exponential blowup. At 100 keys (an extreme JSON body), the per-request cost is +0.014 ms. The cost is the searchAll() call on short variable-name strings against precompiled PCRE2 patterns.

Key design decisions keeping performance in check:

  • Regex compiled once at config load, stored as shared_ptr
  • searchAll() runs on short strings (variable names, typically 10-40 chars)
  • Literal targets use existing == comparison — no regression for non-regex users

Made with Cursor

- 2 test cases: JSON array keys, mixpanel suffix (SecRuleUpdateTargetById parity)
- Expected to fail until regex support is implemented
- OODA baseline: Test 1 parse error, Test 2 HTTP 403 (exclusion not applied)

Made-with: Cursor
- Compile regex at config load, not per-request
- RuleRemoveTargetByIdEntry struct: literal or shared_ptr<Regex>
- Test 2 (ARGS:/mixpanel$/) passes; Test 1 blocked by parser owasp-modsecurity#2927

Made-with: Cursor
…veTargetByTag

Add regex pattern matching in the variable-key position of
ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling
exclusions like:

  ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/
  ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/

JSON body processing generates argument names with dynamic array
indices (json.0.Field, json.1.Field, ...). Without regex keys,
operators cannot scope exclusions to specific keys without listing
every possible index or disabling rules entirely.

Design:
- Regex detected by /pattern/ delimiter in COLLECTION:/pattern/
- Compiled once at config load via Utils::Regex (PCRE2/PCRE1)
- Stored as shared_ptr - zero per-request compilation
- Literal targets continue to work unchanged (no breaking change)
- Shared RuleRemoveTargetSpec struct used by both ById and ByTag
- Lexer REMOVE_RULE_TARGET_VALUE class shared by both actions

Aligns ModSecurity v3 with Coraza (corazawaf/coraza#1561).

Fixes owasp-modsecurity#3505
@etiennemunnich etiennemunnich force-pushed the feature/ctl-regex-ruleRemoveTarget-byid-bytag branch from 370f93b to 637ad9c Compare March 28, 2026 22:44
@etiennemunnich
Copy link
Copy Markdown
Author

Known limitation: {m,n} quantifiers in regex keys

After reviewing the discussion in PR #3121 between @airween, @marcstern, and @dune73, I want to flag one inherent limitation.

The comma inside {m,n} quantifiers breaks parsing. For example:

# This will NOT work — comma inside {2,5} terminates the token:
ctl:ruleRemoveTargetById=1;ARGS:/^field\d{2,5}$/

# Parser sees: ARGS:/^field\d{2   ← token ends at comma

This is the same constraint discussed at length in PR #3121 — the comma is the SecLang action separator, and it cannot be included in the lexer character class without breaking ctl:a=...,ctl:b=... chaining.

Workarounds are straightforward:

Instead of Use
\d{2,5} \d\d+ or \d+
[a-z]{3} [a-z][a-z][a-z] or [a-z]+
.{1,10} .+

The character class does include { and } (updated in latest push), so fixed-count quantifiers like \d{3} work fine — only the comma-containing {m,n} form is affected. This matches the same trade-off the v2 PR makes.

I believe this is acceptable for the target use case (JSON key patterns like ^json\.\d+\.FieldName$ and cookie name patterns like ^sess_[a-f0-9]+$), which don't need {m,n}.

@sonarqubecloud
Copy link
Copy Markdown

@airween
Copy link
Copy Markdown
Member

airween commented Mar 29, 2026

Hi @etiennemunnich,

thanks for this great PR! Really excellent work, congrats!

There are some reports from SonarCloud, please take a look at them, but I think that would be enough to check only headers/modsecurity/rule_remove_target_entry.h (these 3 issues).

Regarding to your knowing {m, n} limitation: I see that pain, probably we should check the parser, maybe we can figure out something to resolve this.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds regex support for the variable-key portion of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling scoped exclusions for dynamic JSON keys (e.g., json.0.Field, json.1.Field) without disabling entire collections.

Changes:

  • Introduces RuleRemoveTargetSpec / entry structs and updates transaction storage for rule-remove-target state.
  • Parses /pattern/ targets at config load time and stores a compiled regex for ById/ByTag.
  • Updates rule evaluation paths and adds regression + CRS-style tests for the new behavior.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tools/rules-check/Makefile.am Adds include path to accommodate new header dependencies.
test/test-suite.in Registers new regression test files.
test/test-cases/regression/issue-3505.json Adds regression tests covering ById/ByTag regex and literal compatibility.
test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json Adds CRS-style JSON tests validating regex exclusions with SQLi detection.
test/benchmark/Makefile.am Adds include path to accommodate new header dependencies.
src/rule_with_operator.cc Switches exclusion checks to new matches*() helpers for ById/ByTag.
src/parser/seclang-scanner.ll Broadens lexer token class for remove-target values to allow regex metacharacters.
src/actions/ctl/rule_remove_target_by_tag.h Stores a precompiled regex pointer for ByTag action.
src/actions/ctl/rule_remove_target_by_tag.cc Detects /pattern/, compiles regex in init(), stores structured entry in transaction.
src/actions/ctl/rule_remove_target_by_id.h Stores a precompiled regex pointer for ById action.
src/actions/ctl/rule_remove_target_by_id.cc Detects /pattern/, compiles regex in init(), stores structured entry in transaction.
headers/modsecurity/transaction.h Updates transaction fields to new entry types and includes new header.
headers/modsecurity/rule_remove_target_entry.h New public header defining target spec + ById/ByTag entry structs and match helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +35 to +41
bool matchesKeyWithCollection(const std::string &key,
const std::string &keyWithCollection) const {
if (regex) {
return regex->searchAll(key).size() > 0;
}
return literal == keyWithCollection;
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For regex targets, matchesKeyWithCollection() matches against key only and ignores the collection portion (e.g., ARGS:). This can cause a ctl exclusion like ARGS:/.../ to also exclude variables from other collections that share the same key when a rule inspects multiple collections. Preserve collection scoping by also verifying the collection prefix from literal (before :) matches keyWithCollection/getCollection() before applying the regex.

Copilot uses AI. Check for mistakes.
Comment on lines +526 to 532
std::list<RuleRemoveTargetByTagEntry> m_ruleRemoveTargetByTag;

/**
*
*/
std::list< std::pair<int, std::string> > m_ruleRemoveTargetById;
std::list<RuleRemoveTargetByIdEntry> m_ruleRemoveTargetById;

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing Transaction::m_ruleRemoveTargetByTag / m_ruleRemoveTargetById from std::list<std::pair<...>> to new entry structs is a source/ABI breaking change for any downstream code that accesses these public members. If ABI/API stability matters, consider keeping the existing member types (or providing typedefs / accessors and making the raw storage private) and storing the compiled regex out-of-line to avoid breaking consumers.

Copilot uses AI. Check for mistakes.
Comment on lines +35 to +49
bool matchesKeyWithCollection(const std::string &key,
const std::string &keyWithCollection) const {
if (regex) {
return regex->searchAll(key).size() > 0;
}
return literal == keyWithCollection;
}

bool matchesFullName(const std::string &fullName) const {
if (regex) {
size_t colon = fullName.find(':');
std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size())
? fullName.substr(colon + 1) : fullName;
return regex->searchAll(keyPart).size() > 0;
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

searchAll() allocates match data (PCRE2) and constructs a std::list/std::string per match; here you only need a boolean. Prefer regex->search(key) > 0 (or another non-allocating API) to reduce per-request allocations and better match the PR’s stated performance goals.

Copilot uses AI. Check for mistakes.
$(LUA_LDFLAGS)

benchmark_CPPFLAGS = \
-I$(top_srcdir) \
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding -I$(top_srcdir) here is only necessary because an installed public header now pulls in src/* headers. This broad include path can accidentally change which headers are picked up (source vs generated/build headers). Prefer fixing the header dependency (don’t include src/utils/regex.h from installed headers) so these tool-specific include-path workarounds aren’t needed.

Suggested change
-I$(top_srcdir) \

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +61
std::string pattern = m_target.substr(pattern_start,
pattern_end - pattern_start);
m_regex = std::make_unique<Utils::Regex>(pattern, true);
if (m_regex->hasError()) {
error->assign("Invalid regex in ctl:ruleRemoveTargetByTag: " +
m_target);
return false;
}
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m_regex is a std::shared_ptr<Utils::Regex>, but the code assigns std::make_unique<Utils::Regex>(...), which won’t compile (unique_ptr can’t be assigned to shared_ptr). Use std::make_shared<Utils::Regex>(...) (or construct a shared_ptr explicitly) so the compiled regex is stored correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +45 to +47
size_t colon = fullName.find(':');
std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size())
? fullName.substr(colon + 1) : fullName;
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matchesFullName() has the same scoping issue for regex targets: it strips everything before : and runs the regex against only the key portion, so a target like ARGS:/.../ can match a different collection’s fullName. Ensure the collection in fullName matches the collection specified in literal before applying the regex.

Suggested change
size_t colon = fullName.find(':');
std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size())
? fullName.substr(colon + 1) : fullName;
// Extract collection and key part from fullName.
size_t fullColon = fullName.find(':');
std::string fullCollection;
std::string keyPart;
if (fullColon != std::string::npos) {
fullCollection = fullName.substr(0, fullColon);
if (fullColon + 1 < fullName.size()) {
keyPart = fullName.substr(fullColon + 1);
}
}
if (keyPart.empty()) {
keyPart = fullName;
}
// If literal encodes a collection (e.g. "ARGS:/.../"), ensure it matches
// the collection in fullName before applying the regex.
size_t litColon = literal.find(':');
if (litColon != std::string::npos && !fullCollection.empty()) {
std::string litCollection = literal.substr(0, litColon);
auto ciEqualChar = [](char a, char b) {
return std::tolower(static_cast<unsigned char>(a)) ==
std::tolower(static_cast<unsigned char>(b));
};
if (litCollection.size() != fullCollection.size() ||
!std::equal(litCollection.begin(), litCollection.end(),
fullCollection.begin(), ciEqualChar)) {
return false;
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +23
#include <memory>
#include <string>

#include "src/utils/regex.h"

Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new public header includes src/utils/regex.h, which is an internal header and is not installed with the public headers. This breaks downstream builds that include <modsecurity/transaction.h> from an installed libmodsecurity. Consider forward-declaring modsecurity::Utils::Regex in the public header and moving regex-dependent method definitions into a .cc file inside the library (and ensure rule_remove_target_entry.h is added to the installed header list).

Copilot uses AI. Check for mistakes.
#include "modsecurity/anchored_set_variable_translation_proxy.h"
#include "modsecurity/audit_log.h"
#ifdef __cplusplus
#include "modsecurity/rule_remove_target_entry.h"
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

transaction.h now includes modsecurity/rule_remove_target_entry.h, but that header is not currently listed in pkginclude_HEADERS (src/Makefile.am). That means make install won’t ship it, and external users including <modsecurity/transaction.h> will fail to compile. Add the new header to the installed headers list (and avoid depending on src/* headers from installed headers).

Suggested change
#include "modsecurity/rule_remove_target_entry.h"
// Avoid including non-essential headers here so that transaction.h
// does not depend on headers that may not be installed.

Copilot uses AI. Check for mistakes.
$(LIBXML2_LDFLAGS)

modsec_rules_check_CPPFLAGS = \
-I$(top_srcdir) \
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding -I$(top_srcdir) suggests the build now depends on internal src/* headers being reachable via public includes. This can mask packaging/install issues and may change header resolution order. Prefer keeping installed headers self-contained (no src/* includes) so tools don’t need the entire source tree on the include path.

Suggested change
-I$(top_srcdir) \

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +57
return false;
}
return std::equal(literal.begin(), literal.end(), fullName.begin(),
[](char a, char b) {
return std::tolower(static_cast<unsigned char>(a)) ==
std::tolower(static_cast<unsigned char>(b));
});
Copy link

Copilot AI Mar 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rule_remove_target_entry.h calls std::equal and std::tolower but doesn’t include the standard headers that define them (<algorithm> and <cctype>). Since this is an installed public header, add the required includes explicitly instead of relying on transitive includes.

Copilot uses AI. Check for mistakes.
@airween
Copy link
Copy Markdown
Member

airween commented Mar 29, 2026

@etiennemunnich,

also, unfortunately all Windows tests were failed, for eg this one.

Honestly, I can't see the root cause after a quick look, namely why the compiler does not find the pcre2.h...

Meantime Copilot also made a review, please take a look at them too. Maybe this explains the Windows build error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Wildcard/pattern support in ctl:ruleRemoveTargetById (v3)

3 participants