feat: regex key support for ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag#3526
Conversation
- 2 test cases: JSON array keys, mixpanel suffix (SecRuleUpdateTargetById parity) - Expected to fail until regex support is implemented - OODA baseline: Test 1 parse error, Test 2 HTTP 403 (exclusion not applied) Made-with: Cursor
- Compile regex at config load, not per-request - RuleRemoveTargetByIdEntry struct: literal or shared_ptr<Regex> - Test 2 (ARGS:/mixpanel$/) passes; Test 1 blocked by parser owasp-modsecurity#2927 Made-with: Cursor
…veTargetByTag Add regex pattern matching in the variable-key position of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling exclusions like: ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/ ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/ JSON body processing generates argument names with dynamic array indices (json.0.Field, json.1.Field, ...). Without regex keys, operators cannot scope exclusions to specific keys without listing every possible index or disabling rules entirely. Design: - Regex detected by /pattern/ delimiter in COLLECTION:/pattern/ - Compiled once at config load via Utils::Regex (PCRE2/PCRE1) - Stored as shared_ptr - zero per-request compilation - Literal targets continue to work unchanged (no breaking change) - Shared RuleRemoveTargetSpec struct used by both ById and ByTag - Lexer REMOVE_RULE_TARGET_VALUE class shared by both actions Aligns ModSecurity v3 with Coraza (corazawaf/coraza#1561). Fixes owasp-modsecurity#3505
370f93b to
637ad9c
Compare
Known limitation:
|
| Instead of | Use |
|---|---|
\d{2,5} |
\d\d+ or \d+ |
[a-z]{3} |
[a-z][a-z][a-z] or [a-z]+ |
.{1,10} |
.+ |
The character class does include { and } (updated in latest push), so fixed-count quantifiers like \d{3} work fine — only the comma-containing {m,n} form is affected. This matches the same trade-off the v2 PR makes.
I believe this is acceptable for the target use case (JSON key patterns like ^json\.\d+\.FieldName$ and cookie name patterns like ^sess_[a-f0-9]+$), which don't need {m,n}.
|
|
Hi @etiennemunnich, thanks for this great PR! Really excellent work, congrats! There are some reports from SonarCloud, please take a look at them, but I think that would be enough to check only Regarding to your knowing |
There was a problem hiding this comment.
Pull request overview
Adds regex support for the variable-key portion of ctl:ruleRemoveTargetById and ctl:ruleRemoveTargetByTag, enabling scoped exclusions for dynamic JSON keys (e.g., json.0.Field, json.1.Field) without disabling entire collections.
Changes:
- Introduces
RuleRemoveTargetSpec/ entry structs and updates transaction storage for rule-remove-target state. - Parses
/pattern/targets at config load time and stores a compiled regex for ById/ByTag. - Updates rule evaluation paths and adds regression + CRS-style tests for the new behavior.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/rules-check/Makefile.am | Adds include path to accommodate new header dependencies. |
| test/test-suite.in | Registers new regression test files. |
| test/test-cases/regression/issue-3505.json | Adds regression tests covering ById/ByTag regex and literal compatibility. |
| test/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json | Adds CRS-style JSON tests validating regex exclusions with SQLi detection. |
| test/benchmark/Makefile.am | Adds include path to accommodate new header dependencies. |
| src/rule_with_operator.cc | Switches exclusion checks to new matches*() helpers for ById/ByTag. |
| src/parser/seclang-scanner.ll | Broadens lexer token class for remove-target values to allow regex metacharacters. |
| src/actions/ctl/rule_remove_target_by_tag.h | Stores a precompiled regex pointer for ByTag action. |
| src/actions/ctl/rule_remove_target_by_tag.cc | Detects /pattern/, compiles regex in init(), stores structured entry in transaction. |
| src/actions/ctl/rule_remove_target_by_id.h | Stores a precompiled regex pointer for ById action. |
| src/actions/ctl/rule_remove_target_by_id.cc | Detects /pattern/, compiles regex in init(), stores structured entry in transaction. |
| headers/modsecurity/transaction.h | Updates transaction fields to new entry types and includes new header. |
| headers/modsecurity/rule_remove_target_entry.h | New public header defining target spec + ById/ByTag entry structs and match helpers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| bool matchesKeyWithCollection(const std::string &key, | ||
| const std::string &keyWithCollection) const { | ||
| if (regex) { | ||
| return regex->searchAll(key).size() > 0; | ||
| } | ||
| return literal == keyWithCollection; | ||
| } |
There was a problem hiding this comment.
For regex targets, matchesKeyWithCollection() matches against key only and ignores the collection portion (e.g., ARGS:). This can cause a ctl exclusion like ARGS:/.../ to also exclude variables from other collections that share the same key when a rule inspects multiple collections. Preserve collection scoping by also verifying the collection prefix from literal (before :) matches keyWithCollection/getCollection() before applying the regex.
| std::list<RuleRemoveTargetByTagEntry> m_ruleRemoveTargetByTag; | ||
|
|
||
| /** | ||
| * | ||
| */ | ||
| std::list< std::pair<int, std::string> > m_ruleRemoveTargetById; | ||
| std::list<RuleRemoveTargetByIdEntry> m_ruleRemoveTargetById; | ||
|
|
There was a problem hiding this comment.
Changing Transaction::m_ruleRemoveTargetByTag / m_ruleRemoveTargetById from std::list<std::pair<...>> to new entry structs is a source/ABI breaking change for any downstream code that accesses these public members. If ABI/API stability matters, consider keeping the existing member types (or providing typedefs / accessors and making the raw storage private) and storing the compiled regex out-of-line to avoid breaking consumers.
| bool matchesKeyWithCollection(const std::string &key, | ||
| const std::string &keyWithCollection) const { | ||
| if (regex) { | ||
| return regex->searchAll(key).size() > 0; | ||
| } | ||
| return literal == keyWithCollection; | ||
| } | ||
|
|
||
| bool matchesFullName(const std::string &fullName) const { | ||
| if (regex) { | ||
| size_t colon = fullName.find(':'); | ||
| std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size()) | ||
| ? fullName.substr(colon + 1) : fullName; | ||
| return regex->searchAll(keyPart).size() > 0; | ||
| } |
There was a problem hiding this comment.
searchAll() allocates match data (PCRE2) and constructs a std::list/std::string per match; here you only need a boolean. Prefer regex->search(key) > 0 (or another non-allocating API) to reduce per-request allocations and better match the PR’s stated performance goals.
| $(LUA_LDFLAGS) | ||
|
|
||
| benchmark_CPPFLAGS = \ | ||
| -I$(top_srcdir) \ |
There was a problem hiding this comment.
Adding -I$(top_srcdir) here is only necessary because an installed public header now pulls in src/* headers. This broad include path can accidentally change which headers are picked up (source vs generated/build headers). Prefer fixing the header dependency (don’t include src/utils/regex.h from installed headers) so these tool-specific include-path workarounds aren’t needed.
| -I$(top_srcdir) \ |
| std::string pattern = m_target.substr(pattern_start, | ||
| pattern_end - pattern_start); | ||
| m_regex = std::make_unique<Utils::Regex>(pattern, true); | ||
| if (m_regex->hasError()) { | ||
| error->assign("Invalid regex in ctl:ruleRemoveTargetByTag: " + | ||
| m_target); | ||
| return false; | ||
| } |
There was a problem hiding this comment.
m_regex is a std::shared_ptr<Utils::Regex>, but the code assigns std::make_unique<Utils::Regex>(...), which won’t compile (unique_ptr can’t be assigned to shared_ptr). Use std::make_shared<Utils::Regex>(...) (or construct a shared_ptr explicitly) so the compiled regex is stored correctly.
| size_t colon = fullName.find(':'); | ||
| std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size()) | ||
| ? fullName.substr(colon + 1) : fullName; |
There was a problem hiding this comment.
matchesFullName() has the same scoping issue for regex targets: it strips everything before : and runs the regex against only the key portion, so a target like ARGS:/.../ can match a different collection’s fullName. Ensure the collection in fullName matches the collection specified in literal before applying the regex.
| size_t colon = fullName.find(':'); | |
| std::string keyPart = (colon != std::string::npos && colon + 1 < fullName.size()) | |
| ? fullName.substr(colon + 1) : fullName; | |
| // Extract collection and key part from fullName. | |
| size_t fullColon = fullName.find(':'); | |
| std::string fullCollection; | |
| std::string keyPart; | |
| if (fullColon != std::string::npos) { | |
| fullCollection = fullName.substr(0, fullColon); | |
| if (fullColon + 1 < fullName.size()) { | |
| keyPart = fullName.substr(fullColon + 1); | |
| } | |
| } | |
| if (keyPart.empty()) { | |
| keyPart = fullName; | |
| } | |
| // If literal encodes a collection (e.g. "ARGS:/.../"), ensure it matches | |
| // the collection in fullName before applying the regex. | |
| size_t litColon = literal.find(':'); | |
| if (litColon != std::string::npos && !fullCollection.empty()) { | |
| std::string litCollection = literal.substr(0, litColon); | |
| auto ciEqualChar = [](char a, char b) { | |
| return std::tolower(static_cast<unsigned char>(a)) == | |
| std::tolower(static_cast<unsigned char>(b)); | |
| }; | |
| if (litCollection.size() != fullCollection.size() || | |
| !std::equal(litCollection.begin(), litCollection.end(), | |
| fullCollection.begin(), ciEqualChar)) { | |
| return false; | |
| } | |
| } |
| #include <memory> | ||
| #include <string> | ||
|
|
||
| #include "src/utils/regex.h" | ||
|
|
There was a problem hiding this comment.
This new public header includes src/utils/regex.h, which is an internal header and is not installed with the public headers. This breaks downstream builds that include <modsecurity/transaction.h> from an installed libmodsecurity. Consider forward-declaring modsecurity::Utils::Regex in the public header and moving regex-dependent method definitions into a .cc file inside the library (and ensure rule_remove_target_entry.h is added to the installed header list).
| #include "modsecurity/anchored_set_variable_translation_proxy.h" | ||
| #include "modsecurity/audit_log.h" | ||
| #ifdef __cplusplus | ||
| #include "modsecurity/rule_remove_target_entry.h" |
There was a problem hiding this comment.
transaction.h now includes modsecurity/rule_remove_target_entry.h, but that header is not currently listed in pkginclude_HEADERS (src/Makefile.am). That means make install won’t ship it, and external users including <modsecurity/transaction.h> will fail to compile. Add the new header to the installed headers list (and avoid depending on src/* headers from installed headers).
| #include "modsecurity/rule_remove_target_entry.h" | |
| // Avoid including non-essential headers here so that transaction.h | |
| // does not depend on headers that may not be installed. |
| $(LIBXML2_LDFLAGS) | ||
|
|
||
| modsec_rules_check_CPPFLAGS = \ | ||
| -I$(top_srcdir) \ |
There was a problem hiding this comment.
Adding -I$(top_srcdir) suggests the build now depends on internal src/* headers being reachable via public includes. This can mask packaging/install issues and may change header resolution order. Prefer keeping installed headers self-contained (no src/* includes) so tools don’t need the entire source tree on the include path.
| -I$(top_srcdir) \ |
| return false; | ||
| } | ||
| return std::equal(literal.begin(), literal.end(), fullName.begin(), | ||
| [](char a, char b) { | ||
| return std::tolower(static_cast<unsigned char>(a)) == | ||
| std::tolower(static_cast<unsigned char>(b)); | ||
| }); |
There was a problem hiding this comment.
rule_remove_target_entry.h calls std::equal and std::tolower but doesn’t include the standard headers that define them (<algorithm> and <cctype>). Since this is an installed public header, add the required includes explicitly instead of relying on transitive includes.



Summary
Add regex pattern matching in the variable-key position of
ctl:ruleRemoveTargetByIdandctl:ruleRemoveTargetByTag.This enables exclusion patterns like:
ctl:ruleRemoveTargetById=932125;ARGS:/^json\.\d+\.JobDescription$/ ctl:ruleRemoveTargetByTag=XSS;ARGS:/^json\.\d+\.JobDescription$/Fixes #3505
Problem
JSON body processing generates argument names with unpredictable numeric indices (
json.0.JobDescription,json.1.JobDescription, ...). Without regex key support, operators must either:This is a common pain point for anyone running CRS with JSON/GraphQL APIs.
Approach
Following your guidance in the issue discussion, the regex is compiled once at config load — never recompiled per request. This directly addresses the concern about the v2 PR #3121 where regex compilation ran on every exclusion check.
How it works
init(), the/pattern/delimiter is detected in the target string (e.g.ARGS:/^json\.\d+\.Field$/)Utils::Regex(PCRE2 by default, PCRE1 with--with-pcre) at config load timeshared_ptr<Utils::Regex>— shared across all requests, zero per-request allocationsearchAll()runs against the short variable-key string (typically 10-40 chars)ARGS:password) continue to work unchanged via the existing==/ case-insensitive comparisonShared design for ById and ByTag
Both actions use a common
RuleRemoveTargetSpecstruct withmatchesFullName()andmatchesKeyWithCollection()methods, avoiding code duplication.Lexer change
The scanner character class
REMOVE_RULE_TARGET_VALUE(previouslyREMOVE_RULE_TARGET_BY_ID_VALUE, used only by ById) is now shared by both ById and ByTag. It includes regex metacharacters (^ $ + ( ) | ? \) but not comma — so chainedctl:actions still split correctly on,.Context: ModSecurity v2 and Coraza
This PR adds regex key support to the two actions that v3 already has (ById, ByTag). The missing
ctl:ruleRemoveTargetByMsgis a separate, larger discussion and is intentionally excluded.Files Changed (11 files)
headers/modsecurity/rule_remove_target_entry.hRuleRemoveTargetSpec,ByIdEntry,ByTagEntrystructsheaders/modsecurity/transaction.hsrc/actions/ctl/rule_remove_target_by_id.cc/pattern/, compile regex ininit()src/actions/ctl/rule_remove_target_by_id.hshared_ptr<Regex>membersrc/actions/ctl/rule_remove_target_by_tag.cc/pattern/, compile regex ininit()src/actions/ctl/rule_remove_target_by_tag.hshared_ptr<Regex>membersrc/parser/seclang-scanner.llREMOVE_RULE_TARGET_VALUEshared by ById + ByTagsrc/rule_with_operator.cctarget.matchesFullName()/matchesKeyWithCollection()test/test-cases/regression/issue-3505.jsontest/test-cases/regression/issue-3505-crs-ctl-byid-tag-msg.json@detectSQLi+ JSON bodytest/test-suite.inTest Results
7 new tests, all passing:
ARGS:/^json\.\d+\.JobDescription$/excludes dynamic JSON argsARGS:/mixpanel$/excludes args by suffix patternARGS:password— proves literal targets still work unchanged@detectSQLi@detectSQLiFull regression suite: 5005 total, 4987 pass, 18 skip, 0 fail, 0 error.
Performance
Benchmark: 25,000 iterations × 5 trials, JSON POST with 20 ARGS keys, 3 detection rules (
@detectSQLi,@detectXSS,@rx), 2 regex exclusions (one ById, one ByTag).Scaling with more ARGS keys:
The overhead scales linearly with ARGS count — no exponential blowup. At 100 keys (an extreme JSON body), the per-request cost is +0.014 ms. The cost is the
searchAll()call on short variable-name strings against precompiled PCRE2 patterns.Key design decisions keeping performance in check:
shared_ptrsearchAll()runs on short strings (variable names, typically 10-40 chars)==comparison — no regression for non-regex usersMade with Cursor