From 4e793e8cc88c65c9ff64ec4dce9d2e18aa8ade10 Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Mon, 16 Mar 2026 21:53:58 -0600 Subject: [PATCH 1/8] Document RegexOptions.AnyNewLine in conceptual docs Add AnyNewLine mode section to Regular Expression Options article. Add tips/notes about AnyNewLine in: - Multiline mode section (as alternative to \r?\$ workaround) - Anchors doc (\$ and \Z sections) - Character classes doc (Any character: . section) - Quick reference (options table) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../anchors-in-regular-expressions.md | 4 ++ ...haracter-classes-in-regular-expressions.md | 2 + .../base-types/regular-expression-options.md | 48 ++++++++++++++++++- 3 files changed, 53 insertions(+), 1 deletion(-) diff --git a/docs/standard/base-types/anchors-in-regular-expressions.md b/docs/standard/base-types/anchors-in-regular-expressions.md index fa11783a74d01..015de8a738653 100644 --- a/docs/standard/base-types/anchors-in-regular-expressions.md +++ b/docs/standard/base-types/anchors-in-regular-expressions.md @@ -63,6 +63,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where The `$` anchor specifies that the preceding pattern must occur at the end of the input string, or before `\n` at the end of the input string. If you use `$` with the option, the match can also occur at the end of a line. Note that `$` is satisfied at `\n` but not at `\r\n` (the combination of carriage return and newline characters, or CR/LF). To handle the CR/LF character combination, include `\r?$` in the regular expression pattern. Note that `\r?$` will include any `\r` in the match. + + Starting with .NET 11, you can use to make `$` recognize all common newline sequences instead of only `\n`. Unlike the `\r?$` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode). The following example adds the `$` anchor to the regular expression pattern used in the example in the [Start of String or Line](#start-of-string-or-line-) section. When used with the original input string, which includes five lines of text, the method is unable to find a match, because the end of the first line does not match the `$` pattern. When the original input string is split into a string array, the method succeeds in matching each of the five lines. When the method is called with the `options` parameter set to , no matches are found because the regular expression pattern does not account for the carriage return character `\r`. However, when the regular expression pattern is modified by replacing `$` with `\r?$`, calling the method with the `options` parameter set to again finds five matches. @@ -83,6 +85,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where The `\Z` anchor specifies that a match must occur at the end of the input string, or before `\n` at the end of the input string. It is identical to the `$` anchor, except that `\Z` ignores the option. Therefore, in a multiline string, it can only be satisfied by the end of the last line, or the last line before `\n`. Note that `\Z` is satisfied at `\n` but is not satisfied at `\r\n` (the CR/LF character combination). To treat CR/LF as if it were `\n`, include `\r?\Z` in the regular expression pattern. Note that this will make the `\r` part of the match. + + Starting with .NET 11, you can use to make `\Z` recognize all common newline sequences instead of only `\n`. Unlike the `\r?\Z` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode). The following example uses the `\Z` anchor in a regular expression that is similar to the example in the [Start of String or Line](#start-of-string-or-line-) section, which extracts information about the years during which some professional baseball teams existed. The subexpression `\r?\Z` in the regular expression `^((\w+(\s?)){2,}),\s(\w+\s\w+),(\s\d{4}(-(\d{4}|present))?,?)+\r?\Z` is satisfied at the end of a string, and also at the end of a string that ends with `\n` or `\r\n`. As a result, each element in the array matches the regular expression pattern. diff --git a/docs/standard/base-types/character-classes-in-regular-expressions.md b/docs/standard/base-types/character-classes-in-regular-expressions.md index 985fb630f8fd0..f294d934ac89e 100644 --- a/docs/standard/base-types/character-classes-in-regular-expressions.md +++ b/docs/standard/base-types/character-classes-in-regular-expressions.md @@ -162,6 +162,8 @@ The period character (.) matches any character except `\n` (the newline characte - If a regular expression pattern is modified by the option, or if the portion of the pattern that contains the `.` character class is modified by the `s` option, `.` matches any character. For more information, see [Regular Expression Options](regular-expression-options.md). +- Starting with .NET 11, if the option is specified, `.` excludes all common newline characters instead of only `\n`. If both `Singleline` and `AnyNewLine` are specified, `Singleline` takes precedence and `.` matches every character. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode). + The following example illustrates the different behavior of the `.` character class by default and with the option. The regular expression `^.+` starts at the beginning of the string and matches every character. By default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, `\r`, but it doesn't match `\n`. Because the option interprets the entire input string as a single line, it matches every character in the input string, including `\n`. :::code language="csharp" source="snippets/character-classes-in-regular-expressions/csharp/Program.cs" id="AnyCharacterMultiline"::: diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index 02eca90d7eeee..f38e32d22ccd3 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -30,6 +30,7 @@ By default, the comparison of an input string with any literal characters in a r | | Not available | Enable ECMAScript-compliant behavior for the expression. | [ECMAScript matching behavior](#ecmascript-matching-behavior) | | | Not available | Ignore cultural differences in language. | [Comparison using the invariant culture](#compare-using-the-invariant-culture) | | | Not available | Match using an approach that avoids backtracking and guarantees linear-time processing in the length of the input. (Available in .NET 7 and later versions.)| [Nonbacktracking mode](#nonbacktracking-mode) | +| | Not available | Make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`. (Available in .NET 11 and later versions.) | [AnyNewLine mode](#anynewline-mode) | ## Specify options @@ -75,7 +76,7 @@ The following five regular expression options can be set both with the options p - -The following five regular expression options can be set using the `options` parameter but cannot be set inline: +The following seven regular expression options can be set using the `options` parameter but cannot be set inline: - @@ -87,6 +88,10 @@ The following five regular expression options can be set using the `options` par - +- + +- + ## Determine options You can determine which options were provided to a object when it was instantiated by retrieving the value of the read-only property. @@ -150,6 +155,9 @@ By default, `$` will be satisfied only at the end of the input string. If you sp In neither case does `$` recognize the carriage return/line feed character combination (`\r\n`). `$` always ignores any carriage return (`\r`). To end your match with either `\r\n` or `\n`, use the subexpression `\r?$` instead of just `$`. Note that this will make the `\r` part of the match. +> [!TIP] +> Starting with .NET 11, you can use to make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`, removing the need for `\r?` workarounds. `AnyNewLine` also treats `\r\n` as an atomic newline sequence, so `\r` is never included in the match. For more information, see the [AnyNewLine mode](#anynewline-mode) section. + The following example extracts bowlers' names and scores and adds them to a collection that sorts them in descending order. The method is called twice. In the first method call, the regular expression is `^(\w+)\s(\d+)$` and no options are set. As the output shows, because the regular expression engine cannot match the input pattern along with the beginning and end of the input string, no matches are found. In the second method call, the regular expression is changed to `^(\w+)\s(\d+)\r?$` and the options are set to . As the output shows, the names and scores are successfully matched, and the scores are displayed in descending order. [!code-csharp[Conceptual.Regex.Language.Options#3](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.regex.language.options/cs/multiline1.cs#3)] @@ -405,6 +413,44 @@ The mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r`, which leads to common bugs when processing text with mixed or non-Unix line endings. + +The option, which was introduced in .NET 11, makes these constructs recognize all common newline sequences: `\r\n` (CR+LF), `\r` (CR), `\n` (LF), `\u0085` (NEL), `\u2028` (LS), and `\u2029` (PS). This is consistent with [Unicode TR18 RL1.6](https://unicode.org/reports/tr18/#RL1.6). + +For example, without `AnyNewLine`, matching lines in a string with Windows line endings requires manual workarounds like `\r?$`: + +```csharp +// BUG: .+$ captures trailing \r on Windows line endings +var match = Regex.Match("foo\r\nbar", @".+$", RegexOptions.Multiline); +Console.WriteLine(match.Value); // "foo\r" -- not "foo"! +``` + +With `AnyNewLine`, the anchors handle all newline types automatically: + +```csharp +var match = Regex.Match("foo\r\nbar", @".+$", + RegexOptions.Multiline | RegexOptions.AnyNewLine); +Console.WriteLine(match.Value); // "foo" +``` + +The following table summarizes how `AnyNewLine` affects each construct: + +|Construct|Default behavior|With `AnyNewLine`| +|---------|----------------|-----------------| +|`.`|Matches any character except `\n`|Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| +|`$` (Multiline)|Matches before `\n`|Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| +|`^` (Multiline)|Matches after `\n`|Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| +|`$` (default) / `\Z`|Matches before `\n` at end of string|Matches before any newline sequence at end of string| + +Key design points: + +- **`\r\n` is treated atomically**: `$` matches before the full `\r\n` sequence, never between `\r` and `\n`. +- **`Singleline` takes precedence**: `.` with both `Singleline` and `AnyNewLine` matches every character (including newlines), consistent with `Singleline`'s existing behavior. +- **`\A` and `\z` are unaffected**: Absolute start-of-string and end-of-string anchors don't change. +- **Incompatible options**: `AnyNewLine` cannot be combined with or . Attempting to do so throws an . + ## See also - [Regular Expression Language - Quick Reference](regular-expression-language-quick-reference.md) From 5cdacacfe93543d7979c933caac56a8acb4f1c53 Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Mon, 16 Mar 2026 22:41:41 -0600 Subject: [PATCH 2/8] Fix: newline characters -> newline sequences Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../base-types/character-classes-in-regular-expressions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/character-classes-in-regular-expressions.md b/docs/standard/base-types/character-classes-in-regular-expressions.md index f294d934ac89e..d25676d67d91a 100644 --- a/docs/standard/base-types/character-classes-in-regular-expressions.md +++ b/docs/standard/base-types/character-classes-in-regular-expressions.md @@ -162,7 +162,7 @@ The period character (.) matches any character except `\n` (the newline characte - If a regular expression pattern is modified by the option, or if the portion of the pattern that contains the `.` character class is modified by the `s` option, `.` matches any character. For more information, see [Regular Expression Options](regular-expression-options.md). -- Starting with .NET 11, if the option is specified, `.` excludes all common newline characters instead of only `\n`. If both `Singleline` and `AnyNewLine` are specified, `Singleline` takes precedence and `.` matches every character. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode). +- Starting with .NET 11, if the option is specified, `.` excludes all common newline sequences instead of only `\n`. If both `Singleline` and `AnyNewLine` are specified, `Singleline` takes precedence and `.` matches every character. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode). The following example illustrates the different behavior of the `.` character class by default and with the option. The regular expression `^.+` starts at the beginning of the string and matches every character. By default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, `\r`, but it doesn't match `\n`. Because the option interprets the entire input string as a single line, it matches every character in the input string, including `\n`. From 75844424306cda07d77a2a33e7a7846133e03a5c Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Mon, 16 Mar 2026 22:46:07 -0600 Subject: [PATCH 3/8] Scope . NOTE to default behavior, mention AnyNewLine Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../base-types/character-classes-in-regular-expressions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/character-classes-in-regular-expressions.md b/docs/standard/base-types/character-classes-in-regular-expressions.md index d25676d67d91a..626f39e7ac1af 100644 --- a/docs/standard/base-types/character-classes-in-regular-expressions.md +++ b/docs/standard/base-types/character-classes-in-regular-expressions.md @@ -170,7 +170,7 @@ The period character (.) matches any character except `\n` (the newline characte :::code language="vb" source="snippets/character-classes-in-regular-expressions/vb/Program.vb" id="AnyCharacterMultiline"::: > [!NOTE] -> Because it matches any character except `\n`, the `.` character class also matches `\r` (the carriage return character). +> By default, because it matches any character except `\n`, the `.` character class also matches `\r` (the carriage return character). With , `.` excludes `\r` and other newline sequences as well. - In a positive or negative character group, a period is treated as a literal period character, and not as a character class. For more information, see [Positive Character Group](#PositiveGroup) and [Negative Character Group](#NegativeGroup) earlier in this article. The following example provides an illustration by defining a regular expression that includes the period character (`.`) both as a character class and as a member of a positive character group. The regular expression `\b.*[.?!;:](\s|\z)` begins at a word boundary, matches any character until it encounters one of five punctuation marks, including a period, and then matches either a white-space character or the end of the string. From c0560f7c313cc82987935b74e8169f6dd8c739f8 Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Tue, 17 Mar 2026 00:13:31 -0600 Subject: [PATCH 4/8] Add AnyNewLine to NonBacktracking incompatible options list Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/standard/base-types/regular-expression-options.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index f38e32d22ccd3..5652dd7b2445e 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -400,7 +400,7 @@ The following example is identical to the previous example, except that the stat By default, .NET's regex engine uses *backtracking* to try to find pattern matches. A backtracking engine is one that tries to match one pattern, and if that fails, goes backs and tries to match an alternate pattern, and so on. A backtracking engine is very fast for typical cases, but slows down as the number of pattern alternations increases, which can lead to *catastrophic backtracking*. The option, which was introduced in .NET 7, doesn't use backtracking and avoids that worst-case scenario. Its goal is to provide consistently good behavior, regardless of the input being searched. -The option doesn't support everything the other built-in engines support. In particular, the option can't be used in conjunction with or . It also doesn't allow for the following constructs in the pattern: +The option doesn't support everything the other built-in engines support. In particular, the option can't be used in conjunction with , , or . It also doesn't allow for the following constructs in the pattern: - Atomic groups - Backreferences From ba512209bf483f2c8d52a619fd326c28d19daf45 Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Thu, 19 Mar 2026 16:01:48 -0600 Subject: [PATCH 5/8] Improve AnyNewLine behavior table formatting Apply gewarren's suggestion to align table columns for readability. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../base-types/regular-expression-options.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index 5652dd7b2445e..705eaf91e1a9e 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -437,12 +437,12 @@ Console.WriteLine(match.Value); // "foo" The following table summarizes how `AnyNewLine` affects each construct: -|Construct|Default behavior|With `AnyNewLine`| -|---------|----------------|-----------------| -|`.`|Matches any character except `\n`|Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| -|`$` (Multiline)|Matches before `\n`|Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| -|`^` (Multiline)|Matches after `\n`|Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029`| -|`$` (default) / `\Z`|Matches before `\n` at end of string|Matches before any newline sequence at end of string| +| Construct | Default behavior | With `AnyNewLine` | +|----------------------|--------------------------------------|-----------------------------------------------------------------------| +| `.` | Matches any character except `\n` | Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | +| `$` (Multiline) | Matches before `\n` | Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | +| `^` (Multiline) | Matches after `\n` | Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | +| `$` (default) / `\Z` | Matches before `\n` at end of string | Matches before any newline sequence at end of string | Key design points: From ca57054553daf16bd2302a2c3903102c0893961f Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Thu, 19 Mar 2026 16:02:56 -0600 Subject: [PATCH 6/8] Clarify that . matching \r applies only without Singleline Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/standard/base-types/regular-expression-options.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index 705eaf91e1a9e..40d8f76153cf4 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -415,7 +415,7 @@ For more information about backtracking, see [Backtracking in regular expression ## AnyNewLine mode -By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r`, which leads to common bugs when processing text with mixed or non-Unix line endings. +By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r` (unless is enabled, in which case `.` matches all characters), which leads to common bugs when processing text with mixed or non-Unix line endings. The option, which was introduced in .NET 11, makes these constructs recognize all common newline sequences: `\r\n` (CR+LF), `\r` (CR), `\n` (LF), `\u0085` (NEL), `\u2028` (LS), and `\u2029` (PS). This is consistent with [Unicode TR18 RL1.6](https://unicode.org/reports/tr18/#RL1.6). From 801a0b4a355cf50fe621a2e732bb48b23153c6ca Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Thu, 19 Mar 2026 16:03:17 -0600 Subject: [PATCH 7/8] Add (default) qualifier to . row in AnyNewLine table Clarifies that the . behavior described applies without Singleline, consistent with how other rows qualify their mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/standard/base-types/regular-expression-options.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index 40d8f76153cf4..ef0547706776e 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -439,7 +439,7 @@ The following table summarizes how `AnyNewLine` affects each construct: | Construct | Default behavior | With `AnyNewLine` | |----------------------|--------------------------------------|-----------------------------------------------------------------------| -| `.` | Matches any character except `\n` | Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | +| `.` (default) | Matches any character except `\n` | Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | | `$` (Multiline) | Matches before `\n` | Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | | `^` (Multiline) | Matches after `\n` | Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` | | `$` (default) / `\Z` | Matches before `\n` at end of string | Matches before any newline sequence at end of string | From 0c581d8ca7c2bc08e527ce42d05157f38f766b54 Mon Sep 17 00:00:00 2001 From: Dan Moseley Date: Thu, 19 Mar 2026 16:08:44 -0600 Subject: [PATCH 8/8] Clarify that . matches \r but not \n by default Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/standard/base-types/regular-expression-options.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/standard/base-types/regular-expression-options.md b/docs/standard/base-types/regular-expression-options.md index ef0547706776e..d928ac7446a1b 100644 --- a/docs/standard/base-types/regular-expression-options.md +++ b/docs/standard/base-types/regular-expression-options.md @@ -415,7 +415,7 @@ For more information about backtracking, see [Backtracking in regular expression ## AnyNewLine mode -By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r` (unless is enabled, in which case `.` matches all characters), which leads to common bugs when processing text with mixed or non-Unix line endings. +By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r` but not `\n` (unless is enabled, in which case `.` matches all characters), which leads to common bugs when processing text with mixed or non-Unix line endings. The option, which was introduced in .NET 11, makes these constructs recognize all common newline sequences: `\r\n` (CR+LF), `\r` (CR), `\n` (LF), `\u0085` (NEL), `\u2028` (LS), and `\u2029` (PS). This is consistent with [Unicode TR18 RL1.6](https://unicode.org/reports/tr18/#RL1.6).