Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/standard/base-types/anchors-in-regular-expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where
The `$` anchor specifies that the preceding pattern must occur at the end of the input string, or before `\n` at the end of the input string.

If you use `$` with the <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> option, the match can also occur at the end of a line. Note that `$` is satisfied at `\n` but not at `\r\n` (the combination of carriage return and newline characters, or CR/LF). To handle the CR/LF character combination, include `\r?$` in the regular expression pattern. Note that `\r?$` will include any `\r` in the match.

Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `$` recognize all common newline sequences instead of only `\n`. Unlike the `\r?$` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).

The following example adds the `$` anchor to the regular expression pattern used in the example in the [Start of String or Line](#start-of-string-or-line-) section. When used with the original input string, which includes five lines of text, the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%29?displayProperty=nameWithType> method is unable to find a match, because the end of the first line does not match the `$` pattern. When the original input string is split into a string array, the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%29?displayProperty=nameWithType> method succeeds in matching each of the five lines. When the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%2CSystem.Text.RegularExpressions.RegexOptions%29?displayProperty=nameWithType> method is called with the `options` parameter set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType>, no matches are found because the regular expression pattern does not account for the carriage return character `\r`. However, when the regular expression pattern is modified by replacing `$` with `\r?$`, calling the <xref:System.Text.RegularExpressions.Regex.Matches%28System.String%2CSystem.String%2CSystem.Text.RegularExpressions.RegexOptions%29?displayProperty=nameWithType> method with the `options` parameter set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> again finds five matches.

Expand All @@ -83,6 +85,8 @@ Anchors, or atomic zero-width assertions, specify a position in the string where
The `\Z` anchor specifies that a match must occur at the end of the input string, or before `\n` at the end of the input string. It is identical to the `$` anchor, except that `\Z` ignores the <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> option. Therefore, in a multiline string, it can only be satisfied by the end of the last line, or the last line before `\n`.

Note that `\Z` is satisfied at `\n` but is not satisfied at `\r\n` (the CR/LF character combination). To treat CR/LF as if it were `\n`, include `\r?\Z` in the regular expression pattern. Note that this will make the `\r` part of the match.

Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `\Z` recognize all common newline sequences instead of only `\n`. Unlike the `\r?\Z` workaround, `AnyNewLine` treats `\r\n` as an atomic sequence, so `\r` is not included in the match. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).

The following example uses the `\Z` anchor in a regular expression that is similar to the example in the [Start of String or Line](#start-of-string-or-line-) section, which extracts information about the years during which some professional baseball teams existed. The subexpression `\r?\Z` in the regular expression `^((\w+(\s?)){2,}),\s(\w+\s\w+),(\s\d{4}(-(\d{4}|present))?,?)+\r?\Z` is satisfied at the end of a string, and also at the end of a string that ends with `\n` or `\r\n`. As a result, each element in the array matches the regular expression pattern.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,13 +162,15 @@ The period character (.) matches any character except `\n` (the newline characte

- If a regular expression pattern is modified by the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option, or if the portion of the pattern that contains the `.` character class is modified by the `s` option, `.` matches any character. For more information, see [Regular Expression Options](regular-expression-options.md).

- Starting with .NET 11, if the <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> option is specified, `.` excludes all common newline sequences instead of only `\n`. If both `Singleline` and `AnyNewLine` are specified, `Singleline` takes precedence and `.` matches every character. For more information, see [AnyNewLine mode](regular-expression-options.md#anynewline-mode).

The following example illustrates the different behavior of the `.` character class by default and with the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option. The regular expression `^.+` starts at the beginning of the string and matches every character. By default, the match ends at the end of the first line; the regular expression pattern matches the carriage return character, `\r`, but it doesn't match `\n`. Because the <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> option interprets the entire input string as a single line, it matches every character in the input string, including `\n`.

:::code language="csharp" source="snippets/character-classes-in-regular-expressions/csharp/Program.cs" id="AnyCharacterMultiline":::
:::code language="vb" source="snippets/character-classes-in-regular-expressions/vb/Program.vb" id="AnyCharacterMultiline":::

> [!NOTE]
> Because it matches any character except `\n`, the `.` character class also matches `\r` (the carriage return character).
> By default, because it matches any character except `\n`, the `.` character class also matches `\r` (the carriage return character). With <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType>, `.` excludes `\r` and other newline sequences as well.

- In a positive or negative character group, a period is treated as a literal period character, and not as a character class. For more information, see [Positive Character Group](#PositiveGroup) and [Negative Character Group](#NegativeGroup) earlier in this article. The following example provides an illustration by defining a regular expression that includes the period character (`.`) both as a character class and as a member of a positive character group. The regular expression `\b.*[.?!;:](\s|\z)` begins at a word boundary, matches any character until it encounters one of five punctuation marks, including a period, and then matches either a white-space character or the end of the string.

Expand Down
50 changes: 48 additions & 2 deletions docs/standard/base-types/regular-expression-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ By default, the comparison of an input string with any literal characters in a r
| <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript> | Not available | Enable ECMAScript-compliant behavior for the expression. | [ECMAScript matching behavior](#ecmascript-matching-behavior) |
| <xref:System.Text.RegularExpressions.RegexOptions.CultureInvariant> | Not available | Ignore cultural differences in language. | [Comparison using the invariant culture](#compare-using-the-invariant-culture) |
| <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking> | Not available | Match using an approach that avoids backtracking and guarantees linear-time processing in the length of the input. (Available in .NET 7 and later versions.)| [Nonbacktracking mode](#nonbacktracking-mode) |
| <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine> | Not available | Make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`. (Available in .NET 11 and later versions.) | [AnyNewLine mode](#anynewline-mode) |

## Specify options

Expand Down Expand Up @@ -75,7 +76,7 @@ The following five regular expression options can be set both with the options p

- <xref:System.Text.RegularExpressions.RegexOptions.IgnorePatternWhitespace?displayProperty=nameWithType>

The following five regular expression options can be set using the `options` parameter but cannot be set inline:
The following seven regular expression options can be set using the `options` parameter but cannot be set inline:

- <xref:System.Text.RegularExpressions.RegexOptions.None?displayProperty=nameWithType>

Expand All @@ -87,6 +88,10 @@ The following five regular expression options can be set using the `options` par

- <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>

- <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType>

- <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType>

## Determine options

You can determine which options were provided to a <xref:System.Text.RegularExpressions.Regex> object when it was instantiated by retrieving the value of the read-only <xref:System.Text.RegularExpressions.Regex.Options%2A?displayProperty=nameWithType> property.
Expand Down Expand Up @@ -150,6 +155,9 @@ By default, `$` will be satisfied only at the end of the input string. If you sp

In neither case does `$` recognize the carriage return/line feed character combination (`\r\n`). `$` always ignores any carriage return (`\r`). To end your match with either `\r\n` or `\n`, use the subexpression `\r?$` instead of just `$`. Note that this will make the `\r` part of the match.

> [!TIP]
> Starting with .NET 11, you can use <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> to make `^`, `$`, `\Z`, and `.` recognize all common newline sequences instead of only `\n`, removing the need for `\r?` workarounds. `AnyNewLine` also treats `\r\n` as an atomic newline sequence, so `\r` is never included in the match. For more information, see the [AnyNewLine mode](#anynewline-mode) section.

The following example extracts bowlers' names and scores and adds them to a <xref:System.Collections.Generic.SortedList%602> collection that sorts them in descending order. The <xref:System.Text.RegularExpressions.Regex.Matches%2A> method is called twice. In the first method call, the regular expression is `^(\w+)\s(\d+)$` and no options are set. As the output shows, because the regular expression engine cannot match the input pattern along with the beginning and end of the input string, no matches are found. In the second method call, the regular expression is changed to `^(\w+)\s(\d+)\r?$` and the options are set to <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType>. As the output shows, the names and scores are successfully matched, and the scores are displayed in descending order.

[!code-csharp[Conceptual.Regex.Language.Options#3](../../../samples/snippets/csharp/VS_Snippets_CLR/conceptual.regex.language.options/cs/multiline1.cs#3)]
Expand Down Expand Up @@ -392,7 +400,7 @@ The following example is identical to the previous example, except that the stat

By default, .NET's regex engine uses *backtracking* to try to find pattern matches. A backtracking engine is one that tries to match one pattern, and if that fails, goes backs and tries to match an alternate pattern, and so on. A backtracking engine is very fast for typical cases, but slows down as the number of pattern alternations increases, which can lead to *catastrophic backtracking*. The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option, which was introduced in .NET 7, doesn't use backtracking and avoids that worst-case scenario. Its goal is to provide consistently good behavior, regardless of the input being searched.

The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option doesn't support everything the other built-in engines support. In particular, the option can't be used in conjunction with <xref:System.Text.RegularExpressions.RegexOptions.RightToLeft?displayProperty=nameWithType> or <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>. It also doesn't allow for the following constructs in the pattern:
The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> option doesn't support everything the other built-in engines support. In particular, the option can't be used in conjunction with <xref:System.Text.RegularExpressions.RegexOptions.RightToLeft?displayProperty=nameWithType>, <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>, or <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType>. It also doesn't allow for the following constructs in the pattern:

- Atomic groups
- Backreferences
Expand All @@ -405,6 +413,44 @@ The <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayPro

For more information about backtracking, see [Backtracking in regular expressions](backtracking-in-regular-expressions.md).

## AnyNewLine mode

By default, .NET's regular expression engine treats only `\n` as a newline character. The anchors `^` and `$` (in <xref:System.Text.RegularExpressions.RegexOptions.Multiline?displayProperty=nameWithType> mode), `\Z`, and the wildcard `.` all use `\n` as the sole line boundary. This means that `$` doesn't match before `\r\n` (Windows-style line endings), and `.` matches `\r` but not `\n` (unless <xref:System.Text.RegularExpressions.RegexOptions.Singleline?displayProperty=nameWithType> is enabled, in which case `.` matches all characters), which leads to common bugs when processing text with mixed or non-Unix line endings.

The <xref:System.Text.RegularExpressions.RegexOptions.AnyNewLine?displayProperty=nameWithType> option, which was introduced in .NET 11, makes these constructs recognize all common newline sequences: `\r\n` (CR+LF), `\r` (CR), `\n` (LF), `\u0085` (NEL), `\u2028` (LS), and `\u2029` (PS). This is consistent with [Unicode TR18 RL1.6](https://unicode.org/reports/tr18/#RL1.6).

For example, without `AnyNewLine`, matching lines in a string with Windows line endings requires manual workarounds like `\r?$`:

```csharp
// BUG: .+$ captures trailing \r on Windows line endings
var match = Regex.Match("foo\r\nbar", @".+$", RegexOptions.Multiline);
Console.WriteLine(match.Value); // "foo\r" -- not "foo"!
```

With `AnyNewLine`, the anchors handle all newline types automatically:

```csharp
var match = Regex.Match("foo\r\nbar", @".+$",
RegexOptions.Multiline | RegexOptions.AnyNewLine);
Console.WriteLine(match.Value); // "foo"
```

The following table summarizes how `AnyNewLine` affects each construct:

| Construct | Default behavior | With `AnyNewLine` |
|----------------------|--------------------------------------|-----------------------------------------------------------------------|
| `.` (default) | Matches any character except `\n` | Matches any character except `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` |
| `$` (Multiline) | Matches before `\n` | Matches before `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` |
| `^` (Multiline) | Matches after `\n` | Matches after `\r\n`, `\r`, `\n`, `\u0085`, `\u2028`, `\u2029` |
| `$` (default) / `\Z` | Matches before `\n` at end of string | Matches before any newline sequence at end of string |

Key design points:

- **`\r\n` is treated atomically**: `$` matches before the full `\r\n` sequence, never between `\r` and `\n`.
- **`Singleline` takes precedence**: `.` with both `Singleline` and `AnyNewLine` matches every character (including newlines), consistent with `Singleline`'s existing behavior.
- **`\A` and `\z` are unaffected**: Absolute start-of-string and end-of-string anchors don't change.
- **Incompatible options**: `AnyNewLine` cannot be combined with <xref:System.Text.RegularExpressions.RegexOptions.NonBacktracking?displayProperty=nameWithType> or <xref:System.Text.RegularExpressions.RegexOptions.ECMAScript?displayProperty=nameWithType>. Attempting to do so throws an <xref:System.ArgumentOutOfRangeException>.

## See also

- [Regular Expression Language - Quick Reference](regular-expression-language-quick-reference.md)
Loading