From c60e92ce7ce88e97fdf2a397fb2751e01f4bd20a Mon Sep 17 00:00:00 2001 From: Eric Huss Date: Wed, 25 Feb 2026 09:39:48 -0800 Subject: [PATCH] Remove RESERVED_NUMBER This removes RESERVED_NUMBER and instead embeds the correctness requirements for a number in the literal productions using the cut operator. I like the idea of inlining these requirements into the grammar so that the restrictions are clear and nearer to the rules they are related to. I'm on the fence about keeping the rules describing the restrictions. In general we don't want to describe the grammar in English due to the potential ambiguity. If we do this here, why wouldn't we do this everywhere there is a cut operator? However, I think it's still helpful to include in this case. SUFFIX_NO_E is no longer needed because all of the other rules handle rejecting the E. --- src/tokens.md | 105 ++++++++++++++++++++++++-------------------------- 1 file changed, 50 insertions(+), 55 deletions(-) diff --git a/src/tokens.md b/src/tokens.md index 0f0964bfce..7fe427a51c 100644 --- a/src/tokens.md +++ b/src/tokens.md @@ -113,9 +113,9 @@ A suffix is a sequence of characters following the primary part of a literal (wi r[lex.token.literal.suffix.syntax] ```grammar,lexer -SUFFIX -> IDENTIFIER_OR_KEYWORD _except `_`_ - -SUFFIX_NO_E -> ![`e` `E`] SUFFIX +SUFFIX -> + `_` ^ XID_Continue+ + | XID_Start XID_Continue* ``` r[lex.token.literal.suffix.validity] @@ -443,15 +443,16 @@ r[lex.token.literal.int] r[lex.token.literal.int.syntax] ```grammar,lexer INTEGER_LITERAL -> - ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL | DEC_LITERAL ) SUFFIX_NO_E? + ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL | DEC_LITERAL ) + ^ !RESERVED_FLOAT SUFFIX? DEC_LITERAL -> DEC_DIGIT (DEC_DIGIT|`_`)* -BIN_LITERAL -> `0b` `_`* BIN_DIGIT (BIN_DIGIT|`_`)* +BIN_LITERAL -> `0b` ^ `_`* BIN_DIGIT (BIN_DIGIT|`_`)* ![`e` `E` `2`-`9`] -OCT_LITERAL -> `0o` `_`* OCT_DIGIT (OCT_DIGIT|`_`)* +OCT_LITERAL -> `0o` ^ `_`* OCT_DIGIT (OCT_DIGIT|`_`)* ![`e` `E` `8`-`9`] -HEX_LITERAL -> `0x` `_`* HEX_DIGIT (HEX_DIGIT|`_`)* +HEX_LITERAL -> `0x` ^ `_`* HEX_DIGIT (HEX_DIGIT|`_`)* BIN_DIGIT -> [`0`-`1`] @@ -460,6 +461,8 @@ OCT_DIGIT -> [`0`-`7`] DEC_DIGIT -> [`0`-`9`] HEX_DIGIT -> [`0`-`9` `a`-`f` `A`-`F`] + +RESERVED_FLOAT -> `.` !(`.` | `_` | XID_Start) ``` r[lex.token.literal.int.kind] @@ -477,7 +480,7 @@ r[lex.token.literal.int.kind-oct] r[lex.token.literal.int.kind-bin] * A _binary literal_ starts with the character sequence `U+0030` `U+0062` (`0b`) and continues as any mixture (with at least one digit) of binary digits and underscores. -r[lex.token.literal.int.restriction] +r[lex.token.literal.int.suffix] Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above. The suffix may not begin with `e` or `E`, as that would be interpreted as the exponent of a floating-point literal. See [Integer literal expressions] for the effect of these suffixes. Examples of integer literals which are accepted as literal expressions: @@ -525,6 +528,37 @@ Examples of integer literals which are not accepted as literal expressions: # } ``` +r[lex.token.literal.int.invalid] +##### Invalid integer literals + +r[lex.token.literal.int.invalid.intro] +The following integer literal forms are invalid. To avoid ambiguity, they are rejected by the tokenizer as a whole rather than being split into separate tokens. + +> [!EXAMPLE] +> ```rust,compile_fail +> 0b0102; // this is not `0b010` followed by `2` +> 0o1279; // this is not `0o127` followed by `9` +> 0x80.0; // this is not `0x80` followed by `.` and `0` +> 0b101e; // this is not a suffixed literal, nor `0b101` followed by `e` +> 0b; // this is not an integer literal, nor `0` followed by `b` +> 0b_; // this is not an integer literal, nor `0` followed by `b_` +> 2em; // this is not a suffixed literal, nor `2` followed by `em` +> 2.0em; // this is not a suffixed literal, nor `2.0` followed by `em` +> ``` + +r[lex.token.literal.int.out-of-range] +It is an error to have an unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit outside the range for its radix. + +r[lex.token.literal.int.period] +It is an error to have an unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (subject to the same restrictions on what may follow the period as in floating-point literals). + +r[lex.token.literal.int.exp] +It is an error to have an unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`. + +r[lex.token.literal.int.empty-with-radix] +It is an error for a radix prefix to not be followed, after any optional leading underscores, by at least one valid digit for its radix. + + r[lex.token.literal.int.tuple-field] #### Tuple index @@ -559,7 +593,7 @@ r[lex.token.literal.float.syntax] ```grammar,lexer FLOAT_LITERAL -> DEC_LITERAL (`.` DEC_LITERAL)? FLOAT_EXPONENT SUFFIX? - | DEC_LITERAL `.` DEC_LITERAL SUFFIX_NO_E? + | DEC_LITERAL `.` DEC_LITERAL SUFFIX? | DEC_LITERAL `.` !(`.` | `_` | XID_Start) FLOAT_EXPONENT -> @@ -601,53 +635,15 @@ Examples of floating-point literals which are not accepted as literal expression # } ``` -r[lex.token.literal.reserved] -#### Reserved forms similar to number literals +r[lex.token.literal.float.invalid-exponent] +It is an error for a floating-point literal to have an exponent with no digits. -r[lex.token.literal.reserved.syntax] -```grammar,lexer -RESERVED_NUMBER -> - BIN_LITERAL [`2`-`9`] - | OCT_LITERAL [`8`-`9`] - | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) `.` !(`.` | `_` | XID_Start) - | ( BIN_LITERAL | OCT_LITERAL ) (`e`|`E`) - | `0b` `_`* !BIN_DIGIT - | `0o` `_`* !OCT_DIGIT - | `0x` `_`* !HEX_DIGIT -``` -r[lex.token.literal.reserved.intro] -The following lexical forms similar to number literals are _reserved forms_. Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens. - -r[lex.token.literal.reserved.out-of-range] -* An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix. - -r[lex.token.literal.reserved.period] -* An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals). - -r[lex.token.literal.reserved.exp] -* An unsuffixed binary or octal literal followed, without intervening whitespace, by the character `e` or `E`. - -r[lex.token.literal.reserved.empty-with-radix] -* Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits). - -r[lex.token.literal.reserved.empty-exp] -* Input which has the form of a floating-point literal with no digits in the exponent. - -Examples of reserved forms: - -```rust,compile_fail -0b0102; // this is not `0b010` followed by `2` -0o1279; // this is not `0o127` followed by `9` -0x80.0; // this is not `0x80` followed by `.` and `0` -0b101e; // this is not a suffixed literal, or `0b101` followed by `e` -0b; // this is not an integer literal, or `0` followed by `b` -0b_; // this is not an integer literal, or `0` followed by `b_` -2e; // this is not a floating-point literal, or `2` followed by `e` -2.0e; // this is not a floating-point literal, or `2.0` followed by `e` -2em; // this is not a suffixed literal, or `2` followed by `em` -2.0em; // this is not a suffixed literal, or `2.0` followed by `em` -``` +> [!EXAMPLE] +> ```rust,compile_fail +> 2e; // this is not a floating-point literal, nor `2` followed by `e` +> 2.0e; // this is not a floating-point literal, nor `2.0` followed by `e` +> ``` r[lex.token.life] ## Lifetimes and loop labels @@ -771,7 +767,6 @@ r[lex.token.reserved.syntax] ```grammar,lexer RESERVED_TOKEN -> RESERVED_GUARDED_STRING_LITERAL - | RESERVED_NUMBER | RESERVED_POUNDS | RESERVED_RAW_IDENTIFIER | RESERVED_RAW_LIFETIME