Skip to content

parser/lexer: bump to Unicode 17, use faster unicode-ident#148321

Merged
bors merged 2 commits intorust-lang:mainfrom
Marcondiro:master
Dec 29, 2025
Merged

parser/lexer: bump to Unicode 17, use faster unicode-ident#148321
bors merged 2 commits intorust-lang:mainfrom
Marcondiro:master

Conversation

@Marcondiro
Copy link
Contributor

@Marcondiro Marcondiro commented Oct 31, 2025

Hello,

Bump the unicode version used by lexer/parser to 17.0.0 by updating:

  • unicode-normalization to 0.1.25
  • unicode-properties to 0.1.4
  • unicode-width to 0.2.2

and by replacing unicode-xid with unicode-ident which is also 6 times faster.
I think it might be worth to run the benchmarks to double check.
(unicode-ident is already in src/tools/tidy/src/deps.rs)

Thanks!

@rustbot
Copy link
Collaborator

rustbot commented Oct 31, 2025

The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging.

cc @davidtwco, @wesleywiser

These commits modify the Cargo.lock file. Unintentional changes to Cargo.lock can be introduced when switching branches and rebasing PRs.

If this was unintentional then you should revert the changes before this PR is merged.
Otherwise, you can ignore this comment.

@rustbot rustbot added A-tidy Area: The tidy tool S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 31, 2025
@rustbot rustbot added T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2025
@rustbot
Copy link
Collaborator

rustbot commented Oct 31, 2025

r? @Mark-Simulacrum

rustbot has assigned @Mark-Simulacrum.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented Oct 31, 2025

If the Unicode version changes are intentional,
it should also be updated in the reference at
https://github.com/rust-lang/reference/blob/HEAD/src/identifiers.md.

cc @ehuss

@Kobzol
Copy link
Member

Kobzol commented Oct 31, 2025

@bors try @rust-timer queue

(Parsing could be affected too).

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Oct 31, 2025
parser/lexer: bump to Unicode 17, use faster unicode-ident
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 31, 2025
@rust-bors
Copy link
Contributor

rust-bors bot commented Oct 31, 2025

☀️ Try build successful (CI)
Build commit: 988451c (988451ce73b832a095adca69acf309ce27a2f54d, parent: 23c7bad921fb7163de37ea680bed317deaa03fda)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (988451c): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.2% [0.1%, 0.3%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results (primary 2.6%, secondary 3.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.6% [2.6%, 2.6%] 1
Regressions ❌
(secondary)
3.6% [3.6%, 3.6%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.6% [2.6%, 2.6%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 473.971s -> 474.835s (0.18%)
Artifact size: 390.89 MiB -> 390.89 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 31, 2025
@traviscross traviscross added T-lang Relevant to the language team I-lang-nominated Nominated for discussion during a lang team meeting. I-lang-easy-decision Issue: The decision needed by the team is conjectured to be easy; this does not imply nomination I-lang-radar Items that are on lang's radar and will need eventual work or consideration. P-lang-drag-1 Lang team prioritization drag level 1. https://rust-lang.zulipchat.com/#narrow/channel/410516-t-lang labels Nov 2, 2025
@clarfonthey
Copy link
Contributor

Is there a reason why the reference explicitly specifies the Unicode version in a way that makes it feel like updating that version is a nontrivial change?

i.e., is there a reason why it does not clarify that the Unicode version in the compiler is allowed to be (and should be) bumped whenever Unicode releases a new version, and to simply say something like "it is version N as of Rust 1.M"?

@crlf0710 crlf0710 added the A-Unicode Area: Unicode label Nov 4, 2025
@traviscross traviscross removed T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) labels Nov 5, 2025
@joshtriplett
Copy link
Member

joshtriplett commented Nov 5, 2025

I think it needs some level of review, to make sure that (for instance) the new Unicode version isn't doing anything out of the ordinary, and to make sure that some person in the project experienced with Unicode has taken at least a cursory look at the changes to XID_Start/XID_Continue and any related changes to confusables that overlap with XID_Start/XID_Continue.

I don't think that review is best done in lang. I think we should delgate that to whichever team is making sure of the above. (Is that T-compiler?) So, ideally, I'd love to see a proposal to lang requesting a delegation to take responsibility for the above.

That said, let's go ahead and sign off on this change to unblock it.

@rustbot

This comment has been minimized.

@rustbot

This comment has been minimized.

@Marcondiro
Copy link
Contributor Author

Marcondiro commented Dec 13, 2025

Hello, I added some compile time checks as suggested by @clarfonthey.
Specifically, in rustc_parse and rustc_lexer compilation fails if the dependencies use different Unicode versions.
I did not implement the checks across the entire rust repository but only in the crates modified by this PR. I think the other checks might be addressed in another PR.
The resulting error print might not be the best but I stuck to stable const rust. Any suggestion on this or other parts of the PR are welcome.

r? @Mark-Simulacrum

Marcondiro and others added 2 commits December 27, 2025 11:20
Replace unicode-xid with unicode-ident which is 6 times faster
Add a compile time check in rustc_lexer and rustc_parse ensuring that unicode-related dependencies within the crate use the same unicode version.
These checks are inspired by the examples privided by @clarfonthey.
@rustbot
Copy link
Collaborator

rustbot commented Dec 27, 2025

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@crlf0710
Copy link
Member

I think that bumping the Unicode version, since it is a routine change, should have some sort of playbook so we don't need to have these discussions again

#101840 could be an outdated (year 2022) example of such a playbook.

@Mark-Simulacrum
Copy link
Member

@bors r+ rollup

@bors
Copy link
Collaborator

bors commented Dec 28, 2025

📌 Commit f7cb82e has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 28, 2025
JonathanBrouwer added a commit to JonathanBrouwer/rust that referenced this pull request Dec 28, 2025
parser/lexer: bump to Unicode 17, use faster unicode-ident

Hello,

Bump the unicode version used by lexer/parser to 17.0.0 by updating:
- `unicode-normalization` to 0.1.25
- `unicode-properties` to 0.1.4
- `unicode-width` to 0.2.2

and by replacing `unicode-xid` with `unicode-ident` which is also 6 times faster.
I think it might be worth to run the benchmarks to double check.
(`unicode-ident` is already in `src/tools/tidy/src/deps.rs`)

Thanks!
bors added a commit that referenced this pull request Dec 28, 2025
…uwer

Rollup of 8 pull requests

Successful merges:

 - #148321 (parser/lexer: bump to Unicode 17, use faster unicode-ident)
 - #149540 (std: sys: fs: uefi: Implement readdir)
 - #149582 (Implement `Duration::div_duration_{floor,ceil}`)
 - #149663 (Optimized implementation for uN::{gather,scatter}_bits)
 - #149667 (Fix ICE by rejecting const blocks in patterns during AST lowering (closes #148138))
 - #149947 (add several older crashtests)
 - #150011 (Add more `unbounded_sh[lr]` examples)
 - #150411 (refactor `destructure_const`)

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 30618bb into rust-lang:main Dec 29, 2025
11 checks passed
@rustbot rustbot added this to the 1.94.0 milestone Dec 29, 2025
rust-timer added a commit that referenced this pull request Dec 29, 2025
Rollup merge of #148321 - Marcondiro:master, r=Mark-Simulacrum

parser/lexer: bump to Unicode 17, use faster unicode-ident

Hello,

Bump the unicode version used by lexer/parser to 17.0.0 by updating:
- `unicode-normalization` to 0.1.25
- `unicode-properties` to 0.1.4
- `unicode-width` to 0.2.2

and by replacing `unicode-xid` with `unicode-ident` which is also 6 times faster.
I think it might be worth to run the benchmarks to double check.
(`unicode-ident` is already in `src/tools/tidy/src/deps.rs`)

Thanks!
@JonathanBrouwer
Copy link
Contributor

This PR regressed perf in the rollup
#150469 (comment)

@JonathanBrouwer JonathanBrouwer added the perf-regression Performance regression. label Dec 29, 2025
jhpratt added a commit to jhpratt/rust that referenced this pull request Jan 4, 2026
…_12_25, r=jdonszelmann

Weekly `cargo update`

~~Pins the current versions of some Unicode-related crates while waiting for rust-lang#148321 to land.~~
pub use unicode_xid::UNICODE_VERSION as UNICODE_XID_VERSION;

// Make sure that the Unicode version of the dependencies is the same.
const _: () = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, #149226 use-case spotted in the wild

wip-sync pushed a commit to NetBSD/pkgsrc-wip that referenced this pull request Mar 7, 2026
Pkgsrc changes:
 * Update version & checksums.
 * Adapt patches to new vendored crates.

This has so far just been verified to build on NetBSD/amd64.

Upstream changes relative to 1.93.1:

Version 1.94.0 (2026-03-05)
==========================

Language
--------
- [Impls and impl items inherit `dead_code` lint level of the
  corresponding traits and trait items]
  (rust-lang/rust#144113)
- [Stabilize additional 29 RISC-V target features including large
  portions of the RVA22U64 / RVA23U64 profiles]
  (rust-lang/rust#145948)
- [Add warn-by-default `unused_visibilities` lint for visibility
  on `const _` declarations]
  (rust-lang/rust#147136)
- [Update to Unicode 17]
  (rust-lang/rust#148321)
- [Avoid incorrect lifetime errors for closures]
  (rust-lang/rust#148329)

Platform Support
----------------
- [Add `riscv64im-unknown-none-elf` as a tier 3 target]
  (rust-lang/rust#148790)

Refer to Rust's [platform support page][platform-support-doc]
for more information on Rust's tiered platform support.

[platform-support-doc]: https://doc.rust-lang.org/rustc/platform-support.html

Libraries
---------
- [Relax `T: Ord` bound for some `BinaryHeap<T>` methods.]
  (rust-lang/rust#149408)

Stabilized APIs
---------------
- [`<[T]>::array_windows`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.array_windows)
- [`<[T]>::element_offset`]
  (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.element_offset)
- [`LazyCell::get`]
  (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.get)
- [`LazyCell::get_mut`]
  (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.get_mut)
- [`LazyCell::force_mut`]
  (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.force_mut)
- [`LazyLock::get`]
  (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.get)
- [`LazyLock::get_mut`]
  (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.get_mut)
- [`LazyLock::force_mut`]
  (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.force_mut)
- [`impl TryFrom<char> for usize`]
  (https://doc.rust-lang.org/stable/std/convert/trait.TryFrom.html#impl-TryFrom%3Cchar%3E-for-usize)
- [`std::iter::Peekable::next_if_map`]
  (https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_map)
- [`std::iter::Peekable::next_if_map_mut`]
  (https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_map_mut)
- [x86 `avx512fp16` intrinsics]
  (rust-lang/rust#127213)
  (excluding those that depend directly on the unstable `f16` type)
- [AArch64 NEON fp16 intrinsics]
  (rust-lang/rust#136306)
  (excluding those that depend directly on the unstable `f16` type)
- [`f32::consts::EULER_GAMMA`]
  (https://doc.rust-lang.org/stable/std/f32/consts/constant.EULER_GAMMA.html)
- [`f64::consts::EULER_GAMMA`]
  (https://doc.rust-lang.org/stable/std/f64/consts/constant.EULER_GAMMA.html)
- [`f32::consts::GOLDEN_RATIO`]
  (https://doc.rust-lang.org/stable/std/f32/consts/constant.GOLDEN_RATIO.html)
- [`f64::consts::GOLDEN_RATIO`]
  (https://doc.rust-lang.org/stable/std/f64/consts/constant.GOLDEN_RATIO.html)

These previously stable APIs are now stable in const contexts:

- [`f32::mul_add`]
  (https://doc.rust-lang.org/stable/std/primitive.f32.html#method.mul_add)
- [`f64::mul_add`]
  (https://doc.rust-lang.org/stable/std/primitive.f64.html#method.mul_add)

Cargo
-----
- Stabilize the config include key. The top-level include config
  key allows loading additional config files, enabling better
  organization, sharing, and management of Cargo configurations
  across projects and environments. [docs]
  (https://doc.rust-lang.org/nightly/cargo/reference/config.html#including-extra-configuration-files)
  [#16284] (rust-lang/cargo#16284)
- Stabilize the pubtime field in registry index. This records when
  a crate version was published and enables time-based dependency
  resolution in the future. Note that crates.io will gradually backfill
  existing packages when a new version is published. Not all crates
  have pubtime yet. [#16369]
  (rust-lang/cargo#16369) [#16372]
  (rust-lang/cargo#16372)
- Cargo now parses [TOML v1.1](https://toml.io/en/v1.1.0) for
  manifests and configuration files. Note that using these features
  in Cargo.toml will raise your development MSRV, but the published
  manifest remains compatible with older parsers. [#16415]
  (rust-lang/cargo#16415)
- [Make `CARGO_BIN_EXE_<crate>` available at runtime ]
  (rust-lang/cargo#16421)

Compatibility Notes
-------------------
- [Forbid freely casting lifetime bounds of `dyn`-types]
  (rust-lang/rust#136776)
- [Make closure capturing have consistent and correct behaviour around patterns]
  (rust-lang/rust#138961)
  Some finer details of how precise closure captures get affected
  by pattern matching have been changed. In some cases, this can
  cause a non-move closure that was previously capturing an entire
  variable by move, to now capture only part of that variable by
  move, and other parts by borrow. This can cause the borrow checker
  to complain where it previously didn't, or cause `Drop` to run
  at a different point in time.
- [Standard library macros are now imported via prelude, not via injected `
  #[macro_use]`] (rust-lang/rust#139493)
  This will raise an error if macros of the same name are glob
  imported.  For example if a crate defines their own `matches`
  macro and then glob imports that, it's now ambiguous whether the
  custom or standard library `matches` is meant and an explicit
  import of the name is required to resolve the ambiguity.  One
  exception is `core::panic` and `std::panic`, if their import is
  ambiguous a new warning ([`ambiguous_panic_imports`]
  (rust-lang/rust#147319)) is raised.
  This may raise a new warning ([`ambiguous_panic_imports`]
  (rust-lang/rust#147319)) on `#![no_std]`
  code glob importing the std crate.  Both `core::panic!` and
  `std::panic!` are then in scope and which is used is ambiguous.
- [Don't strip shebang in expression-context `include!(…)`s]
  (rust-lang/rust#146377)
  This can cause previously working includes to no longer compile
  if they included files which started with a shebang.
- [Ambiguous glob reexports are now also visible cross-crate]
  (rust-lang/rust#147984)
  This unifies behavior between local and cross-crate errors on
  these exports, which may introduce new ambiguity errors.
- [Don't normalize where-clauses before checking well-formedness]
  (rust-lang/rust#148477)
- [Introduce a future compatibility warning on codegen attributes
  on body-free trait methods]
  (rust-lang/rust#148756) These attributes
  currently have no effect in this position.
- [On Windows `std::time::SystemTime::checked_sub_duration` will
  return `None` for times before the Windows epoch (1/1/1601)]
  (rust-lang/rust#148825)
- [Lifetime identifiers such as `'a` are now NFC normalized]
  (rust-lang/rust#149192).
- [Overhaul filename handling for cross-compiler consistency]
  (rust-lang/rust#149709)
  Any paths emitted by compiler now always respect the relative-ness
  of the paths and `--remap-path-prefix` given originally.  One
  side-effect of this change is that paths emitted for local crates
  in Cargo (path dependencies and workspace members) are no longer
  absolute but relative when emitted as part of a diagnostic in a
  downstream crate.

Internal Changes
----------------

These changes do not affect any public interfaces of Rust, but they represent
significant improvements to the performance or internals of rustc and related
tools.

- [Switch to `annotate-snippets` for error emission]
  (rust-lang/rust#150032)
  This should preserve mostly the same outputs in rustc error messages.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-tidy Area: The tidy tool A-Unicode Area: Unicode disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this PR / Issue. I-lang-radar Items that are on lang's radar and will need eventual work or consideration. perf-regression Performance regression. relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-bootstrap Relevant to the bootstrap subteam: Rust's build system (x.py and src/bootstrap) T-lang Relevant to the language team

Projects

None yet

Development

Successfully merging this pull request may close these issues.