parser/lexer: bump to Unicode 17, use faster unicode-ident#148321
parser/lexer: bump to Unicode 17, use faster unicode-ident#148321bors merged 2 commits intorust-lang:mainfrom
Conversation
|
The list of allowed third-party dependencies may have been modified! You must ensure that any new dependencies have compatible licenses before merging. These commits modify the If this was unintentional then you should revert the changes before this PR is merged. |
|
rustbot has assigned @Mark-Simulacrum. Use |
This comment has been minimized.
This comment has been minimized.
|
If the Unicode version changes are intentional, cc @ehuss |
|
@bors try @rust-timer queue (Parsing could be affected too). |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
parser/lexer: bump to Unicode 17, use faster unicode-ident
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (988451c): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesResults (primary 2.6%, secondary 3.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 473.971s -> 474.835s (0.18%) |
|
Is there a reason why the reference explicitly specifies the Unicode version in a way that makes it feel like updating that version is a nontrivial change? i.e., is there a reason why it does not clarify that the Unicode version in the compiler is allowed to be (and should be) bumped whenever Unicode releases a new version, and to simply say something like "it is version N as of Rust 1.M"? |
|
I think it needs some level of review, to make sure that (for instance) the new Unicode version isn't doing anything out of the ordinary, and to make sure that some person in the project experienced with Unicode has taken at least a cursory look at the changes to I don't think that review is best done in lang. I think we should delgate that to whichever team is making sure of the above. (Is that T-compiler?) So, ideally, I'd love to see a proposal to lang requesting a delegation to take responsibility for the above. That said, let's go ahead and sign off on this change to unblock it. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Hello, I added some compile time checks as suggested by @clarfonthey. |
Replace unicode-xid with unicode-ident which is 6 times faster
Add a compile time check in rustc_lexer and rustc_parse ensuring that unicode-related dependencies within the crate use the same unicode version. These checks are inspired by the examples privided by @clarfonthey.
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
#101840 could be an outdated (year 2022) example of such a playbook. |
|
@bors r+ rollup |
parser/lexer: bump to Unicode 17, use faster unicode-ident Hello, Bump the unicode version used by lexer/parser to 17.0.0 by updating: - `unicode-normalization` to 0.1.25 - `unicode-properties` to 0.1.4 - `unicode-width` to 0.2.2 and by replacing `unicode-xid` with `unicode-ident` which is also 6 times faster. I think it might be worth to run the benchmarks to double check. (`unicode-ident` is already in `src/tools/tidy/src/deps.rs`) Thanks!
…uwer Rollup of 8 pull requests Successful merges: - #148321 (parser/lexer: bump to Unicode 17, use faster unicode-ident) - #149540 (std: sys: fs: uefi: Implement readdir) - #149582 (Implement `Duration::div_duration_{floor,ceil}`) - #149663 (Optimized implementation for uN::{gather,scatter}_bits) - #149667 (Fix ICE by rejecting const blocks in patterns during AST lowering (closes #148138)) - #149947 (add several older crashtests) - #150011 (Add more `unbounded_sh[lr]` examples) - #150411 (refactor `destructure_const`) r? `@ghost` `@rustbot` modify labels: rollup
Rollup merge of #148321 - Marcondiro:master, r=Mark-Simulacrum parser/lexer: bump to Unicode 17, use faster unicode-ident Hello, Bump the unicode version used by lexer/parser to 17.0.0 by updating: - `unicode-normalization` to 0.1.25 - `unicode-properties` to 0.1.4 - `unicode-width` to 0.2.2 and by replacing `unicode-xid` with `unicode-ident` which is also 6 times faster. I think it might be worth to run the benchmarks to double check. (`unicode-ident` is already in `src/tools/tidy/src/deps.rs`) Thanks!
|
This PR regressed perf in the rollup |
…_12_25, r=jdonszelmann Weekly `cargo update` ~~Pins the current versions of some Unicode-related crates while waiting for rust-lang#148321 to land.~~
| pub use unicode_xid::UNICODE_VERSION as UNICODE_XID_VERSION; | ||
|
|
||
| // Make sure that the Unicode version of the dependencies is the same. | ||
| const _: () = { |
Pkgsrc changes: * Update version & checksums. * Adapt patches to new vendored crates. This has so far just been verified to build on NetBSD/amd64. Upstream changes relative to 1.93.1: Version 1.94.0 (2026-03-05) ========================== Language -------- - [Impls and impl items inherit `dead_code` lint level of the corresponding traits and trait items] (rust-lang/rust#144113) - [Stabilize additional 29 RISC-V target features including large portions of the RVA22U64 / RVA23U64 profiles] (rust-lang/rust#145948) - [Add warn-by-default `unused_visibilities` lint for visibility on `const _` declarations] (rust-lang/rust#147136) - [Update to Unicode 17] (rust-lang/rust#148321) - [Avoid incorrect lifetime errors for closures] (rust-lang/rust#148329) Platform Support ---------------- - [Add `riscv64im-unknown-none-elf` as a tier 3 target] (rust-lang/rust#148790) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. [platform-support-doc]: https://doc.rust-lang.org/rustc/platform-support.html Libraries --------- - [Relax `T: Ord` bound for some `BinaryHeap<T>` methods.] (rust-lang/rust#149408) Stabilized APIs --------------- - [`<[T]>::array_windows`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.array_windows) - [`<[T]>::element_offset`] (https://doc.rust-lang.org/stable/std/primitive.slice.html#method.element_offset) - [`LazyCell::get`] (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.get) - [`LazyCell::get_mut`] (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.get_mut) - [`LazyCell::force_mut`] (https://doc.rust-lang.org/stable/std/cell/struct.LazyCell.html#method.force_mut) - [`LazyLock::get`] (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.get) - [`LazyLock::get_mut`] (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.get_mut) - [`LazyLock::force_mut`] (https://doc.rust-lang.org/stable/std/sync/struct.LazyLock.html#method.force_mut) - [`impl TryFrom<char> for usize`] (https://doc.rust-lang.org/stable/std/convert/trait.TryFrom.html#impl-TryFrom%3Cchar%3E-for-usize) - [`std::iter::Peekable::next_if_map`] (https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_map) - [`std::iter::Peekable::next_if_map_mut`] (https://doc.rust-lang.org/stable/std/iter/struct.Peekable.html#method.next_if_map_mut) - [x86 `avx512fp16` intrinsics] (rust-lang/rust#127213) (excluding those that depend directly on the unstable `f16` type) - [AArch64 NEON fp16 intrinsics] (rust-lang/rust#136306) (excluding those that depend directly on the unstable `f16` type) - [`f32::consts::EULER_GAMMA`] (https://doc.rust-lang.org/stable/std/f32/consts/constant.EULER_GAMMA.html) - [`f64::consts::EULER_GAMMA`] (https://doc.rust-lang.org/stable/std/f64/consts/constant.EULER_GAMMA.html) - [`f32::consts::GOLDEN_RATIO`] (https://doc.rust-lang.org/stable/std/f32/consts/constant.GOLDEN_RATIO.html) - [`f64::consts::GOLDEN_RATIO`] (https://doc.rust-lang.org/stable/std/f64/consts/constant.GOLDEN_RATIO.html) These previously stable APIs are now stable in const contexts: - [`f32::mul_add`] (https://doc.rust-lang.org/stable/std/primitive.f32.html#method.mul_add) - [`f64::mul_add`] (https://doc.rust-lang.org/stable/std/primitive.f64.html#method.mul_add) Cargo ----- - Stabilize the config include key. The top-level include config key allows loading additional config files, enabling better organization, sharing, and management of Cargo configurations across projects and environments. [docs] (https://doc.rust-lang.org/nightly/cargo/reference/config.html#including-extra-configuration-files) [#16284] (rust-lang/cargo#16284) - Stabilize the pubtime field in registry index. This records when a crate version was published and enables time-based dependency resolution in the future. Note that crates.io will gradually backfill existing packages when a new version is published. Not all crates have pubtime yet. [#16369] (rust-lang/cargo#16369) [#16372] (rust-lang/cargo#16372) - Cargo now parses [TOML v1.1](https://toml.io/en/v1.1.0) for manifests and configuration files. Note that using these features in Cargo.toml will raise your development MSRV, but the published manifest remains compatible with older parsers. [#16415] (rust-lang/cargo#16415) - [Make `CARGO_BIN_EXE_<crate>` available at runtime ] (rust-lang/cargo#16421) Compatibility Notes ------------------- - [Forbid freely casting lifetime bounds of `dyn`-types] (rust-lang/rust#136776) - [Make closure capturing have consistent and correct behaviour around patterns] (rust-lang/rust#138961) Some finer details of how precise closure captures get affected by pattern matching have been changed. In some cases, this can cause a non-move closure that was previously capturing an entire variable by move, to now capture only part of that variable by move, and other parts by borrow. This can cause the borrow checker to complain where it previously didn't, or cause `Drop` to run at a different point in time. - [Standard library macros are now imported via prelude, not via injected ` #[macro_use]`] (rust-lang/rust#139493) This will raise an error if macros of the same name are glob imported. For example if a crate defines their own `matches` macro and then glob imports that, it's now ambiguous whether the custom or standard library `matches` is meant and an explicit import of the name is required to resolve the ambiguity. One exception is `core::panic` and `std::panic`, if their import is ambiguous a new warning ([`ambiguous_panic_imports`] (rust-lang/rust#147319)) is raised. This may raise a new warning ([`ambiguous_panic_imports`] (rust-lang/rust#147319)) on `#![no_std]` code glob importing the std crate. Both `core::panic!` and `std::panic!` are then in scope and which is used is ambiguous. - [Don't strip shebang in expression-context `include!(…)`s] (rust-lang/rust#146377) This can cause previously working includes to no longer compile if they included files which started with a shebang. - [Ambiguous glob reexports are now also visible cross-crate] (rust-lang/rust#147984) This unifies behavior between local and cross-crate errors on these exports, which may introduce new ambiguity errors. - [Don't normalize where-clauses before checking well-formedness] (rust-lang/rust#148477) - [Introduce a future compatibility warning on codegen attributes on body-free trait methods] (rust-lang/rust#148756) These attributes currently have no effect in this position. - [On Windows `std::time::SystemTime::checked_sub_duration` will return `None` for times before the Windows epoch (1/1/1601)] (rust-lang/rust#148825) - [Lifetime identifiers such as `'a` are now NFC normalized] (rust-lang/rust#149192). - [Overhaul filename handling for cross-compiler consistency] (rust-lang/rust#149709) Any paths emitted by compiler now always respect the relative-ness of the paths and `--remap-path-prefix` given originally. One side-effect of this change is that paths emitted for local crates in Cargo (path dependencies and workspace members) are no longer absolute but relative when emitted as part of a diagnostic in a downstream crate. Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Switch to `annotate-snippets` for error emission] (rust-lang/rust#150032) This should preserve mostly the same outputs in rustc error messages.
Hello,
Bump the unicode version used by lexer/parser to 17.0.0 by updating:
unicode-normalizationto 0.1.25unicode-propertiesto 0.1.4unicode-widthto 0.2.2and by replacing
unicode-xidwithunicode-identwhich is also 6 times faster.I think it might be worth to run the benchmarks to double check.
(
unicode-identis already insrc/tools/tidy/src/deps.rs)Thanks!