Automatically enable cross-crate inlining for small functions by saethlin · Pull Request #116505 · rust-lang/rust

saethlin · 2023-10-07T04:37:50Z

This is basically reviving #70550

The #[inline] attribute can have a significant impact on code generation or runtime performance (because it enables inlining between CGUs where it would normally not happen) and also on compile-time performance (because it enables MIR inlining). But it has to be added manually, which is awkward.

This PR factors whether a DefId is cross-crate inlinable into a query, and replaces all uses of CodegenFnAttrs::requests_inline with this new query. The new query incorporates all the other logic that is used to determine whether a Def should be treated as cross-crate-inlinable, and as a last step inspects the function's optimized_mir to determine if it should be treated as cross-crate-inlinable.

The heuristic implemented here is deliberately conservative; we only infer inlinability for functions whose optimized_mir does not contain any calls or asserts. I plan to study adjusting the cost model later, but for now the compile time implications of this change are so significant that I think this very crude heuristic is well worth landing.

rustbot · 2023-10-07T04:37:56Z

r? @petrochenkov

(rustbot has picked a reviewer for you, use r? to override)

saethlin · 2023-10-07T04:38:04Z

@bors try @rust-timer queue

bors · 2023-10-07T04:39:14Z

⌛ Trying commit 64f45aa with merge f1cbf12...

Automatically enable cross-crate inlining for small functions This is a work-in-progress. For example I have not thought at all about the cost model and I am sure that the threshold is too high. But I'm curious to know how this looks in perf. It certainly has some unique effects on codegen.

bors · 2023-10-07T05:54:55Z

☀️ Try build successful - checks-actions
Build commit: f1cbf12 (f1cbf125bf7c5bdb502d2c6359397a8d7e4fe068)

rust-timer · 2023-10-07T12:31:03Z

Finished benchmarking commit (f1cbf12): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	28.5%	[0.2%, 610.4%]	102
Regressions ❌ (secondary)	62.8%	[0.3%, 2246.9%]	66
Improvements ✅ (primary)	-3.8%	[-56.5%, -0.2%]	136
Improvements ✅ (secondary)	-7.5%	[-85.5%, -0.3%]	151
All ❌✅ (primary)	10.1%	[-56.5%, 610.4%]	238

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	11.3%	[0.5%, 72.7%]	90
Regressions ❌ (secondary)	16.3%	[0.7%, 100.1%]	86
Improvements ✅ (primary)	-5.7%	[-15.3%, -0.9%]	9
Improvements ✅ (secondary)	-5.0%	[-18.0%, -0.7%]	36
All ❌✅ (primary)	9.8%	[-15.3%, 72.7%]	99

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	32.5%	[0.8%, 565.9%]	98
Regressions ❌ (secondary)	70.9%	[1.4%, 1900.7%]	64
Improvements ✅ (primary)	-14.9%	[-60.4%, -0.9%]	27
Improvements ✅ (secondary)	-12.5%	[-85.1%, -1.9%]	78
All ❌✅ (primary)	22.2%	[-60.4%, 565.9%]	125

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.5%	[0.2%, 29.4%]	107
Regressions ❌ (secondary)	6.8%	[0.1%, 70.1%]	59
Improvements ✅ (primary)	-6.9%	[-30.5%, -0.1%]	56
Improvements ✅ (secondary)	-14.8%	[-77.0%, -2.5%]	83
All ❌✅ (primary)	0.5%	[-30.5%, 29.4%]	163

Bootstrap: 625.502s -> 742.509s (18.71%)
Artifact size: 270.64 MiB -> 275.71 MiB (1.87%)

saethlin · 2023-10-07T15:26:23Z

This should be less silly than the last one because now we don't do the per-CGU thing for incr builds. And I also halved the default threshold.

@bors try @rust-timer queue

bors · 2023-10-07T15:27:32Z

⌛ Trying commit aac4020 with merge 26bbeca...

Automatically enable cross-crate inlining for small functions This is a work-in-progress. For example I have not thought at all about the cost model and I am sure that the threshold is too high. But I'm curious to know how this looks in perf. It certainly has some unique effects on codegen.

bors · 2023-10-07T16:42:25Z

☀️ Try build successful - checks-actions
Build commit: 26bbeca (26bbeca8f2e9d38e46c985ab930a46c608dd85f8)

rust-timer · 2023-10-08T01:12:57Z

Finished benchmarking commit (26bbeca): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	21.8%	[0.2%, 346.9%]	153
Regressions ❌ (secondary)	32.7%	[0.2%, 456.2%]	43
Improvements ✅ (primary)	-2.7%	[-40.3%, -0.2%]	86
Improvements ✅ (secondary)	-5.3%	[-26.4%, -0.1%]	157
All ❌✅ (primary)	13.0%	[-40.3%, 346.9%]	239

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	9.3%	[0.5%, 71.9%]	106
Regressions ❌ (secondary)	6.7%	[0.5%, 32.1%]	45
Improvements ✅ (primary)	-14.0%	[-21.1%, -3.1%]	4
Improvements ✅ (secondary)	-3.6%	[-6.5%, -0.4%]	12
All ❌✅ (primary)	8.4%	[-21.1%, 71.9%]	110

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	25.5%	[0.9%, 325.3%]	137
Regressions ❌ (secondary)	34.1%	[1.9%, 350.8%]	42
Improvements ✅ (primary)	-12.1%	[-37.9%, -0.9%]	15
Improvements ✅ (secondary)	-8.1%	[-20.6%, -1.2%]	96
All ❌✅ (primary)	21.8%	[-37.9%, 325.3%]	152

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	12.3%	[0.0%, 77.9%]	116
Regressions ❌ (secondary)	6.0%	[0.1%, 52.2%]	56
Improvements ✅ (primary)	-3.9%	[-16.6%, -0.3%]	46
Improvements ✅ (secondary)	-8.9%	[-19.0%, -0.5%]	83
All ❌✅ (primary)	7.7%	[-16.6%, 77.9%]	162

Bootstrap: 622.673s -> 704.606s (13.16%)
Artifact size: 270.68 MiB -> 273.48 MiB (1.04%)

briansmith · 2024-01-12T00:30:52Z

With this change, is there any convenient way for a crate to indicate that it should be compiled with full optimizations but none of its public API should be inlined unless marked #[inline] specifically? Or do we have to denote every single function as #[inline(never)]? And is #[inline(never)] on a function sufficient to ensure it won't ever be inlined into callers?

saethlin · 2024-01-12T01:04:00Z

And is #[inline(never)] on a function sufficient to ensure it won't ever be inlined into callers?

That attribute is documented as a hint, so no. I do not think we have a language feature that guarantees what you are asking for.

It seems like a plausible thing to add but that's a bit outside my area.

rustbot assigned petrochenkov Oct 7, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 7, 2023

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 7, 2023

saethlin unassigned petrochenkov Oct 7, 2023

This comment has been minimized.

Sign in to view

rustbot added the perf-regression Performance regression. label Oct 7, 2023

saethlin force-pushed the infer-inline branch from 64f45aa to aac4020 Compare October 7, 2023 15:25

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 7, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 8, 2023

Zalathar mentioned this pull request Oct 18, 2023

coverage: Move most per-function coverage info into mir::Body #116046

Merged

pnkfelix mentioned this pull request Oct 18, 2023

exhaustiveness: Rework constructor splitting #116391

Merged

bjorn3 mentioned this pull request Oct 21, 2023

[PSA] Public Service Announcements Thread rust-lang/rustc_codegen_cranelift#1299

Open

AzazKamaz mentioned this pull request Oct 22, 2023

cross-crate inlining breaks compare of functions passed between crates #117047

Closed

bjorn3 mentioned this pull request Oct 25, 2023

Missing symbol in generated dylib #117137

Closed

tmiasko mentioned this pull request Nov 14, 2023

Make small functions implicitly #[inline] #78120

Closed

Noratrieb mentioned this pull request Nov 21, 2023

Rust beta/nightly optimises everything away compiler-explorer/compiler-explorer#5782

Closed

davxy mentioned this pull request Dec 4, 2023

WASM huge performance regression paritytech/polkadot-sdk#2590

Closed

a1phyr mentioned this pull request Dec 20, 2023

Add release notes for 1.75.0 #118729

Merged

RossSmyth mentioned this pull request Dec 22, 2023

Runtime code is generated for functions only used in const eval #119214

Open

osiewicz mentioned this pull request Dec 29, 2023

sample test case is broken with 1.75 pacak/cargo-show-asm#230

Closed

pacak mentioned this pull request Dec 29, 2023

Trying to fix CI failure pacak/cargo-show-asm#231

Merged

This was referenced Jan 9, 2024

"Show LLVM IR" and "Show ASM" are missing some functions rust-lang/rust-playground#1032

Closed

Functions are missing from "--emit=llvm-ir" and "--emit=asm" #119850

Open

Patryk27 mentioned this pull request Feb 8, 2024

"Traitify" the Style struct DioxusLabs/taffy#568

Merged

9 tasks

jbr mentioned this pull request Feb 9, 2024

Enable memchr feature by default? trillium-rs/trillium#552

Closed

wprzytula mentioned this pull request Mar 6, 2024

Sprinkle #[inline] across scylla-cql crate to allow inlining into scylla crate scylladb/scylla-rust-driver#949

Open

pacak mentioned this pull request Mar 14, 2024

Function gets optimized away unless marked #[inline(never)] pub fn #122516

Closed

saethlin mentioned this pull request May 5, 2024

Rustc fails to inline trivial functions #37538

Closed

y21 mentioned this pull request May 16, 2024

Warn about missing #[inline] on trivial public methods rust-lang/rust-clippy#12797

Closed

ppershing mentioned this pull request Jun 11, 2024

Update #[inline] documentation nnethercote/perf-book#84

Closed

lilizoey mentioned this pull request Jun 17, 2024

Add snapped to integer vectors godot-rust/gdext#768

Merged

pacak mentioned this pull request Oct 8, 2024

example of public non generic library function not appearing pacak/cargo-show-asm#320

Closed

scottmcm mentioned this pull request Oct 25, 2024

#[inline(never)] does not work for async functions #129347

Open

FreezyLemon mentioned this pull request Nov 11, 2024

refactor: less #[inline(always)] rust-av/yuvxyb#26

Merged

greeble-dev mentioned this pull request Sep 5, 2025

Change bevy_math to use inline instead of inline(always) bevyengine/bevy#20887

Merged

Shatur mentioned this pull request Oct 11, 2025

General entity set cleanup bevyengine/bevy#21498

Open

svenhuster mentioned this pull request Jan 29, 2026

The problem of non-transitivity of Rust's inline has seemed to be fixed matklad/benchmarks#2

Open

Uh oh!

Conversation

saethlin commented Oct 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rustbot commented Oct 7, 2023

Uh oh!

saethlin commented Oct 7, 2023

Uh oh!

This comment has been minimized.

bors commented Oct 7, 2023

Uh oh!

This comment has been minimized.

bors commented Oct 7, 2023

Uh oh!

This comment has been minimized.

rust-timer commented Oct 7, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

saethlin commented Oct 7, 2023

Uh oh!

This comment has been minimized.

bors commented Oct 7, 2023

Uh oh!

This comment has been minimized.

bors commented Oct 7, 2023

Uh oh!

This comment has been minimized.

rust-timer commented Oct 8, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

briansmith commented Jan 12, 2024

Uh oh!

saethlin commented Jan 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

saethlin commented Oct 7, 2023 •

edited

Loading

saethlin commented Jan 12, 2024 •

edited

Loading