-
-
Notifications
You must be signed in to change notification settings - Fork 14.8k
slice::Iter::fold optimizes poorly for some niche optimized types. #106288
Copy link
Copy link
Closed
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Metadata
Metadata
Assignees
Labels
A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.I-slowIssue: Problems and improvements with respect to performance of generated code.Issue: Problems and improvements with respect to performance of generated code.T-compilerRelevant to the compiler team, which will review and decide on the PR/issue.Relevant to the compiler team, which will review and decide on the PR/issue.
Type
Fields
Give feedbackNo fields configured for issues without a type.
I tried this code:
(nevermind the fact that these could obviously just use
slice::back)(godbolt link: https://rust.godbolt.org/z/6fjzo4faW )
I expected that all of these functions produce more or less similar assembly, as all of them just need to peel the last loop iteration to be able to optimize away the whole loop body. Indeed, the first two functions optimize just fine:
The
fold_{nonnull,ref}functions however don't optimize away the loop:I'm assuming this somehow has to do with
NonNulland&Thaving the null niche value, as I don't see any other reason for the differences between*const TandNonNull<T>. It doesn't seem to be happening with all niche optimized types though, as functions like these do optimize away the loop:This is using nightly rustc on godbolt, which currently is: