-
Notifications
You must be signed in to change notification settings - Fork 6
Inline pass scales as O(n^2) #554
Copy link
Copy link
Open
Labels
C-feature-requestCategory: This is a feature request, i.e: not implemented / a PRCategory: This is a feature request, i.e: not implemented / a PRarea: compiler passArea: compiler pass related issues.Area: compiler pass related issues.category: bugCategory: this is a bug or something isn't working as expected.Category: this is a bug or something isn't working as expected.performance: optimizationPerformance: issues and PRs related to runtime performance optimizations.Performance: issues and PRs related to runtime performance optimizations.
Metadata
Metadata
Assignees
Labels
C-feature-requestCategory: This is a feature request, i.e: not implemented / a PRCategory: This is a feature request, i.e: not implemented / a PRarea: compiler passArea: compiler pass related issues.Area: compiler pass related issues.category: bugCategory: this is a bug or something isn't working as expected.Category: this is a bug or something isn't working as expected.performance: optimizationPerformance: issues and PRs related to runtime performance optimizations.Performance: issues and PRs related to runtime performance optimizations.
The rewrite
Walk(Inline(...))scales asO(n^2). Any compiler pass should scale at most linearly in program size.The quadratic scaling comes from
inline.pywhere each block is split in two, and the inline region is inserted in the middle. Here, splitting blocks in two comes with the extra O(n) factor:The
whileloop is over all statements within a block. This seems harmless at first. But for heavily inlined kernels, most of the program will eventually be contained in a single block. Here,ncan approach the total program size.Additionally, there are statements like
inline_region.blocks.popfirst()that also provide O(n) factors (popping the first element of a list).@cduck has observed quadratic scaling in practice, so this bottleneck is relevant in practice.
A temporary fix has been introduced in #552.
However, the inline pass should be refactored more generally to remove any O(n^2) components.