UPSTREAM PR #1366: chore: clean up unused variables in ipndm_v implementation by loci-dev · Pull Request #92 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-25T04:21:01Z

Note

Source pull request: leejet/stable-diffusion.cpp#1366

Since the calculations in the reference implementation are way more complex, I believe the use of those constant factors is intentional. So, just clean up the h_n_2 and h_n_3 unused variables.

loci-review · 2026-03-25T05:17:38Z

Overview

Analysis of 49,623 functions shows minimal performance impact from commit e273924 (cleanup of unused variables in iPNDM_V implementation). Modified: 60 functions (0.12%), New: 0, Removed: 0, Unchanged: 49,563 (99.88%).

Binaries analyzed:

build.bin.sd-cli: +0.005% power consumption (491,821.56 nJ → 491,847.65 nJ)
build.bin.sd-server: -0.115% power consumption (528,347.68 nJ → 527,739.00 nJ)

Net positive outcome with compiler-driven optimizations improving server efficiency.

Function Analysis

Major Improvements:

std::vector<std::string>::back (build.bin.sd-cli): Response time -41.96% (-189.85 ns: 452.50 ns → 262.65 ns), Throughput time -73.10% (-189.85 ns). Entry block consolidation eliminated intermediate jumps.
std::_Sp_counted_ptr_inplace<Anima::FinalLayer>::_M_destroy (build.bin.sd-cli): Response time -37.63% (-188.72 ns: 501.48 ns → 312.76 ns), Throughput time -64.25% (-188.72 ns). Optimized stack setup reduces destructor overhead.
ggml_log_internal (build.bin.sd-server): Response time -9.83% (-43.96 ns: 447.22 ns → 403.26 ns), Throughput time -25.22% (-43.95 ns). Eliminated jump block reduces logging overhead.

Minor Regressions:

std::__detail::_Hash_code_base::_M_bucket_index (build.bin.sd-cli): Response time +44.18% (+35.35 ns: 80.01 ns → 115.36 ns), Throughput time +62.50% (+35.35 ns). Added indirection in entry sequence, but absolute impact minimal (35 ns).
std::make_shared<Linear> (build.bin.sd-server): Response time +1.68% (+39.27 ns: 2,343.54 ns → 2,382.81 ns), Throughput time +37.62% (+39.29 ns). Unnecessary control flow detour during layer construction (initialization only, not inference hot path).

Other analyzed functions showed minor improvements in memory management and vector operations, with no changes affecting core inference operations.

Additional Findings

No impact on performance-critical inference operations. The denoising loop, UNet forward passes, VAE operations, and GPU-accelerated tensor operations remain unchanged. All performance differences stem from compiler optimization artifacts triggered by code cleanup, not algorithmic modifications. The source code change (removing unused variables h_n_2, h_n_3 from iPNDM_V sampler) enabled better compiler optimization across both binaries without affecting functional behavior.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

chore: clean up unused variables in ipndm_v implementation

e273924

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 25, 2026 04:21 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1366: chore: clean up unused variables in ipndm_v implementation#92

UPSTREAM PR #1366: chore: clean up unused variables in ipndm_v implementation#92
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1366-sd_fix_ipndm_v

loci-dev commented Mar 25, 2026

Uh oh!

loci-review bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 25, 2026

Uh oh!

loci-review bot commented Mar 25, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants