Skip to content

UPSTREAM PR #1366: chore: clean up unused variables in ipndm_v implementation#92

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1366-sd_fix_ipndm_v
Open

UPSTREAM PR #1366: chore: clean up unused variables in ipndm_v implementation#92
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1366-sd_fix_ipndm_v

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1366

Since the calculations in the reference implementation are way more complex, I believe the use of those constant factors is intentional. So, just clean up the h_n_2 and h_n_3 unused variables.

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 25, 2026 04:21 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 25, 2026

Overview

Analysis of 49,623 functions shows minimal performance impact from commit e273924 (cleanup of unused variables in iPNDM_V implementation). Modified: 60 functions (0.12%), New: 0, Removed: 0, Unchanged: 49,563 (99.88%).

Binaries analyzed:

  • build.bin.sd-cli: +0.005% power consumption (491,821.56 nJ → 491,847.65 nJ)
  • build.bin.sd-server: -0.115% power consumption (528,347.68 nJ → 527,739.00 nJ)

Net positive outcome with compiler-driven optimizations improving server efficiency.

Function Analysis

Major Improvements:

  • std::vector<std::string>::back (build.bin.sd-cli): Response time -41.96% (-189.85 ns: 452.50 ns → 262.65 ns), Throughput time -73.10% (-189.85 ns). Entry block consolidation eliminated intermediate jumps.

  • std::_Sp_counted_ptr_inplace<Anima::FinalLayer>::_M_destroy (build.bin.sd-cli): Response time -37.63% (-188.72 ns: 501.48 ns → 312.76 ns), Throughput time -64.25% (-188.72 ns). Optimized stack setup reduces destructor overhead.

  • ggml_log_internal (build.bin.sd-server): Response time -9.83% (-43.96 ns: 447.22 ns → 403.26 ns), Throughput time -25.22% (-43.95 ns). Eliminated jump block reduces logging overhead.

Minor Regressions:

  • std::__detail::_Hash_code_base::_M_bucket_index (build.bin.sd-cli): Response time +44.18% (+35.35 ns: 80.01 ns → 115.36 ns), Throughput time +62.50% (+35.35 ns). Added indirection in entry sequence, but absolute impact minimal (35 ns).

  • std::make_shared<Linear> (build.bin.sd-server): Response time +1.68% (+39.27 ns: 2,343.54 ns → 2,382.81 ns), Throughput time +37.62% (+39.29 ns). Unnecessary control flow detour during layer construction (initialization only, not inference hot path).

Other analyzed functions showed minor improvements in memory management and vector operations, with no changes affecting core inference operations.

Additional Findings

No impact on performance-critical inference operations. The denoising loop, UNet forward passes, VAE operations, and GPU-accelerated tensor operations remain unchanged. All performance differences stem from compiler optimization artifacts triggered by code cleanup, not algorithmic modifications. The source code change (removing unused variables h_n_2, h_n_3 from iPNDM_V sampler) enabled better compiler optimization across both binaries without affecting functional behavior.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants