Skip to content

UPSTREAM PR #1364: feat: add support for the eta parameter to ancestral samplers#93

Open
loci-dev wants to merge 2 commits intomainfrom
loci/pr-1364-sd_samplers_eta
Open

UPSTREAM PR #1364: feat: add support for the eta parameter to ancestral samplers#93
loci-dev wants to merge 2 commits intomainfrom
loci/pr-1364-sd_samplers_eta

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1364

Applies the eta parameter to the DPM++(2s) and Euler ancestral implementations, so the amount of injected noise can be adjusted (e.g. Euler A with eta=0 should be the same as Euler). It reuses the calculation from the RES samplers, so it's mostly a refactor and interface/UI adjustments.

#1363 already includes this, but since this is self-contained and useful on its own, I believe it's worth including directly.

@loci-dev loci-dev deployed to stable-diffusion-cpp-prod March 25, 2026 04:21 — with GitHub Actions Active
@loci-review
Copy link

loci-review bot commented Mar 25, 2026

Overview

Analysis of 49,629 functions across two binaries reveals minimal performance impact. Modified: 68 functions (0.14%), New: 4, Removed: 0, Unchanged: 49,557 (99.86%).

Binaries analyzed:

  • build.bin.sd-cli: +0.023% power consumption
  • build.bin.sd-server: -0.14% power consumption

Function Analysis

Most performance changes occur in C++ STL functions due to compiler code generation differences, not application source modifications:

std::_Rb_tree::end() (sd-cli): Response time +183ns (+228%), throughput time +183ns (+307%). CFG shows entry block increased 9x (21ns → 195ns) with added indirect jump. No source changes—system library function.

std::__make_move_if_noexcept_iterator (sd-cli): Response time +185ns (+196%), throughput time +185ns (+317%). Entry block split with additional branches. Used in prompt attention parsing, not inference hot path.

GGMLRunner::alloc_params_ctx (sd-server): Response time -171ns (-4.8%), throughput time unchanged. Improvement from optimized downstream functions (ggml_init, ggml_log_internal). One-time initialization function.

ggml_log_internal (sd-server): Response time -44ns (-9.8%), throughput time -44ns (-25.2%). CFG consolidated from 8 to 7 blocks, eliminating intermediate jump.

Other analyzed functions (std::vector::back(), smart pointer operations, iterators) show mixed changes (±38-190ns) in non-critical paths with no source modifications.

Additional Findings

No functions in performance-critical inference paths (UNet/DiT forward passes, attention mechanisms, VAE operations) were affected. All changes are in initialization, logging, or STL utilities. The consistent pattern of CFG reorganization across STL functions suggests compiler version or optimization flag differences between builds rather than code regressions.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants