UPSTREAM PR #1364: feat: add support for the eta parameter to ancestral samplers#93
UPSTREAM PR #1364: feat: add support for the eta parameter to ancestral samplers#93
Conversation
OverviewAnalysis of 49,629 functions across two binaries reveals minimal performance impact. Modified: 68 functions (0.14%), New: 4, Removed: 0, Unchanged: 49,557 (99.86%). Binaries analyzed:
Function AnalysisMost performance changes occur in C++ STL functions due to compiler code generation differences, not application source modifications: std::_Rb_tree::end() (sd-cli): Response time +183ns (+228%), throughput time +183ns (+307%). CFG shows entry block increased 9x (21ns → 195ns) with added indirect jump. No source changes—system library function. std::__make_move_if_noexcept_iterator (sd-cli): Response time +185ns (+196%), throughput time +185ns (+317%). Entry block split with additional branches. Used in prompt attention parsing, not inference hot path. GGMLRunner::alloc_params_ctx (sd-server): Response time -171ns (-4.8%), throughput time unchanged. Improvement from optimized downstream functions (ggml_init, ggml_log_internal). One-time initialization function. ggml_log_internal (sd-server): Response time -44ns (-9.8%), throughput time -44ns (-25.2%). CFG consolidated from 8 to 7 blocks, eliminating intermediate jump. Other analyzed functions (std::vector::back(), smart pointer operations, iterators) show mixed changes (±38-190ns) in non-critical paths with no source modifications. Additional FindingsNo functions in performance-critical inference paths (UNet/DiT forward passes, attention mechanisms, VAE operations) were affected. All changes are in initialization, logging, or STL utilities. The consistent pattern of CFG reorganization across STL functions suggests compiler version or optimization flag differences between builds rather than code regressions. 🔎 Full breakdown: Loci Inspector |
Note
Source pull request: leejet/stable-diffusion.cpp#1364
Applies the
etaparameter to the DPM++(2s) and Euler ancestral implementations, so the amount of injected noise can be adjusted (e.g. Euler A witheta=0should be the same as Euler). It reuses the calculation from the RES samplers, so it's mostly a refactor and interface/UI adjustments.#1363 already includes this, but since this is self-contained and useful on its own, I believe it's worth including directly.