Skip to content

UPSTREAM PR #1354: feat: add Euler CFG++ and Euler-A CFG++ samplers#85

Open
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1354-euler_cfg_pp
Open

UPSTREAM PR #1354: feat: add Euler CFG++ and Euler-A CFG++ samplers#85
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1354-euler_cfg_pp

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1354

This PR adds support for the Euler CFG++ and Euler Ancestral CFG++ samplers: CFG++.

The logic from the code has been adapted from their repository and checked against ComfyUI's implementation and I tried to keep the sampler style as close as possible to the existing ones.
Some changes were needed in src/stable-diffusion.cpp as this specific sampler requires the unconditioned output in order to work.
This currently doesn't work with Spectrum cache.

As any CFG++ sampler you must use very low CFG values (for SDXL often less than 2).

I'd be very grateful if anyone could review this, as it's the first sampler I implement that requires this kind of changes.

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 18, 2026 04:23 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 18, 2026

Overview

Analysis of 49,691 functions (85 modified, 74 new, 70 removed) across two binaries shows minor overall impact with near-zero power consumption changes:

  • build.bin.sd-cli: +0.12% (+604nJ)
  • build.bin.sd-server: -0.12% (-636nJ)

Commit b88df73 adds Euler CFG++ sampler support. Most performance variations stem from compiler optimization differences in STL code rather than application changes.

Function Analysis

Most Impacted Functions:

std::_Rb_tree::begin() (sd-cli, two variants) - Red-black tree iterator for tensor maps

  • Response time: 83.6ns → 265.7ns (+182ns, +218%)
  • Throughput time: 63.0ns → 245.1ns (+182ns, +289%)
  • Cause: Compiler code generation regression with extra intermediate block and entry overhead (25ns → 195ns)
  • Impact: Non-critical - used in model initialization, not inference loops

std::vector::back() (sd-cli) - IMPROVEMENT

  • Response time: 452.5ns → 262.7ns (-190ns, -42%)
  • Throughput time: 259.7ns → 69.9ns (-190ns, -73%)
  • Cause: Entry block consolidation eliminated indirect branch (198ns → 19ns)
  • Impact: Beneficial for frequent vector operations

Sampler name matching lambda (sd-server, main.cpp:912-942)

  • Response time: 12,167ns → 13,138ns (+971ns, +8%)
  • Throughput time: 512.7ns → 577.4ns (+65ns, +13%)
  • Cause: Expanded lookup map from 16 to 20 entries (+112 bytes stack) for new Euler CFG++ samplers
  • Impact: Justified feature overhead in non-critical request parsing path (~1μs before multi-second inference)

GGMLRunner::alloc_params_ctx() (sd-server) - IMPROVEMENT

  • Response time: 3,553ns → 3,393ns (-160ns, -4.5%)
  • Throughput time: 192.7ns (unchanged)
  • Cause: Optimizations in called functions (ggml_init -80ns, memory allocation -43ns, logging -44ns)
  • Impact: Faster model initialization

ggml_log_internal() (sd-server) - IMPROVEMENT

  • Response time: 447ns → 403ns (-44ns, -10%)
  • Throughput time: 174ns → 130ns (-44ns, -25%)
  • Cause: Entry sequence consolidation (8 → 7 blocks)
  • Impact: Reduced logging overhead throughout GGML operations

Other analyzed functions (regex operations, smart pointer cleanup, hash table operations, error handlers) showed minor changes (<200ns) in non-critical paths with no impact on inference performance.

Additional Findings

GPU/ML Operations: No direct GPU kernel modifications. GGML infrastructure improvements (memory allocation 5-8% faster, logging 10% faster) benefit model initialization. Core inference pipeline unaffected.

STL Pattern: Compiler optimization differences created divergent outcomes - vector/smart pointer operations improved significantly while red-black tree/regex operations regressed. Net impact negligible as regressions occur in non-hot paths (initialization, validation).

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants