UPSTREAM PR #1354: feat: add Euler CFG++ and Euler-A CFG++ samplers by loci-dev · Pull Request #85 · auroralabs-loci/stable-diffusion.cpp

loci-dev · 2026-03-18T04:23:41Z

Note

Source pull request: leejet/stable-diffusion.cpp#1354

This PR adds support for the Euler CFG++ and Euler Ancestral CFG++ samplers: CFG++.

The logic from the code has been adapted from their repository and checked against ComfyUI's implementation and I tried to keep the sampler style as close as possible to the existing ones.
Some changes were needed in src/stable-diffusion.cpp as this specific sampler requires the unconditioned output in order to work.
This currently doesn't work with Spectrum cache.

As any CFG++ sampler you must use very low CFG values (for SDXL often less than 2).

I'd be very grateful if anyone could review this, as it's the first sampler I implement that requires this kind of changes.

loci-review · 2026-03-18T05:24:13Z

Overview

Analysis of 49,691 functions (85 modified, 74 new, 70 removed) across two binaries shows minor overall impact with near-zero power consumption changes:

build.bin.sd-cli: +0.12% (+604nJ)
build.bin.sd-server: -0.12% (-636nJ)

Commit b88df73 adds Euler CFG++ sampler support. Most performance variations stem from compiler optimization differences in STL code rather than application changes.

Function Analysis

Most Impacted Functions:

std::_Rb_tree::begin() (sd-cli, two variants) - Red-black tree iterator for tensor maps

Response time: 83.6ns → 265.7ns (+182ns, +218%)
Throughput time: 63.0ns → 245.1ns (+182ns, +289%)
Cause: Compiler code generation regression with extra intermediate block and entry overhead (25ns → 195ns)
Impact: Non-critical - used in model initialization, not inference loops

std::vector::back() (sd-cli) - IMPROVEMENT

Response time: 452.5ns → 262.7ns (-190ns, -42%)
Throughput time: 259.7ns → 69.9ns (-190ns, -73%)
Cause: Entry block consolidation eliminated indirect branch (198ns → 19ns)
Impact: Beneficial for frequent vector operations

Sampler name matching lambda (sd-server, main.cpp:912-942)

Response time: 12,167ns → 13,138ns (+971ns, +8%)
Throughput time: 512.7ns → 577.4ns (+65ns, +13%)
Cause: Expanded lookup map from 16 to 20 entries (+112 bytes stack) for new Euler CFG++ samplers
Impact: Justified feature overhead in non-critical request parsing path (~1μs before multi-second inference)

GGMLRunner::alloc_params_ctx() (sd-server) - IMPROVEMENT

Response time: 3,553ns → 3,393ns (-160ns, -4.5%)
Throughput time: 192.7ns (unchanged)
Cause: Optimizations in called functions (ggml_init -80ns, memory allocation -43ns, logging -44ns)
Impact: Faster model initialization

ggml_log_internal() (sd-server) - IMPROVEMENT

Response time: 447ns → 403ns (-44ns, -10%)
Throughput time: 174ns → 130ns (-44ns, -25%)
Cause: Entry sequence consolidation (8 → 7 blocks)
Impact: Reduced logging overhead throughout GGML operations

Other analyzed functions (regex operations, smart pointer cleanup, hash table operations, error handlers) showed minor changes (<200ns) in non-critical paths with no impact on inference performance.

Additional Findings

GPU/ML Operations: No direct GPU kernel modifications. GGML infrastructure improvements (memory allocation 5-8% faster, logging 10% faster) benefit model initialization. Core inference pipeline unaffected.

STL Pattern: Compiler optimization differences created divergent outcomes - vector/smart pointer operations improved significantly while red-black tree/regex operations regressed. Net impact negligible as regressions occur in non-hot paths (initialization, validation).

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

feat: add euler cfg++ and euler_a cfg++ samplers

b88df73

loci-dev temporarily deployed to stable-diffusion-cpp-prod March 18, 2026 04:23 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UPSTREAM PR #1354: feat: add Euler CFG++ and Euler-A CFG++ samplers#85

UPSTREAM PR #1354: feat: add Euler CFG++ and Euler-A CFG++ samplers#85
loci-dev wants to merge 1 commit intomainfrom
loci/pr-1354-euler_cfg_pp

loci-dev commented Mar 18, 2026

Uh oh!

loci-review bot commented Mar 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

loci-dev commented Mar 18, 2026

Uh oh!

loci-review bot commented Mar 18, 2026

Overview

Function Analysis

Additional Findings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants