Skip to content

UPSTREAM PR #1359: fix VAE on CPU#89

Open
loci-dev wants to merge 3 commits intomainfrom
loci/pr-1359-sd_fix_vae_cpu
Open

UPSTREAM PR #1359: fix VAE on CPU#89
loci-dev wants to merge 3 commits intomainfrom
loci/pr-1359-sd_fix_vae_cpu

Conversation

@loci-dev
Copy link

Note

Source pull request: leejet/stable-diffusion.cpp#1359

I've included other fixes for minor ~related issues I've found while debugging this.

Fixes #1353

@loci-dev loci-dev temporarily deployed to stable-diffusion-cpp-prod March 21, 2026 04:12 — with GitHub Actions Inactive
@loci-review
Copy link

loci-review bot commented Mar 21, 2026

Flame Graph: build.bin.sd-cli::_ZN8nlohmann16json_abi_v3_11_26detail5lexerINS0_10basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS0_14adl_serializerES5_IhSaIhEEEENS1_22iterator_input_adapterIN9__gnu_cxx17__normal_iteratorIPKcSB_EEEEE4scanEv

Target version:

Flame Graph: build.bin.sd-cli::_ZN8nlohmann16json_abi_v3_11_26detail5lexerINS0_10basic_jsonISt3mapSt6vectorNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEblmdSaNS0_14adl_serializerES5_IhSaIhEEEENS1_22iterator_input_adapterIN9__gnu_cxx17__normal_iteratorIPKcSB_EEEEE4scanEv

Target version eliminates helper functions (skip_bom, skip_whitespace, scan_comment) from the worst-case execution path, reducing response time by 20,867ns through compiler inlining optimizations.

Additional Findings

Commits address VAE CPU correctness, backend management, and symbol typos—no intentional performance changes. Core inference operations (tensor computations, convolutions) unaffected. Standard library regressions are compiler artifacts from template instantiation differences, not application code changes.

🔎 Full breakdown: Loci Inspector
💬 Questions? Tag @loci-dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants