Skip to content

Fix music generation token stopping#2057

Open
dysangel wants to merge 3 commits intoLostRuins:concedofrom
dysangel:fix/music-gen-token-stopping
Open

Fix music generation token stopping#2057
dysangel wants to merge 3 commits intoLostRuins:concedofrom
dysangel:fix/music-gen-token-stopping

Conversation

@dysangel
Copy link

@dysangel dysangel commented Mar 21, 2026

Problem

I was trying to get the music UI working on MacOS. bf16 inference is not hardware accelerated on metal, so I switched for quantised models.

The 'Plan' button would not work for me, as Music Phase 1 planning generation was always continuing generating to the kv cache limit.

Root Cause

After the FSM guides the model through metadata fields (bpm, caption, duration, keyscale, language, timesignature) and forces TOKEN_THINK_END, it transitions to CODES state and disables itself. The model should then naturally generate TOKEN_IM_END to stop, but in some cases (especially with quantized models) this doesn't happen efficiently, causing the generation to continue until the KV cache is exhausted.

Solution

Add a safety check that forces TOKEN_IM_END when the generation reaches the token limit. This prevents KV cache exhaustion while still allowing the model to generate normally (including any lyrics after the thinking block).

Changes

  • Modified otherarch/acestep/ace-qwen3.cpp to add a safety check before adding each token
  • Forces TOKEN_IM_END when gen_tokens.size() >= max_new_tokens - 1

Impact

  • Prevents KV cache exhaustion errors
  • Makes music generation more robust across all model types

Co-Authored-By: GLM-5

In Phase 1 lyrics mode, the FSM transitions to CODES state after
TOKEN_THINK_END and disables itself. The quantized Q4_K_M model was
not efficiently generating TOKEN_IM_END to stop the generation,
causing it to continue until hitting the 8192 token limit.

This fix forces TOKEN_IM_END to be generated immediately after
TOKEN_THINK_END in lyrics mode, ensuring clean completion of the
planning phase without excessive token generation.

Testing shows generation now completes in ~500ms instead of 80+
seconds with timeout errors.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dysangel dysangel changed the title Fix music generation token stopping for quantized models Fix music generation token stopping Mar 21, 2026
dysangel and others added 2 commits March 21, 2026 10:33
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Instead of forcing TOKEN_IM_END immediately after TOKEN_THINK_END,
only force it when we've reached the token limit. This allows the model
to generate lyrics after the thinking block while still preventing KV
cache exhaustion.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dysangel dysangel marked this pull request as draft March 21, 2026 10:52
@dysangel dysangel marked this pull request as ready for review March 21, 2026 11:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant