Skip to content

Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion #11963

@Kuroasashinz

Description

@Kuroasashinz

Before submitting your bug report

Relevant environment info

- OS: `Windows 11`
- Continue version: `v1.2.22`
- IDE version: `1.113.0`
- Model: `qwen2.5-coder-0.5b-instruct`
- config:
  
  - name: Apply_qwen2.5-coder-0.5b-instruct
    provider: lmstudio
    model: qwen2.5-coder-0.5b-instruct
    defaultCompletionOptions:
      maxTokens: 31744 # 32256 error? # 31744 work # 32768 error? # 30720 work # 8192 work
    roles:
      - apply
    apiBase: http://localhost:1234/v1/

Description

Title
Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion

Environment

  • VSCode: 1.113.0
  • Continue extension: v1.2.22
  • OS: Windows 11
  • Model used: qwen2.5-coder-0.5b-instruct (advertised Context Length: 32,768)

Reproduction steps

  1. Enable Continue: Enable Console in Continue → Settings, then check the Continue output channel and Developer Tools (Help → Toggle Developer Tools) for logs.
  2. Add or modify defaultCompletionOptions in config.yaml with a large maxTokens value (examples below).
  3. Trigger a Complete via Continue in VSCode.

Observed behavior

  • When maxTokens is set to 32768 (or 32767, 32256 in tests), Continue shows Options containing the value but the result contains zero tokens:
Type: Complete
Result: Success
Prompt Tokens: 0
Generated Tokens: 0
ThinkingTokens: 0
Total Time: 2.04s
Options
{
  "maxTokens": 32768,
  "model": "qwen2.5-coder-0.5b-instruct",
  "raw": true,
  "reasoning": false
}
  • When maxTokens is set to 30720, 31744 or 8192 the generation succeeds and tokens are produced: (If correct, it should be over 8000 tokens, because there were some issues after multiple apply and cancellation attempts.)
Type: Complete
Result: Success
Prompt Tokens: 1393
Generated Tokens: 1225
...
Options
{
  "maxTokens": 30720,
  "model": "qwen2.5-coder-0.5b-instruct",
  "raw": true,
  "reasoning": false
}
  • Without defaultCompletionOptions set, default maxTokens = 4096. For prompts longer than 4096 tokens the generation is truncated/stopped and the extension does not notify the user.
  • With defaultCompletionOptions set to large values, some values succeed (e.g., 30720, 31744) while values near 32768 fail silently (e.g., 32256, 32767, 32768) producing Result: Success but Prompt Tokens: 0 and Generated Tokens: 0.
  • This indicates both: (a) the extension/backend enforces a different effective maximum than the model advertises, and (b) the extension does not surface clear validation errors to the user.

Listener leak observed

  • While reproducing apply, Developer Tools shows warnings like:
[042] potential listener LEAK detected, having 571 listeners already. MOST frequent listener (172): Error
Error
ERR potential listener LEAK detected, dominated: Error
  • This appears each time apply is triggered and the number of listeners grows. Observed consequences include memory growth and silent failures (zero-token results).
  • I can attach the full Developer Tools Console stack trace and Network trace if helpful.

Prompt content incorrect after repeated apply

  • Initially, apply shows correct Prompt content containing both original code and suggested code. After repeated tests or when listener warnings appear, the Continue Console shows incorrect or empty Prompt content that does not include the original code and suggested code. (Or only partial content[code])
  • Reproduction pattern:
    1. Trigger apply several times.
    2. Early runs show correct Prompt content.
    3. Later runs show Prompt missing or replaced with incorrect content and generation returns zero tokens.
  • Likely causes: race condition in prompt assembly, shared mutable state being overwritten, or multiple event handlers modifying/clearing the prompt before send.

Additional observations and test matrix

  • Values that work in my tests: 8192, 30720, 31744.
  • Values that fail or produce zero-token result: 32256, 32767, 32768.
  • Hypothesis: there may be an internal buffer or effective maximum (e.g., 32,768 − 1024 = 31,744), a schema/validation boundary, or integer/edge-case handling near 32k. The listener leak suggests repeated registration per apply invocation causing resource exhaustion or short-circuit.

Suggested debug checks for maintainers

  1. Check contributes.configuration schema or any validation logic for maxTokens — is there a maximum set?
  2. Verify whether the extension or backend reserves a token buffer (e.g., 1024) and whether that effective maximum is documented or enforced at config load time.
  3. Inspect integer handling and serialization for maxTokens to rule out overflow/edge-case bugs.
  4. Investigate repeated event listener registration during apply; check for missing dispose/removeListener or missing context.subscriptions usage.
  5. Add debug logs that print the prompt immediately before sending the request and verify the network payload.
  6. Ensure that when a config value is invalid or rejected, the extension returns a clear error message (in Output/Console and as a VSCode notification) rather than silent success.
  7. Provide the actual effective maximum in Settings UI or validate config.yaml on load with a clear message indicating allowed range.

Logs to attach

  • Continue Output channel logs (snippets above).
  • Developer Tools Console and Network traces captured while reproducing the issue (request payloads, backend responses, and listener leak stack trace).
  • config.yaml snippet used (remove any sensitive keys).

Suggested fix / UX improvements

  • Validate defaultCompletionOptions on load and show a clear VSCode notification if values are out of range.
  • When a request is rejected or short-circuited, log a descriptive error in the extension Output channel and Developer Tools console.
  • Fix repeated listener registration and ensure proper disposal of event handlers.
  • Ensure prompt assembly uses a local immutable snapshot of editor content before any async operations.
  • Document the effective maximum maxTokens for each supported model or any backend buffer that reduces the usable limit from the model's advertised context length.

Minimal reproduction checklist for maintainers

  • Reproduce with the provided config.yaml snippet and the qwen2.5-coder-0.5b-instruct model.
  • Compare behavior for maxTokens values: 30720, 31744, 32256, 32767, 32768.
  • Capture the exact request payload sent by the extension and the backend response.
  • Capture Developer Tools Console stack trace showing the listener leak.

Labels suggested

  • bug, needs-triage, extension

My English isn't perfect; some content was AI-assisted.

To reproduce

Reproduction steps

  1. Enable Continue: Enable Console in Continue → Settings, then check the Continue output channel and Developer Tools (Help → Toggle Developer Tools) for logs.
  2. Add or modify defaultCompletionOptions in config.yaml with a large maxTokens value.
  3. Trigger a Complete (Apply) via Continue in VSCode.
  4. (repeat, setting difference defaultCompletionOptions in config.yaml and try again).

Log output

I can't find `/core.log`.
In the initial test, the result was cancelled, but in subsequent tests, the result was successful, except for tests 32256, 32767, and 32768, which either failed or produced a zero-token.(A zero-token indicates a successful result.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:configurationRelates to configuration optionside:vscodeRelates specifically to VS Code extensionkind:bugIndicates an unexpected problem or unintended behavioros:windowsHappening specifically on Windows

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions