Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion

### Before submitting your bug report

- [ ] I've tried using the "Ask AI" feature on the [Continue docs site](https://docs.continue.dev/) to see if the docs have an answer
- [x] I'm not able to find a related conversation on [GitHub discussions](https://github.com/continuedev/continue/discussions) that reports the same bug
- [x] I'm not able to find an [open issue](https://github.com/continuedev/continue/issues?q=is%3Aopen+is%3Aissue) that reports the same bug
- [x] I've seen the [troubleshooting guide](https://docs.continue.dev/troubleshooting) on the Continue Docs

### Relevant environment info

```Markdown
- OS: `Windows 11`
- Continue version: `v1.2.22`
- IDE version: `1.113.0`
- Model: `qwen2.5-coder-0.5b-instruct`
- config:
  
  - name: Apply_qwen2.5-coder-0.5b-instruct
    provider: lmstudio
    model: qwen2.5-coder-0.5b-instruct
    defaultCompletionOptions:
      maxTokens: 31744 # 32256 error? # 31744 work # 32768 error? # 30720 work # 8192 work
    roles:
      - apply
    apiBase: http://localhost:1234/v1/
```

### Description

**Title**  
Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion

**Environment**  
- **VSCode**: `1.113.0`  
- **Continue extension**: `v1.2.22`  
- **OS**: `Windows 11`  
- **Model used**: `qwen2.5-coder-0.5b-instruct` (advertised Context Length: 32,768)  

**Reproduction steps**  
1. Enable `Continue: Enable Console` in **Continue → Settings**, then check the Continue output channel and Developer Tools (Help → Toggle Developer Tools) for logs.
2. Add or modify `defaultCompletionOptions` in `config.yaml` with a large `maxTokens` value (examples below).  
3. Trigger a Complete via Continue in VSCode.  

**Observed behavior**  
- When `maxTokens` is set to **32768** (or **32767**, **32256** in tests), Continue shows `Options` containing the value but the result contains **zero tokens**:

```log
Type: Complete
Result: Success
Prompt Tokens: 0
Generated Tokens: 0
ThinkingTokens: 0
Total Time: 2.04s
```

```json
Options
{
  "maxTokens": 32768,
  "model": "qwen2.5-coder-0.5b-instruct",
  "raw": true,
  "reasoning": false
}
```
- When `maxTokens` is set to **30720**, **31744** or **8192** the generation succeeds and tokens are produced: (If correct, it should be over 8000 tokens, because there were some issues after multiple apply and cancellation attempts.)
```log
Type: Complete
Result: Success
Prompt Tokens: 1393
Generated Tokens: 1225
...
```

```json
Options
{
  "maxTokens": 30720,
  "model": "qwen2.5-coder-0.5b-instruct",
  "raw": true,
  "reasoning": false
}
```
- **Without `defaultCompletionOptions` set, default `maxTokens = 4096`. For prompts longer than 4096 tokens the generation is truncated/stopped and the extension does not notify the user.**  
- **With `defaultCompletionOptions` set to large values, some values succeed (e.g., 30720, 31744) while values near 32768 fail silently (e.g., 32256, 32767, 32768) producing Result: Success but Prompt Tokens: 0 and Generated Tokens: 0.**  
- This indicates both: **(a)** the extension/backend enforces a different effective maximum than the model advertises, and **(b)** the extension does not surface clear validation errors to the user.

**Listener leak observed**  
- While reproducing apply, Developer Tools shows warnings like:
```
[042] potential listener LEAK detected, having 571 listeners already. MOST frequent listener (172): Error
Error
ERR potential listener LEAK detected, dominated: Error
```
- This appears each time apply is triggered and the number of listeners grows. Observed consequences include memory growth and silent failures (zero-token results).  
- I can attach the full Developer Tools Console stack trace and Network trace if helpful.

**Prompt content incorrect after repeated apply**  
- Initially, apply shows correct Prompt content containing both original code and suggested code. After repeated tests or when listener warnings appear, the Continue Console shows incorrect or empty Prompt content that does not include the original code and suggested code.  (Or only partial content[code])
- Reproduction pattern:
  1. Trigger apply several times.  
  2. Early runs show correct Prompt content.  
  3. Later runs show Prompt missing or replaced with incorrect content and generation returns zero tokens.  
- Likely causes: race condition in prompt assembly, shared mutable state being overwritten, or multiple event handlers modifying/clearing the prompt before send.

**Additional observations and test matrix**  
- Values that **work** in my tests: `8192`, `30720`, `31744`.  
- Values that **fail** or produce zero-token result: `32256`, `32767`, `32768`.  
- Hypothesis: there may be an internal buffer or effective maximum (e.g., 32,768 − 1024 = 31,744), a schema/validation boundary, or integer/edge-case handling near 32k. The listener leak suggests repeated registration per apply invocation causing resource exhaustion or short-circuit.

**Suggested debug checks for maintainers**  
1. Check `contributes.configuration` schema or any validation logic for `maxTokens` — is there a `maximum` set?  
2. Verify whether the extension or backend reserves a token buffer (e.g., 1024) and whether that effective maximum is documented or enforced at config load time.  
3. Inspect integer handling and serialization for `maxTokens` to rule out overflow/edge-case bugs.  
4. Investigate repeated event listener registration during apply; check for missing `dispose`/`removeListener` or missing `context.subscriptions` usage.  
5. Add debug logs that print the prompt immediately before sending the request and verify the network payload.  
6. Ensure that when a config value is invalid or rejected, the extension returns a clear error message (in Output/Console and as a VSCode notification) rather than silent success.  
7. Provide the actual effective maximum in Settings UI or validate `config.yaml` on load with a clear message indicating allowed range.

**Logs to attach**  
- Continue Output channel logs (snippets above).  
- Developer Tools Console and Network traces captured while reproducing the issue (request payloads, backend responses, and listener leak stack trace).  
- `config.yaml` snippet used (remove any sensitive keys).

**Suggested fix / UX improvements**  
- Validate `defaultCompletionOptions` on load and show a clear VSCode notification if values are out of range.  
- When a request is rejected or short-circuited, log a descriptive error in the extension Output channel and Developer Tools console.  
- Fix repeated listener registration and ensure proper disposal of event handlers.  
- Ensure prompt assembly uses a local immutable snapshot of editor content before any async operations.  
- Document the effective maximum `maxTokens` for each supported model or any backend buffer that reduces the usable limit from the model's advertised context length.

**Minimal reproduction checklist for maintainers**  
- Reproduce with the provided `config.yaml` snippet and the `qwen2.5-coder-0.5b-instruct` model.  
- Compare behavior for `maxTokens` values: `30720`, `31744`, `32256`, `32767`, `32768`.  
- Capture the exact request payload sent by the extension and the backend response.  
- Capture Developer Tools Console stack trace showing the listener leak.

**Labels suggested**  
- bug, needs-triage, extension

My English isn't perfect; some content was AI-assisted.

### To reproduce

**Reproduction steps**  
1. Enable `Continue: Enable Console` in **Continue → Settings**, then check the Continue output channel and Developer Tools (Help → Toggle Developer Tools) for logs.
2. Add or modify `defaultCompletionOptions` in `config.yaml` with a large `maxTokens` value.  
3. Trigger a Complete (Apply) via Continue in VSCode.  
4. (repeat, setting difference `defaultCompletionOptions` in `config.yaml` and try again).

### Log output

```Shell
I can't find `/core.log`.
In the initial test, the result was cancelled, but in subsequent tests, the result was successful, except for tests 32256, 32767, and 32768, which either failed or produced a zero-token.(A zero-token indicates a successful result.)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion #11963

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Continue extension fails or returns zero tokens when defaultCompletionOptions.maxTokens is set near 32k and suggestion #11963

Description

Before submitting your bug report

Relevant environment info

Description

To reproduce

Log output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions