[Detail Bug] Prompt truncation uses limits from unused provider, forcing 7K cap on high-context models

# Summary
- **Context**: `OpenAiRequestFactory` is responsible for building request payloads and managing prompt truncation for OpenAI-compatible providers.
- **Bug**: The `truncatePromptForCompletion` method calculates the token limit based on the most restrictive limit of *both* configured models (`openaiModel` and `githubModelsChatModel`), regardless of which one will actually be used for the request.
- **Actual vs. expected**: The prompt is truncated to 7,000 tokens if *either* model is from the GPT-5 family, even if the primary provider supports 100,000 tokens. Expected behavior is to truncate based on the specific model used for the request.
- **Impact**: Users of high-context models (like `gpt-4o`) will experience severe and unnecessary context truncation (losing up to 93,000 tokens of context) because the default configuration for the unused provider includes a GPT-5 model.

# Code with bug
```java
public String truncatePromptForCompletion(String prompt) {
    if (prompt == null || prompt.isEmpty()) {
        return prompt;
    }

    String openaiModelId = normalizedModelId(false);
    String githubModelId = normalizedModelId(true);
    // BUG 🔴 Aggregates family check across BOTH configured models
    boolean gpt5Family = isGpt5Family(openaiModelId) || isGpt5Family(githubModelId);
    boolean reasoningModel = gpt5Family
            || canonicalModelName(openaiModelId).startsWith("o")
            || canonicalModelName(githubModelId).startsWith("o");

    // BUG 🔴 Uses the most restrictive limit if ANY reasoning/gpt5 model is configured
    int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
    String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);

    if (truncatedPrompt.length() < prompt.length()) {
        // BUG 🔴 Might show GPT-5 notice even when using a non-GPT-5 model
        String truncationNotice = gpt5Family ? TRUNCATION_NOTICE_GPT5 : TRUNCATION_NOTICE_GENERIC;
        return truncationNotice + truncatedPrompt;
    }

    return prompt;
}
```

# Evidence
1. **Reproduction Test**: A test case was created where `OPENAI_MODEL` was set to `gpt-4o` (high context) and `GITHUB_MODELS_CHAT_MODEL` was left as default (`openai/gpt-5`, low context). Despite using `gpt-4o` for the completion, the prompt was truncated to 7,000 tokens and prepended with a GPT-5 truncation notice.
2. **Default Values**: `DEFAULT_GITHUB_MODELS_MODEL` is `openai/gpt-5`. This means that by default, `gpt5Family` will always be `true` in `truncatePromptForCompletion`, forcing a 7,000 token limit on all completion requests across the entire application unless both providers are explicitly reconfigured.
3. **Contrast with Streaming**: The `prepareStreamingRequest` method correctly identifies the `tokenLimit` based on the *provided* `ApiProvider`, ensuring that truncation is correctly scoped to the model actually being used.

# Why has this bug gone undetected?
1. **Subtle Truncation**: Truncation often goes unnoticed in chat applications unless the input is exceptionally long or the user is specifically monitoring token usage or context retention.
2. **Confusing Notice**: The truncation notice `[Context truncated due to GPT-5 8K input limit]` might be dismissed by users as a general limitation of the "system" or they might assume the backend is using a different model than they expected.
3. **Mock Environment**: The use of hypothetical "gpt-5" models as defaults suggests this part of the codebase might be tested primarily against these models, where the bug's effect (restricting to 7k) matches the model's actual limit.

# Recommended fix
Refactor `truncatePromptForCompletion` to accept the `ApiProvider` as a parameter, or move the truncation logic into `buildCompletionRequest` where the provider is already known.

```java
// Recommended fix in buildCompletionRequest 🟢
public ResponseCreateParams buildCompletionRequest(
        String prompt, double temperature, RateLimitService.ApiProvider provider) {
    boolean useGitHubModels = provider == RateLimitService.ApiProvider.GITHUB_MODELS;
    String modelId = normalizedModelId(useGitHubModels);
    
    // Truncate based on the resolved modelId here 🟢
    int tokenLimit = (isGpt5Family(modelId) || canonicalModelName(modelId).startsWith("o")) 
        ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
    String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);
    
    return buildResponseParams(truncatedPrompt, temperature, modelId);
}
```

# Related bugs
- In `buildResponseParams`, reasoning effort and `maxOutputTokens` are only applied if `gpt5Family` is true. However, "o" models (like `o1-preview`) are also identified as `reasoningModel` but do not get these settings applied, even though they support them.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Detail Bug] Prompt truncation uses limits from unused provider, forcing 7K cap on high-context models #40

Summary

Code with bug

Evidence

Why has this bug gone undetected?

Recommended fix

Related bugs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Detail Bug] Prompt truncation uses limits from unused provider, forcing 7K cap on high-context models #40

Description

Summary

Code with bug

Evidence

Why has this bug gone undetected?

Recommended fix

Related bugs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions