Skip to content

[Detail Bug] Prompt truncation uses limits from unused provider, forcing 7K cap on high-context models #40

@detail-app

Description

@detail-app

Summary

  • Context: OpenAiRequestFactory is responsible for building request payloads and managing prompt truncation for OpenAI-compatible providers.
  • Bug: The truncatePromptForCompletion method calculates the token limit based on the most restrictive limit of both configured models (openaiModel and githubModelsChatModel), regardless of which one will actually be used for the request.
  • Actual vs. expected: The prompt is truncated to 7,000 tokens if either model is from the GPT-5 family, even if the primary provider supports 100,000 tokens. Expected behavior is to truncate based on the specific model used for the request.
  • Impact: Users of high-context models (like gpt-4o) will experience severe and unnecessary context truncation (losing up to 93,000 tokens of context) because the default configuration for the unused provider includes a GPT-5 model.

Code with bug

public String truncatePromptForCompletion(String prompt) {
    if (prompt == null || prompt.isEmpty()) {
        return prompt;
    }

    String openaiModelId = normalizedModelId(false);
    String githubModelId = normalizedModelId(true);
    // BUG 🔴 Aggregates family check across BOTH configured models
    boolean gpt5Family = isGpt5Family(openaiModelId) || isGpt5Family(githubModelId);
    boolean reasoningModel = gpt5Family
            || canonicalModelName(openaiModelId).startsWith("o")
            || canonicalModelName(githubModelId).startsWith("o");

    // BUG 🔴 Uses the most restrictive limit if ANY reasoning/gpt5 model is configured
    int tokenLimit = reasoningModel ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
    String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);

    if (truncatedPrompt.length() < prompt.length()) {
        // BUG 🔴 Might show GPT-5 notice even when using a non-GPT-5 model
        String truncationNotice = gpt5Family ? TRUNCATION_NOTICE_GPT5 : TRUNCATION_NOTICE_GENERIC;
        return truncationNotice + truncatedPrompt;
    }

    return prompt;
}

Evidence

  1. Reproduction Test: A test case was created where OPENAI_MODEL was set to gpt-4o (high context) and GITHUB_MODELS_CHAT_MODEL was left as default (openai/gpt-5, low context). Despite using gpt-4o for the completion, the prompt was truncated to 7,000 tokens and prepended with a GPT-5 truncation notice.
  2. Default Values: DEFAULT_GITHUB_MODELS_MODEL is openai/gpt-5. This means that by default, gpt5Family will always be true in truncatePromptForCompletion, forcing a 7,000 token limit on all completion requests across the entire application unless both providers are explicitly reconfigured.
  3. Contrast with Streaming: The prepareStreamingRequest method correctly identifies the tokenLimit based on the provided ApiProvider, ensuring that truncation is correctly scoped to the model actually being used.

Why has this bug gone undetected?

  1. Subtle Truncation: Truncation often goes unnoticed in chat applications unless the input is exceptionally long or the user is specifically monitoring token usage or context retention.
  2. Confusing Notice: The truncation notice [Context truncated due to GPT-5 8K input limit] might be dismissed by users as a general limitation of the "system" or they might assume the backend is using a different model than they expected.
  3. Mock Environment: The use of hypothetical "gpt-5" models as defaults suggests this part of the codebase might be tested primarily against these models, where the bug's effect (restricting to 7k) matches the model's actual limit.

Recommended fix

Refactor truncatePromptForCompletion to accept the ApiProvider as a parameter, or move the truncation logic into buildCompletionRequest where the provider is already known.

// Recommended fix in buildCompletionRequest 🟢
public ResponseCreateParams buildCompletionRequest(
        String prompt, double temperature, RateLimitService.ApiProvider provider) {
    boolean useGitHubModels = provider == RateLimitService.ApiProvider.GITHUB_MODELS;
    String modelId = normalizedModelId(useGitHubModels);
    
    // Truncate based on the resolved modelId here 🟢
    int tokenLimit = (isGpt5Family(modelId) || canonicalModelName(modelId).startsWith("o")) 
        ? MAX_TOKENS_GPT5_INPUT : MAX_TOKENS_DEFAULT_INPUT;
    String truncatedPrompt = chunker.keepLastTokens(prompt, tokenLimit);
    
    return buildResponseParams(truncatedPrompt, temperature, modelId);
}

Related bugs

  • In buildResponseParams, reasoning effort and maxOutputTokens are only applied if gpt5Family is true. However, "o" models (like o1-preview) are also identified as reasoningModel but do not get these settings applied, even though they support them.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdetail

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions