Skip to content

Comments

OpenAI compatibility: reasoning parser coverage and robust tool-call handling#411

Open
lesj0610 wants to merge 6 commits intotheroyallab:mainfrom
lesj0610:feat/oai-reasoning-toolcall-compat
Open

OpenAI compatibility: reasoning parser coverage and robust tool-call handling#411
lesj0610 wants to merge 6 commits intotheroyallab:mainfrom
lesj0610:feat/oai-reasoning-toolcall-compat

Conversation

@lesj0610
Copy link

@lesj0610 lesj0610 commented Feb 11, 2026

Summary

This PR improves OpenAI-compat behavior in tabbyAPI (ExLlamaV3 backend path), with a focus on correctness, maintainability, and parser/tool-call robustness.

Included changes

  • Added a vLLM-style reasoning parser framework and full parser registry under endpoints/OAI/reasoning/.
  • Added parser selection config (reasoning_parser) with default fallback (basic) when unspecified.
  • Integrated reasoning extraction into both non-streaming and streaming chat completion flows.
  • Preserved strict separation between reasoning / reasoning_content and content.
  • Hardened tool-call handling:
    • supports tool_choice: none | auto | required and named function choice,
    • normalizes model-emitted tool payload forms (including <tool_call>...</tool_call>),
    • aligns final response behavior with OpenAI-compatible tool_calls handling.
  • Added Jinja tojson compatibility for templates using tojson(ensure_ascii=False).
  • Added OpenWebUI compatibility for thinking toggles by mapping top-level enable_thinking / thinking into chat_template_kwargs (template_vars) during request validation.

Validation

  • compile checks on modified modules,
  • reasoning on/off and streaming checks on /v1/chat/completions,
  • tool-choice path checks (none, required, named function),
  • EXAONE template render path verification.

License and attribution

  • Added third-party notices for vLLM-derived parser code:
    • THIRD_PARTY_LICENSES.md
    • LICENSES/Apache-2.0.txt
  • Added SPDX headers to reasoning parser files where applicable.

Additional update (EXAONE4 parser hardening)

  • Reworked endpoints/OAI/reasoning/exaone4_reasoning_parser.py into a standalone parser (no runtime delegation to DeepSeekR1ReasoningParser / DeepSeekV3ReasoningParser).
  • Implemented EXAONE-specific behavior for enable_thinking on/off in both non-streaming and streaming flows.
  • Added coverage test: tests/exaone4_reasoning_parser_test.py (prefill/no-start-token streaming, end-token split, non-thinking content-only behavior).

Additional update (vLLM-style tool parser registry)

  • Added parser-key dispatch for hermes, llama/llama3_json/llama4_json, openai, pythonic, qwen3_coder, qwen3_xml, deepseek_v3, deepseek_v31, deepseek_v32
  • Added native second-pass generation mode for parser families that require non-JSON syntax
  • Added parser alias normalization (llama -> llama3_json) and native-generation decision helpers
  • Added focused tests for parser mapping, parser-key dispatch precedence, and fallback behavior
  • Head commit: b71647a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant