Skip to content

Improve reasoning and tool-call parsers#4468

Open
lvhan028 wants to merge 9 commits intoInternLM:mainfrom
lvhan028:improve-parsers
Open

Improve reasoning and tool-call parsers#4468
lvhan028 wants to merge 9 commits intoInternLM:mainfrom
lvhan028:improve-parsers

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily receiving feedbacks. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

  1. Pre-commit or other linting tools are used to fix the potential lint issues.
  2. The modification is covered by complete unit tests. If not, please add more unit tests to ensure the correctness.
  3. If the modification has a dependency on downstream projects of a newer version, this PR should be tested with all supported versions of downstream projects.
  4. The documentation has been modified accordingly, like docstring or example tutorials.

Copilot AI review requested due to automatic review settings March 26, 2026 04:02
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors streaming reasoning/tool-call parsing by centralizing “previous/current text + token ids” into a shared, request-attached streaming state and updates the OpenAI API server + parsers to use the new streaming interfaces.

Changes:

  • Introduces StreamingParserState + get_streaming_state(request) and a shared ThinkingReasoningParser base for <think>...</think>-style models.
  • Simplifies streaming parser method signatures (tool + reasoning) to accept deltas only and read previous/current context from request-attached state.
  • Renames/reshuffles parser modules (*_tool_parser.py, qwen_reasoning_parser.py) and updates imports accordingly.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/test_lmdeploy/test_qwen3coder_parser.py Updates tool parser import path (but streaming call sites still need updating to new API).
tests/test_lmdeploy/test_qwen3_parser.py Updates reasoning/tool parser import paths (but streaming call sites still need updating to new API).
lmdeploy/serve/openai/tool_parser/tool_parser.py Updates abstract streaming tool parser signature + docs to rely on request-attached streaming state.
lmdeploy/serve/openai/tool_parser/qwen3coder_tool_parser.py Adapts Qwen3Coder streaming parsing to read current_text via get_streaming_state.
lmdeploy/serve/openai/tool_parser/qwen3_tool_parser.py Adapts Qwen3 streaming parsing to read current_text via get_streaming_state.
lmdeploy/serve/openai/tool_parser/qwen2d5_tool_parser.py Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/llama3_tool_parser.py Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/internlm2_tool_parser.py Adapts streaming parsing to shared state (but still uses parser-instance state).
lmdeploy/serve/openai/tool_parser/init.py Updates exports to new *_tool_parser.py module names.
lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py Adds shared streaming state + ThinkingReasoningParser base; updates reasoning parser streaming API.
lmdeploy/serve/openai/reasoning_parser/qwen_reasoning_parser.py New Qwen QwQ / Intern-S1 reasoning parser based on ThinkingReasoningParser.
lmdeploy/serve/openai/reasoning_parser/qwen_qwq_reasoning_parser.py Removes the prior Qwen QwQ reasoning parser implementation.
lmdeploy/serve/openai/reasoning_parser/deepseek_r1_reasoning_parser.py Refactors DeepSeek-R1 reasoning parser onto ThinkingReasoningParser.
lmdeploy/serve/openai/reasoning_parser/init.py Updates exports to include new shared state utilities + new module path.
lmdeploy/serve/openai/api_server.py Uses StreamingParserState in the streaming loop and calls updated parser streaming APIs.
Comments suppressed due to low confidence (3)

lmdeploy/serve/openai/tool_parser/qwen2d5_tool_parser.py:52

  • Qwen2d5ToolParser stores streaming parse state (self.position, self.current_tool_id, self.current_tool_name_sent, etc.) on the parser instance, but the server uses a single global parser instance across requests. This is not safe under concurrent streaming requests and can lead to mixed tool-call parsing. Please move per-request state to the request object (e.g., request._tool_parser_state) or another request-scoped structure.
    lmdeploy/serve/openai/tool_parser/llama3_tool_parser.py:83
  • Llama3JsonToolParser relies on instance attributes (self.current_tool_id, self.current_tool_name_sent, self.prev_tool_call_arr, etc.) for streaming state. Since the API server stores a single global parser instance, concurrent streaming requests can race and corrupt this state. Please migrate the streaming state to be request-scoped (e.g., attach a parser-state object to request).
    lmdeploy/serve/openai/tool_parser/internlm2_tool_parser.py:56
  • Internlm2ToolParser keeps streaming parse state (self.position, self.current_tool_id, self.current_tool_name_sent, etc.) on the parser instance, but VariableInterface.tool_parser is a single global instance shared across requests. With concurrent streaming requests, this state can interleave and corrupt parsing. Consider moving per-request state onto the request object (e.g., request._tool_parser_state) similar to Qwen3ToolParser, or into the shared StreamingParserState.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

"""Base class for reasoning parsers that use <think>...</think> style tags.

Subclasses only need to set `start_token`, `end_token`, and optionally
override `strip_newlines` and `on_missing_start_tag` to customize behavior.
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring mentions overriding on_missing_start_tag, but the actual configurable attribute is on_missing_end_tag. Please fix the docstring to avoid confusing subclass authors.

Suggested change
override `strip_newlines` and `on_missing_start_tag` to customize behavior.
override `strip_newlines` and `on_missing_end_tag` to customize behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +14
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseChoice,
ChatCompletionResponseStreamChoice, ChatCompletionStreamResponse,
ChatMessage, DeltaMessage, DeltaToolCall, UsageInfo)
from lmdeploy.serve.openai.reasoning_parser.qwen_qwq_reasoning_parser import QwenQwQReasoningParser
from lmdeploy.serve.openai.tool_parser.qwen3_parser import Qwen3ToolParser
from lmdeploy.serve.openai.reasoning_parser.qwen_reasoning_parser import QwenQwQReasoningParser
from lmdeploy.serve.openai.tool_parser.qwen3_tool_parser import Qwen3ToolParser
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file still calls extract_tool_calls_streaming(...) / extract_reasoning_content_streaming(...) with the removed previous_text/current_text and previous_token_ids/current_token_ids parameters (see _chat_completion_v1 below). Update the test harness to match the new parser API (and to update/step the shared StreamingParserState via get_streaming_state(request) similarly to api_server.py).

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +13
@@ -10,7 +10,7 @@
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest, ChatCompletionResponse, ChatCompletionResponseChoice,
ChatCompletionResponseStreamChoice, ChatCompletionStreamResponse,
ChatMessage, DeltaMessage, DeltaToolCall, UsageInfo)
from lmdeploy.serve.openai.tool_parser.qwen3coder_parser import Qwen3CoderToolParser
from lmdeploy.serve.openai.tool_parser.qwen3coder_tool_parser import Qwen3CoderToolParser
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test file still calls extract_tool_calls_streaming(...) / extract_reasoning_content_streaming(...) using the old signature with previous_text/current_text and token id args (see _chat_completion_v1 below). Update the test harness to the new streaming parser API and maintain the shared StreamingParserState on the request (as api_server.py now does).

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +23
previous_text: str = ''
current_text: str = ''
previous_token_ids: list[int] = []
current_token_ids: list[int] = []
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingParserState uses mutable list defaults (previous_token_ids / current_token_ids). In a dataclass these lists are shared across all instances, which can leak token ids between requests. Use field(default_factory=list) for these fields.

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +34
def step(self) -> None:
"""Advance: copy current -> previous (call at end of each iteration)."""
self.previous_text = self.current_text
self.previous_token_ids = self.current_token_ids

Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamingParserState.step() assigns previous_token_ids = current_token_ids, which aliases the same list. Because update() mutates current_token_ids in place, previous_token_ids will also change, breaking any logic that compares previous vs current tokens (and can cause incorrect detection of start/end tags). Snapshot via .copy() (or store lengths / use an immutable sequence).

Copilot uses AI. Check for mistakes.
Comment on lines 80 to 92
def extract_reasoning_content(self, model_output: str, request: ChatCompletionRequest, **kwargs) -> tuple[str, str]:
"""Extract reasoning content from a complete model-generated string.

Used for non-streaming responses where we have the entire model response
available before sending to the client.

Args:
model_output (str): The model-generated string to extract reasoning content from.
request (ChatCompletionRequest): he request object that was used to generate the model_output.
model_output: The model-generated string to extract reasoning content from.
request: The request object that was used to generate the model_output.

Returns:
reasoning_content (str | None): The reasoning content.
final_output (str | None): The content.
A tuple of (reasoning_content, final_output). Either may be None.
"""
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type annotation for extract_reasoning_content is tuple[str, str], but the docstring and implementations return None for either element (e.g., ThinkingReasoningParser.extract_reasoning_content returns (None, model_output) and vice versa). Update the annotation to allow None (e.g., tuple[str | None, str | None]) to match behavior.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants