Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
db3f184
improve reasoning parser
lvhan028 Mar 10, 2026
1572900
rename file
lvhan028 Mar 10, 2026
b895d53
minor fix
lvhan028 Mar 10, 2026
e8274c3
merge main
lvhan028 Mar 26, 2026
35b404c
refactor
lvhan028 Mar 26, 2026
c516394
update deepseek reasoning parser ut
lvhan028 Mar 26, 2026
dea23e0
Merge branch 'main' into improve-parsers
lvhan028 Mar 26, 2026
390d2ed
Merge branch 'main' into improve-parsers
lvhan028 Mar 27, 2026
d3eb973
agent's first refactor version
lvhan028 Mar 30, 2026
bc0502e
agent's 2nd refactor version
lvhan028 Apr 1, 2026
904490d
agent's 3rd refactor version
lvhan028 Apr 1, 2026
92eb62c
the 4-th version
lvhan028 Apr 1, 2026
754cf55
the 4-th version
lvhan028 Apr 1, 2026
39ca371
fix
lvhan028 Apr 1, 2026
525eb87
type hint
lvhan028 Apr 1, 2026
dd1280b
remove unused code
lvhan028 Apr 1, 2026
d028118
fix
lvhan028 Apr 2, 2026
f82998a
rename file test_qwen3_parser.py
lvhan028 Apr 2, 2026
47b3a68
Merge branch 'main' into improve-parsers
lvhan028 Apr 2, 2026
2f8208f
update qwen3.5 parsers tc
lvhan028 Apr 2, 2026
ee2f752
fix dump tools
lvhan028 Apr 2, 2026
2d6ffee
update exception
lvhan028 Apr 2, 2026
da3e868
reorg
lvhan028 Apr 3, 2026
8678cab
gpt-oss
lvhan028 Apr 4, 2026
c785a91
parser -> parsers
lvhan028 Apr 9, 2026
86631d2
merge main
lvhan028 Apr 15, 2026
4b55122
fix gpt-oss parser
lvhan028 Apr 15, 2026
f299c67
fix
lvhan028 Apr 15, 2026
2e05aa3
fix intern-s1 tool parser
lvhan028 Apr 15, 2026
0a58a51
fix intern-s1 parsers
lvhan028 Apr 15, 2026
cfc65a9
fix qwen3coder parser
lvhan028 Apr 17, 2026
d02562c
take schema into consideration
lvhan028 Apr 17, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 37 additions & 50 deletions docs/en/llm/api_server_reasoning.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Reasoning Outputs

For models that support reasoning capabilities, such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), LMDeploy supports parsing the reasoning results in the service and separately records the reasoning content using `reasoning_content`.
For models that support reasoning capabilities, such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), LMDeploy can parse reasoning outputs on the server side and expose them via `reasoning_content`.

## Examples

### DeepSeek R1

We can start the DeepSeek R1 model's api_server service just like launching other models. The difference is that we need to specify --reasoning-parser\` parameter.
We can start DeepSeek R1's `api_server` like other models, but we need to specify the `--reasoning-parser` argument.

```
lmdeploy serve api_server deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek-r1
Expand Down Expand Up @@ -44,62 +44,49 @@ print("content:", content)

## Custom parser

You only need to add a similar parser class in `lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py`.
Built-in reasoning parser names include:

```python
# import the required packages
from typing import Sequence, Union, Tuple, Optional
- `qwen-qwq`
- `qwen3`
- `intern-s1`
- `deepseek-r1`
- `deepseek-v3`
- `gpt-oss`

### Notes

- `deepseek-v3`: starts in reasoning mode only when `enable_thinking=True`.
When `enable_thinking` is `None` (default), output is usually plain content without a reasoning segment.
- `gpt-oss`: parses OpenAI Harmony channels:
- `final` -> `content`
- `analysis` -> `reasoning_content`
- `commentary` with `functions.*` recipient -> `tool_calls`

### Add a custom parser

Add a parser class under `lmdeploy/serve/openai/reasoning_parser/` and register it with `ReasoningParserManager`.

```python
from lmdeploy.serve.openai.reasoning_parser import (
ReasoningParser, ReasoningParserManager)
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest,
DeltaMessage)
ReasoningParser, ReasoningParserManager
)

# define a reasoning parser and register it to lmdeploy
# the name list in register_module can be used
# in --reasoning-parser.
@ReasoningParserManager.register_module(["example"])
class ExampleParser(ReasoningParser):
def __init__(self, tokenizer: object):
super().__init__(tokenizer)

def extract_reasoning_content_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
) -> Union[DeltaMessage, None]:
"""
Instance method that should be implemented for extracting reasoning
from an incomplete response; for use when handling reasoning calls and
streaming. Has to be an instance method because it requires state -
the current tokens/diffs, but also the information about what has
previously been parsed and extracted (see constructor)
"""

def extract_reasoning_content(
self, model_output: str, request: ChatCompletionRequest
) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content from a complete model-generated string.

Used for non-streaming responses where we have the entire model response
available before sending to the client.

Args:
model_output (str): The model-generated string to extract reasoning content from.
request (ChatCompletionRequest): he request object that was used to generate the model_output.

Returns:
reasoning_content (str | None): The reasoning content.
final_output (str | None): The content.
"""
def __init__(self, tokenizer: object, **kwargs):
super().__init__(tokenizer, **kwargs)

def get_reasoning_open_tag(self) -> str | None:
return "<think>"

def get_reasoning_close_tag(self) -> str | None:
return "</think>"

def starts_in_reasoning_mode(self) -> bool:
return True
```

Similarly, the command to start the service becomes:
Then start the service with:

```
lmdeploy serve api_server $model_path --reasoning-parser example
Expand Down
89 changes: 37 additions & 52 deletions docs/zh_cn/llm/api_server_reasoning.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# Reasoning Outputs

对于支持推理能力的模型,比如 [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),LMDeploy 支持在服务中将推理的结果解析出来,并单独用
reasoning_content 记录推理内容。
对于支持推理能力的模型,比如 [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),LMDeploy 支持在服务端解析推理结果,并通过 `reasoning_content` 单独返回推理内容。

## 使用示例

### DeepSeek R1

我们可以像启动其他模型的 api_server 服务一样启动 DeepSeek R1 的模型,只是不同的是,我们需要指定 `--reasoning-parser`。
在 `--reasoning-parser` 传参里,我们需要指定具体的 parser。
我们可以像启动其他模型一样启动 DeepSeek R1 的 `api_server`,但需要额外指定 `--reasoning-parser` 参数。

```
lmdeploy serve api_server deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek-r1
Expand Down Expand Up @@ -46,62 +44,49 @@ print("content:", content)

## 自定义 parser

只需要在 `lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py` 中添加一个类似的 parser 类即可。
内置的 reasoning parser 名称包括:

```python
# import the required packages
from typing import Sequence, Union, Tuple, Optional
- `qwen-qwq`
- `qwen3`
- `intern-s1`
- `deepseek-r1`
- `deepseek-v3`
- `gpt-oss`

### 说明

- `deepseek-v3`:仅当 `enable_thinking=True` 时,才会从推理模式开始解析。
当 `enable_thinking` 为 `None`(默认)时,通常不会出现推理段,输出为普通内容。
- `gpt-oss`:基于 OpenAI Harmony channel 解析:
- `final` -> `content`
- `analysis` -> `reasoning_content`
- `commentary` 且 `recipient` 为 `functions.*` -> `tool_calls`

### 添加自定义 parser

在 `lmdeploy/serve/openai/reasoning_parser/` 目录下新增 parser 类,并通过 `ReasoningParserManager` 注册。

```python
from lmdeploy.serve.openai.reasoning_parser import (
ReasoningParser, ReasoningParserManager)
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest,
DeltaMessage)
ReasoningParser, ReasoningParserManager
)

# define a reasoning parser and register it to lmdeploy
# the name list in register_module can be used
# in --reasoning-parser.
@ReasoningParserManager.register_module(["example"])
class ExampleParser(ReasoningParser):
def __init__(self, tokenizer: object):
super().__init__(tokenizer)

def extract_reasoning_content_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
) -> Union[DeltaMessage, None]:
"""
Instance method that should be implemented for extracting reasoning
from an incomplete response; for use when handling reasoning calls and
streaming. Has to be an instance method because it requires state -
the current tokens/diffs, but also the information about what has
previously been parsed and extracted (see constructor)
"""

def extract_reasoning_content(
self, model_output: str, request: ChatCompletionRequest
) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content from a complete model-generated string.

Used for non-streaming responses where we have the entire model response
available before sending to the client.

Args:
model_output (str): The model-generated string to extract reasoning content from.
request (ChatCompletionRequest): he request object that was used to generate the model_output.

Returns:
reasoning_content (str | None): The reasoning content.
final_output (str | None): The content.
"""
def __init__(self, tokenizer: object, **kwargs):
super().__init__(tokenizer, **kwargs)

def get_reasoning_open_tag(self) -> str | None:
return "<think>"

def get_reasoning_close_tag(self) -> str | None:
return "</think>"

def starts_in_reasoning_mode(self) -> bool:
return True
```

类似的,启动服务的命令就变成了
然后通过以下命令启动服务

```
lmdeploy serve api_server $model_path --reasoning-parser example
Expand Down
8 changes: 5 additions & 3 deletions lmdeploy/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,18 +462,20 @@ def chat_template(parser):
@staticmethod
def reasoning_parser(parser):
"""Add reasoning parser to parser."""
from lmdeploy.serve.openai.reasoning_parser import ReasoningParserManager
legacy_names = ['qwen-qwq', 'intern-s1', 'deepseek-r1']
from lmdeploy.serve.parsers.reasoning_parser import ReasoningParserManager
return parser.add_argument(
'--reasoning-parser',
type=str,
default=None,
help=f'The registered reasoning parser name from {ReasoningParserManager.module_dict.keys()}. '
help=f'The registered reasoning parser name: {ReasoningParserManager.module_dict.keys()}. '
f'Legacy names: {legacy_names}. '
'Default to None.')

@staticmethod
def tool_call_parser(parser):
"""Add tool call parser to parser."""
from lmdeploy.serve.openai.tool_parser import ToolParserManager
from lmdeploy.serve.parsers.tool_parser import ToolParserManager

return parser.add_argument(
'--tool-call-parser',
Expand Down
Loading
Loading