Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 37 additions & 50 deletions docs/en/llm/api_server_reasoning.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Reasoning Outputs

For models that support reasoning capabilities, such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), LMDeploy supports parsing the reasoning results in the service and separately records the reasoning content using `reasoning_content`.
For models that support reasoning capabilities, such as [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), LMDeploy can parse reasoning outputs on the server side and expose them via `reasoning_content`.

## Examples

### DeepSeek R1

We can start the DeepSeek R1 model's api_server service just like launching other models. The difference is that we need to specify --reasoning-parser\` parameter.
We can start DeepSeek R1's `api_server` like other models, but we need to specify the `--reasoning-parser` argument.

```
lmdeploy serve api_server deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek-r1
Expand Down Expand Up @@ -44,62 +44,49 @@ print("content:", content)

## Custom parser

You only need to add a similar parser class in `lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py`.
Built-in reasoning parser names include:

```python
# import the required packages
from typing import Sequence, Union, Tuple, Optional
- `qwen-qwq`
- `qwen3`
- `intern-s1`
- `deepseek-r1`
- `deepseek-v3`
- `gpt-oss`

### Notes

- `deepseek-v3`: starts in reasoning mode only when `enable_thinking=True`.
When `enable_thinking` is `None` (default), output is usually plain content without a reasoning segment.
- `gpt-oss`: parses OpenAI Harmony channels:
- `final` -> `content`
- `analysis` -> `reasoning_content`
- `commentary` with `functions.*` recipient -> `tool_calls`

### Add a custom parser

Add a parser class under `lmdeploy/serve/openai/reasoning_parser/` and register it with `ReasoningParserManager`.

```python
from lmdeploy.serve.openai.reasoning_parser import (
ReasoningParser, ReasoningParserManager)
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest,
DeltaMessage)
ReasoningParser, ReasoningParserManager
)

# define a reasoning parser and register it to lmdeploy
# the name list in register_module can be used
# in --reasoning-parser.
@ReasoningParserManager.register_module(["example"])
class ExampleParser(ReasoningParser):
def __init__(self, tokenizer: object):
super().__init__(tokenizer)

def extract_reasoning_content_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
) -> Union[DeltaMessage, None]:
"""
Instance method that should be implemented for extracting reasoning
from an incomplete response; for use when handling reasoning calls and
streaming. Has to be an instance method because it requires state -
the current tokens/diffs, but also the information about what has
previously been parsed and extracted (see constructor)
"""

def extract_reasoning_content(
self, model_output: str, request: ChatCompletionRequest
) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content from a complete model-generated string.

Used for non-streaming responses where we have the entire model response
available before sending to the client.

Args:
model_output (str): The model-generated string to extract reasoning content from.
request (ChatCompletionRequest): he request object that was used to generate the model_output.

Returns:
reasoning_content (str | None): The reasoning content.
final_output (str | None): The content.
"""
def __init__(self, tokenizer: object, **kwargs):
super().__init__(tokenizer, **kwargs)

def get_reasoning_open_tag(self) -> str | None:
return "<think>"

def get_reasoning_close_tag(self) -> str | None:
return "</think>"

def starts_in_reasoning_mode(self) -> bool:
return True
```

Similarly, the command to start the service becomes:
Then start the service with:

```
lmdeploy serve api_server $model_path --reasoning-parser example
Expand Down
89 changes: 37 additions & 52 deletions docs/zh_cn/llm/api_server_reasoning.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
# Reasoning Outputs

对于支持推理能力的模型,比如 [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),LMDeploy 支持在服务中将推理的结果解析出来,并单独用
reasoning_content 记录推理内容。
对于支持推理能力的模型,比如 [DeepSeek R1](https://huggingface.co/deepseek-ai/DeepSeek-R1),LMDeploy 支持在服务端解析推理结果,并通过 `reasoning_content` 单独返回推理内容。

## 使用示例

### DeepSeek R1

我们可以像启动其他模型的 api_server 服务一样启动 DeepSeek R1 的模型,只是不同的是,我们需要指定 `--reasoning-parser`。
在 `--reasoning-parser` 传参里,我们需要指定具体的 parser。
我们可以像启动其他模型一样启动 DeepSeek R1 的 `api_server`,但需要额外指定 `--reasoning-parser` 参数。

```
lmdeploy serve api_server deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --reasoning-parser deepseek-r1
Expand Down Expand Up @@ -46,62 +44,49 @@ print("content:", content)

## 自定义 parser

只需要在 `lmdeploy/serve/openai/reasoning_parser/reasoning_parser.py` 中添加一个类似的 parser 类即可。
内置的 reasoning parser 名称包括:

```python
# import the required packages
from typing import Sequence, Union, Tuple, Optional
- `qwen-qwq`
- `qwen3`
- `intern-s1`
- `deepseek-r1`
- `deepseek-v3`
- `gpt-oss`

### 说明

- `deepseek-v3`:仅当 `enable_thinking=True` 时,才会从推理模式开始解析。
当 `enable_thinking` 为 `None`(默认)时,通常不会出现推理段,输出为普通内容。
- `gpt-oss`:基于 OpenAI Harmony channel 解析:
- `final` -> `content`
- `analysis` -> `reasoning_content`
- `commentary` 且 `recipient` 为 `functions.*` -> `tool_calls`

### 添加自定义 parser

在 `lmdeploy/serve/openai/reasoning_parser/` 目录下新增 parser 类,并通过 `ReasoningParserManager` 注册。

```python
from lmdeploy.serve.openai.reasoning_parser import (
ReasoningParser, ReasoningParserManager)
from lmdeploy.serve.openai.protocol import (ChatCompletionRequest,
DeltaMessage)
ReasoningParser, ReasoningParserManager
)

# define a reasoning parser and register it to lmdeploy
# the name list in register_module can be used
# in --reasoning-parser.
@ReasoningParserManager.register_module(["example"])
class ExampleParser(ReasoningParser):
def __init__(self, tokenizer: object):
super().__init__(tokenizer)

def extract_reasoning_content_streaming(
self,
previous_text: str,
current_text: str,
delta_text: str,
previous_token_ids: Sequence[int],
current_token_ids: Sequence[int],
delta_token_ids: Sequence[int],
) -> Union[DeltaMessage, None]:
"""
Instance method that should be implemented for extracting reasoning
from an incomplete response; for use when handling reasoning calls and
streaming. Has to be an instance method because it requires state -
the current tokens/diffs, but also the information about what has
previously been parsed and extracted (see constructor)
"""

def extract_reasoning_content(
self, model_output: str, request: ChatCompletionRequest
) -> Tuple[Optional[str], Optional[str]]:
"""
Extract reasoning content from a complete model-generated string.

Used for non-streaming responses where we have the entire model response
available before sending to the client.

Args:
model_output (str): The model-generated string to extract reasoning content from.
request (ChatCompletionRequest): he request object that was used to generate the model_output.

Returns:
reasoning_content (str | None): The reasoning content.
final_output (str | None): The content.
"""
def __init__(self, tokenizer: object, **kwargs):
super().__init__(tokenizer, **kwargs)

def get_reasoning_open_tag(self) -> str | None:
return "<think>"

def get_reasoning_close_tag(self) -> str | None:
return "</think>"

def starts_in_reasoning_mode(self) -> bool:
return True
```

类似的,启动服务的命令就变成了
然后通过以下命令启动服务

```
lmdeploy serve api_server $model_path --reasoning-parser example
Expand Down
4 changes: 3 additions & 1 deletion lmdeploy/cli/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,12 +462,14 @@ def chat_template(parser):
@staticmethod
def reasoning_parser(parser):
"""Add reasoning parser to parser."""
legacy_names = ['qwen-qwq', 'intern-s1', 'deepseek-r1']
from lmdeploy.serve.openai.reasoning_parser import ReasoningParserManager
return parser.add_argument(
'--reasoning-parser',
type=str,
default=None,
help=f'The registered reasoning parser name from {ReasoningParserManager.module_dict.keys()}. '
help=f'The registered reasoning parser name: {ReasoningParserManager.module_dict.keys()}. '
f'Legacy names: {legacy_names}. '
'Default to None.')

@staticmethod
Expand Down
Loading
Loading