Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added docs/resources/deepresearch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/resources/deepresearch_beta.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
262 changes: 262 additions & 0 deletions docs/source/Instruction/深度研究.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
# 深度研究

MS-Agent的DeepResearch项目提供了具备复杂任务解决能力的Agent工作流,用于面向科研等领域生成深度的多模态调研报告。目前提供两个版本,分别用于支持轻量、高效的低成本调研和深度、全面的大规模调研。

## 原理介绍

### 基础版本

基础版本支持的核心特性如下:

- **自动探索**:针对不同方向的复杂问题进行自动探索与分析
- **多模态**:支持处理不同的数据模态并抽取原始图表信息,生成图文并茂的研究报告
- **轻量高效**:采用"search-then-execute"模式执行任务,仅需较低的token消耗和几分钟时间即可完成,支持使用Ray加速文档解析

获取输入query与必要的配置参数后,执行流程如下图所示:

![深度研究基础版本流程图](../../resources/deepresearch.png)

工作流按照以下步骤执行:

- **输入处理**:接收用户query、搜索引擎配置等信息并完成初始化
- **搜索与解析**:改写用户query并搜索,基于层次化关键信息提取策略抽取包含图表等信息的核心chunk,清理、保留多模态上下文
- **报告生成**:生成保留关键图表信息的多模态研究报告,支持以多种形式导出、上传到多个平台

### 扩展版本

在保留基础版本具备的多模态处理与生成能力的同时,扩展版本支持的核心特性如下:

- **意图澄清**:通过可选的human-feedback澄清用户意图,支持多种回复模式(报告/简答)
- **深度搜索**:递归优化搜索路径,扩大信息召回覆盖面,提高主题深入程度;根据用户预算与主题研究进度自动决定是否继续深入
- **上下文压缩**:支持长上下文压缩,在多轮搜索、解析大量文档的情况下仍能保持输出稳定

获取输入query、搜索预算和必要的配置参数,执行流程如下图所示:

![深度研究扩展版本流程图](../../resources/deepresearch_beta.png)

工作流按照以下步骤执行:

- **意图澄清**:接收用户问题,向用户提出主题相关的问题以明确研究方向,如果用户输入足够清晰则跳过此步骤
- **查询改写**:基于当前问题、探索历史(历史研究问题与结论)、搜索引擎类型生成搜索查询和研究目标
- **搜索与解析**:执行搜索、解析和信息提取,基于层次化关键信息提取策略抽取多模态上下文
- **上下文压缩**:基于抽取的多模态上下文生成信息密集的总结和后续待研究问题,保留图表与文本间的上下文关系
- **递归搜索**:重复上述过程,直到达到预期研究深度或无后续问题
- **报告生成**:整理搜索历史,根据用户需求生成多模态研究报告或简短回复

## 使用方式

### 安装

安装Deep Research项目需要遵循以下步骤:

```bash
# 源码安装
git clone https://github.com/modelscope/ms-agent.git
pip install -r requirements/research.txt

# PyPI (>=v1.1.0)安装
pip install 'ms-agent[research]'
```

### 启动

#### 环境配置

项目当前默认使用免费的**arXiv search**(无需API密钥即可使用)。如果希望使用更通用的搜索引擎,可以切换到**Exa**或**SerpApi**。

- 复制并编辑.env文件配置环境变量

```bash
cp .env.example .env

# 编辑`.env`文件,需要包含希望使用的搜索引擎的API密钥
# 使用Exa搜索配置如下(注册地址为https://exa.ai, 注册时赠送免费额度):
EXA_API_KEY=your_exa_api_key
# 使用SerpApi搜索配置如下(注册地址为https://serpapi.com, 每月赠送免费额度):
SERPAPI_API_KEY=your_serpapi_api_key

# 扩展版本配置说明:
# 扩展版本(ResearchWorkflowBeta)在查询改写阶段会使用效果更稳定的模型(如 gemini-2.5-flash)。
# 需要配置兼容OpenAI接口的端点(OPENAI_BASE_URL)和API密钥(OPENAI_API_KEY),并支持使用对应的模型。
# 如需更换模型,可在ResearchWorkflowBeta.generate_search_queries中修改模型名称。
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=https://your-openai-compatible-endpoint/v1
```

- 使用conf.yaml配置搜索引擎

```yaml
SEARCH_ENGINE:
engine: exa
exa_api_key: $EXA_API_KEY
```

#### 代码样例

- 基础版本快速启动代码如下:

```python
from ms_agent.llm.openai import OpenAIChat
from ms_agent.tools.search.search_base import SearchEngine
from ms_agent.tools.search_engine import get_web_search_tool
from ms_agent.workflow.deep_research.principle import MECEPrinciple
from ms_agent.workflow.deep_research.research_workflow import ResearchWorkflow


def run_workflow(user_prompt: str,
task_dir: str,
chat_client: OpenAIChat,
search_engine: SearchEngine,
reuse: bool,
use_ray: bool = False):
"""
Run the deep research workflow, which follows a lightweight and efficient pipeline.

Args:
user_prompt: The user prompt.
task_dir: The task directory where the research results will be saved.
chat_client: The chat client.
search_engine: The search engine.
reuse: Whether to reuse the previous research results.
use_ray: Whether to use Ray for document parsing/extraction.
"""

research_workflow = ResearchWorkflow(
client=chat_client,
principle=MECEPrinciple(),
search_engine=search_engine,
workdir=task_dir,
reuse=reuse,
use_ray=use_ray,
)

research_workflow.run(user_prompt=user_prompt)


if __name__ == '__main__':

query: str = 'Survey of the AI Agent within the recent 3 month, including the latest research papers, open-source projects, and industry applications.' # noqa
task_workdir: str = '/path/to/your_task_dir'
reuse: bool = False

# Get chat client OpenAI compatible api
# Free API Inference Calls - Every registered ModelScope user receives a set number of free API inference calls daily, refer to https://modelscope.cn/docs/model-service/API-Inference/intro for details. # noqa
"""
* `api_key` (str), your API key, replace `xxx-xxx` with your actual key. Alternatively, you can use ModelScope API key, refer to https://modelscope.cn/my/myaccesstoken # noqa
* `base_url`: (str), the base URL for API requests, `https://api-inference.modelscope.cn/v1/` for ModelScope API-Inference
* `model`: (str), the model ID for inference, `Qwen/Qwen3-235B-A22B-Instruct-2507` can be recommended for document research tasks.
"""
chat_client = OpenAIChat(
api_key='xxx-xxx',
base_url='https://api-inference.modelscope.cn/v1/',
model='Qwen/Qwen3-235B-A22B-Instruct-2507',
)

# Get web-search engine client
# Please specify your config file path, the default is `conf.yaml` in the current directory.
search_engine = get_web_search_tool(config_file='conf.yaml')

# Enable Ray with `use_ray=True` to speed up document parsing.
# It uses multiple CPU cores for faster processing,
# but also increases CPU usage and may cause temporary stutter on your machine.
run_workflow(
user_prompt=query,
task_dir=task_workdir,
reuse=reuse,
chat_client=chat_client,
search_engine=search_engine,
use_ray=False,
)
```

- 扩展版本快速启动代码如下:

```python
import asyncio

from ms_agent.llm.openai import OpenAIChat
from ms_agent.tools.search.search_base import SearchEngine
from ms_agent.tools.search_engine import get_web_search_tool
from ms_agent.workflow.deep_research.research_workflow_beta import ResearchWorkflowBeta


def run_deep_workflow(user_prompt: str,
task_dir: str,
chat_client: OpenAIChat,
search_engine: SearchEngine,
breadth: int = 4,
depth: int = 2,
is_report: bool = True,
show_progress: bool = True,
use_ray: bool = False):
"""
Run the expandable deep research workflow (beta version).
This version is more flexible and scalable than the original deep research workflow.

Args:
user_prompt: The user prompt.
task_dir: The task directory where the research results will be saved.
chat_client: The chat client.
search_engine: The search engine.
breadth: The number of search queries to generate per depth level.
In order to avoid the explosion of the search space,
we divide the breadth by 2 for each depth level.
depth: The maximum research depth.
is_report: Whether to generate a report.
show_progress: Whether to show the progress.
use_ray: Whether to use Ray for document parsing/extraction.
"""

research_workflow = ResearchWorkflowBeta(
client=chat_client,
search_engine=search_engine,
workdir=task_dir,
use_ray=use_ray,
enable_multimodal=True)

asyncio.run(
research_workflow.run(
user_prompt=user_prompt,
breadth=breadth,
depth=depth,
is_report=is_report,
show_progress=show_progress))


if __name__ == "__main__":

query: str = 'Survey of the AI Agent within the recent 3 month, including the latest research papers, open-source projects, and industry applications.' # noqa
task_workdir: str = '/path/to/your_workdir' # Specify your task work directory here

# Get chat client OpenAI compatible api
# Free API Inference Calls - Every registered ModelScope user receives a set number of free API inference calls daily, refer to https://modelscope.cn/docs/model-service/API-Inference/intro for details. # noqa
"""
* `api_key` (str), your API key, replace `xxx-xxx` with your actual key. Alternatively, you can use ModelScope API key, refer to https://modelscope.cn/my/myaccesstoken # noqa
* `base_url`: (str), the base URL for API requests, `https://api-inference.modelscope.cn/v1/` for ModelScope API-Inference
* `model`: (str), the model ID for inference, `Qwen/Qwen3-235B-A22B-Instruct-2507` can be recommended for document research tasks.
"""
chat_client = OpenAIChat(
api_key='xxx-xxx',
base_url='https://api-inference.modelscope.cn/v1/',
model='Qwen/Qwen3-235B-A22B-Instruct-2507',
generation_config={'extra_body': {
'enable_thinking': False
}})

# Get web-search engine client
# Please specify your config file path, the default is `conf.yaml` in the current directory.
search_engine = get_web_search_tool(config_file='conf.yaml')

# Enable Ray with `use_ray=True` to speed up document parsing.
# It uses multiple CPU cores for faster processing,
# but also increases CPU usage and may cause temporary stutter on your machine.
# Tip: combine use_ray=True with show_progress=True for a better experience.
run_deep_workflow(
user_prompt=query,
task_dir=task_workdir,
chat_client=chat_client,
search_engine=search_engine,
show_progress=True,
use_ray=False,
)
```