Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
File renamed without changes.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ LMDeploy is a toolkit for compressing, deploying, and serving LLM, developed by
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
<li>Qwen3-VL (2B - 235B)</li>
<li>Qwen3.5</li>
<li>Qwen3.5 (27B - 397B)</li>
<li>DeepSeek-VL (7B)</li>
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
Expand Down Expand Up @@ -228,7 +228,7 @@ The default prebuilt package is compiled on **CUDA 12** since v0.3.0.
For the GeForce RTX 50 series, please install the LMDeploy prebuilt package complied with **CUDA 12.8**

```shell
export LMDEPLOY_VERSION=0.12.1
export LMDEPLOY_VERSION=0.12.2
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
```
Expand Down
2 changes: 1 addition & 1 deletion README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ LMDeploy TurboMindエンジンは卓越した推論能力を持ち、さまざ
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
<li>Qwen3-VL (2B - 235B)</li>
<li>Qwen3.5</li>
<li>Qwen3.5 (27B - 397B)</li>
<li>DeepSeek-VL (7B)</li>
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
Expand Down
4 changes: 2 additions & 2 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ LMDeploy TurboMind 引擎拥有卓越的推理能力,在各种规模的模型
<li>Qwen2-VL (2B, 7B, 72B)</li>
<li>Qwen2.5-VL (3B, 7B, 72B)</li>
<li>Qwen3-VL (2B - 235B)</li>
<li>Qwen3.5</li>
<li>Qwen3.5 (27B - 397B)</li>
<li>DeepSeek-VL (7B)</li>
<li>DeepSeek-VL2 (3B, 16B, 27B)</li>
<li>InternVL-Chat (v1.1-v1.5)</li>
Expand Down Expand Up @@ -230,7 +230,7 @@ pip install lmdeploy
若使用 GeForce RTX 50 系列显卡,请安装基于 **CUDA 12.8** 编译的 LMDeploy 预编译包。

```shell
export LMDEPLOY_VERSION=0.12.1
export LMDEPLOY_VERSION=0.12.2
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu128-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu128
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
The default prebuilt package is compiled on **CUDA 12**. If CUDA 11+ (>=11.3) is required, you can install lmdeploy by:

```shell
export LMDEPLOY_VERSION=0.12.1
export LMDEPLOY_VERSION=0.12.2
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/en/quantization/llm_compressor.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ conda create -n lmdeploy python=3.10 -y
conda activate lmdeploy

# Install llm-compressor
pip install llm-compressor
pip install llmcompressor

# Clone lmdeploy source code and run the quantization example
git clone https://github.com/InternLM/lmdeploy
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/get_started/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ pip install lmdeploy
默认的预构建包是在 **CUDA 12** 上编译的。如果需要 CUDA 11+ (>=11.3),你可以使用以下命令安装 lmdeploy:

```shell
export LMDEPLOY_VERSION=0.12.1
export LMDEPLOY_VERSION=0.12.2
export PYTHON_VERSION=310
pip install https://github.com/InternLM/lmdeploy/releases/download/v${LMDEPLOY_VERSION}/lmdeploy-${LMDEPLOY_VERSION}+cu118-cp${PYTHON_VERSION}-cp${PYTHON_VERSION}-manylinux2014_x86_64.whl --extra-index-url https://download.pytorch.org/whl/cu118
```
Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/quantization/llm_compressor.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ conda create -n lmdeploy python=3.10 -y
conda activate lmdeploy

# 安装 llm-compressor
pip install llm-compressor
pip install llmcompressor

# 下载 lmdeploy 源码,运行量化用用例
git clone https://github.com/InternLM/lmdeploy
Expand Down
3 changes: 1 addition & 2 deletions lmdeploy/pytorch/engine/engine_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,8 +146,7 @@ def _log_resps(outputs: List[InferOutput]):
if logger.level <= logging.DEBUG:
session_ids = [out.session_id for out in outputs]
logger.debug(f'Response sessions: {session_ids}')
elif logger.level <= logging.INFO:
logger.info(f'Response: num_outputs={len(outputs)}.')
logger.debug(f'Response: num_outputs={len(outputs)}.')

def _send_resp(self, out: InferOutput):
"""Send response."""
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/serve/core/async_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ async def safe_run(self, handle, session, **kwargs):
metrics_processor.increase_api_routed_requests()
yield generator
except (Exception, asyncio.CancelledError, GeneratorExit) as e: # noqa
logger.error(f'[safe_run] session {session.session_id} exception caught: {type(e).__name__} {e}')
logger.exception(f'[safe_run] session {session.session_id} exception caught: {e}')
await session.async_abort()
if self.backend == 'pytorch':
await handle.async_end(session.session_id)
Expand Down
2 changes: 1 addition & 1 deletion lmdeploy/version.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Copyright (c) OpenMMLab. All rights reserved.
from typing import Tuple

__version__ = '0.12.1'
__version__ = '0.12.2'
short_version = __version__


Expand Down
Loading