[AsyncEngine Refactor 3/N] Introduce Session and SessionManager by lvhan028 · Pull Request #4253 · InternLM/lmdeploy

lvhan028 · 2026-01-05T02:56:57Z

Motivation

Refactor AsynEngine for improving the maintainance.

Modification

Decouple SessionManager&Session, InstanceManager and AsyncEngine
Move multimodal data preprocessing from VLAsyncEngine to MultiModalProcessor
Move the sync API from AsyncEngine to a new class Pipeline

BC-breaking (Optional)

Copilot

Pull request overview

This PR introduces SessionManager and InferInstManager as part of an AsyncEngine refactoring effort (3/N). The purpose is to better organize session and inference instance lifecycle management by extracting these concerns into dedicated manager classes.

Key Changes:

Introduces SessionManager and InferInstManager with singleton pattern for managing sessions and inference instances
Refactors AsyncEngine to delegate session and instance management to the new managers
Creates a new Pipeline class to wrap AsyncEngine and provide a user-facing API
Deprecates the serve() function in api.py

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 33 comments.

Show a summary per file

File	Description
lmdeploy/serve/session_manager.py	New SessionManager and Session classes for session lifecycle management
lmdeploy/serve/inst_manager.py	New InferInstManager for managing inference instance pool
lmdeploy/serve/exceptions.py	New SafeRunException for error handling
lmdeploy/serve/utils.py	New singleton decorator utility
lmdeploy/serve/async_engine.py	Refactored to use SessionManager and InferInstManager
lmdeploy/serve/openai/api_server.py	Updated session ID handling and request validation
lmdeploy/serve/openai/serving_*.py	Added session_id validation in check_request functions
lmdeploy/pipeline.py	New Pipeline class wrapping AsyncEngine
lmdeploy/api.py	Refactored to use Pipeline, deprecated serve()
lmdeploy/cli/chat.py	Updated to use new session management APIs
lmdeploy/archs.py	Updated autoget_backend_config to return tuple
lmdeploy/pytorch/kernels/cuda/fused_moe_ep.py	Memory optimization using tensor views instead of allocations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmdeploy/pipeline.py

lmdeploy/serve/managers/instance_manager.py

lmdeploy/serve/async_engine.py

lmdeploy/serve/openai/api_server.py

lmdeploy/serve/async_engine.py

lmdeploy/pipeline.py

lmdeploy/serve/session_manager.py

Copilot · 2026-01-08T12:04:54Z

lmdeploy/serve/session_manager.py

+                    # Try to get the inference instance if it was already retrieved before cancellation
+                    try:
+                        await get_task
+                    except asyncio.CancelledError:


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

except asyncio.CancelledError:

except asyncio.CancelledError:

# The get_task was cancelled as part of abort handling; no further action is required.

…mdeploy into refactor-async-engine

lvhan028 · 2026-01-14T04:14:44Z

Conflict with #4267
Remember to fix the guided_decoding UT after #4267 is merged.

lmdeploy/serve/utils/decorators.py

irexyc · 2026-01-20T08:46:13Z

lmdeploy/serve/core/async_engine.py

-            await generator.async_cancel(session_id)
-            logger.info(f'session {session_id} stopped')
-        # else it's not running at all
+        await self.session_mgr.async_abort(session_id)


self.session_mgr.async_abort takes Session as input.

irexyc · 2026-01-20T09:11:35Z

lmdeploy/pipeline.py

+                                           adapter_name=adapter_name,
+                                           stream_response=stream_response,
+                                           **kwargs)
+        return self.async_engine._infer(requests, multiplex=True)


The function async_engine._infer should be moved to Pipeline

I'd like to postpone moving async_engine._infer to Pipeline for the following reasons:

It is tightly coupled with _EventLoopThread, which is also defined within AsyncEngine.

If we also move _EventLoopThread to Pipeline, any component of LMDeploy or third-party code that creates an AsyncEngine instance would need to explicitly call start_loop. This would introduce a breaking change (BC).

lmdeploy/serve/core/async_engine.py

lvhan028 added 7 commits December 23, 2025 16:42

add singleton

54db20a

init

bb7ad10

remove useless code

8af17e4

merge main

fdeb371

merge main

3c2100b

implement Pipeline

199d8bc

fix when test llm pipeline

29eef8e

Copilot AI review requested due to automatic review settings January 8, 2026 11:51

Copilot started reviewing on behalf of lvhan028 January 8, 2026 11:52 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

lvhan028 added 14 commits January 8, 2026 21:39

fix chat

87563dc

merge main

5830872

fix chat

4932a28

fix profile pipeline

d9e7380

fix acquire_inst

ca497b1

fix get_ppl

8233746

fix

5aaf383

use inst.async_end

a3c721f

fix session's async_end

b887433

Merge branch 'main' into refactor-async-engine

e24f3f2

fix max_new_tokens

0637b97

Merge branch 'refactor-async-engine' of https://github.com/lvhan028/l…

3fb6bdc

…mdeploy into refactor-async-engine

rename session_mgr.create to session_mgr.get

730ac66

fix step

6a740b9

lvhan028 added the improvement label Jan 13, 2026

lvhan028 added 4 commits January 13, 2026 12:11

rollback fused_moe_ep

e621a00

improve singleton

6193ff4

fix processing multimodal data

71ed1b3

Merge branch 'refactor-async-engine' of https://github.com/lvhan028/l…

bd6f437

…mdeploy into refactor-async-engine

lvhan028 added 15 commits January 14, 2026 21:40

make managers package including inst_manager and session_manager

2e99627

make processors package including multimodal_processor

9ab9e9e

move async_engine and vl_async_engine to serve/core package

c12417c

about utils

3fe1755

docs

96cd73e

docs

3976029

fix BC

1127221

mark serve and client APIs are not available

5310c28

rename inst to handle

7ee4265

enhance comments

8a03fdf

fix according to lzhangzz comment

6932a9f

fix according to lzhangzz comments

4196e5f

following python3.10+ type hint spec

a1321c7

Merge branch 'main' into refactor-async-engine

daad15d

apply_session_id to get_session_id

222afb5

irexyc reviewed Jan 20, 2026

View reviewed changes

lvhan028 changed the title ~~[AsyncEngine Refactor 3/N] Introduce SessionManager and InstManager~~ [AsyncEngine Refactor 3/N] Introduce SessionManager and Session Jan 21, 2026

lvhan028 changed the title ~~[AsyncEngine Refactor 3/N] Introduce SessionManager and Session~~ [AsyncEngine Refactor 3/N] Introduce Session and SessionManager Jan 21, 2026

lvhan028 added 4 commits January 21, 2026 18:15

refactor session and sessionmanager

41ac0fe

update api_server

99eb95c

Merge branch 'main' into refactor-async-engine

9487486

move EventThread from async_engine to pipeline

7457e77

lvhan028 force-pushed the refactor-async-engine branch from a9c9590 to 7457e77 Compare January 27, 2026 06:21

lvhan028 added 3 commits January 27, 2026 15:04

apply lazy import to resolve the readthedocs issue

a00d1d9

Merge branch 'main' into refactor-async-engine

33b12f9

call self._handle.async_end when SafeRunException happens

b50914e

lzhangzz approved these changes Jan 28, 2026

View reviewed changes

lvhan028 added 2 commits January 28, 2026 21:23

Merge branch 'main' into refactor-async-engine

beb6b91

fix get_reward_score

8ae2ada

lvhan028 merged commit a5d7e4a into InternLM:main Jan 28, 2026
4 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AsyncEngine Refactor 3/N] Introduce Session and SessionManager#4253

[AsyncEngine Refactor 3/N] Introduce Session and SessionManager#4253
lvhan028 merged 53 commits intoInternLM:mainfrom
lvhan028:refactor-async-engine

lvhan028 commented Jan 5, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

lvhan028 commented Jan 14, 2026

Uh oh!

Uh oh!

irexyc Jan 20, 2026

Uh oh!

irexyc Jan 20, 2026

Uh oh!

lvhan028 Jan 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	except asyncio.CancelledError:
	except asyncio.CancelledError:
	# The get_task was cancelled as part of abort handling; no further action is required.

Conversation

lvhan028 commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modification

BC-breaking (Optional)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

lvhan028 commented Jan 14, 2026

Uh oh!

Uh oh!

irexyc Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

irexyc Jan 20, 2026

Choose a reason for hiding this comment

Uh oh!

lvhan028 Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lvhan028 commented Jan 5, 2026 •

edited

Loading