[AsyncEngine Refactor 3/N] Introduce Session and SessionManager#4253
[AsyncEngine Refactor 3/N] Introduce Session and SessionManager#4253lvhan028 merged 53 commits intoInternLM:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces SessionManager and InferInstManager as part of an AsyncEngine refactoring effort (3/N). The purpose is to better organize session and inference instance lifecycle management by extracting these concerns into dedicated manager classes.
Key Changes:
- Introduces SessionManager and InferInstManager with singleton pattern for managing sessions and inference instances
- Refactors AsyncEngine to delegate session and instance management to the new managers
- Creates a new Pipeline class to wrap AsyncEngine and provide a user-facing API
- Deprecates the
serve()function in api.py
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 33 comments.
Show a summary per file
| File | Description |
|---|---|
| lmdeploy/serve/session_manager.py | New SessionManager and Session classes for session lifecycle management |
| lmdeploy/serve/inst_manager.py | New InferInstManager for managing inference instance pool |
| lmdeploy/serve/exceptions.py | New SafeRunException for error handling |
| lmdeploy/serve/utils.py | New singleton decorator utility |
| lmdeploy/serve/async_engine.py | Refactored to use SessionManager and InferInstManager |
| lmdeploy/serve/openai/api_server.py | Updated session ID handling and request validation |
| lmdeploy/serve/openai/serving_*.py | Added session_id validation in check_request functions |
| lmdeploy/pipeline.py | New Pipeline class wrapping AsyncEngine |
| lmdeploy/api.py | Refactored to use Pipeline, deprecated serve() |
| lmdeploy/cli/chat.py | Updated to use new session management APIs |
| lmdeploy/archs.py | Updated autoget_backend_config to return tuple |
| lmdeploy/pytorch/kernels/cuda/fused_moe_ep.py | Memory optimization using tensor views instead of allocations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lmdeploy/serve/session_manager.py
Outdated
| # Try to get the inference instance if it was already retrieved before cancellation | ||
| try: | ||
| await get_task | ||
| except asyncio.CancelledError: |
There was a problem hiding this comment.
'except' clause does nothing but pass and there is no explanatory comment.
| except asyncio.CancelledError: | |
| except asyncio.CancelledError: | |
| # The get_task was cancelled as part of abort handling; no further action is required. |
…mdeploy into refactor-async-engine
lmdeploy/serve/core/async_engine.py
Outdated
| await generator.async_cancel(session_id) | ||
| logger.info(f'session {session_id} stopped') | ||
| # else it's not running at all | ||
| await self.session_mgr.async_abort(session_id) |
There was a problem hiding this comment.
self.session_mgr.async_abort takes Session as input.
lmdeploy/pipeline.py
Outdated
| adapter_name=adapter_name, | ||
| stream_response=stream_response, | ||
| **kwargs) | ||
| return self.async_engine._infer(requests, multiplex=True) |
There was a problem hiding this comment.
The function async_engine._infer should be moved to Pipeline
There was a problem hiding this comment.
I'd like to postpone moving async_engine._infer to Pipeline for the following reasons:
- It is tightly coupled with
_EventLoopThread, which is also defined withinAsyncEngine. - If we also move
_EventLoopThreadtoPipeline, any component of LMDeploy or third-party code that creates anAsyncEngineinstance would need to explicitly callstart_loop. This would introduce a breaking change (BC).
a9c9590 to
7457e77
Compare
Motivation
Refactor AsynEngine for improving the maintainance.
Modification
BC-breaking (Optional)