Skip to content

Improve health endpoint#4615

Merged
lvhan028 merged 6 commits into
InternLM:mainfrom
lvhan028:improve-health
May 28, 2026
Merged

Improve health endpoint#4615
lvhan028 merged 6 commits into
InternLM:mainfrom
lvhan028:improve-health

Conversation

@lvhan028
Copy link
Copy Markdown
Collaborator

@lvhan028 lvhan028 commented May 23, 2026

Improve the API server /health endpoint so it reflects inference engine health instead of only reporting that the HTTP server is alive.

This change adds backend health probing for both PyTorch and TurboMind engines. The API server now runs a background EngineHealthMonitor, caches the latest health snapshot, and
returns 503 when the inference backend is unhealthy while keeping 200 for healthy or sleeping engines.

The health probe uses a bounded, non-overlapping backend check and validates scheduler progress with a backend-owned monotonic scheduler_tick. This allows /health to detect cases
where requests have been dispatched but the backend scheduler stops making progress. Idle periods are handled separately so the backend is not marked unhealthy simply because there is
no active work.

Both engines expose scheduler_tick through schedule metrics, which is update in every inference iter. so health probing sees current sequence/block state.

Beside "scheduler_tick`, PyTorch engine health status now also checks engine loop/task liveness

Copilot AI review requested due to automatic review settings May 23, 2026 07:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the OpenAI server /health endpoint by adding an engine health monitor that actively probes backend liveness and detects scheduler stalls via a new monotonic scheduler_tick metric surfaced from both TurboMind (C++/pybind) and PyTorch backends.

Changes:

  • Add scheduler_tick to schedule metrics across TurboMind (C++ + Python binding) and PyTorch scheduler metrics.
  • Introduce EngineHealthMonitor + AsyncEngine.health_probe() and wire /health to return structured JSON with 200/503 based on engine status.
  • Add lightweight backend-specific get_health_status() implementations (TurboMind, PyTorch, mp engines) for the health probe.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/turbomind/utils/metrics.h Adds scheduler_tick field to TurboMind schedule metrics and prints it in the stream operator.
src/turbomind/python/bind.cpp Exposes scheduler_tick to Python via pybind ScheduleMetrics.
src/turbomind/engine/engine.cc Tracks scheduler_tick and adjusts schedule-metrics update/get logic (also initializes metrics after seq manager creation).
lmdeploy/turbomind/turbomind.py Propagates scheduler_tick into Python ScheduleMetrics and adds TurboMind get_health_status().
lmdeploy/serve/openai/api_server.py Switches /health to JSON output backed by EngineHealthMonitor; wires monitor into FastAPI lifespan.
lmdeploy/serve/managers/session_manager.py Adds num_dispatched to track checked-out request handles for stall detection logic.
lmdeploy/serve/core/health.py New EngineHealthMonitor background task that periodically probes engine health.
lmdeploy/serve/core/async_engine.py Adds bounded, non-overlapping health probing + scheduler progress validation.
lmdeploy/serve/core/__init__.py Exports EngineHealthMonitor.
lmdeploy/pytorch/paging/scheduler.py Adds scheduler_tick and includes it in schedule metrics.
lmdeploy/pytorch/engine/mp_engine/zmq_engine.py Adds health status check for ZMQ process liveness before probing.
lmdeploy/pytorch/engine/mp_engine/base.py Adds get_health_status() RPC wrapper.
lmdeploy/pytorch/engine/mp_engine/base_worker.py Adds RPC-exposed get_health_status() implementation.
lmdeploy/pytorch/engine/engine.py Adds PyTorch engine get_health_status() checking request/main loop task liveness.
lmdeploy/pytorch/engine/engine_loop.py Increments scheduler tick on each main-loop iteration.
lmdeploy/pytorch/engine/base.py Adds get_health_status() to the engine base interface.
lmdeploy/messages.py Adds scheduler_tick field to the Python ScheduleMetrics dataclass.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/turbomind/turbomind.py Outdated
Comment thread src/turbomind/engine/engine.cc Outdated
Comment thread lmdeploy/serve/core/health.py
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment thread lmdeploy/serve/core/async_engine.py Outdated
Comment thread lmdeploy/turbomind/turbomind.py
Comment on lines +611 to +617
@staticmethod
def _health_check_tasks(tasks):
done_tasks = []
for task in list(tasks):
if task.done():
done_tasks.append(task.get_name())
return len(done_tasks) == 0, done_tasks
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Comment on lines +305 to +320
self._health_probe_task = asyncio.create_task(self.engine.get_health_status(), name='EngineHealthProbe')
try:
backend_status = await asyncio.wait_for(asyncio.shield(self._health_probe_task), timeout=timeout)
except asyncio.TimeoutError:
return self._make_health_result(
status='unhealthy',
message=f'Backend health probe timed out after {timeout:.1f}s.',
)
except Exception as e:
self._health_probe_task = None
return self._make_health_result(
status='unhealthy',
message=f'Backend health probe failed: {e}',
)

self._health_probe_task = None
Comment thread src/turbomind/engine/engine.cc
total_blocks=tm_metrics.total_blocks,
active_blocks=tm_metrics.active_blocks,
free_blocks=tm_metrics.free_blocks)
free_blocks=tm_metrics.free_blocks,
@lvhan028 lvhan028 requested a review from grimoire May 25, 2026 09:45
Comment thread lmdeploy/pytorch/paging/scheduler.py
@lvhan028 lvhan028 merged commit 4dad4c9 into InternLM:main May 28, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants