-
Notifications
You must be signed in to change notification settings - Fork 404
Description
Motivation.
Currently, all CI workflows run on a single Ascend server node, which limits the maximum available NPU count to 8 cards.
But its impossible to test multi-node
scenarios witch maybe more close to real world use case.
This RFC aims to provide a solution that enables community developers to write test cases for multi-node vllm serving deployments.
Proposed Change.
Since the community CI runs on a kubernetes cluster, there are many out-of-box multi-node serving
solution, for example lws and in vllm project there's also an example for reference, the straightforward idea is to build by lws directly, here's a general plan:
- add a new workflow, which contains two jobs:
- job1: create a new lws instance which expose a vllm service with multi pods on seperatly node
- job2: wait the lws service is ready, then run the tests, job must cleanup the resource when tests finished
- add some guides to help developer how to setup more
multi-node
style test cases
since multi-node serving
may be time and NPU comsuming, It's best not to triggered by a PR
Feedback Period.
Maybe one week
CC List.
@Yikun @wangxiyuan @Potabk @MengqingCao
Any Other Things.
No response