Skip to content

Commit f474ca5

Browse files
authored
Create AMD-Qwen3-Next-Usage.md
1 parent b7d325e commit f474ca5

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

Qwen/AMD-Qwen3-Next-Usage.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
#### Step by Step Guide
2+
Please follow the steps here to install and run Qwen3-Next-80B-A3B-Instruct models on AMD MI300X GPU.
3+
#### Step 1
4+
Pull the latest vllm docker:
5+
```shell
6+
docker pull rocm/vllm-dev:nightly
7+
```
8+
Launch the Rocm-vllm docker:
9+
```shell
10+
docker run -d -it --ipc=host --network=host --privileged --cap-add=CAP_SYS_ADMIN --device=/dev/kfd --device=/dev/dri --device=/dev/mem --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v /:/work -e SHELL=/bin/bash --name Qwen3-next rocm/vllm-dev:nightly
11+
```
12+
#### Step 2
13+
Huggingface login
14+
```shell
15+
huggingface-cli login
16+
```
17+
#### Step 3
18+
##### FP8
19+
20+
Run the vllm online serving
21+
Sample Command
22+
```shell
23+
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 vllm serve Qwen/Qwen3-Next-80B-A3B-Instruct --tensor-parallel-size 4 --max-model-len 32768 --no-enable-prefix-caching
24+
```
25+
#### Step 4
26+
Open a new terminal, enter into the running docker and run the following benchmark script.
27+
```shell
28+
docker exec -it Qwen3-next /bin/bash
29+
python3 /vllm-workspace/benchmarks/benchmark_serving.py --model Qwen/Qwen3-Next-80B-A3B-Instruct --dataset-name random --ignore-eos --num-prompts 500 --max-concurrency 128 --random-input-len 3200 --random-output-len 800 --percentile-metrics ttft,tpot,itl,e2el
30+
```

0 commit comments

Comments
 (0)