Skip to content

Commit d49f9f4

Browse files
committed
adjust the gpu deployment to increase max batch size
1 parent a13a123 commit d49f9f4

File tree

1 file changed

+6
-0
lines changed

1 file changed

+6
-0
lines changed

config/manifests/vllm/gpu-deployment.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,15 @@ spec:
2424
- "1"
2525
- "--port"
2626
- "8000"
27+
- "--max-num-seq"
28+
- "2048"
29+
- "--compilation-config"
30+
- "3"
2731
- "--enable-lora"
2832
- "--max-loras"
2933
- "2"
34+
- "--max-lora-rank"
35+
- "8"
3036
- "--max-cpu-loras"
3137
- "12"
3238
env:

0 commit comments

Comments
 (0)