-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend
feature request
#10343
opened Nov 14, 2024 by
manninglucas
1 task done
[Bug]: Not able to run LLama3 LoRA with --fully-sharded-loras
bug
Something isn't working
#10342
opened Nov 14, 2024 by
xyang16
1 task done
[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model
bug
Something isn't working
#10337
opened Nov 14, 2024 by
amakaido28
1 task done
[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval
bug
Something isn't working
#10325
opened Nov 14, 2024 by
wchen61
1 task done
[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}]
bug
Something isn't working
#10324
opened Nov 14, 2024 by
victorserbu2709
1 task done
[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it?
feature request
#10323
opened Nov 14, 2024 by
1nlplearner
1 task done
[Usage]: using open-webui with vLLM inference engine instead of ollama
usage
How to use vllm
#10322
opened Nov 14, 2024 by
wolfgangsmdt
1 task done
[Usage]: Request to include vllm==0.6.2 for cuda 11.8
usage
How to use vllm
#10319
opened Nov 14, 2024 by
amew0
1 task done
[Performance]: Results in "vLLM Blog" article about speculative decoding are unreproducible
performance
Performance-related issues
#10318
opened Nov 14, 2024 by
yeonjoon-jung01
1 task done
[Bug]: FusedMoE kernel performance depends on input prompt length while decoding
bug
Something isn't working
#10313
opened Nov 14, 2024 by
taegeonum
1 task done
[Usage]: how to use How to use vllm
vllm
to output code only
usage
#10309
opened Nov 14, 2024 by
shaoyuyoung
1 task done
[Installation]: Build vllm environment error
installation
Installation problems
#10303
opened Nov 13, 2024 by
Kawai1Ace
1 task done
[Bug]: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12
bug
Something isn't working
#10300
opened Nov 13, 2024 by
yananchen1989
1 task done
[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3
bug
Something isn't working
#10298
opened Nov 13, 2024 by
piamo
1 task done
[Bug]: VLLLm crash when running Qwen/Qwen2.5-Coder-32B-Instruct on two H100 GPUs
bug
Something isn't working
#10296
opened Nov 13, 2024 by
noamwies
1 task done
[Usage]: What does "since, enforce-eager is enabled, async output processor cannot be used" mean exactly?
usage
How to use vllm
#10295
opened Nov 13, 2024 by
Leon-Sander
1 task done
[Feature]: Quark quantization format upstream to VLLM
feature request
#10294
opened Nov 13, 2024 by
kewang-xlnx
[Bug]: Can't use yarn rope config for long context in Qwen2 model
bug
Something isn't working
#10293
opened Nov 13, 2024 by
FlyCarrot
1 task done
[Feature]: Chunked prefill for multimodal models
feature request
#10290
opened Nov 13, 2024 by
QiuJingkai
1 task done
[Misc]: Invariant encountered: value was None when it should not be
misc
#10284
opened Nov 13, 2024 by
nithingovindugari
[Bug]: LLM initialization time increases significantly with larger tensor parallel size and Ray
bug
Something isn't working
#10283
opened Nov 13, 2024 by
piood
[Bug]: 因vllm的版本不同,启动的qwen2.5服务,对于相同的输入;0.6.1.post2 sse输出是正确的,但 0.6.3.post1是错误的?
bug
Something isn't working
#10280
opened Nov 13, 2024 by
mawenju203
1 task done
[Bug]: Speculative Decoding + TP on Spec Worker + Chunked Prefill does not work.
bug
Something isn't working
#10276
opened Nov 13, 2024 by
andoorve
1 task done
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.