vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.6k
Star 30.2k

Code
Issues 1.8k
Pull requests 406
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 18

vLLM's V1 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 9

Labels 56 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,843 Open 3,407 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Feature]: Allow head_size smaller than 128 on TPU with Pallas backend feature request

#10343 opened Nov 14, 2024 by manninglucas

1 task done

[Bug]: Not able to run LLama3 LoRA with --fully-sharded-loras bug

Something isn't working

#10342 opened Nov 14, 2024 by xyang16

1 task done

[Bug]: KV Cache Error with KV_cache_dtype=FP8 and Large Sequence Length: Losing Context Length of Model bug

Something isn't working

#10337 opened Nov 14, 2024 by amakaido28

1 task done

[Bug]: Out of Memory (OOM) Issues During MMLU Evaluation with lm_eval bug

Something isn't working

#10325 opened Nov 14, 2024 by wchen61

1 task done

[Bug] custom chat template sends to model [{'type': 'text', 'text': '...'}] bug

Something isn't working

#10324 opened Nov 14, 2024 by victorserbu2709

1 task done

[Feature]: To adapt to the TTS task, I need to directly pass in the embedding. How should I modify it? feature request

#10323 opened Nov 14, 2024 by 1nlplearner

1 task done

[Usage]: using open-webui with vLLM inference engine instead of ollama usage

How to use vllm

#10322 opened Nov 14, 2024 by wolfgangsmdt

1 task done

[Usage]: Request to include vllm==0.6.2 for cuda 11.8 usage

How to use vllm

#10319 opened Nov 14, 2024 by amew0

1 task done

[Performance]: Results in "vLLM Blog" article about speculative decoding are unreproducible performance

Performance-related issues

#10318 opened Nov 14, 2024 by yeonjoon-jung01

1 task done

[Bug]: FusedMoE kernel performance depends on input prompt length while decoding bug

Something isn't working

#10313 opened Nov 14, 2024 by taegeonum

1 task done

[Usage]: how to use vllm to output code only usage

How to use vllm

#10309 opened Nov 14, 2024 by shaoyuyoung

1 task done

[Installation]: Build vllm environment error installation

Installation problems

#10303 opened Nov 13, 2024 by Kawai1Ace

1 task done

[Bug]: vllm-openai is outdated bug

Something isn't working

#10301 opened Nov 13, 2024 by matbee-eth

[Bug]: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12 bug

Something isn't working

#10300 opened Nov 13, 2024 by yananchen1989

1 task done

[Bug]: Get meaningless output when run long context inference of Qwen2.5 model with vllm>=0.6.3 bug

Something isn't working

#10298 opened Nov 13, 2024 by piamo

1 task done

[Bug]: VLLLm crash when running Qwen/Qwen2.5-Coder-32B-Instruct on two H100 GPUs bug

Something isn't working

#10296 opened Nov 13, 2024 by noamwies

1 task done

[Usage]: What does "since, enforce-eager is enabled, async output processor cannot be used" mean exactly? usage

How to use vllm

#10295 opened Nov 13, 2024 by Leon-Sander

1 task done

[Feature]: Quark quantization format upstream to VLLM feature request

#10294 opened Nov 13, 2024 by kewang-xlnx

[Bug]: Can't use yarn rope config for long context in Qwen2 model bug

Something isn't working

#10293 opened Nov 13, 2024 by FlyCarrot

1 task done

[Feature]: Chunked prefill for multimodal models feature request

#10290 opened Nov 13, 2024 by QiuJingkai

1 task done

[Feature]: 2D TP & EP feature request

#10289 opened Nov 13, 2024 by WenhaoHe02

1 task done

[Misc]: Invariant encountered: value was None when it should not be misc

#10284 opened Nov 13, 2024 by nithingovindugari

[Bug]: LLM initialization time increases significantly with larger tensor parallel size and Ray bug

Something isn't working

#10283 opened Nov 13, 2024 by piood

[Bug]: 因vllm的版本不同，启动的qwen2.5服务，对于相同的输入；0.6.1.post2 sse输出是正确的，但 0.6.3.post1是错误的？ bug

Something isn't working

#10280 opened Nov 13, 2024 by mawenju203

1 task done

[Bug]: Speculative Decoding + TP on Spec Worker + Chunked Prefill does not work. bug

Something isn't working

#10276 opened Nov 13, 2024 by andoorve

1 task done

Previous 1 2 3 4 5 … 73 74 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly