Skip to content

Pull requests: vllm-project/vllm

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Rahul quant merged
#10341 opened Nov 14, 2024 by robertgshaw2-neuralmagic Draft
[Perf] Reduce peak memory usage of llama ready ONLY add when PR is ready to merge/full CI is needed
#10339 opened Nov 14, 2024 by andoorve Loading…
[bugfix] Fix static asymmetric quantization case ready ONLY add when PR is ready to merge/full CI is needed
#10334 opened Nov 14, 2024 by ProExpertProg Loading…
[Tool parsing] Improve / correct mistral tool parsing frontend ready ONLY add when PR is ready to merge/full CI is needed
#10333 opened Nov 14, 2024 by patrickvonplaten Loading…
DistServe Prototype
#10321 opened Nov 14, 2024 by Jocn2020 Draft
[Bugfix] Fix unable to load some models ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed release-blocker This PR/issue blocks the next release, therefore deserves highest priority
#10312 opened Nov 14, 2024 by DarkLight1337 Loading…
[Model] Support telechat2
#10311 opened Nov 14, 2024 by shunxing12345 Loading…
[Misc] Change RedundantReshapesPass and FusionPass logging from info to debug ready ONLY add when PR is ready to merge/full CI is needed
#10308 opened Nov 13, 2024 by tlrmchlsmth Loading…
[TPU] Implement prefix caching for TPUs ci/build tpu Related to Google TPUs
#10307 opened Nov 13, 2024 by WoosukKwon Draft
[torch.compile] PostGradPassManager, Inductor code caching fix, fix_functionalization pass refactor + tests ready ONLY add when PR is ready to merge/full CI is needed
#10273 opened Nov 12, 2024 by ProExpertProg Loading…
[misc] Layerwise profile updates
#10242 opened Nov 12, 2024 by varun-sundar-rabindranath Loading…
[Hardware] [HPU]add mark_step for hpu Gaudi
#10239 opened Nov 12, 2024 by jikunshang Loading…
[Core] Reduce TTFT with concurrent partial prefills frontend ready ONLY add when PR is ready to merge/full CI is needed
#10235 opened Nov 11, 2024 by joerunde Loading…
ProTip! What’s not been updated in a month: updated:<2024-10-14.