-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Kernel] Add CUTLASS sparse support, heuristics, and torch operators
ci/build
#10340
opened Nov 14, 2024 by
Faraz9877
Loading…
[Perf] Reduce peak memory usage of llama
ready
ONLY add when PR is ready to merge/full CI is needed
#10339
opened Nov 14, 2024 by
andoorve
Loading…
[bugfix] Fix static asymmetric quantization case
ready
ONLY add when PR is ready to merge/full CI is needed
#10334
opened Nov 14, 2024 by
ProExpertProg
Loading…
[Tool parsing] Improve / correct mistral tool parsing
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#10333
opened Nov 14, 2024 by
patrickvonplaten
Loading…
[Hardware][Cambricon MLU] Add Cambricon MLU inference backend (#9649)
ci/build
#10315
opened Nov 14, 2024 by
zonghuaxiansheng
Loading…
[Bugfix] Fix unable to load some models
ci/build
documentation
Improvements or additions to documentation
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
release-blocker
This PR/issue blocks the next release, therefore deserves highest priority
#10312
opened Nov 14, 2024 by
DarkLight1337
Loading…
[Misc] Change RedundantReshapesPass and FusionPass logging from info to debug
ready
ONLY add when PR is ready to merge/full CI is needed
#10308
opened Nov 13, 2024 by
tlrmchlsmth
Loading…
[TPU] Implement prefix caching for TPUs
ci/build
tpu
Related to Google TPUs
#10307
opened Nov 13, 2024 by
WoosukKwon
•
Draft
[Bugfix] return zero point in static quantization in scaled_int8_quant
#10292
opened Nov 13, 2024 by
danieldk
Loading…
[Model] Add Support for Multimodal Granite Models
#10291
opened Nov 13, 2024 by
alex-jw-brooks
Loading…
[Misc] Update benchmark to support image_url file or http
#10287
opened Nov 13, 2024 by
kakao-steve-ai
•
Draft
[Core][Frontend] Add faster-outlines as guided decoding backend
ci/build
#10277
opened Nov 13, 2024 by
unaidedelf8777
Loading…
[torch.compile] PostGradPassManager, Inductor code caching fix, fix_functionalization pass refactor + tests
ready
ONLY add when PR is ready to merge/full CI is needed
#10273
opened Nov 12, 2024 by
ProExpertProg
Loading…
[Kernel][Hardware][AMD] Add support for GGUF quantization on ROCm
ci/build
#10254
opened Nov 12, 2024 by
kliuae
Loading…
[CI/build] update torch to 2.5.1 in publish.yml to match requirements-build.txt
ci/build
#10253
opened Nov 12, 2024 by
tomeras91
Loading…
[Core] Reduce TTFT with concurrent partial prefills
frontend
ready
ONLY add when PR is ready to merge/full CI is needed
#10235
opened Nov 11, 2024 by
joerunde
Loading…
Previous Next
ProTip!
What’s not been updated in a month: updated:<2024-10-14.