vllm-project / tpu-inference Public

Notifications You must be signed in to change notification settings
Fork 54
Star 184

Code
Issues 17
Pull requests 71
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Pull requests: vllm-project/tpu-inference

Labels 10 Milestones 0

New pull request New

71 Open 1,158 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

Fix moe layer from upstream change ready

ONLY add when PR is ready to merge/full CI is needed

#1274 opened Dec 10, 2025 by kyuyeunk

Loading…

[Quantization] Add option to bypass quantized matmul kernel for W8A8-FP8 Compressed Tensors

#1273 opened Dec 9, 2025 by jrplatin

Loading…

clear xla compilation cache before each disagg server launch

#1271 opened Dec 9, 2025 by sixiang-google

Loading…

First check-in to add ci/cd test on tpuv7x ready

ONLY add when PR is ready to merge/full CI is needed

#1270 opened Dec 9, 2025 by QiliangCui

Loading…

add github action for check ready label ready

ONLY add when PR is ready to merge/full CI is needed

#1269 opened Dec 9, 2025 by boe20211

Loading…

Fix TPU7x chip counting to account for chiplet architecture

#1266 opened Dec 8, 2025 by burbajr

Loading…

Replacing bit_width() with itemized_bits(). ready

ONLY add when PR is ready to merge/full CI is needed

#1264 opened Dec 8, 2025 by aman2930

Loading…

3 tasks done

Add default 'auto' MODEL_IMPL_TYPE that resolves based on architecture ready

ONLY add when PR is ready to merge/full CI is needed

#1255 opened Dec 5, 2025 by xingliu14

Loading…

[Kernel][FusedMoE] Fix MoE crash and hang issues ready

ONLY add when PR is ready to merge/full CI is needed

#1252 opened Dec 5, 2025 by bythew3i

Loading…

docs: update support matrices and improve visuals

#1250 opened Dec 5, 2025 by RobMulla

Loading…

Avoid installing CUDA related stuff

#1246 opened Dec 4, 2025 by wdhongtw

Loading…

Reduce image size and enhance caching

#1245 opened Dec 4, 2025 by wdhongtw

Loading…

update run_in_docker script for running on local env ready

ONLY add when PR is ready to merge/full CI is needed

#1243 opened Dec 4, 2025 by ernie-chang

Loading…

Verify vllm-tpu python package (draft) ready

ONLY add when PR is ready to merge/full CI is needed

#1241 opened Dec 4, 2025 by ylangtsou • Draft

[DRAFT] Optimize Dockerfile to reduce image size and build time.

#1226 opened Dec 3, 2025 by py4

Loading…

[CI] Fix awq dtype ready

ONLY add when PR is ready to merge/full CI is needed

#1220 opened Dec 2, 2025 by kyuyeunk

Loading…

[Oncall] update the SchedulerConfig interface

#1219 opened Dec 2, 2025 by bzgoogle

Loading…

Add a SP e2e test.

#1209 opened Dec 2, 2025 by vanbasten23

Loading…

[RPA] Pipeline flash attention in default kernel ready

ONLY add when PR is ready to merge/full CI is needed

#1203 opened Dec 1, 2025 by jrplatin

Loading…

Save size in scalar scratch for bo and bq ready

ONLY add when PR is ready to merge/full CI is needed

#1201 opened Dec 1, 2025 by rupengliu-meta

Loading…

fix(rpa-v3): add sliding window mask to h64 kernel and attention_sink to h128

#1185 opened Nov 26, 2025 by erfanzar

Loading…

Add 1st iteration of pre-merge pipeline

#1175 opened Nov 25, 2025 by jcyang43 • Draft

[Qwix/Flax] Upgrade to Flax 0.12.0 + Qwix 0.1.4

#1170 opened Nov 25, 2025 by jrplatin

Loading…

[do not merge] test status check POC ready

ONLY add when PR is ready to merge/full CI is needed

#1168 opened Nov 25, 2025 by khluu

Loading…

[Feat][TPU Offload] KV cache offload to local cpu buffer ready

ONLY add when PR is ready to merge/full CI is needed

#1163 opened Nov 24, 2025 by juncgu-google

Loading…

Previous 1 2 3 Next

Previous Next

ProTip! Filter pull requests by the default branch with base:main.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!