We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 64.8k 11.8k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.3k 306
Common recipes to run vLLM
Jupyter Notebook 263 98
Intelligent Router for Mixture-of-Models
TPU inference for vLLM, with unified JAX and PyTorch support.
There was an error while loading. Please reload this page.
Community maintained hardware plugin for vLLM on Ascend
A high-performance and light-weight router for vLLM large scale deployment
A framework for efficient model inference with omni-modality models
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
Cost-efficient and pluggable Infrastructure components for GenAI inference