Skip to content
Change the repository type filter

All

    Repositories list

    • llm-d

      Public
      Achieve state of the art inference performance with modern accelerators on Kubernetes
      Shell
      Apache License 2.0
      446000Updated Apr 30, 2026Apr 30, 2026
    • Helm Chart Repository
      Makefile
      0000Updated Apr 30, 2026Apr 30, 2026
    • LMCache

      Public
      Redis for LLMs
      Python
      Apache License 2.0
      1.1k003Updated Apr 30, 2026Apr 30, 2026
    • Gateway API Inference Extension
      Go
      Apache License 2.0
      283000Updated Apr 30, 2026Apr 30, 2026
    • mif

      Public
      MIF: MoAI Inference Framework
      MDX
      0001Updated Apr 29, 2026Apr 29, 2026
    • Distributed KV cache coordinator
      Go
      Apache License 2.0
      119001Updated Apr 29, 2026Apr 29, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      16k105Updated Apr 29, 2026Apr 29, 2026
    • Inference scheduler for llm-d
      Go
      Apache License 2.0
      181000Updated Apr 28, 2026Apr 28, 2026
    • GenAI inference performance benchmarking tool
      Python
      Apache License 2.0
      87000Updated Apr 28, 2026Apr 28, 2026
    • git pre-commit hook for automation
      Python
      0000Updated Apr 26, 2026Apr 26, 2026
    • HTML
      0000Updated Apr 26, 2026Apr 26, 2026
    • Mooncake

      Public
      Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
      C++
      Apache License 2.0
      719000Updated Apr 22, 2026Apr 22, 2026
    • A lightweight, configurable, and real-time simulator designed to mimic the behavior of vLLM without the need for GPUs or running actual heavy models.
      Go
      Apache License 2.0
      82000Updated Apr 19, 2026Apr 19, 2026
    • tt-umd

      Public
      User-Mode Driver for Tenstorrent hardware
      C++
      Apache License 2.0
      35000Updated Mar 27, 2026Mar 27, 2026
    • skypilot

      Public
      SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 16+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.
      Python
      Apache License 2.0
      1.1k000Updated Mar 15, 2026Mar 15, 2026
    • hwlog

      Public
      hwlog for system health check and logs
      Go
      0000Updated Mar 11, 2026Mar 11, 2026
    • vllm-pcp

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      16k003Updated Feb 27, 2026Feb 27, 2026
    • SpecForge

      Public
      Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
      Python
      MIT License
      218000Updated Jan 23, 2026Jan 23, 2026
    • mori

      Public
      Modular RDMA Interface
      C++
      MIT License
      37002Updated Jan 22, 2026Jan 22, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      5.6k002Updated Jan 14, 2026Jan 14, 2026
    • repo for MI355X benchmark (TensorWave)
      Shell
      Apache License 2.0
      155000Updated Nov 26, 2025Nov 26, 2025
    • 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference an…
      Python
      Apache License 2.0
      33k000Updated Nov 26, 2025Nov 26, 2025
    • Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      2.7k000Updated Nov 17, 2025Nov 17, 2025
    • lws

      Public
      LeaderWorkerSet: An API for deploying a group of pods as a unit of replication
      Go
      Apache License 2.0
      146000Updated Nov 3, 2025Nov 3, 2025
    • 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
      Python
      Apache License 2.0
      33k200Updated Oct 31, 2025Oct 31, 2025
    • kgateway

      Public
      The Cloud-Native API Gateway and AI Gateway
      Go
      Apache License 2.0
      744000Updated Oct 20, 2025Oct 20, 2025
    • gpt-oss

      Public
      Inference gpt-oss in one file of pure C
      Python
      2.5k100Updated Oct 12, 2025Oct 12, 2025
    • k0s

      Public
      k0s - The Zero Friction Kubernetes
      Go
      Other
      498000Updated Oct 2, 2025Oct 2, 2025
    • Kubernetes Operator for OpenTelemetry Collector
      Go
      Apache License 2.0
      621000Updated Sep 27, 2025Sep 27, 2025
    • Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
      Python
      MIT License
      51000Updated Sep 25, 2025Sep 25, 2025
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.