Pinned Loading
-
llama.cpp-1-bit-turbo
llama.cpp-1-bit-turbo PublicForked from ggml-org/llama.cpp
HIP/ROCm fork optimized for AMD RDNA2 (gfx1030) with PrismML Q1_0_G128 1-bit quant support, RotorQuant, TurboQuant, EAGLE3 and P-EAGLE speculative decoding, and full Wave32 kernel optimizations.
C++ 5
-
sglang-1-bit-turbo
sglang-1-bit-turbo PublicForked from sgl-project/sglang
SGLang 1-Bit Turbo — AMD ROCm (gfx1030) inference fork with RotorQuant/TurboQuant KV compression, PHANTOM-X zero-copy draft speculation, EAGLE3 speculative decoding, 12 RDNA2 crash fixes, and Prism…
Python 4
-
vllm-1-bit-turbo
vllm-1-bit-turbo PublicForked from mitkox/vllm-turboquant
vLLM 0.18.1rc1 fork optimized for HIP/ROCm with support added for PrismML Bonsai Q1_0 and Q1_0_G128 1-bit GPU inference, TurboQuant TQ3_0 KV cache, and AMD gfx1030/RDNA2 architecture.
Python
-
gfxGRAPH
gfxGRAPH PublicCUDA Graph → HIP Graph translation layer for AMD gfx1030 (RDNA2). Bridges all 4 CUDA Graph parity gaps on ROCm.
Python
-
SpecForge
SpecForge PublicForked from sgl-project/SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Python
-
ATLAS
ATLAS PublicForked from itigges22/ATLAS
Adaptive Test-time Learning and Autonomous Specialization
Python
If the problem persists, check the GitHub status page or contact support.



