NVIDIA Corporation

All

612 repositories

cccl
Public
CUDA Core Compute Libraries
cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing
C++
•
Other
•283•2k•1.1k•176•Updated Oct 17, 2025Oct 17, 2025
TensorRT-Model-Optimizer
Public
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM or TensorRT to optimize inference speed.
Python
•
Apache License 2.0
•177•1.5k•121•36•Updated Oct 17, 2025Oct 17, 2025
Fuser
Public
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
C++
•
Other
•69•357•199•190•Updated Oct 17, 2025Oct 17, 2025
doca-platform
Public
DOCA Platform manages provisioning and service orchestration for Bluefield DPUs
Go
•
Apache License 2.0
•11•53•0•0•Updated Oct 17, 2025Oct 17, 2025
TensorRT-LLM
Public
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
cuda pytorch moe blackwell llm-serving
C++
•
Apache License 2.0
•1.8k•12k•749•405•Updated Oct 17, 2025Oct 17, 2025
gpu-operator
Public
NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
kubernetes gpu cuda nvidia
Go
•
Apache License 2.0
•394•2.3k•394•68•Updated Oct 17, 2025Oct 17, 2025
nvidia-container-toolkit
Public
Build and run containers leveraging NVIDIA GPUs
Go
•
Apache License 2.0
•422•3.7k•408•22•Updated Oct 17, 2025Oct 17, 2025
torch-harmonics
Public
Differentiable signal processing on the sphere for PyTorch
machine-learning signal-processing sphere pytorch
Jupyter Notebook
•
Other
•53•542•4•2•Updated Oct 17, 2025Oct 17, 2025
warp
Public
A Python framework for accelerated simulation, data generation and spatial computing.
python gpu cuda nvidia gpu-acceleration differentiable-programming nvidia-warp
Python
•
Apache License 2.0
•371•5.7k•255•11•Updated Oct 17, 2025Oct 17, 2025
NeMo-Agent-Toolkit
Public
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
Python
•
Apache License 2.0
•389•1.4k•59•24•Updated Oct 17, 2025Oct 17, 2025
Megatron-LM
Public
Ongoing research training transformer models at scale
transformers model-para large-language-models
Python
•
Other
•3.2k•14k•295•101•Updated Oct 17, 2025Oct 17, 2025
TransformerEngine
Public
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
python machine-learning deep-learning gpu cuda pytorch jax fp8
Python
•
Apache License 2.0
•522•2.8k•219•92•Updated Oct 17, 2025Oct 17, 2025
bionemo-framework
Public
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
machine-learning gpu pytorch drug-discovery
Jupyter Notebook
•88•542•55•87•Updated Oct 17, 2025Oct 17, 2025
nv-ingest
Public
NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Python
•
Apache License 2.0
•269•2.8k•94•33•Updated Oct 17, 2025Oct 17, 2025
numba-cuda
Public
The CUDA target for Numba
Python
•
BSD 2-Clause "Simplified" License
•40•199•90•21•Updated Oct 17, 2025Oct 17, 2025
TorchFort
Public
An Online Deep Learning Interface for HPC programs on NVIDIA GPUs
deep-learning fortran pytorch libtorch
C++
•
Apache License 2.0
•27•173•2•0•Updated Oct 17, 2025Oct 17, 2025
accelerated-computing-hub
Public
NVIDIA curated collection of educational resources related to general purpose GPU programming.
Jupyter Notebook
•
Other
•125•750•12•4•Updated Oct 17, 2025Oct 17, 2025
k8s-operator-libs
Public
A collection of useful Go libraries to ease the development of NVIDIA Operators for GPU/NIC management.
Go
•
Apache License 2.0
•21•25•2•7•Updated Oct 17, 2025Oct 17, 2025
aistore
Public
AIStore: scalable storage for AI applications
kubernetes high-performance distributed-storage high-availability object-storage multi-cloud batch-jobs s3-compatible multipart-upload ml-training
Go
•
MIT License
•219•1.6k•0•0•Updated Oct 17, 2025Oct 17, 2025
cuda-python
Public
CUDA Python: Performance meets Productivity
Python
•
Other
•214•3k•176•15•Updated Oct 17, 2025Oct 17, 2025
NeMo-Agent-Toolkit-UI
Public
The NVIDIA AIQToolkit UI streamlines interacting with AIQToolkit workflows in an easy-to-use web application.
TypeScript
•
Other
•37•50•4•6•Updated Oct 17, 2025Oct 17, 2025
nvkind
Public
Go
•
Apache License 2.0
•22•174•9•11•Updated Oct 17, 2025Oct 17, 2025
cuopt
Public
GPU accelerated decision optimization
gpu optimization cuda linear-programming
Cuda
•
Apache License 2.0
•83•472•62•18•Updated Oct 17, 2025Oct 17, 2025
spark-rapids
Public
Spark RAPIDS plugin - accelerate Apache Spark with GPUs
big-data spark gpu rapids
Scala
•
Apache License 2.0
•258•939•1.7k•20•Updated Oct 17, 2025Oct 17, 2025
dgx-spark-playbooks
Public
Collection of step-by-step playbooks for setting up AI/ML workloads on NVIDIA DGX Spark devices with Blackwell architecture.
TypeScript
•
Apache License 2.0
•6•26•1•1•Updated Oct 17, 2025Oct 17, 2025
physicsnemo
Public
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
machine-learning deep-learning physics pytorch nvidia-gpu
Python
•
Apache License 2.0
•453•1.9k•43•28•Updated Oct 17, 2025Oct 17, 2025
spark-rapids-jni
Public
RAPIDS Accelerator JNI For Apache Spark
Cuda
•
Apache License 2.0
•74•51•77•7•Updated Oct 17, 2025Oct 17, 2025
nvidia-resiliency-ext
Public
NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.
Python
•
Other
•34•226•1•10•Updated Oct 17, 2025Oct 17, 2025
holodeck
Public
Holodeck is a project to create test environments optimised for GPU projects.
Go
•
Apache License 2.0
•8•19•3•8•Updated Oct 17, 2025Oct 17, 2025
cuda-quantum
Public
C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
python cpp quantum quantum-computing hacktoberfest quantum-programming-language quantum-algorithms quantum-machine-learning unitaryhack
C++
•
Other
•294•821•409•90•Updated Oct 17, 2025Oct 17, 2025