AMD GPU Kernel Benchmark Dashboard

A full-stack web application for benchmarking and comparing AMD GPU kernel performance across multiple providers (MIOpen, Composable Kernel, Triton, hipBLASLt, and custom kernels).

Features (Phase 1 - MVP)

✅ Implemented

Hardware Detection: Automatic AMD GPU detection using rocm-smi
- Supports MI300, MI325, MI350, MI355, MI250, Radeon Pro series
- Real-time GPU monitoring (temperature, utilization, power)
- Multi-GPU support with device selection
Benchmark Configuration
- Kernel categories: GEMM, Pointwise, SDPA, Conv, Norm
- Customizable shape parameters
- Data type support: fp16, fp32, bf16, int8
- Configurable warmup and timing runs
Provider Support
- MIOpen, Composable Kernel, Triton, hipBLASLt
- Provider version management
- Custom kernel upload system (Python, HIP, ASM)
Results & Visualization
- Latency and throughput metrics
- Power consumption and temperature tracking
- Side-by-side performance comparison
- Interactive charts (bar charts for latency/throughput)
RESTful API
- Hardware management endpoints
- Provider and version CRUD
- Benchmark execution and results
- Custom kernel management
Modern UI
- React + TypeScript frontend
- Real-time GPU status updates
- Responsive design

🚧 Phase 2+ (Planned)

GitHub sync service for provider updates
Real kernel implementations (currently using mock data)
Custom kernel compilation service
Regression detection and alerts
Model-specific presets (LLaMA, GPT, Stable Diffusion)
Advanced visualizations (roofline plots, heatmaps)

Tech Stack

Backend: Python 3.10+, FastAPI, SQLAlchemy, Celery
Frontend: React 18, TypeScript, Recharts
Database: PostgreSQL (production), SQLite (development)
Containerization: Docker, docker-compose
GPU Integration: ROCm, HIP, rocm-smi

Prerequisites

For Development (without Docker)

Python 3.10+
Node.js 18+
ROCm 5.0+ (for AMD GPU support)
PostgreSQL (optional, SQLite used by default)
Redis (for Celery, optional in Phase 1)

For Docker Deployment

Docker 20.10+
docker-compose 2.0+
ROCm installed on host (for GPU access)

Quick Start

Option 1: Docker (Recommended)

# Clone the repository
cd KernelBench

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Backend Setup

# Navigate to backend
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Initialize database
python -c "from app.models.database import init_db; init_db()"

# Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

# Navigate to frontend (in a new terminal)
cd frontend

# Install dependencies
npm install

# Start development server
npm start

The application will be available at:

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs

Project Structure

KernelBench/
├── backend/
│   ├── app/
│   │   ├── api/              # API endpoints
│   │   │   ├── hardware.py   # GPU hardware management
│   │   │   ├── providers.py  # Provider management
│   │   │   ├── benchmarks.py # Benchmark execution
│   │   │   └── custom_kernels.py  # Custom kernel upload
│   │   ├── models/
│   │   │   ├── database.py   # SQLAlchemy models
│   │   │   └── schemas.py    # Pydantic schemas
│   │   ├── services/
│   │   │   ├── hardware_detector.py  # AMD GPU detection
│   │   │   ├── benchmark_runner.py   # Benchmark execution
│   │   │   ├── github_sync.py        # GitHub integration (Phase 2)
│   │   │   └── build_service.py      # Kernel compilation (Phase 2)
│   │   ├── config.py         # Application configuration
│   │   └── main.py           # FastAPI application
│   ├── artifacts/            # Build artifacts and custom kernels
│   ├── requirements.txt
│   └── Dockerfile
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── HardwareSelector.tsx    # GPU selection
│   │   │   ├── ConfigurationForm.tsx   # Benchmark config
│   │   │   └── ResultsView.tsx         # Results display
│   │   ├── services/
│   │   │   └── api.ts        # API client
│   │   ├── App.tsx           # Main application
│   │   └── index.tsx
│   ├── package.json
│   └── Dockerfile
│
├── config/
│   ├── providers.yaml        # Provider configuration
│   ├── hardware.yaml         # GPU specifications
│   └── presets.yaml          # Model presets
│
├── docker-compose.yml
└── README.md

Configuration

Environment Variables

Create a .env file in the backend directory:

# Database (optional, defaults to SQLite)
DATABASE_URL=postgresql://user:password@localhost:5432/benchmark_db

# GitHub (for Phase 2)
GITHUB_TOKEN=your_github_token

# Celery (for Phase 2)
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# ROCm paths (auto-detected if standard installation)
ROCM_SMI_PATH=/opt/rocm/bin/rocm-smi
HIPCC_PATH=/opt/rocm/bin/hipcc

# Benchmarking
DEFAULT_WARMUP_RUNS=3
DEFAULT_TIMING_RUNS=10
MAX_GPU_TEMPERATURE=85.0

Usage

Via Web UI

Open http://localhost:3000
Select your target GPU from the dropdown
Configure benchmark parameters
Select providers to compare
Click "Run Benchmark"
View results and charts

Via API

1. GPU Detection

curl -X POST http://localhost:8000/api/hardware/refresh

2. Create a Provider

curl -X POST http://localhost:8000/api/providers/create \
  -H "Content-Type: application/json" \
  -d '{
    "name": "MIOpen",
    "repo_url": "https://github.com/ROCm/MIOpen",
    "build_system": "cmake",
    "active": true
  }'

3. Run a Benchmark

curl -X POST http://localhost:8000/api/benchmarks/run \
  -H "Content-Type: application/json" \
  -d '{
    "config_id": 1,
    "provider_version_ids": [1, 2],
    "hardware_profile_id": 1,
    "warmup_runs": 3,
    "timing_runs": 10
  }'

API Documentation

Interactive API documentation is available at:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

Current Limitations (Phase 1)

⚠️ Important: This is Phase 1 (MVP/Core Infrastructure)

Benchmarks return mock data for testing the full pipeline
Real kernel implementations will be added in Phase 2
Custom kernel compilation is not yet implemented
GitHub sync service is not yet active
Hardware detection works with real AMD GPUs via rocm-smi
GPU selection and monitoring are fully functional

Troubleshooting

GPU Not Detected

# Check ROCm installation
rocm-smi --version

# List GPUs
rocm-smi --showid

# Check if application can access GPU
docker-compose exec backend rocm-smi

Database Issues

# Reset database
docker-compose down -v
docker-compose up -d

# Or for local setup
rm backend/benchmark.db
python -c "from app.models.database import init_db; init_db()"

Roadmap

✅ Phase 1: Core Infrastructure (Current)

✅ Backend API with FastAPI
✅ Database models and schemas
✅ Hardware detection (real AMD GPU scanning)
✅ Mock benchmark runner with GPU selection
✅ React + TypeScript frontend
✅ Basic visualization

🚧 Phase 2: Custom Kernel Support

Kernel compilation service
Syntax validation and smoke testing
Integration with benchmark runner

🚧 Phase 3: GitHub Integration

Automated provider syncing
Version tracking and build service
Change detection

🚧 Phase 4: Advanced Benchmarking

Real kernel dispatchers for all providers
Baseline comparison and regression detection
Multi-GPU parallel benchmarking

🚧 Phase 5: Visualization & UX

Roofline plots and heatmaps
Version timeline and diff view
Performance trends and alerts

License

MIT License (or your choice)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
backend		backend
config		config
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

sa-faizal/KernelBench

Folders and files

Latest commit

History

Repository files navigation