Skip to content

sa-faizal/KernelBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AMD GPU Kernel Benchmark Dashboard

A full-stack web application for benchmarking and comparing AMD GPU kernel performance across multiple providers (MIOpen, Composable Kernel, Triton, hipBLASLt, and custom kernels).

Features (Phase 1 - MVP)

✅ Implemented

  • Hardware Detection: Automatic AMD GPU detection using rocm-smi

    • Supports MI300, MI325, MI350, MI355, MI250, Radeon Pro series
    • Real-time GPU monitoring (temperature, utilization, power)
    • Multi-GPU support with device selection
  • Benchmark Configuration

    • Kernel categories: GEMM, Pointwise, SDPA, Conv, Norm
    • Customizable shape parameters
    • Data type support: fp16, fp32, bf16, int8
    • Configurable warmup and timing runs
  • Provider Support

    • MIOpen, Composable Kernel, Triton, hipBLASLt
    • Provider version management
    • Custom kernel upload system (Python, HIP, ASM)
  • Results & Visualization

    • Latency and throughput metrics
    • Power consumption and temperature tracking
    • Side-by-side performance comparison
    • Interactive charts (bar charts for latency/throughput)
  • RESTful API

    • Hardware management endpoints
    • Provider and version CRUD
    • Benchmark execution and results
    • Custom kernel management
  • Modern UI

    • React + TypeScript frontend
    • Real-time GPU status updates
    • Responsive design

🚧 Phase 2+ (Planned)

  • GitHub sync service for provider updates
  • Real kernel implementations (currently using mock data)
  • Custom kernel compilation service
  • Regression detection and alerts
  • Model-specific presets (LLaMA, GPT, Stable Diffusion)
  • Advanced visualizations (roofline plots, heatmaps)

Tech Stack

  • Backend: Python 3.10+, FastAPI, SQLAlchemy, Celery
  • Frontend: React 18, TypeScript, Recharts
  • Database: PostgreSQL (production), SQLite (development)
  • Containerization: Docker, docker-compose
  • GPU Integration: ROCm, HIP, rocm-smi

Prerequisites

For Development (without Docker)

  • Python 3.10+
  • Node.js 18+
  • ROCm 5.0+ (for AMD GPU support)
  • PostgreSQL (optional, SQLite used by default)
  • Redis (for Celery, optional in Phase 1)

For Docker Deployment

  • Docker 20.10+
  • docker-compose 2.0+
  • ROCm installed on host (for GPU access)

Quick Start

Option 1: Docker (Recommended)

# Clone the repository
cd KernelBench

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f

# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Backend Setup

# Navigate to backend
cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Initialize database
python -c "from app.models.database import init_db; init_db()"

# Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Frontend Setup

# Navigate to frontend (in a new terminal)
cd frontend

# Install dependencies
npm install

# Start development server
npm start

The application will be available at:

Project Structure

KernelBench/
├── backend/
│   ├── app/
│   │   ├── api/              # API endpoints
│   │   │   ├── hardware.py   # GPU hardware management
│   │   │   ├── providers.py  # Provider management
│   │   │   ├── benchmarks.py # Benchmark execution
│   │   │   └── custom_kernels.py  # Custom kernel upload
│   │   ├── models/
│   │   │   ├── database.py   # SQLAlchemy models
│   │   │   └── schemas.py    # Pydantic schemas
│   │   ├── services/
│   │   │   ├── hardware_detector.py  # AMD GPU detection
│   │   │   ├── benchmark_runner.py   # Benchmark execution
│   │   │   ├── github_sync.py        # GitHub integration (Phase 2)
│   │   │   └── build_service.py      # Kernel compilation (Phase 2)
│   │   ├── config.py         # Application configuration
│   │   └── main.py           # FastAPI application
│   ├── artifacts/            # Build artifacts and custom kernels
│   ├── requirements.txt
│   └── Dockerfile
│
├── frontend/
│   ├── src/
│   │   ├── components/
│   │   │   ├── HardwareSelector.tsx    # GPU selection
│   │   │   ├── ConfigurationForm.tsx   # Benchmark config
│   │   │   └── ResultsView.tsx         # Results display
│   │   ├── services/
│   │   │   └── api.ts        # API client
│   │   ├── App.tsx           # Main application
│   │   └── index.tsx
│   ├── package.json
│   └── Dockerfile
│
├── config/
│   ├── providers.yaml        # Provider configuration
│   ├── hardware.yaml         # GPU specifications
│   └── presets.yaml          # Model presets
│
├── docker-compose.yml
└── README.md

Configuration

Environment Variables

Create a .env file in the backend directory:

# Database (optional, defaults to SQLite)
DATABASE_URL=postgresql://user:password@localhost:5432/benchmark_db

# GitHub (for Phase 2)
GITHUB_TOKEN=your_github_token

# Celery (for Phase 2)
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0

# ROCm paths (auto-detected if standard installation)
ROCM_SMI_PATH=/opt/rocm/bin/rocm-smi
HIPCC_PATH=/opt/rocm/bin/hipcc

# Benchmarking
DEFAULT_WARMUP_RUNS=3
DEFAULT_TIMING_RUNS=10
MAX_GPU_TEMPERATURE=85.0

Usage

Via Web UI

  1. Open http://localhost:3000
  2. Select your target GPU from the dropdown
  3. Configure benchmark parameters
  4. Select providers to compare
  5. Click "Run Benchmark"
  6. View results and charts

Via API

1. GPU Detection

curl -X POST http://localhost:8000/api/hardware/refresh

2. Create a Provider

curl -X POST http://localhost:8000/api/providers/create \
  -H "Content-Type: application/json" \
  -d '{
    "name": "MIOpen",
    "repo_url": "https://github.com/ROCm/MIOpen",
    "build_system": "cmake",
    "active": true
  }'

3. Run a Benchmark

curl -X POST http://localhost:8000/api/benchmarks/run \
  -H "Content-Type: application/json" \
  -d '{
    "config_id": 1,
    "provider_version_ids": [1, 2],
    "hardware_profile_id": 1,
    "warmup_runs": 3,
    "timing_runs": 10
  }'

API Documentation

Interactive API documentation is available at:

Current Limitations (Phase 1)

⚠️ Important: This is Phase 1 (MVP/Core Infrastructure)

  • Benchmarks return mock data for testing the full pipeline
  • Real kernel implementations will be added in Phase 2
  • Custom kernel compilation is not yet implemented
  • GitHub sync service is not yet active
  • Hardware detection works with real AMD GPUs via rocm-smi
  • GPU selection and monitoring are fully functional

Troubleshooting

GPU Not Detected

# Check ROCm installation
rocm-smi --version

# List GPUs
rocm-smi --showid

# Check if application can access GPU
docker-compose exec backend rocm-smi

Database Issues

# Reset database
docker-compose down -v
docker-compose up -d

# Or for local setup
rm backend/benchmark.db
python -c "from app.models.database import init_db; init_db()"

Roadmap

✅ Phase 1: Core Infrastructure (Current)

  • ✅ Backend API with FastAPI
  • ✅ Database models and schemas
  • ✅ Hardware detection (real AMD GPU scanning)
  • ✅ Mock benchmark runner with GPU selection
  • ✅ React + TypeScript frontend
  • ✅ Basic visualization

🚧 Phase 2: Custom Kernel Support

  • Kernel compilation service
  • Syntax validation and smoke testing
  • Integration with benchmark runner

🚧 Phase 3: GitHub Integration

  • Automated provider syncing
  • Version tracking and build service
  • Change detection

🚧 Phase 4: Advanced Benchmarking

  • Real kernel dispatchers for all providers
  • Baseline comparison and regression detection
  • Multi-GPU parallel benchmarking

🚧 Phase 5: Visualization & UX

  • Roofline plots and heatmaps
  • Version timeline and diff view
  • Performance trends and alerts

License

MIT License (or your choice)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •