A full-stack web application for benchmarking and comparing AMD GPU kernel performance across multiple providers (MIOpen, Composable Kernel, Triton, hipBLASLt, and custom kernels).
-
Hardware Detection: Automatic AMD GPU detection using
rocm-smi- Supports MI300, MI325, MI350, MI355, MI250, Radeon Pro series
- Real-time GPU monitoring (temperature, utilization, power)
- Multi-GPU support with device selection
-
Benchmark Configuration
- Kernel categories: GEMM, Pointwise, SDPA, Conv, Norm
- Customizable shape parameters
- Data type support: fp16, fp32, bf16, int8
- Configurable warmup and timing runs
-
Provider Support
- MIOpen, Composable Kernel, Triton, hipBLASLt
- Provider version management
- Custom kernel upload system (Python, HIP, ASM)
-
Results & Visualization
- Latency and throughput metrics
- Power consumption and temperature tracking
- Side-by-side performance comparison
- Interactive charts (bar charts for latency/throughput)
-
RESTful API
- Hardware management endpoints
- Provider and version CRUD
- Benchmark execution and results
- Custom kernel management
-
Modern UI
- React + TypeScript frontend
- Real-time GPU status updates
- Responsive design
- GitHub sync service for provider updates
- Real kernel implementations (currently using mock data)
- Custom kernel compilation service
- Regression detection and alerts
- Model-specific presets (LLaMA, GPT, Stable Diffusion)
- Advanced visualizations (roofline plots, heatmaps)
- Backend: Python 3.10+, FastAPI, SQLAlchemy, Celery
- Frontend: React 18, TypeScript, Recharts
- Database: PostgreSQL (production), SQLite (development)
- Containerization: Docker, docker-compose
- GPU Integration: ROCm, HIP, rocm-smi
- Python 3.10+
- Node.js 18+
- ROCm 5.0+ (for AMD GPU support)
- PostgreSQL (optional, SQLite used by default)
- Redis (for Celery, optional in Phase 1)
- Docker 20.10+
- docker-compose 2.0+
- ROCm installed on host (for GPU access)
# Clone the repository
cd KernelBench
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docs# Navigate to backend
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Initialize database
python -c "from app.models.database import init_db; init_db()"
# Run the server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000# Navigate to frontend (in a new terminal)
cd frontend
# Install dependencies
npm install
# Start development server
npm startThe application will be available at:
- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
KernelBench/
├── backend/
│ ├── app/
│ │ ├── api/ # API endpoints
│ │ │ ├── hardware.py # GPU hardware management
│ │ │ ├── providers.py # Provider management
│ │ │ ├── benchmarks.py # Benchmark execution
│ │ │ └── custom_kernels.py # Custom kernel upload
│ │ ├── models/
│ │ │ ├── database.py # SQLAlchemy models
│ │ │ └── schemas.py # Pydantic schemas
│ │ ├── services/
│ │ │ ├── hardware_detector.py # AMD GPU detection
│ │ │ ├── benchmark_runner.py # Benchmark execution
│ │ │ ├── github_sync.py # GitHub integration (Phase 2)
│ │ │ └── build_service.py # Kernel compilation (Phase 2)
│ │ ├── config.py # Application configuration
│ │ └── main.py # FastAPI application
│ ├── artifacts/ # Build artifacts and custom kernels
│ ├── requirements.txt
│ └── Dockerfile
│
├── frontend/
│ ├── src/
│ │ ├── components/
│ │ │ ├── HardwareSelector.tsx # GPU selection
│ │ │ ├── ConfigurationForm.tsx # Benchmark config
│ │ │ └── ResultsView.tsx # Results display
│ │ ├── services/
│ │ │ └── api.ts # API client
│ │ ├── App.tsx # Main application
│ │ └── index.tsx
│ ├── package.json
│ └── Dockerfile
│
├── config/
│ ├── providers.yaml # Provider configuration
│ ├── hardware.yaml # GPU specifications
│ └── presets.yaml # Model presets
│
├── docker-compose.yml
└── README.md
Create a .env file in the backend directory:
# Database (optional, defaults to SQLite)
DATABASE_URL=postgresql://user:password@localhost:5432/benchmark_db
# GitHub (for Phase 2)
GITHUB_TOKEN=your_github_token
# Celery (for Phase 2)
CELERY_BROKER_URL=redis://localhost:6379/0
CELERY_RESULT_BACKEND=redis://localhost:6379/0
# ROCm paths (auto-detected if standard installation)
ROCM_SMI_PATH=/opt/rocm/bin/rocm-smi
HIPCC_PATH=/opt/rocm/bin/hipcc
# Benchmarking
DEFAULT_WARMUP_RUNS=3
DEFAULT_TIMING_RUNS=10
MAX_GPU_TEMPERATURE=85.0- Open http://localhost:3000
- Select your target GPU from the dropdown
- Configure benchmark parameters
- Select providers to compare
- Click "Run Benchmark"
- View results and charts
curl -X POST http://localhost:8000/api/hardware/refreshcurl -X POST http://localhost:8000/api/providers/create \
-H "Content-Type: application/json" \
-d '{
"name": "MIOpen",
"repo_url": "https://github.com/ROCm/MIOpen",
"build_system": "cmake",
"active": true
}'curl -X POST http://localhost:8000/api/benchmarks/run \
-H "Content-Type: application/json" \
-d '{
"config_id": 1,
"provider_version_ids": [1, 2],
"hardware_profile_id": 1,
"warmup_runs": 3,
"timing_runs": 10
}'Interactive API documentation is available at:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- Benchmarks return mock data for testing the full pipeline
- Real kernel implementations will be added in Phase 2
- Custom kernel compilation is not yet implemented
- GitHub sync service is not yet active
- Hardware detection works with real AMD GPUs via rocm-smi
- GPU selection and monitoring are fully functional
# Check ROCm installation
rocm-smi --version
# List GPUs
rocm-smi --showid
# Check if application can access GPU
docker-compose exec backend rocm-smi# Reset database
docker-compose down -v
docker-compose up -d
# Or for local setup
rm backend/benchmark.db
python -c "from app.models.database import init_db; init_db()"- ✅ Backend API with FastAPI
- ✅ Database models and schemas
- ✅ Hardware detection (real AMD GPU scanning)
- ✅ Mock benchmark runner with GPU selection
- ✅ React + TypeScript frontend
- ✅ Basic visualization
- Kernel compilation service
- Syntax validation and smoke testing
- Integration with benchmark runner
- Automated provider syncing
- Version tracking and build service
- Change detection
- Real kernel dispatchers for all providers
- Baseline comparison and regression detection
- Multi-GPU parallel benchmarking
- Roofline plots and heatmaps
- Version timeline and diff view
- Performance trends and alerts
MIT License (or your choice)