diff --git a/README.md b/README.md index 3f3c9a61..7490a434 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,11 @@ These results underscore HRM’s potential as a transformative advancement towar ## Quick Start Guide 🚀 -### Prerequisites ⚙️ +### Windows Users 🪟 + +**For Windows users, we recommend using WSL2 for the best compatibility.** See our comprehensive [Windows WSL2 Setup Guide](SETUP_WINDOWS_WSL2.md) for detailed instructions. + +### Prerequisites ⚙️ (Linux/WSL2) Ensure PyTorch and CUDA are installed. The repo needs CUDA extensions to be built. If not present, run the following commands: diff --git a/SETUP_WINDOWS_WSL2.md b/SETUP_WINDOWS_WSL2.md new file mode 100644 index 00000000..df7b0b50 --- /dev/null +++ b/SETUP_WINDOWS_WSL2.md @@ -0,0 +1,269 @@ +# Windows WSL2 Setup Guide for HRM + +This guide provides step-by-step instructions for setting up the Hierarchical Reasoning Model (HRM) on Windows using WSL2 (Windows Subsystem for Linux 2). + +## Why WSL2? + +The HRM project requires CUDA extensions (FlashAttention, adam-atan2) that are challenging to compile on native Windows due to: +- Complex Visual Studio/CUDA toolchain compatibility issues +- Missing Linux-specific compilation tools +- Incompatible build environments + +WSL2 provides a full Linux environment with proper CUDA support, making it the recommended approach for Windows users. + +## Prerequisites + +### System Requirements +- Windows 10 version 2004+ or Windows 11 +- NVIDIA GPU with CUDA support (tested with RTX 3050 Ti) +- At least 16GB RAM (recommended) +- 10GB+ free disk space + +### Enable WSL2 +1. Open PowerShell as Administrator +2. Enable WSL2: +```powershell +wsl --install Ubuntu-22.04 +``` +3. Restart your computer when prompted +4. Set Ubuntu as default WSL distribution: +```powershell +wsl --set-default Ubuntu-22.04 +``` + +### Install NVIDIA CUDA Drivers +1. Install the latest NVIDIA drivers for Windows from [NVIDIA's website](https://www.nvidia.com/drivers/) +2. **Do not install CUDA toolkit in Windows** - we'll install it in WSL2 + +## WSL2 Environment Setup + +### 1. Update Ubuntu and Install Dependencies +```bash +# Update system packages +sudo apt update && sudo apt upgrade -y + +# Install essential build tools +sudo apt install -y build-essential curl wget git python3-full python3-dev python3-pip + +# Install NVIDIA CUDA Toolkit via apt (Ubuntu package manager) +sudo apt install -y nvidia-cuda-toolkit + +# Verify CUDA installation +nvcc --version +nvidia-smi # Should show your GPU +``` + +### 2. Verify GPU Access +```bash +nvidia-smi +``` +You should see output showing your NVIDIA GPU. If not, restart WSL2: +```powershell +# In Windows PowerShell +wsl --shutdown +wsl +``` + +### 3. Clone and Setup HRM Project +```bash +# Clone the repository +git clone https://github.com/sapientinc/HRM.git +cd HRM + +# Initialize git submodules for datasets +git submodule update --init --recursive +``` + +### 4. Install Python Dependencies +```bash +# Install PyTorch with CUDA support (Linux version) +python3 -m pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 --break-system-packages + +# Install project requirements +python3 -m pip install -r requirements.txt --break-system-packages + +# Install FlashAttention (this will compile from source) +python3 -m pip install flash-attn --no-build-isolation --break-system-packages +``` + +**Note**: The `--break-system-packages` flag is needed due to Ubuntu 22.04's externally managed Python environment policy. + +### 5. Verify Installation +Create a test script to verify everything works: +```bash +cat > test_setup.py << 'EOF' +import torch +import flash_attn +print(f'✅ PyTorch version: {torch.__version__}') +print(f'✅ CUDA available: {torch.cuda.is_available()}') +if torch.cuda.is_available(): + print(f'✅ GPU: {torch.cuda.get_device_name()}') +print(f'✅ FlashAttention version: {flash_attn.__version__}') + +# Test FlashAttention +if torch.cuda.is_available(): + from flash_attn import flash_attn_func + q = torch.randn(1, 8, 512, 64, dtype=torch.float16, device='cuda') + k = torch.randn(1, 8, 512, 64, dtype=torch.float16, device='cuda') + v = torch.randn(1, 8, 512, 64, dtype=torch.float16, device='cuda') + out = flash_attn_func(q, k, v) + print('✅ FlashAttention CUDA test passed!') +EOF + +python3 test_setup.py +``` + +Expected output: +``` +✅ PyTorch version: 2.5.1+cu121 +✅ CUDA available: True +✅ GPU: NVIDIA GeForce RTX 3050 Ti Laptop GPU +✅ FlashAttention version: 2.8.2 +✅ FlashAttention CUDA test passed! +``` + +## Running HRM + +### Quick Test with Pre-trained Model +Test the setup using a pre-trained Sudoku model: + +```bash +# Create Sudoku dataset (required for evaluation) +python3 dataset/build_sudoku_dataset.py --output-dir data/sudoku-extreme-1k-aug-1000 --subsample-size 1000 --num-aug 1000 + +# Download and test pre-trained model +python3 -c " +from huggingface_hub import hf_hub_download +import os +os.makedirs('checkpoints', exist_ok=True) +ckpt = hf_hub_download('sapientinc/HRM-checkpoint-sudoku-extreme', 'checkpoint', local_dir='checkpoints/sudoku') +config = hf_hub_download('sapientinc/HRM-checkpoint-sudoku-extreme', 'all_config.yaml', local_dir='checkpoints/sudoku') +print(f'✅ Downloaded checkpoint to: {ckpt}') +" + +# Evaluate the model (this may take a few minutes) +python3 evaluate.py checkpoint=checkpoints/sudoku/checkpoint +``` + +### Training from Scratch +#### From Windows PowerShell +You can run HRM commands from Windows PowerShell using the `wsl` prefix: + +```powershell +# Check training options +wsl python3 /mnt/c/Development/HRM/pretrain.py --help + +# Quick Sudoku training (10 minutes on RTX 4070) +wsl python3 /mnt/c/Development/HRM/pretrain.py data_path=data/sudoku-extreme-1k-aug-1000 epochs=20000 eval_interval=2000 global_batch_size=384 lr=7e-5 puzzle_emb_lr=7e-5 weight_decay=1.0 puzzle_emb_weight_decay=1.0 +``` + +#### From WSL2 Terminal +Or work directly in the WSL2 environment: + +```bash +# Enter WSL2 +wsl + +# Navigate to project +cd /mnt/c/Development/HRM + +# Run training +python3 pretrain.py data_path=data/sudoku-extreme-1k-aug-1000 epochs=20000 eval_interval=2000 global_batch_size=384 lr=7e-5 puzzle_emb_lr=7e-5 weight_decay=1.0 puzzle_emb_weight_decay=1.0 +``` + +## Performance Notes + +- **Memory**: WSL2 can use up to 50% of your system RAM by default +- **GPU Performance**: Expect ~95% of native Linux performance +- **File I/O**: Accessing files on Windows filesystem (`/mnt/c/`) is slower than WSL2 filesystem (`~/`) +- **Compilation**: First-time FlashAttention compilation takes ~10-15 minutes + +## Troubleshooting + +### Common Issues + +#### 1. "CUDA not available" Error +```bash +# Check if nvidia-smi works +nvidia-smi + +# If not working, restart WSL2 +# In Windows PowerShell: +wsl --shutdown +wsl +``` + +#### 2. FlashAttention Compilation Fails +```bash +# Install additional build dependencies +sudo apt install -y ninja-build + +# Retry FlashAttention installation +python3 -m pip install flash-attn --no-build-isolation --force-reinstall --break-system-packages +``` + +#### 3. Out of Memory During Compilation +```bash +# Increase WSL2 memory limit in Windows +# Create/edit: %UserProfile%\.wslconfig +[wsl2] +memory=16GB +processors=8 +``` + +#### 4. Permission Errors with Virtual Environments +Use the system Python with `--break-system-packages` instead of virtual environments in WSL2. + +#### 5. "evaluate.py requires checkpoint parameter" Error +The evaluation script requires a trained model checkpoint: + +```bash +# Wrong (missing checkpoint parameter) +python3 evaluate.py + +# Correct (with checkpoint path) +python3 evaluate.py checkpoint=/path/to/your/checkpoint + +# Use pre-trained models from Hugging Face: +# - sapientinc/HRM-checkpoint-sudoku-extreme +# - sapientinc/HRM-checkpoint-ARC-2 +# - sapientinc/HRM-checkpoint-maze-30x30-hard +``` + +### Performance Optimization + +1. **Move project to WSL2 filesystem for better I/O**: +```bash +# Copy project to WSL2 home directory +cp -r /mnt/c/Development/HRM ~/HRM-wsl2 +cd ~/HRM-wsl2 +``` + +2. **Set WSL2 resource limits** in `%UserProfile%\.wslconfig`: +```ini +[wsl2] +memory=16GB +processors=8 +swap=4GB +``` + +## Success Metrics + +After successful setup, you should be able to: +- [x] Access NVIDIA GPU from WSL2 (`nvidia-smi` shows GPU) +- [x] Import PyTorch with CUDA support +- [x] Import and use FlashAttention +- [x] Run HRM training scripts without errors +- [x] See CUDA operations executing on GPU +- [x] Download and evaluate pre-trained models +- [x] Create datasets and start training + +## Additional Resources + +- [WSL2 CUDA Documentation](https://docs.nvidia.com/cuda/wsl-user-guide/index.html) +- [FlashAttention Installation Guide](https://github.com/Dao-AILab/flash-attention) +- [HRM Paper](https://arxiv.org/abs/2506.21734) + +--- + +*This setup was tested and verified on Windows 11 with WSL2 Ubuntu 22.04 and NVIDIA RTX 3050 Ti.*