Note: Refactored and documented with GitHub Copilot assistance. Based on ATLAS Collaboration code for ML-driven track overlay routing.
Train a neural network to intelligently route ATLAS simulation events:
- MC-overlay: Full simulation (accurate but slow)
- Track-overlay: Fast simulation (approximation)
- Goal: Use Track-overlay when it matches MC-overlay (MatchProb > 0.5), otherwise use MC-overlay
The framework requires ATLAS simulation data in specific formats:
/eos/user/f/fatsai/TrackOverlayDATA/matched_JZ7W_data.h5
/eos/user/f/fatsai/TrackOverlayDATA/unmatched_JZ7W_data.h5
/eos/user/f/fatsai/TrackOverlayDATA/MCOverlay_JZ7W/*.csv
/eos/user/f/fatsai/TrackOverlayDATA/TrackOverlay_JZ7W/*.csv
Access: These datasets are stored on CERN EOS and require ATLAS collaboration access rights.
To request access:
- Contact: [email protected]
Expected directory structure:
data/
├── MC-overlay_JZ7W/
│ ├── file1.csv
│ ├── file2.csv
│ └── ...
└── Track-overlay_JZ7W/
├── file1.csv
├── file2.csv
└── ...
For different samples, use the sample name in the directory:
data/
├── MC-overlay_ttbar/
│ └── *.csv
├── Track-overlay_ttbar/
│ └── *.csv
├── MC-overlay_JZ7W/
│ └── *.csv
└── Track-overlay_JZ7W/
└── *.csv
Setting up your data:
# Create directories for your sample
mkdir -p data/MC-overlay_ttbar
mkdir -p data/Track-overlay_ttbar
and copy or link your CSV files accordingly.The easiest way to run this framework is using the pre-built Singularity container, which includes all dependencies:
# Pull the container (only needed once)
singularity pull docker://fyingtsai/dsnnr_4gpu:v5
# or on Perlmutter
podman-hpc pull docker://fyingtsai/dsnnr_4gpu:v5If you cannot use Singularity, install dependencies locally:
Option A: Using uv
# Install uv if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
#uv pip install -e .
uv run python scripts/prepare_data.py --sample JZ7W --path dataOption B: Using Conda
# Create environment from file
conda env create -f environment.yml --prefix /path/to/your/scratch/trackoverlay-ml
# Activate environment
conda activate /path/to/your/scratch/trackoverlay-mlOption C: Using pip
pip install "tensorflow>=2.8.0" "numpy>=1.21.0" "pandas>=1.3.0" "scikit-learn>=1.0.0" "matplotlib>=3.5.0,<3.9.0" "seaborn>=0.11.0" "tables>=3.7.0" "statsmodels>=0.13.0" "mplhep>=0.3.28,<0.4.0" "xarray>=0.20.0"Note: All examples in this README assume Singularity usage. For local installation, remove the singularity exec dsnnr_4gpu_v5.sif prefix.
Example:
# Full pipeline
singularity exec dsnnr_4gpu_v5.sif python scripts/run_pipeline.py --sample JZ7W --epochs 5
# Or run steps individually (Recommended. Run each step individually for easier debugging and better control)
singularity exec dsnnr_4gpu_v5.sif python scripts/prepare_data.py --sample JZ7W --path data
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample JZ7W --path data --epochs 5
singularity exec dsnnr_4gpu_v5.sif python scripts/evaluate_model.py --sample JZ7W# Train on balanced 10k + 10k
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --matched_size 10000 --unmatched_size 10000
# Train on realistic imbalanced ratio (1:10)
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --matched_size 5000 --unmatched_size 50000
# Use all matched, but limit unmatched
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --unmatched_size 20000
# Full pipeline with balanced training
singularity exec dsnnr_4gpu_v5.sif python scripts/run_pipeline.py --stage all --sample ttbar --path data --matched_size 5000 --unmatched_size 5000 --epochs 20
# Full pipeline with cross-sample evaluation
singularity exec dsnnr_4gpu_v5.sif python scripts/run_pipeline.py --stage all --sample ttbar --eval_sample JZ7W
# Just train on subset
singularity exec dsnnr_4gpu_v5.sif python scripts/run_pipeline.py --stage train --sample ttbar --matched_size 10000 --unmatched_size 10000TrackOverlayML/
├── data/ # Data directory (--path to customize)
│ ├── MC-overlay_{sample}/ # MC workflow CSVs (required, if not yet have a h5 dataframe)
│ ├── Track-overlay_{sample}/ # Track workflow CSVs (required, if not yet have a h5 dataframe)
│ ├── matched_{sample}_data.h5 # Good matches (pre-created)
│ └── unmatched_{sample}_data.h5 # Poor matches (pre-created)
├── scripts/ # Main entry points
│ ├── prepare_data.py # Merge MC/Track, compute features
│ ├── train_model.py # Train classifier
│ ├── evaluate_model.py # Evaluate performance
│ └── run_pipeline.py # Run all steps
├── network/classifier.py # Model architecture
├── utils/ # Evaluation & plotting
└── results/ # Outputs (models, plots, logs)
# Step 1: Prepare data (merge MC/Track workflows)
singularity exec dsnnr_4gpu_v5.sif python scripts/prepare_data.py --sample ttbar --trainsplit 0.8
# Step 2: Train model
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --epochs 200
# Step 3: Evaluate (same sample)
singularity exec dsnnr_4gpu_v5.sif python scripts/evaluate_model.py --sample ttbar
# Step 3b: Evaluate on different sample
singularity exec dsnnr_4gpu_v5.sif python scripts/evaluate_model.py --sample ttbar --eval_sample JZ7WTrain multiple models on same data:
singularity exec dsnnr_4gpu_v5.sif python scripts/prepare_data.py --sample ttbar
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --layers 32 16 8
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar --layers 64 32 16Cross-sample evaluation:
# Train on ttbar, test on JZ7W
singularity exec dsnnr_4gpu_v5.sif python scripts/train_model.py --sample ttbar
singularity exec dsnnr_4gpu_v5.sif python scripts/prepare_data.py --sample JZ7W
singularity exec dsnnr_4gpu_v5.sif python scripts/evaluate_model.py --sample ttbar --eval_sample JZ7WQuick evaluation on subset:
singularity exec dsnnr_4gpu_v5.sif python scripts/evaluate_model.py --sample ttbar --matched_size 5000 --unmatched_size 50000| Argument | Default | Description |
|---|---|---|
--path |
data |
Data directory path |
--sample |
JZ7W |
Sample name (ttbar, JZ7W, etc.) |
--eval_sample |
None | Different sample for evaluation |
--trainsplit |
0.8 | Train/test split ratio |
--epochs |
100 | Training epochs |
--batchsize |
80 | Batch size |
--lr |
0.001 | Learning rate |
--layers |
45 35 30 | Hidden layer sizes |
--patience |
20 | Early stopping patience |
--rouletter |
smart |
Roulette type (smart/hard) |
--matched_size |
None | Limit matched samples for training/eval |
--unmatched_size |
None | Limit unmatched samples for training/eval |
Run python scripts/run_pipeline.py --help for full list.
MC-overlay_{sample}/ Track-overlay_{sample}/
└── *.csv └── *.csv
↓ ↓
└──── Merge on EventNumber ─┘
↓
Create labels (MatchProb > 0.5)
↓
matched_*.h5 (good) & unmatched_*.h5 (poor)
↓
Train/Test split
↓
Train classifier
↓
Evaluate performance
results/{sample}/
├── classifier/
│ ├── classifier.h5 # Trained model
│ └── history.pkl # Training history
├── logs/ # Logs for each step
└── {xscore}/{rouletter}/
└── plots/ # ROC, efficiency, fraction plots
- Matched (TargetLabel=1): MatchProb > 0.5 (Track-overlay accurate)
- Unmatched (TargetLabel=0): MatchProb ≤ 0.5 (needs MC-overlay)
- Preprocessed HDF5 files are cached for faster reruns
When making changes:
- Keep function docstrings updated
- Add inline comments for complex physics calculations
- Update this README if workflow changes