Skip to content

seokjuchung/SPINE_Inference_EAF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Fermilab EAF — quick guide

EAF is a Fermilab GPU cluster (NVIDIA A100). If you have a Fermilab Services account, you also have EAF access.

Notes

  • Each user gets a single GPU with a memory quota. Use EAF for in-house sample production and light workloads. For heavy jobs, use Polaris or hand off to the production team.

Access from offsite

You need either a VPN or a browser proxy.

Option: SOCKS proxy for your browser (Firefox)

  1. Start a dynamic SOCKS proxy to a FNAL host:
    ssh -D 3128 USER@FNAL_MACHINE
  2. In Firefox, set the Automatic proxy configuration URL to the provided PAC file: https://www.nevis.columbia.edu/~sc5303/fnal-proxy.pac

Tip: The PAC file assumes port 3128, matching the command above.

Storage and SPINE paths

  • EAF mounts nearly the same disks as the GPVMs, except /pnfs is not mounted.
  • For SPINE software, use: /exp/sbnd/app/users/sc5303/SPINE.

Apptainer

EAF does not include Apptainer by default. The easiest route is to install it inside a Conda environment.

Quota tip: EAF has a small /home quota. Point Conda/Pip/Apptainer caches to /exp/sbnd/data or /exp/sbnd/app to avoid filling /home.

Recommended cache settings (add to your shell startup):

export APPTAINER_CACHEDIR=/exp/sbnd/data/users/USERNAME/apptainer_cache
export PIP_CACHE_DIR=/exp/sbnd/data/users/USERNAME/pip_cache
export XDG_CACHE_HOME=/exp/sbnd/data/users/USERNAME/.cache

Example .condarc:

pkgs_dirs:
  - /exp/sbnd/data/users/sc5303/apptainer_cache/pkgs
envs_dirs:
  - /exp/sbnd/data/users/sc5303/apptainer_cache/envs
channels:
  - conda-forge
solver: libmamba

OpT0Finder

Build OpT0Finder (from the repository root):

source configure.sh
make -j

Inference

Inference uses a trained model to produce predictions on unseen data. In this workflow, it creates HDF5 files from LArCV inputs.

LArCV files are created after the reco1 stage in SBND.

Script used: /exp/sbnd/app/users/sc5303/SPINE/inference/inference.sh

Key variables inside the script

  1. CFG — Configuration file path
  2. LOG_DIR — Directory for logs
  3. FNAME — Input LArCV file(s); can be a list
  4. workdir — Output directory (use /exp/sbnd/data due to size)
  5. container — Apptainer image path
  6. CUDA_VISIBLE_DEVICES — Set to 0 (EAF provides one GPU)

Ensure the output directory exists and has sufficient space.

Merging with CAFs

CAFs and flatCAFs are created from cafmakerjob_sbnd.fcl in SBND.

HDF5 files are created from the inference step above.

Merge HDF5 and CAF using sbn-ml-cafmaker (see link below).

Merged output is CAF, for flatCAF format, use

flatten_caf normal_caf_file.caf.root flat_caf_file_name.flat.caf.root

after setting up sbnana.

Setting Up

Select a tagged sbnana version available from ups list -aK+ sbnana.

For example:

setup sbnana v10_01_04 -q e26:prof`
cmake /exp/sbnd/app/users/sc5303/sbn_ml_cafmaker -DHDF5_INSTALL="/exp/icarus/app/users/mueller/hdf5/hdf5_install"
make

Sources

Where to get components:

  1. Inference bash script: https://github.com/bear-is-asleep/sbnd_spine_train/blob/master/deghost/train_uresnet.sh (modify for EAF)
  2. Configuration files (CFG): https://github.com/DeepLearnPhysics/spine-prod/tree/main/config/sbnd

Need to pull from github

  1. OpT0Finder: https://github.com/bear-is-asleep/OpT0Finder
  2. spine: https://github.com/DeepLearnPhysics/spine/tree/v0.7.6
  3. spine-prod: https://github.com/DeepLearnPhysics/spine-prod
  4. sbn-ml-cafmaker: https://github.com/justinjmueller/sbn_ml_cafmaker

About

SPINE Inference on EAF

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages