Skip to content

Improve time and memory tracking for Babel runs#679

Open
gaurav wants to merge 2 commits intomasterfrom
improve-memory-tracking
Open

Improve time and memory tracking for Babel runs#679
gaurav wants to merge 2 commits intomasterfrom
improve-memory-tracking

Conversation

@gaurav
Copy link
Collaborator

@gaurav gaurav commented Feb 27, 2026

Add benchmark: directives, localrules, and download resource limits

  • Add benchmark: to every non-trivial rule across all 19 snakefiles, writing per-rule wall time / max_rss TSVs to babel_outputs/benchmarks/. Wildcard rules include the wildcard in the filename; loop-generated rules in reports.snakefile use an f-string path.
  • Declare localrules for trivial done-marker rules (all, all_outputs, clean_compendia, clean_downloads, export_all_to_kgx, export_all_to_sapbert_training, export_all_{compendia,synonyms,}_to_duckdb, all_reports) so they run on the head node without a SLURM allocation.
  • Add resources: mem="8G", cpus_per_task=1 to pure download rules in datacollect.snakefile and publications.snakefile; add runtime="6h" to large downloads (UniProtKB idmapping/trembl, UMLS).
  • Add slurm/README.md documenting the profile, benchmark TSV fields, efficiency CSV, known resource hotspots, and future improvements.

- Add benchmark: to every non-trivial rule across all 19 snakefiles,
  writing per-rule wall time / max_rss TSVs to babel_outputs/benchmarks/.
  Wildcard rules include the wildcard in the filename; loop-generated
  rules in reports.snakefile use an f-string path.
- Declare localrules for trivial done-marker rules (all, all_outputs,
  clean_compendia, clean_downloads, export_all_to_kgx,
  export_all_to_sapbert_training, export_all_{compendia,synonyms,}_to_duckdb,
  all_reports) so they run on the head node without a SLURM allocation.
- Add resources: mem="8G", cpus_per_task=1 to pure download rules in
  datacollect.snakefile and publications.snakefile; add runtime="6h" to
  large downloads (UniProtKB idmapping/trembl, UMLS).
- Add slurm/README.md documenting the profile, benchmark TSV fields,
  efficiency CSV, known resource hotspots, and future improvements.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@gaurav gaurav moved this from Backlog to In progress in Babel sprints Mar 5, 2026
@gaurav gaurav requested a review from Copilot March 5, 2026 22:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds standardized benchmarking and SLURM-oriented execution tweaks across the Babel Snakemake workflow to improve per-rule time/memory tracking and reduce unnecessary cluster allocations.

Changes:

  • Add benchmark: outputs for most non-trivial rules across the workflow, writing TSVs under babel_outputs/benchmarks/.
  • Mark trivial aggregation/done-marker rules as localrules so they run on the head node (no SLURM allocation).
  • Add explicit resources overrides for download-heavy rules and document the SLURM profile/benchmark outputs.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
uv.lock Bumps rumdl lock entry to 0.1.26.
pyproject.toml Updates dev dependency constraint to rumdl>=0.1.26.
Snakefile Declares top-level localrules and adds benchmarking for uncompress_synonym_file.
src/snakefiles/anatomy.snakefile Adds per-rule benchmark TSV outputs across anatomy rules.
src/snakefiles/cell_line.snakefile Adds per-rule benchmark TSV outputs across cell line rules.
src/snakefiles/chemical.snakefile Adds per-rule benchmark TSV outputs across chemical rules (incl. multi-step compendia).
src/snakefiles/datacollect.snakefile Adds benchmark TSVs and sets resource overrides for download/I/O-heavy rules.
src/snakefiles/diseasephenotype.snakefile Adds per-rule benchmark TSV outputs across disease/phenotype rules.
src/snakefiles/drugchemical.snakefile Adds per-rule benchmark TSV outputs across drug/chemical conflation rules.
src/snakefiles/duckdb.snakefile Marks aggregation exports as localrules; adds benchmarks for exports and DuckDB reports.
src/snakefiles/exports.snakefile Marks export aggregators as localrules; adds benchmarks for per-file exports.
src/snakefiles/gene.snakefile Adds per-rule benchmark TSV outputs across gene rules.
src/snakefiles/genefamily.snakefile Adds per-rule benchmark TSV outputs across gene family rules.
src/snakefiles/geneprotein.snakefile Adds per-rule benchmark TSV outputs across gene/protein conflation rules.
src/snakefiles/leftover_umls.snakefile Adds benchmark TSV outputs for leftover UMLS and compression steps.
src/snakefiles/macromolecular_complex.snakefile Adds per-rule benchmark TSV outputs across macromolecular complex rules.
src/snakefiles/process.snakefile Adds per-rule benchmark TSV outputs across process/pathway rules.
src/snakefiles/protein.snakefile Adds per-rule benchmark TSV outputs across protein rules.
src/snakefiles/publications.snakefile Adds resource overrides for PubMed download and benchmarks across publication rules.
src/snakefiles/reports.snakefile Marks all_reports as local; adds benchmarks including loop-generated per-compendium report rules.
src/snakefiles/taxon.snakefile Adds per-rule benchmark TSV outputs across taxon rules.
slurm/README.md Documents SLURM profile usage, benchmark TSV fields, and operational notes/hotspots.
docs/RunningBabel.md Minor whitespace update at end of doc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The following improvements are tracked here for visibility but not yet implemented:

- **`uv run snakemake` vs `conda activate babel`**: The SLURM job scripts (`slurm/job`,
`run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths.
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README references run_babel_on_slurm.sh, but the repo script is named slurm/run-babel-on-slurm.sh (hyphens). Update the filename/path here so readers can find the script without guesswork.

Suggested change
`run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths.
`slurm/run-babel-on-slurm.sh`) still reference the old conda environment and hardcoded paths.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

2 participants