Improve time and memory tracking for Babel runs by gaurav · Pull Request #679 · NCATSTranslator/Babel

gaurav · 2026-02-27T23:27:42Z

Add benchmark: directives, localrules, and download resource limits

Add benchmark: to every non-trivial rule across all 19 snakefiles, writing per-rule wall time / max_rss TSVs to babel_outputs/benchmarks/. Wildcard rules include the wildcard in the filename; loop-generated rules in reports.snakefile use an f-string path.
Declare localrules for trivial done-marker rules (all, all_outputs, clean_compendia, clean_downloads, export_all_to_kgx, export_all_to_sapbert_training, export_all_{compendia,synonyms,}_to_duckdb, all_reports) so they run on the head node without a SLURM allocation.
Add resources: mem="8G", cpus_per_task=1 to pure download rules in datacollect.snakefile and publications.snakefile; add runtime="6h" to large downloads (UniProtKB idmapping/trembl, UMLS).
Add slurm/README.md documenting the profile, benchmark TSV fields, efficiency CSV, known resource hotspots, and future improvements.

- Add benchmark: to every non-trivial rule across all 19 snakefiles, writing per-rule wall time / max_rss TSVs to babel_outputs/benchmarks/. Wildcard rules include the wildcard in the filename; loop-generated rules in reports.snakefile use an f-string path. - Declare localrules for trivial done-marker rules (all, all_outputs, clean_compendia, clean_downloads, export_all_to_kgx, export_all_to_sapbert_training, export_all_{compendia,synonyms,}_to_duckdb, all_reports) so they run on the head node without a SLURM allocation. - Add resources: mem="8G", cpus_per_task=1 to pure download rules in datacollect.snakefile and publications.snakefile; add runtime="6h" to large downloads (UniProtKB idmapping/trembl, UMLS). - Add slurm/README.md documenting the profile, benchmark TSV fields, efficiency CSV, known resource hotspots, and future improvements. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

Adds standardized benchmarking and SLURM-oriented execution tweaks across the Babel Snakemake workflow to improve per-rule time/memory tracking and reduce unnecessary cluster allocations.

Changes:

Add benchmark: outputs for most non-trivial rules across the workflow, writing TSVs under babel_outputs/benchmarks/.
Mark trivial aggregation/done-marker rules as localrules so they run on the head node (no SLURM allocation).
Add explicit resources overrides for download-heavy rules and document the SLURM profile/benchmark outputs.

Reviewed changes

Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`uv.lock`	Bumps `rumdl` lock entry to `0.1.26`.
`pyproject.toml`	Updates dev dependency constraint to `rumdl>=0.1.26`.
`Snakefile`	Declares top-level `localrules` and adds benchmarking for `uncompress_synonym_file`.
`src/snakefiles/anatomy.snakefile`	Adds per-rule benchmark TSV outputs across anatomy rules.
`src/snakefiles/cell_line.snakefile`	Adds per-rule benchmark TSV outputs across cell line rules.
`src/snakefiles/chemical.snakefile`	Adds per-rule benchmark TSV outputs across chemical rules (incl. multi-step compendia).
`src/snakefiles/datacollect.snakefile`	Adds benchmark TSVs and sets resource overrides for download/I/O-heavy rules.
`src/snakefiles/diseasephenotype.snakefile`	Adds per-rule benchmark TSV outputs across disease/phenotype rules.
`src/snakefiles/drugchemical.snakefile`	Adds per-rule benchmark TSV outputs across drug/chemical conflation rules.
`src/snakefiles/duckdb.snakefile`	Marks aggregation exports as localrules; adds benchmarks for exports and DuckDB reports.
`src/snakefiles/exports.snakefile`	Marks export aggregators as localrules; adds benchmarks for per-file exports.
`src/snakefiles/gene.snakefile`	Adds per-rule benchmark TSV outputs across gene rules.
`src/snakefiles/genefamily.snakefile`	Adds per-rule benchmark TSV outputs across gene family rules.
`src/snakefiles/geneprotein.snakefile`	Adds per-rule benchmark TSV outputs across gene/protein conflation rules.
`src/snakefiles/leftover_umls.snakefile`	Adds benchmark TSV outputs for leftover UMLS and compression steps.
`src/snakefiles/macromolecular_complex.snakefile`	Adds per-rule benchmark TSV outputs across macromolecular complex rules.
`src/snakefiles/process.snakefile`	Adds per-rule benchmark TSV outputs across process/pathway rules.
`src/snakefiles/protein.snakefile`	Adds per-rule benchmark TSV outputs across protein rules.
`src/snakefiles/publications.snakefile`	Adds resource overrides for PubMed download and benchmarks across publication rules.
`src/snakefiles/reports.snakefile`	Marks `all_reports` as local; adds benchmarks including loop-generated per-compendium report rules.
`src/snakefiles/taxon.snakefile`	Adds per-rule benchmark TSV outputs across taxon rules.
`slurm/README.md`	Documents SLURM profile usage, benchmark TSV fields, and operational notes/hotspots.
`docs/RunningBabel.md`	Minor whitespace update at end of doc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-05T22:29:29Z

slurm/README.md

+The following improvements are tracked here for visibility but not yet implemented:
+
+- **`uv run snakemake` vs `conda activate babel`**: The SLURM job scripts (`slurm/job`,
+  `run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths.


The README references run_babel_on_slurm.sh, but the repo script is named slurm/run-babel-on-slurm.sh (hyphens). Update the filename/path here so readers can find the script without guesswork.

Suggested change

`run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths.

`slurm/run-babel-on-slurm.sh`) still reference the old conda environment and hardcoded paths.

github-project-automation bot added this to Babel sprints Feb 27, 2026

github-project-automation bot moved this to Backlog in Babel sprints Feb 27, 2026

Updated rumdl and fixed minor Markdown issue.

998f43a

gaurav moved this from Backlog to In progress in Babel sprints Mar 5, 2026

gaurav requested a review from Copilot March 5, 2026 22:23

Copilot started reviewing on behalf of gaurav March 5, 2026 22:23 View session

Copilot AI reviewed Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve time and memory tracking for Babel runs#679

Improve time and memory tracking for Babel runs#679
gaurav wants to merge 2 commits intomasterfrom
improve-memory-tracking

gaurav commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	`run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths.
	`slurm/run-babel-on-slurm.sh`) still reference the old conda environment and hardcoded paths.

Conversation

gaurav commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants