Improve time and memory tracking for Babel runs#679
Open
Conversation
- Add benchmark: to every non-trivial rule across all 19 snakefiles,
writing per-rule wall time / max_rss TSVs to babel_outputs/benchmarks/.
Wildcard rules include the wildcard in the filename; loop-generated
rules in reports.snakefile use an f-string path.
- Declare localrules for trivial done-marker rules (all, all_outputs,
clean_compendia, clean_downloads, export_all_to_kgx,
export_all_to_sapbert_training, export_all_{compendia,synonyms,}_to_duckdb,
all_reports) so they run on the head node without a SLURM allocation.
- Add resources: mem="8G", cpus_per_task=1 to pure download rules in
datacollect.snakefile and publications.snakefile; add runtime="6h" to
large downloads (UniProtKB idmapping/trembl, UMLS).
- Add slurm/README.md documenting the profile, benchmark TSV fields,
efficiency CSV, known resource hotspots, and future improvements.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds standardized benchmarking and SLURM-oriented execution tweaks across the Babel Snakemake workflow to improve per-rule time/memory tracking and reduce unnecessary cluster allocations.
Changes:
- Add
benchmark:outputs for most non-trivial rules across the workflow, writing TSVs underbabel_outputs/benchmarks/. - Mark trivial aggregation/done-marker rules as
localrulesso they run on the head node (no SLURM allocation). - Add explicit
resourcesoverrides for download-heavy rules and document the SLURM profile/benchmark outputs.
Reviewed changes
Copilot reviewed 22 out of 23 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
uv.lock |
Bumps rumdl lock entry to 0.1.26. |
pyproject.toml |
Updates dev dependency constraint to rumdl>=0.1.26. |
Snakefile |
Declares top-level localrules and adds benchmarking for uncompress_synonym_file. |
src/snakefiles/anatomy.snakefile |
Adds per-rule benchmark TSV outputs across anatomy rules. |
src/snakefiles/cell_line.snakefile |
Adds per-rule benchmark TSV outputs across cell line rules. |
src/snakefiles/chemical.snakefile |
Adds per-rule benchmark TSV outputs across chemical rules (incl. multi-step compendia). |
src/snakefiles/datacollect.snakefile |
Adds benchmark TSVs and sets resource overrides for download/I/O-heavy rules. |
src/snakefiles/diseasephenotype.snakefile |
Adds per-rule benchmark TSV outputs across disease/phenotype rules. |
src/snakefiles/drugchemical.snakefile |
Adds per-rule benchmark TSV outputs across drug/chemical conflation rules. |
src/snakefiles/duckdb.snakefile |
Marks aggregation exports as localrules; adds benchmarks for exports and DuckDB reports. |
src/snakefiles/exports.snakefile |
Marks export aggregators as localrules; adds benchmarks for per-file exports. |
src/snakefiles/gene.snakefile |
Adds per-rule benchmark TSV outputs across gene rules. |
src/snakefiles/genefamily.snakefile |
Adds per-rule benchmark TSV outputs across gene family rules. |
src/snakefiles/geneprotein.snakefile |
Adds per-rule benchmark TSV outputs across gene/protein conflation rules. |
src/snakefiles/leftover_umls.snakefile |
Adds benchmark TSV outputs for leftover UMLS and compression steps. |
src/snakefiles/macromolecular_complex.snakefile |
Adds per-rule benchmark TSV outputs across macromolecular complex rules. |
src/snakefiles/process.snakefile |
Adds per-rule benchmark TSV outputs across process/pathway rules. |
src/snakefiles/protein.snakefile |
Adds per-rule benchmark TSV outputs across protein rules. |
src/snakefiles/publications.snakefile |
Adds resource overrides for PubMed download and benchmarks across publication rules. |
src/snakefiles/reports.snakefile |
Marks all_reports as local; adds benchmarks including loop-generated per-compendium report rules. |
src/snakefiles/taxon.snakefile |
Adds per-rule benchmark TSV outputs across taxon rules. |
slurm/README.md |
Documents SLURM profile usage, benchmark TSV fields, and operational notes/hotspots. |
docs/RunningBabel.md |
Minor whitespace update at end of doc. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| The following improvements are tracked here for visibility but not yet implemented: | ||
|
|
||
| - **`uv run snakemake` vs `conda activate babel`**: The SLURM job scripts (`slurm/job`, | ||
| `run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths. |
There was a problem hiding this comment.
The README references run_babel_on_slurm.sh, but the repo script is named slurm/run-babel-on-slurm.sh (hyphens). Update the filename/path here so readers can find the script without guesswork.
Suggested change
| `run_babel_on_slurm.sh`) still reference the old conda environment and hardcoded paths. | |
| `slurm/run-babel-on-slurm.sh`) still reference the old conda environment and hardcoded paths. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add benchmark: directives, localrules, and download resource limits