-
Notifications
You must be signed in to change notification settings - Fork 3
Quickstart
This quickstart gets you started with MrBiomics in minutes. It showcases one of our most popular capabilities: taking thousands of hard-to-interpret genomic regions and effortlessly turning them into meaningful biological insights. In this guide, we'll use MrBiomics' Enrichment Analysis module to infer cell identities straight from ATAC-seq analysis results.
We start with genomic region .bed files resulting from differential analysis of healthy hematopoietic ATAC-seq data. Specifically, we focus on four region sets that showed significantly increased accessibility (they are "more open") in four specific cell types compared to the rest of the hematopoietic lineage (one-vs-all comparison):
- B cells:
docs/quickstart/data/Bcell_up_features.bed - CD8 T cells:
docs/quickstart/data/CD8Tcell_up_features.bed - Erythroid cells:
docs/quickstart/data/Ery_up_features.bed - Monocytes:
docs/quickstart/data/Mono_up_features.bed - And a shared background of all consensus regions to compute enrichment against:
docs/quickstart/data/ALL_features.bed
Important
The Question: Starting only from these unannotated coordinate regions in a .bed file, can our enrichment analysis accurately recover the correct hematopoietic cell identities?
Note
These files are selected results from the full ATAC-seq Analysis Recipe.
Important
Operating System: This quickstart is currently designed and tested to work out-of-the-box on Linux systems, covering the most common use case.
To set everything up from scratch (assuming conda is installed) and run the quickstart, just paste these commands into your terminal:
# install Snakemake
conda create -y -n snakemake -c conda-forge -c bioconda snakemake=8.25.3
# clone MrBiomics
git clone https://github.com/epigen/MrBiomics.git
# change to MrBiomics directory
cd MrBiomics
# activate snakemake environment
conda activate snakemake
# run the quickstart workflow
snakemake --software-deployment-method conda --cores 1Note
Grab a coffee! ☕ The very first run may take around 10 minutes because Snakemake automatically downloads and creates the required software environments for you.
Once the workflow finishes, you can find the summary results here: results/quickstart/enrichment_analysis/hematopoietic/GREAT/Azimuth_2023/
The most insightful result file to check first is: hematopoietic_Azimuth_2023_summary_specificTerms.png
This plot summarizes the enrichment signal across all four region sets. Let's see if we recovered the expected biology!
Summary plot showing how the genomic region sets are enriched in specific cell type annotations.
🎉 Success! The B-cell, CD8 T-cell, erythroid, and monocyte region sets strongly enrich for their matching cell type annotations. We have successfully shown that these differentially accessible region signatures encode the underlying cell identities, all fully automated and reproducible.
For those curious, the quickstart runs a small custom MrBiomics Project, which is a Snakemake workflow, demonstrating how you can easily use modules within your own analyses. Let's systematically go through the required files and how they connect:
MrBiomics/
├── config/
│ ├── config.yaml
│ └── quickstart/
│ ├── quickstart_enrichment_analysis_annotation.csv
│ └── quickstart_enrichment_analysis_config.yaml
├── docs/
│ └── quickstart/
│ └── data/
│ ├── ALL_features.bed
│ ├── Bcell_up_features.bed
│ └── ... (other .bed input files)
├── results/
│ └── quickstart/
│ └── enrichment_analysis/.../
│ ├── hematopoietic_Azimuth_2023_summary_specificTerms.png
│ └── ... (other result plots and tables)
└── workflow/
├── Snakefile
└── rules/
└── quickstart.smk
-
Project Workflow (
workflow/Snakefile): The project's main Snakefile, which orchestrates the full execution from start to finish, loads the configuration (2.) and includes the analysis-specific Snakefile (3.). -
Project Configuration (
config/config.yaml): The overarching global project configuration. It links thequickstartanalysis directly to its module-specific configuration files. -
Analysis-specific Workflow (
workflow/rules/quickstart.smk): The external Enrichment Analysis workflow is loaded as a module directly from GitHub into the quickstart's analysis-specific snakefile using the analysis-specific configurations and annotations (4.) -
Module-Specific Configuration & Annotation:
- The Configuration (
config/quickstart/quickstart_enrichment_analysis_config.yaml) specifies the specific analysis settings (like running GREAT on theAzimuth 2023 database). - The Annotation (
config/quickstart/quickstart_enrichment_analysis_annotation.csv) provides the necessary input files (.bedpaths and metadata).
Together, they describe the analysis using the Enrichment Analysis module.
- The Configuration (
graph TD
classDef file fill:#f9f9f9,stroke:#333,stroke-width:1px,color:#333;
classDef input fill:#e1f5fe,stroke:#0288d1,stroke-width:1px,color:#333;
classDef output fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px,color:#333;
classDef external fill:#fff3e0,stroke:#e65100,stroke-width:1px,color:#333;
%% Data Nodes
InputData["<b>Input Data</b><br/><code>docs/quickstart/data/*.bed</code>"]:::input
ResultPlot["<b>Final Results</b><br/><code>results/.../hematopoietic_Azimuth_2023_summary_specificTerms.png</code>"]:::output
%% Config & Workflow Nodes
Snakefile["<b>1. Project Workflow</b><br/><code>workflow/Snakefile</code>"]:::file
ProjectConfig["<b>2. Project Configuration</b><br/><code>config/config.yaml</code>"]:::file
QuickstartSMK["<b>3. Analysis Workflow</b><br/><code>workflow/rules/quickstart.smk</code>"]:::file
%% External Module Node
GitHubModule["<b>Enrichment Analysis</b><br/><code>(GitHub)</code>"]:::external
ModuleConfig["<b>4a. Module Configuration</b><br/><code>.../quickstart_enrichment_analysis_config.yaml</code>"]:::file
ModuleAnno["<b>4b. Module Annotation</b><br/><code>.../quickstart_enrichment_analysis_annotation.csv</code>"]:::file
%% Logic Connections
InputData -.->|Paths defined in| ModuleAnno
Snakefile -->|Loads global configuration| ProjectConfig
ProjectConfig -.->|Points module to| ModuleConfig
ProjectConfig -.->|Points module to| ModuleAnno
Snakefile -->|Includes analysis rules| QuickstartSMK
%% Module inclusion
QuickstartSMK -->|Loads module from GitHub| GitHubModule
%% Module logic
ModuleConfig -->|Configures analysis| GitHubModule
ModuleAnno -->|Provides metadata to| GitHubModule
GitHubModule ===>|Executes and generates| ResultPlot
Now that you've witnessed the power and simplicity of MrBiomics, explore the rest of the wiki to apply it to your own research:
- Want to use modules on your own data? Head over to Installation, Configuration and Execution to learn how to use modules as standalone workflows on your own data.
- Want to build up larger analyses? Read about Module Usage in Projects to see how we recommend loading modules into your own Snakemake workflows (like done in this quickstart).
- Curious about our full end-to-end best practice analyses? Check out How to use Recipes and the modality-specific recipes e.g. ATAC-seq Analysis Recipe, to learn exactly how to generate these differentially accessible regions from raw sequencing data.