Quickstart

This quickstart gets you started with MrBiomics in minutes. It showcases one of our most popular capabilities: taking thousands of hard-to-interpret genomic regions and effortlessly turning them into meaningful biological insights. In this guide, we'll use MrBiomics' Enrichment Analysis module to infer cell identities straight from ATAC-seq analysis results.

🧬 Can we recover cell identities?

We start with genomic region .bed files resulting from differential analysis of healthy hematopoietic ATAC-seq data. Specifically, we focus on four region sets that showed significantly increased accessibility (they are "more open") in four specific cell types compared to the rest of the hematopoietic lineage (one-vs-all comparison):

B cells: docs/quickstart/data/Bcell_up_features.bed
CD8 T cells: docs/quickstart/data/CD8Tcell_up_features.bed
Erythroid cells: docs/quickstart/data/Ery_up_features.bed
Monocytes: docs/quickstart/data/Mono_up_features.bed
And a shared background of all consensus regions to compute enrichment against: docs/quickstart/data/ALL_features.bed

Important

The Question: Starting only from these unannotated coordinate regions in a .bed file, can our enrichment analysis accurately recover the correct hematopoietic cell identities?

Note

These files are selected results from the full ATAC-seq Analysis Recipe.

▶️ Run the Workflow with 5 Commands

Important

Operating System: This quickstart is currently designed and tested to work out-of-the-box on Linux systems, covering the most common use case.

To set everything up from scratch (assuming conda is installed) and run the quickstart, just paste these commands into your terminal:

# install Snakemake
conda create -y -n snakemake -c conda-forge -c bioconda snakemake=8.25.3
# clone MrBiomics
git clone https://github.com/epigen/MrBiomics.git
# change to MrBiomics directory
cd MrBiomics
# activate snakemake environment
conda activate snakemake
# run the quickstart workflow
snakemake --software-deployment-method conda --cores 1

Note

Grab a coffee! ☕ The very first run may take around 10 minutes because Snakemake automatically downloads and creates the required software environments for you.

🔎 Inspect the Results

Once the workflow finishes, you can find the summary results here: results/quickstart/enrichment_analysis/hematopoietic/GREAT/Azimuth_2023/

The most insightful result file to check first is: hematopoietic_Azimuth_2023_summary_specificTerms.png

This plot summarizes the enrichment signal across all four region sets. Let's see if we recovered the expected biology!

Grouped GREAT summary plot for the hematopoietic quickstart

Summary plot showing how the genomic region sets are enriched in specific cell type annotations.

🎉 Success! The B-cell, CD8 T-cell, erythroid, and monocyte region sets strongly enrich for their matching cell type annotations. We have successfully shown that these differentially accessible region signatures encode the underlying cell identities, all fully automated and reproducible.

🛠️ How it works

For those curious, the quickstart runs a small custom MrBiomics Project, which is a Snakemake workflow, demonstrating how you can easily use modules within your own analyses. Let's systematically go through the required files and how they connect:

MrBiomics/
├── config/
│   ├── config.yaml
│   └── quickstart/
│       ├── quickstart_enrichment_analysis_annotation.csv
│       └── quickstart_enrichment_analysis_config.yaml
├── docs/
│   └── quickstart/
│       └── data/
│           ├── ALL_features.bed
│           ├── Bcell_up_features.bed
│           └── ... (other .bed input files)
├── results/
│   └── quickstart/
│       └── enrichment_analysis/.../
│           ├── hematopoietic_Azimuth_2023_summary_specificTerms.png
│           └── ... (other result plots and tables)
└── workflow/
    ├── Snakefile
    └── rules/
        └── quickstart.smk

Project Workflow (workflow/Snakefile): The project's main Snakefile, which orchestrates the full execution from start to finish, loads the configuration (2.) and includes the analysis-specific Snakefile (3.).
Project Configuration (config/config.yaml): The overarching global project configuration. It links the quickstart analysis directly to its module-specific configuration files.
Analysis-specific Workflow (workflow/rules/quickstart.smk): The external Enrichment Analysis workflow is loaded as a module directly from GitHub into the quickstart's analysis-specific snakefile using the analysis-specific configurations and annotations (4.)
Module-Specific Configuration & Annotation:
- The Configuration (config/quickstart/quickstart_enrichment_analysis_config.yaml) specifies the specific analysis settings (like running GREAT on the Azimuth 2023 database).
- The Annotation (config/quickstart/quickstart_enrichment_analysis_annotation.csv) provides the necessary input files (.bed paths and metadata).
Together, they describe the analysis using the Enrichment Analysis module.

graph TD
    classDef file fill:#f9f9f9,stroke:#333,stroke-width:1px,color:#333;
    classDef input fill:#e1f5fe,stroke:#0288d1,stroke-width:1px,color:#333;
    classDef output fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px,color:#333;
    classDef external fill:#fff3e0,stroke:#e65100,stroke-width:1px,color:#333;

    %% Data Nodes
    InputData["<b>Input Data</b><br/><code>docs/quickstart/data/*.bed</code>"]:::input
    ResultPlot["<b>Final Results</b><br/><code>results/.../hematopoietic_Azimuth_2023_summary_specificTerms.png</code>"]:::output

    %% Config & Workflow Nodes
    Snakefile["<b>1. Project Workflow</b><br/><code>workflow/Snakefile</code>"]:::file
    ProjectConfig["<b>2. Project Configuration</b><br/><code>config/config.yaml</code>"]:::file
    QuickstartSMK["<b>3. Analysis Workflow</b><br/><code>workflow/rules/quickstart.smk</code>"]:::file
    
    %% External Module Node
    GitHubModule["<b>Enrichment Analysis</b><br/><code>(GitHub)</code>"]:::external

    ModuleConfig["<b>4a. Module Configuration</b><br/><code>.../quickstart_enrichment_analysis_config.yaml</code>"]:::file
    ModuleAnno["<b>4b. Module Annotation</b><br/><code>.../quickstart_enrichment_analysis_annotation.csv</code>"]:::file

    %% Logic Connections
    InputData -.->|Paths defined in| ModuleAnno
    
    Snakefile -->|Loads global configuration| ProjectConfig
    ProjectConfig -.->|Points module to| ModuleConfig
    ProjectConfig -.->|Points module to| ModuleAnno

    Snakefile -->|Includes analysis rules| QuickstartSMK
    
    %% Module inclusion
    QuickstartSMK -->|Loads module from GitHub| GitHubModule
    
    %% Module logic
    ModuleConfig -->|Configures analysis| GitHubModule
    ModuleAnno -->|Provides metadata to| GitHubModule

    GitHubModule ===>|Executes and generates| ResultPlot

🌱 Next steps & Where to go from here

Now that you've witnessed the power and simplicity of MrBiomics, explore the rest of the wiki to apply it to your own research:

Want to use modules on your own data? Head over to Installation, Configuration and Execution to learn how to use modules as standalone workflows on your own data.
Want to build up larger analyses? Read about Module Usage in Projects to see how we recommend loading modules into your own Snakemake workflows (like done in this quickstart).
Curious about our full end-to-end best practice analyses? Check out How to use Recipes and the modality-specific recipes e.g. ATAC-seq Analysis Recipe, to learn exactly how to generate these differentially accessible regions from raw sequencing data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quickstart

🧬 Can we recover cell identities?

▶️ Run the Workflow with 5 Commands

🔎 Inspect the Results

🛠️ How it works

🌱 Next steps & Where to go from here

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Start Here (Quickstart)

Modules

Module Usage in Projects

Recipes

Tips

CeMM Users

Clone this wiki locally