This is a Snakemake based 16S QIIME2 pipeline.
To install, we assume you already have installed Miniconda3 (4.7.10+) (https://docs.conda.io/en/latest/miniconda.html)
- Clone this repository:
git clone https://github.com/PennChopMicrobiomeProgram/16S_QIIME2.gitSee the updated qiime docs here: https://library.qiime2.org/quickstart/amplicon
- Create a conda environment:
cd 16S_QIIME2
conda create --name qiime2-2023.2 --file environment.ymlIf this doesn't work or you're not on a linux platform, you can manually install following these instructions: (https://docs.qiime2.org/2023.9/install/native/)
To run the pipeline, activate the envrionment (currently based on QIIME2 2023.2) by entering
conda activate qiime2-2023.2
- The following software also need to be installed within the environment you created:
To run the pipeline, we need
- Multiplexed R1/R2 read pairs (Undetermined_S0_L001_R1_001.fastq.gz, Undetermined_S0_L001_R2_001.fastq.gz), and
- QIIME2 compatible mapping file
- Tab delimited
- The first two columns should be
SampleID(or#SampleID) andBarcodeSequence
Qiime2 classifier (https://docs.qiime2.org/2023.2/data-resources/)
dada2 training set (https://benjjneb.github.io/dada2/training.html)
- Create a project directory, e.g.
~/16S_QIIME2/testand put the mapping file, e.g.test_mapping_file.tsvin the project directory. If you are running this on the cluster, the data would be staged in a scratch drive e.g./scr1/username - Edit
qiime2_config.ymlso that it suits your project. In particular,- all: project: path to the project directory, e.g.
~/16S_QIIME2/test - all: mux_dir: the direcotry containing multiplexed R1/R2 read pairs, e.g.
~/16S_QIIME2/test/multiplexed_fastq - all: mapping: the name of mapping file, e.g.
test_mapping_file.tsv
- all: project: path to the project directory, e.g.
- Edit
config.yamlfor platform specific settings (currently formatted for SLURM on republica) - (Optional) Edit
rules\targets\targets.rulesto comment out steps you don't need (e.g.#TARGET_PICRUST2) - To run the pipeline, activate the envrionment by entering
conda activate qiime2-2023.2,cdinto16S_QIIME2and executesnakemake --profile ./- If using
sbatchyou can just execute the script./run_snakemake.bash - You can also do a dryrun:
./dryrun_snakemake.bash
- If using
- Multiplexed R1/R2 read pairs
- QIIME2 compatible mapping file
- Demultiplexed fastq(.gz) files
- Total read count summary (tsv)
- QIIME2 compatible manifest file (csv)
- QIIME2 compatible manifest file
- Demultiplexed fastq files
- QIIME2 PairedEndSequencesWithQuality artifact and corresponding visualization
- QIIME2-generated demultiplexing stats
- QIIME2 PairedEndSequencesWithQuality artifact
- Feature table (QIIME2 artifact, tsv)
- Representative sequences (QIIME2 artifact, fasta)
- Representative sequences
- Taxonomy classification table (QIIME2 artifact, tsv)
- Representative sequences
- Aligned sequence
- Masked (aligned) sequence
- Unrooted tree
- Rooted tree
- Rooted tree
- Various QIIME2 diversity metric artifacts
- Faith phylogenetic diversity vector (tsv)
- Weighted/unweighted UniFrac distance matrices (tsv)
- Representative sequences (fasta)
- Unassigner output (tsv) for species level classification of representative sequences
- Representative sequences (fasta)
- Dada2 species assignments (tsv)
- Dada2 Raw data for loading in R (RData format)
- Representative sequences (fasta)
- Vsearch report (tsv) customized to be like BLAST results (see config.yml)
- Vsearch list of representative sequences that aligned (fasta)
NB: Currently picrust2-2021.11_0 does not work with qiime2 2023.2 but these would be the outputs if it did:
- Feature table (QIIME2 artifact, tsv)
- Representative sequences (QIIME2 artifact, fasta)
- KEGG orthologs counts (tsv)
- Enzyme classification counts (QIIME2 artifact)
- KEGG pathway counts (QIIME2 artifact)
- Manually install a new version of qiime2 using conda (https://docs.qiime2.org/2023.9/install/native/)
- Update the
environment.ymlusing: conda activate myenvconda env export > environment.yml- git commit / push your changes (to your own fork) and create a pull request for PennCHOPMicrobiomeProgram/16S_QIIME2
You can do this replacing ctbushman with your own DockerHub repo name so that you can actually push to DockerHub and use the image elsewhere easily.
cd 16S_QIIME2/
docker build -t ctbushman/16s_qiime2:latest -f Dockerfile .
docker run --rm -it ctbushman/16s_qiime2:latest snakemake -h
docker push ctbushman/16s_qiime2:latest
This step you do once the image is on DockerHub and you want to use it on the HPC. The tmp and cache dirs we set here will help avoid errors you'll run into trying to build big images because the /tmp/ dir on the login nodes will fill up.
mkdir /scr1/users/bushmanc1/tmp
mkdir /scr1/users/bushmanc1/cache
export SINGULARITY_TMPDIR=/scr1/users/bushmanc1/tmp
export SINGULARITY_CACHEDIR=/scr1/users/bushmanc1/cache
singularity build /mnt/isilon/microbiome/analysis/software/16s_qiime2_2025.10.sif docker://ctbushman/16s_qiime2:latest
singularity exec --bind /mnt/isilon/microbiome:/mnt/isilon/microbiome /mnt/isilon/microbiome/analysis/software/16s_qiime2_2025.10.sif snakemake --snakefile /app/Snakefile --profile /path/to/project -n