WGS-FP

Whole Genome Sequencing File Processing

Genomic Analysis Pipeline

Overview

This bash script implements an end-to-end genomic analysis pipeline designed to run on a local server. It performs a series of steps from quality control of raw sequencing data to variant calling, utilizing various bioinformatics tools.

Requirements

The following tools must be installed and available in your system PATH:

fastp
bwa
samtools
gatk
deepvariant

Usage

Edit the script to set the following variables:
- INPUT_R1: Path to input FASTQ file for read 1
- INPUT_R2: Path to input FASTQ file for read 2
- REFERENCE_GENOME: Path to reference genome FASTA file
- KNOWN_SITES: Path to known sites VCF file
- OUTPUT_DIR: Path to output directory
- THREADS: Number of threads to use (adjust based on your server's capabilities)
Make the script executable:
```
chmod +x genomic_analysis_pipeline.sh
```
Run the script:
```
./genomic_analysis_pipeline.sh
```

Pipeline Steps

Quality Control and Trimming: Uses fastp to perform quality control and trimming on input FASTQ files.
Alignment to Reference Genome: Aligns trimmed reads to the reference genome using BWA-MEM.
Sorting and Indexing BAM Files: Sorts and indexes the aligned reads using samtools.
Marking Duplicates: Marks duplicate reads in the BAM file.
Base Quality Score Recalibration (BQSR): Performs base quality score recalibration using GATK.
Variant Calling: Calls variants using DeepVariant.

Output

The script generates several output files in the specified output directory, including:

Trimmed FASTQ files
Aligned, sorted, and indexed BAM files
Recalibrated BAM file
VCF and gVCF files containing called variants

Error Handling

The script will exit immediately if any command fails, helping to catch errors early in the pipeline.

Customization

You can customize the pipeline by modifying the parameters passed to each tool. Refer to the documentation of individual tools for more information on available options.

Notes

This pipeline is designed for whole genome sequencing (WGS) data.
Ensure you have sufficient disk space in the output directory.
The pipeline may take several hours to complete, depending on the size of your input data and the computational resources available.

Troubleshooting

If you encounter any issues:

Check that all required tools are properly installed and in your PATH.
Verify that input files exist and are readable.
Ensure you have write permissions in the output directory.
Check the server logs for any error messages.

For further assistance, please contact your system administrator or bioinformatics support team.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
wgs-fp-main		wgs-fp-main
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WGS-FP

Genomic Analysis Pipeline

Overview

Requirements

Usage

Pipeline Steps

Output

Error Handling

Customization

Notes

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WGS-FP

Genomic Analysis Pipeline

Overview

Requirements

Usage

Pipeline Steps

Output

Error Handling

Customization

Notes

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages