AIRLoM (Adaptive Immune-Receptor Locus Mapper)

Description

This repository contains a bioinformatic tool depevoped to predict the structur of the IGH locus of any desired genome. It is designed to operate using commonly used alignment tools, as well as particular-designed analysis scripts. The information produced is used to predict the location of IGH-V, -D, -J, -C segments within the IGH locus.

Structure

This section describes the general steps performed by AIRLoM, in order to have a general overview of the pipeline.

Variables confirmation
Make CDHIT reductions.
Align with CLUSTALW2 the RSS reductions.
Make BLAST analysis.
Convert BLAST m6 table to gff.
Extract scaffolds of interest.
Make EXONERATE analysis.
Clean vulgar format.
Make HMMER analysis for RSS.
Filter EXONERATE files for exons and genes hits.
Reduce to eliminate redundancy in filtered EXONERATE files.
Make overlap analysis to detect V segments with exon and SP
Make MINIPROT analysis.
Correct V segments and RSS-J coordinates.
Predict D segments based in founded RSS-D.

MAIN MODULES (MASTER AND FUNCTION)

AIRLoM is controled by a master script, structured in a series of functions located in the fun.sh script. This modulation was made to have a better control of every step of the analysis, and even exclude parts of it.

SUB MODULES (SUBSCRIPTS)

Subscript were made in order to format the results obtained, perform reduction of coordinates redundancy, and make overlap analysis.

blast_m6_to_gff.R

This script transforms blast format 6 (tabular with 11 columns) into a gff file, along with some filters.

Syntax

blastm6_to_gff.py --file [FILE] --source [SOURCE] --bitscore [BITSCORE]

Options

Options	Description
file	The file produced by blast with output format 6 (tabular).
source	The mode in which blast was performed, e.g. blastn, blastp, tblastx, ...
bitscore	Bitscore obtained by blast. Used to add a level of filter to the result. If 0, all results will be maintained.

vulgar_to_table.R

This script takes the vulgar format from the exonerate analysis and transforms the vulgar syntax into a table of condensed results.

Syntax

vulgar_to_table.R --file [FILE]

Options

Options	Description
file	The filtered exonerate result file containing only the records with vulgar formats.

hmmer_tbl_to_gff.py

This script transforms hmmer tbl format into a gff file.

Syntax

hmmer_tbl_to_gff.py --file [FILE]

Options

Options	Description
file	The file produced by HMMER analysis, in tbl format.

gff_disambiguation.R

This script takes the filtered exonerate files (genes or exons) and perform a reduction of the overlaping sequences in order to reduce ambiguity produced from matches located at the same coordinates. The result is one genomic range per overlapping individual ranges.

Syntax

gff_disambiguation.R --file [FILE]

Options

Options	Description
file	The files resul

predict_ighv_by_overlaps.R

This script takes two filtered files, the reduced exons and genes records from EXONERATE results. In order to detect if a V segment is a has its structural exon and its signal peptide, at least two exons need to be detected in each gene record. To do this, the number of exons per gene is counted and every gene that has two or more exons are annotated as genes, whereas genes with one or less associated exons are annotated as pseudogenes. Every record is numbered in a unique fashion to give each segment a unique name.

Syntax

SCRIPTS/SUBSCRIPTS/predict_ighv_by_overlaps.R --query [FILE] --subject [FILE]

Options

Options	Description
query	The reduced gff file from filtered exon annotated records from EXONERATE
subject	The reduced gff file from filtered gene annotated records from EXONERATE

locate_nearby_rss.R

This script detect which RSS is locate nearby each V/J segment. This is achieved by increasing the coordinates of RSS segments in both the start and the end of the match, in order to try to make an artificial overlap with the nearby V/J segment

Syntax

locate_nearby_rss.R -n [FILE] -t [FILE] -v [FILE] -r [INT] -m [STRING]

Options

Options	Description
n	The HMMER gff file
t	The HMMER tbl file
v	The overlap gff file prediction containing the names of genes and pseudogenes
r	The number of positions to increase in both sides of the HMMER matches, in which to detect possible overlaps with the V segments
m	Mode. Depends on the input files. One of the followings: [V_segments/J_segments]

predict_ighd_by_rss.R

This script detects the probable location of true D segments. To do this, the reasoning was that true D segments would by flanked by D RSS signal in both 5' and 3' ends. Every genomic region comprising both ends that have less tha 50 bp lenght is considered as a potential true D segment.

Syntax

predict_ighd_by_rss.R -f [FILE] -t [FILE]

Options

Options	Description
f	The HMMMER gff file produced with the 5' RSS database
t	The HMMMER gff file produced with the 3' RSS database

DOCKER

The script uses many dependencies that could be annoying to install, test, and put the required locations. To prevent this, we designed a docker image in order to have a clean and ready-to-use environment to perform the genomic analysis using this script.

DEPENDENCIES

This are all the dependencies and the versions used in order to make the script to make all the analysis. Here are considered only the programs that could not be preinstalled in a new linux installation.

Programming Language	Library, package	Use in	Version
Conda	---	---	---
python	pandas	---	---
R	rtracklayer	---	---
R	GenomicRanges	---	---

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
SCRIPTS		SCRIPTS
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AIRLoM (Adaptive Immune-Receptor Locus Mapper)

Description

Structure

MAIN MODULES (MASTER AND FUNCTION)

SUB MODULES (SUBSCRIPTS)

blast_m6_to_gff.R

Syntax

Options

vulgar_to_table.R

Syntax

Options

hmmer_tbl_to_gff.py

Syntax

Options

gff_disambiguation.R

Syntax

Options

predict_ighv_by_overlaps.R

Syntax

Options

locate_nearby_rss.R

Syntax

Options

predict_ighd_by_rss.R

Syntax

Options

DOCKER

DEPENDENCIES

About

Releases

Packages

Contributors 3

Languages

License

biodfrl89/AIRLoM

Folders and files

Latest commit

History

Repository files navigation

AIRLoM (Adaptive Immune-Receptor Locus Mapper)

Description

Structure

MAIN MODULES (MASTER AND FUNCTION)

SUB MODULES (SUBSCRIPTS)

blast_m6_to_gff.R

Syntax

Options

vulgar_to_table.R

Syntax

Options

hmmer_tbl_to_gff.py

Syntax

Options

gff_disambiguation.R

Syntax

Options

predict_ighv_by_overlaps.R

Syntax

Options

locate_nearby_rss.R

Syntax

Options

predict_ighd_by_rss.R

Syntax

Options

DOCKER

DEPENDENCIES

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages