PaReBrick: PArallel REarrangements and BReakpoints Identification Toolkit

Motivation

The high plasticity of bacterial genomes is facilitated by numerous mechanisms, including horizontal gene transfer and recombination via flanking repeats.
Genome rearrangements such as inversions, deletions, insertions, and duplications may occur independently in different strains, leading to parallel adaptation or phenotypic diversity.
Such rearrangements may be responsible for virulence, antibiotic resistance, and antigenic variation.
However, identifying these events often requires laborious manual inspection and verification of phyletic pattern consistency.

Methods and Results

We present PaReBrick — a tool implementing an algorithm for identifying parallel rearrangements in bacterial populations.
We define "parallel rearrangements" as events that occur independently in phylogenetically distant bacterial strains and provide a formalization for calling these events.

The tool takes a collection of strains represented as sequences of oriented synteny blocks and a phylogenetic tree as input.
It identifies rearrangements, tests them for consistency with the tree, and ranks events by their parallelism score.
The tool also generates diagrams for each block of interest, facilitating the detection of horizontally transferred blocks, their extra copies, and any inversions involving duplicated blocks.

We demonstrated PaReBrick’s efficiency and accuracy, showing its potential for detecting genome rearrangements responsible for pathogenicity and adaptation in bacterial genomes.

Installation

PaReBrick can be installed using pip.
Please note that Python <= 3.8 is required (caused by dependencies). To create and activate a Python 3.8 environment using Conda, run the following commands:

conda create -n py38 python=3.8
conda activate py38

Then, install PaReBrick:

pip install PaReBrick

Now you can run the tool from any directory using PaReBrick (or parebrick).

Script Parameters

The main script of the project, which includes all modules, can be run from any location as a console tool.

Required Input

Important: Identifiers in the tree and blocks must match.

`--tree/-t`

Path to a phylogenetic tree in Newick format, parsable by the ete3 library.
For more information about supported formats, see the ete3 documentation.

`--blocks_folder/-b`

Path to a folder containing synteny blocks, generated by tools such as Sibelia or maf2synteny.
Refer to BLOCKS-OBTAIN.md for instructions on obtaining synteny blocks using SibeliaZ.

Optional Input

`--labels/-l`

Path to a CSV file with tree labels for visualization.
The file must contain two columns: strain and label.

`--output/-o`

Path to the output folder.
Default is ./parebrick_output.

Output

The output consists of three main folders:

preprocessed_data — Contains all synteny blocks in infercars, GRIMM, and CSV formats, as well as genomes_lengths.csv, which lists the lengths of the provided genomes.
balanced_rearrangements_output — Contains a stats.csv file with statistics of non-convex characters from balanced rearrangements, as well as folders (characters, tree_colorings) containing character representations in .pdf trees and .csv formats.
unbalanced_rearrangements_output — Similar to the above, but for unbalanced rearrangements. Contains a stats.csv file and subfolders with tree renderings in .pdf and .csv formats.

Example Run and Data

Example data is available in the example-data folder.

How to Run the Example:

Clone the repository:

git clone https://github.com/ctlab/parallel-rearrangements

Navigate to the example data folder:

cd parallel-rearrangements/example-data/streptococcus_pyogenes

Run PaReBrick using the example input:

PaReBrick -t input/tree.nwk -b input/maf2synteny-output -l input/labels.csv

Or, run with minimal required arguments (without labels):

PaReBrick -t input/tree.nwk -b input/maf2synteny-output

Citation

If you use PaReBrick in your research, please cite:

Alexey Zabelkin, Yulia Yakovleva, Olga Bochkareva, Nikita Alexeev, PaReBrick: PArallel REarrangements and BReakpoints identification toolkit, Bioinformatics, 2021; btab691, https://doi.org/10.1093/bioinformatics/btab691

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
example-data/streptococcus_pyogenes		example-data/streptococcus_pyogenes
figs		figs
parebrick		parebrick
.gitignore		.gitignore
BLOCKS-OBTAIN.md		BLOCKS-OBTAIN.md
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py
upload_pip.sh		upload_pip.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

PaReBrick: PArallel REarrangements and BReakpoints Identification Toolkit

Motivation

Methods and Results

Installation

Script Parameters

Required Input

`--tree/-t`

`--blocks_folder/-b`

Optional Input

`--labels/-l`

`--output/-o`

Output

Example Run and Data

How to Run the Example:

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

License

Uh oh!

ctlab/parallel-rearrangements

Folders and files

Latest commit

History

Repository files navigation

PaReBrick: PArallel REarrangements and BReakpoints Identification Toolkit

Motivation

Methods and Results

Installation

Script Parameters

Required Input

--tree/-t

--blocks_folder/-b

Optional Input

--labels/-l

--output/-o

Output

Example Run and Data

How to Run the Example:

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`--tree/-t`

`--blocks_folder/-b`

`--labels/-l`

`--output/-o`

Packages