Pythia is a lightweight python library to predict the difficulty of Multiple Sequence Alignments (MSA). Phylogenetic analyzes under the Maximum-Likelihood (ML) model are time and resource intensive. To adequately capture the vastness of tree space, one needs to infer multiple independent trees. On some datasets, multiple tree inferences converge to similar tree topologies, on others to multiple, topologically highly distinct yet statistically indistinguishable topologies. Pythia predicts the degree of difficulty of analyzing a dataset prior to initiating ML-based tree inferences. Predicting the difficulty using Pythia is substantially faster than inferring multiple ML trees using RAxML-NG. Pythia can be used to increase user awareness with respect to the amount of signal and uncertainty to be expected in phylogenetic analyzes, and hence inform an appropriate (post-)analysis setup. Further, it can be used to select appropriate search algorithms for easy-, intermediate-, and hard-to-analyze datasets. Pythia supports DNA, AA, and morphological data in phylip and FASTA format.
The easiest (and recommended) way to install PyPythia is by using conda:
conda install pypythia -c conda-forge
This will install the latest version of Pythia. You can verify the correct installation by running pythia -h
.
For further install instructions see the documentation in the wiki.
For detailed instructions on how to install and use Pythia see the wiki.
If you encounter any trouble using Pythia, have a question, or you find a bug, please feel free to open an issue here.
The paper explaining the details of Pythia is published in MBE:
Haag, J., Höhler, D., Bettisworth, B., & Stamatakis, A. (2022). From Easy to Hopeless - Predicting the Difficulty of Phylogenetic Analyses. Molecular Biology and Evolution, 39(12). https://doi.org/10.1093/molbev/msac254
The same functionality is also available as C library here. Since the C library depends on Coraxlib it is not as easy and fast to use as this python library. If you are only interested in the difficulty of your MSA, we recommend using this Python library. If you want to incorporate the difficulty prediction in a phylogenetic tool, we recommend using the faster C library.
-
A. M. Kozlov, D. Darriba, T. Flouri, B. Morel, and A. Stamatakis (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference Bioinformatics, 35(21): 4453–4455. https://doi.org/10.1093/bioinformatics/btz305
-
D. Höhler, W. Pfeiffer, V. Ioannidis, H. Stockinger, A. Stamatakis (2022) RAxML Grove: an empirical phylogenetic tree database Bioinformatics, 38(6):1741–1742. https://doi.org/10.1093/bioinformatics/btab863