| Literature DB >> 35722083 |
David Dylus1,2,3, Yannis Nevers1,2,3, Adrian M Altenhoff1,4, Antoine Gürtler2,3, Christophe Dessimoz1,2,3,5,6, Natasha M Glover1,2,3.
Abstract
Knowledge of species phylogeny is critical to many fields of biology. In an era of genome data availability, the most common way to make a phylogenetic species tree is by using multiple protein-coding genes, conserved in multiple species. This methodology is composed of several steps: orthology inference, multiple sequence alignment and inference of the phylogeny with dedicated tools. This can be a difficult task, and orthology inference, in particular, is usually computationally intensive and error prone if done ad hoc. This tutorial provides protocols to make use of OMA Orthologous Groups, a set of genes all orthologous to each other, to infer a phylogenetic species tree. It is designed to be user-friendly and computationally inexpensive, by providing two options: (1) Using only precomputed groups with species available on the OMA Browser, or (2) Computing orthologs using OMA Standalone for additional species, with the option of using precomputed orthology relations for those present in OMA. A protocol for downstream analyses is provided as well, including creating a supermatrix, tree inference, and visualization. All protocols use publicly available software, and we provide scripts and code snippets to facilitate data handling. The protocols are accompanied with practical examples. Copyright:Entities:
Keywords: OMA; Orthologous Matrix; phylogenetics; phylogenomics; species tree
Year: 2020 PMID: 35722083 PMCID: PMC9194518 DOI: 10.12688/f1000research.23790.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Computational tools needed for making a phylogenetic species tree using OMA.
Note that four phylogenetic tree inference software packages are given, but only one (user preference) is needed to complete this tutorial.
| Tool | Use case | How to get it |
|---|---|---|
| Command line | Mandatory to run the
| Installed by default on Unix and Mac |
| Python 3 | Python language
|
|
| OMA Browser | Needed to import
|
|
| OMA
| Needed to infer orthology
|
|
| MAFFT | Multiple sequence
|
|
| High
| Needed if a high amount
| Institutional infrastructure |
| IQ-Tree | Phylogenetic tree
|
|
| RAxML | Phylogenetic tree
|
|
| Phylobayes | Phylogenetic tree
|
|
| PhyML | Phylogenetic tree
|
|
| Phylo.io | Phylogenetic tree
|
|
Figure 1. Exporting data from OMA for building a species tree.
A) Choose which type of data to export from the Download tab on the right hand side of the home page. B) Select your proteomes from those in the OMA database by using the interactive species tree, which is based on the NCBI taxonomy.
Figure 2. Tree organization of the tarball downloaded through the OMA Browser after exporting an all-against-all of selected species.
The important files and folders are colored. In green, the executable files mentioned in the course of the tutorial. In blue are the files and folder that will need to be modified. Other files and folders (in black) will not be used in the course of the tutorial. Files and folders not shown are represented by three dots.
Recommended software and example commands for computing a phylogenetic tree.
Parameters, such as memory or threads, may vary based on size of dataset.
| Software for making
| Example command |
|---|---|
| IQ-Tree |
|
| RaML |
|
| PhyloBayes |
|
Figure 3. Comparison of phylogenetic trees computed by IQ-TREE, using an LG substitution model (left), and RAxML, using an LG substitution model, a discrete Gamma model of rate heterogeneity with 8 categories, and empirical amino-acid frequencies (right).
Trees were computed with 20 yeast species present in OMA. The leaves of the trees are the UniProt 5-letter species codes. The following export options were used: Minimum species coverage: 1, Maximum nr of markers: -1 (uncapped). 168 marker genes were exported. Visualization was done with phylo.io; different shades of blue show variations in topology. Bootstrap values are reported in red for each bipartition with a bootstrap <100.
Figure 4. Comparison of phylogenetic trees, using additional species, computed by IQ-TREE, under a LG substitution model (left), and RAxML, under a LG substitution model, a discrete Gamma model of rate heterogeneity with 8 categories and empirical amino-acid frequencies (right).
Trees were computed with 18 yeast species present in OMA, plus two additional proteomes (YEAST and FOMPI). The leaves of the trees are the UniProt 5-letter species codes. Genes used to compute the tree had to be shared by at least 90% of the species (minimum species coverage: 0.9, maximum number markers: -1). This represents 880 OGs. Visualization was done with phylo.io; different shades of blue show variations in topology (in this case both trees have identical topology). Bootstrap values are reported in red for each bipartition with a bootstrap <100.