| Literature DB >> 24250218 |
Kevin M Kocot1, Mathew R Citarella, Leonid L Moroz, Kenneth M Halanych.
Abstract
Molecular phylogenetics relies on accurate identification of orthologous sequences among the taxa of interest. Most orthology inference programs available for use in phylogenomics rely on small sets of pre-defined orthologs from model organisms or phenetic approaches such as all-versus-all sequence comparisons followed by Markov graph-based clustering. Such approaches have high sensitivity but may erroneously include paralogous sequences. We developed PhyloTreePruner, a software utility that uses a phylogenetic approach to refine orthology inferences made using phenetic methods. PhyloTreePruner checks single-gene trees for evidence of paralogy and generates a new alignment for each group containing only sequences inferred to be orthologs. Importantly, PhyloTreePruner takes into account support values on the tree and avoids unnecessarily deleting sequences in cases where a weakly supported tree topology incorrectly indicates paralogy. A test of PhyloTreePruner on a dataset generated from 11 completely sequenced arthropod genomes identified 2,027 orthologous groups sampled for all taxa. Phylogenetic analysis of the concatenated supermatrix yielded a generally well-supported topology that was consistent with the current understanding of arthropod phylogeny. PhyloTreePruner is freely available from http://sourceforge.net/projects/phylotreepruner/.Entities:
Keywords: gene tree; orthology; paralogy; phylogenomic
Year: 2013 PMID: 24250218 PMCID: PMC3825643 DOI: 10.4137/EBO.S12813
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1Illustration of the PhyloTreePruner tree-pruning algorithm. (A) PhyloTreePruner reads the single-gene tree and corresponding alignment file. (B) Nodes in the single-gene tree with support values below the user-defined threshold are identified (red box) and (C) collapsed into polytomies (green box). (D) PhyloTreePruner identifies the maximally inclusive subtree in which all taxa are represented by exactly one sequence, or, if there is more than one sequence from a taxon, these sequences form a monophyletic clade or are part of the same polytomy (green box). In this example, PhyloTreePruner identifies a potential paralogy issue with the Ixodes sequences (red box). This example shows the necessity of correct single-gene tree rooting. (E) PhyloTreePruner deletes sequences inferred to be paralogs from the tree and the corresponding sequence alignment file (red boxes). (F) In cases where more than one sequence remains from the same taxon, PhyloTreePruner selects the longest sequence and deletes all others (green boxes). This step can be skipped if preferred and another method (eg, SCaFoS) can be used to select the best sequence for each taxon.
Figure 2Example of a single-gene tree showing a weakly supported node (red box) that incorrectly recovers two sequences from the same taxon as paralogs. PhyloTreePruner collapses nodes with support values below a user-defined threshold and allows sequences from multiple taxa to be part of the same polytomy. Thus, PhyloTreePruner would “rescue” this group from being discarded if a minimum support value above 21 was used.
Taxon sampling.
| Taxon | Species | Predicted transcripts |
|---|---|---|
| Chelicerata | 20,486 | |
| Branchiopoda | 30,930 | |
| Insecta | 10,248 | |
| 15,419 | ||
| 9,093 | ||
| 14,623 | ||
| 18,883 | ||
| 14,076 | ||
| 9,163 | ||
| 11,194 | ||
| 9,761 |
Note: All sequences were downloaded from InParanoid 7.014 from the “processed sequences” directory.
Figure 3Phylogram of the most likely tree recovered in the RAxML analysis of the concatenated data matrix. The tick Ixodes was used to root the tree. Bootstrap support values above 50 are shown at each node. Scale bar = 0.05 substitutions per site. Notably, bootstrap support for Paraneoptera (Acyrthosiphon + Pediculus) was weak, consistent with the results of other phylogenomic studies.34,35