| Literature DB >> 31667118 |
Richard H Adams1, Todd A Castoe1.
Abstract
Genome-scale species tree inference is largely restricted to heuristic approaches that use estimated gene trees to reconstruct species-level relationships. Central to these heuristic species tree methods is the assumption that the gene trees are estimated without error. To increase the accuracy of input gene trees used to infer species trees, several techniques have recently been developed for constructing longer "supergenes" that represent sets of loci inferred to share the same genealogical history. While these supergene methods are designed to increase the amount of data for gene tree estimation by concatenating several loci into "supergenes" to increase gene tree accuracy, no formal protocols have been proposed to validate this key "supergene" concatenation step. In a recent study, we developed several supergene validation strategies for assessing the accuracy of a popular supergene method: the so-called "statistical binning" pipeline. In this article, we describe a more generalizable and model-based "supergene validation" protocol for assessing the accuracy of supergenes and supergene methods using model-based tests of phylogenetic congruency. •Supergenes are validated by adopting model-based tests of topological congruence•These model-based procedures out preform non-model based methods for supergene construction•The results of this protocol can be used to assess the overall performance of a supergene method across a phylogenomic dataset.Entities:
Keywords: Bioinformatics; Coalescent; Concatenation; Phylogenetic inference; Phylogenomics; Species trees; Supergene; Supergene validation protocol
Year: 2019 PMID: 31667118 PMCID: PMC6812401 DOI: 10.1016/j.mex.2019.09.025
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1General framework for the supergene validation protocol as applied in Ref. [15]. First, a set of supergenes is obtained (a) either using an a priori concatenation strategy (i.e., partitioning loci into supergenes assumed to share a common tree), or using a heuristic supergene method (such as statistical binning). Here a supergene is depicted as a set of loci (colored alignments) that have been concatenated together to form a single alignment. After a set of supergenes has been constructed, we show three approaches (b) and associated example toolsets that can be used to assess supergene validity (i) Likelihood Ratio Tests (LRTs), (ii) Tree Topology Tests (TTTs), and (iii) Bayes Factor (BF) model comparison. For the LRTs, the fit of two alternative competing models (“true” vs “false” supergene models) is compared, and the best fit model indicates the reliability of the supergene (top box). A number of TTTs can be used to quantify the number of loci that reject or the overall supergene tree topology (i.e., colored vs gray tree shown in center box). BFs can be used in a similar manner to LRTs to compare the fit of “true” and “false supergene models (bottom box). Finally, the results of the supergene validation tests can be used to summarize the overall performance of the supergene method by estimating the number of supergenes that selected the “true” or “false” supergene models – providing an estimate of the false positive and true positive rate of the supergene method, respectively. Asterisks (*) indicate specific analyses used in the original supergene validation study of [15] (Concatepillar and SH tests).
| Subject Area: | |
| More specific subject area: | |
| Method name: | |
| Name and reference of original method: | Adams R.H., and Castoe, T.A. 2019. Statistical binning suffers profound model violation due to gene tree error incurred by trying to avoid gene tree error. Accepted at Molecular Phylogenetics and Evolution. |
| Resource availability: |