Literature DB >> 26238460

The gene tree delusion.

Mark S Springer1, John Gatesy2.   

Abstract

Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are grossly misaligned, and numerous loci with >50% missing data for taxa that are misplaced in their gene trees. These problems were compounded by inadequate tree searches with nearest neighbor interchange branch swapping and inadvertent application of substitution models that did not account for among-site rate heterogeneity. Sixty-six gene trees imply unrealistic deep coalescences that exceed 100 million years (MY). Gene trees that were obtained with better justified models and search parameters show large increases in both likelihood scores and congruence. Coalescence analyses based on a curated set of 413 improved gene trees and a superior coalescence method (ASTRAL) support a Scandentia (treeshrews)+Glires (rabbits, rodents) clade, contradicting one of the three primary systematic conclusions of Song et al. (2012). Robust support for a Perissodactyla+Carnivora clade within Laurasiatheria is also lost, contradicting a second major conclusion of this study. Song et al.'s (2012) MP-EST species tree provided the basis for circular simulations that led these authors to conclude that the multispecies coalescent accounts for 77% of the gene tree conflicts in their dataset, but many internal branches of their MP-EST tree are stunted by an order of magnitude or more due to wholesale gene tree reconstruction errors. An independent assessment of branch lengths suggests the multispecies coalescent accounts for ⩽ 15% of the conflicts among Song et al.'s (2012) 447 gene trees. Unfortunately, Song et al.'s (2012) flawed phylogenomic dataset has been used as a model for additional simulation work that suggests the superiority of shortcut coalescence methods relative to concatenation. Investigator error was passed on to the subsequent simulation studies, which also incorporated further logical errors that should be avoided in future simulation studies. Illegitimate branch length switches in the simulation routines unfairly protected coalescence methods from their Achilles' heel, high gene tree reconstruction error at short internodes. These simulations therefore provide no evidence that shortcut coalescence methods out-compete concatenation at deep timescales. In summary, the long c-genes that are required for accurate reconstruction of species trees using shortcut coalescence methods do not exist and are a delusion. Coalescence approaches based on SNPs that are widely spaced in the genome avoid problems with the recombination ratchet and merit further pursuit in both empirical systematic research and simulations.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  C-gene; Concatalescence; Deep coalescence; Gene tree; Species tree

Mesh:

Year:  2015        PMID: 26238460     DOI: 10.1016/j.ympev.2015.07.018

Source DB:  PubMed          Journal:  Mol Phylogenet Evol        ISSN: 1055-7903            Impact factor:   4.286


  42 in total

1.  The perils of intralocus recombination for inferences of molecular convergence.

Authors:  Fábio K Mendes; Andrew P Livera; Matthew W Hahn
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-06-03       Impact factor: 6.237

Review 2.  Challenges in Species Tree Estimation Under the Multispecies Coalescent Model.

Authors:  Bo Xu; Ziheng Yang
Journal:  Genetics       Date:  2016-12       Impact factor: 4.562

3.  The Multispecies Coalescent Model Outperforms Concatenation Across Diverse Phylogenomic Data Sets.

Authors:  Xiaodong Jiang; Scott V Edwards; Liang Liu
Journal:  Syst Biol       Date:  2020-07-01       Impact factor: 15.683

Review 4.  Mammal madness: is the mammal tree of life not yet resolved?

Authors:  Nicole M Foley; Mark S Springer; Emma C Teeling
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2016-07-19       Impact factor: 6.237

5.  Phylogenomic red flags: Homology errors and zombie lineages in the evolutionary diversification of placental mammals.

Authors:  John Gatesy; Mark S Springer
Journal:  Proc Natl Acad Sci U S A       Date:  2017-10-24       Impact factor: 11.205

6.  Inferring Local Genealogies on Closely Related Genomes.

Authors:  Ryan A Leo Elworth; Luay Nakhleh
Journal:  Comp Genom       Date:  2017-09-15

7.  Conflicting phylogenetic signals in plastomes of the tribe Laureae (Lauraceae).

Authors:  Tian-Wen Xiao; Yong Xu; Lu Jin; Tong-Jian Liu; Hai-Fei Yan; Xue-Jun Ge
Journal:  PeerJ       Date:  2020-10-15       Impact factor: 2.984

8.  Bayesian Divergence-Time Estimation with Genome-Wide Single-Nucleotide Polymorphism Data of Sea Catfishes (Ariidae) Supports Miocene Closure of the Panamanian Isthmus.

Authors:  Madlen Stange; Marcelo R Sánchez-Villagra; Walter Salzburger; Michael Matschiner
Journal:  Syst Biol       Date:  2018-07-01       Impact factor: 15.683

Review 9.  From Summary Statistics to Gene Trees: Methods for Inferring Positive Selection.

Authors:  Hussein A Hejase; Noah Dukler; Adam Siepel
Journal:  Trends Genet       Date:  2020-01-15       Impact factor: 11.639

10.  PhyloWGA: chromosome-aware phylogenetic interrogation of whole genome alignments.

Authors:  Richard H Adams; Todd A Castoe; Michael DeGiorgio
Journal:  Bioinformatics       Date:  2021-07-27       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.