Literature DB >> 29029338

To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.

Erin K Molloy1, Tandy Warnow1.   

Abstract

With the increasing availability of whole genome data, many species trees are being constructed from hundreds to thousands of loci. Although concatenation analysis using maximum likelihood is a standard approach for estimating species trees, it does not account for gene tree heterogeneity, which can occur due to many biological processes, such as incomplete lineage sorting. Coalescent species tree estimation methods, many of which are statistically consistent in the presence of incomplete lineage sorting, include Bayesian methods that coestimate the gene trees and the species tree, summary methods that compute the species tree by combining estimated gene trees, and site-based methods that infer the species tree from site patterns in the alignments of different loci. Due to concerns that poor quality loci will reduce the accuracy of estimated species trees, many recent phylogenomic studies have removed or filtered genes on the basis of phylogenetic signal and/or missing data prior to inferring species trees; little is known about the performance of species tree estimation methods when gene filtering is performed. We examine how incomplete lineage sorting, phylogenetic signal of individual loci, and missing data affect the absolute and the relative accuracy of species tree estimation methods and show how these properties affect methods' responses to gene filtering strategies. In particular, summary methods (ASTRAL-II, ASTRID, and MP-EST), a site-based coalescent method (SVDquartets within PAUP*), and an unpartitioned concatenation analysis using maximum likelihood (RAxML) were evaluated on a heterogeneous collection of simulated multilocus data sets, and the following trends were observed. Filtering genes based on gene tree estimation error improved the accuracy of the summary methods when levels of incomplete lineage sorting were low to moderate but did not benefit the summary methods under higher levels of incomplete lineage sorting, unless gene tree estimation error was also extremely high (a model condition with few replicates). Neither SVDquartets nor concatenation analysis using RAxML benefited from filtering genes on the basis of gene tree estimation error. Finally, filtering genes based on missing data was either neutral (i.e., did not impact accuracy) or else reduced the accuracy of all five methods. By providing insight into the consequences of gene filtering, we offer recommendations for estimating species tree in the presence of incomplete lineage sorting and reconcile seemingly conflicting observations made in prior studies regarding the impact of gene filtering.

Mesh:

Year:  2018        PMID: 29029338     DOI: 10.1093/sysbio/syx077

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  35 in total

1.  Target-capture phylogenomics provide insights on gene and species tree discordances in Old World treefrogs (Anura: Rhacophoridae).

Authors:  Kin Onn Chan; Carl R Hutter; Perry L Wood; L Lee Grismer; Rafe M Brown
Journal:  Proc Biol Sci       Date:  2020-12-09       Impact factor: 5.349

2.  Investigating Sources of Conflict in Deep Phylogenomics of Vetigastropod Snails.

Authors:  Tauana Junqueira Cunha; James Davis Reimer; Gonzalo Giribet
Journal:  Syst Biol       Date:  2022-06-16       Impact factor: 9.160

3.  DISCO: Species Tree Inference using Multicopy Gene Family Tree Decomposition.

Authors:  James Willson; Mrinmoy Saha Roddur; Baqiao Liu; Paul Zaharias; Tandy Warnow
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

4.  Excluding Loci With Substitution Saturation Improves Inferences From Phylogenomic Data.

Authors:  David A Duchêne; Niklas Mather; Cara Van Der Wal; Simon Y W Ho
Journal:  Syst Biol       Date:  2022-04-19       Impact factor: 9.160

5.  Non-parametric correction of estimated gene trees using TRACTION.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Ananya Yammanuru; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2020-01-04       Impact factor: 1.405

6.  OCTAL: Optimal Completion of gene trees in polynomial time.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2018-03-15       Impact factor: 1.405

7.  Evolutionary Rate Variation among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference.

Authors:  Mezzalina Vankan; Simon Y W Ho; David A Duchêne
Journal:  Syst Biol       Date:  2022-02-10       Impact factor: 15.683

8.  The Legacy of Recurrent Introgression during the Radiation of Hares.

Authors:  Mafalda S Ferreira; Matthew R Jones; Colin M Callahan; Liliana Farelo; Zelalem Tolesa; Franz Suchentrunk; Pierre Boursot; L Scott Mills; Paulo C Alves; Jeffrey M Good; José Melo-Ferreira
Journal:  Syst Biol       Date:  2021-04-15       Impact factor: 15.683

9.  Interrogating Phylogenetic Discordance Resolves Deep Splits in the Rapid Radiation of Old World Fruit Bats (Chiroptera: Pteropodidae).

Authors:  Nicolas Nesi; Georgia Tsagkogeorga; Susan M Tsang; Violaine Nicolas; Aude Lalis; Annette T Scanlon; Silke A Riesle-Sbarbaro; Sigit Wiantoro; Alan T Hitch; Javier Juste; Corinna A Pinzari; Frank J Bonaccorso; Christopher M Todd; Burton K Lim; Nancy B Simmons; Michael R McGowen; Stephen J Rossiter
Journal:  Syst Biol       Date:  2021-10-13       Impact factor: 15.683

10.  Anchored Phylogenomics, Evolution and Systematics of Elateridae: Are All Bioluminescent Elateroidea Derived Click Beetles?

Authors:  Hume B Douglas; Robin Kundrata; Adam J Brunke; Hermes E Escalona; Julie T Chapados; Jackson Eyres; Robin Richter; Karine Savard; Adam Ślipiński; Duane McKenna; Jeremy R Dettman
Journal:  Biology (Basel)       Date:  2021-05-21
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.