Literature DB >> 26330450

How Should Genes and Taxa be Sampled for Phylogenomic Analyses with Missing Data? An Empirical Study in Iguanian Lizards.

Jeffrey W Streicher1, James A Schulte2, John J Wiens3.   

Abstract

Targeted sequence capture is becoming a widespread tool for generating large phylogenomic data sets to address difficult phylogenetic problems. However, this methodology often generates data sets in which increasing the number of taxa and loci increases amounts of missing data. Thus, a fundamental (but still unresolved) question is whether sampling should be designed to maximize sampling of taxa or genes, or to minimize the inclusion of missing data cells. Here, we explore this question for an ancient, rapid radiation of lizards, the pleurodont iguanians. Pleurodonts include many well-known clades (e.g., anoles, basilisks, iguanas, and spiny lizards) but relationships among families have proven difficult to resolve strongly and consistently using traditional sequencing approaches. We generated up to 4921 ultraconserved elements with sampling strategies including 16, 29, and 44 taxa, from 1179 to approximately 2.4 million characters per matrix and approximately 30% to 60% total missing data. We then compared mean branch support for interfamilial relationships under these 15 different sampling strategies for both concatenated (maximum likelihood) and species tree (NJst) approaches (after showing that mean branch support appears to be related to accuracy). We found that both approaches had the highest support when including loci with up to 50% missing taxa (matrices with ~40-55% missing data overall). Thus, our results show that simply excluding all missing data may be highly problematic as the primary guiding principle for the inclusion or exclusion of taxa and genes. The optimal strategy was somewhat different for each approach, a pattern that has not been shown previously. For concatenated analyses, branch support was maximized when including many taxa (44) but fewer characters (1.1 million). For species-tree analyses, branch support was maximized with minimal taxon sampling (16) but many loci (4789 of 4921). We also show that the choice of these sampling strategies can be critically important for phylogenomic analyses, since some strategies lead to demonstrably incorrect inferences (using the same method) that have strong statistical support. Our preferred estimate provides strong support for most interfamilial relationships in this important but phylogenetically challenging group.
© The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  Missing data; Reptilia; Squamata; UCEs; phylogenomics; species tree; taxon sampling

Mesh:

Year:  2015        PMID: 26330450     DOI: 10.1093/sysbio/syv058

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  31 in total

1.  Target-capture phylogenomics provide insights on gene and species tree discordances in Old World treefrogs (Anura: Rhacophoridae).

Authors:  Kin Onn Chan; Carl R Hutter; Perry L Wood; L Lee Grismer; Rafe M Brown
Journal:  Proc Biol Sci       Date:  2020-12-09       Impact factor: 5.349

2.  Accumulation of transposable elements in Hox gene clusters during adaptive radiation of Anolis lizards.

Authors:  Nathalie Feiner
Journal:  Proc Biol Sci       Date:  2016-10-12       Impact factor: 5.349

3.  Combined-evidence analyses of ultraconserved elements and morphological data: an empirical example in iguanian lizards.

Authors:  Simon G Scarpetta
Journal:  Biol Lett       Date:  2020-08-26       Impact factor: 3.703

4.  Phylogenomic analyses of more than 4000 nuclear loci resolve the origin of snakes among lizard families.

Authors:  Jeffrey W Streicher; John J Wiens
Journal:  Biol Lett       Date:  2017-09       Impact factor: 3.703

5.  Escaping the evolutionary trap? Sex chromosome turnover in basilisks and related lizards (Corytophanidae: Squamata).

Authors:  Stuart V Nielsen; Irán Andira Guzmán-Méndez; Tony Gamble; Madison Blumer; Brendan J Pinto; Lukáš Kratochvíl; Michail Rovatsos
Journal:  Biol Lett       Date:  2019-10-09       Impact factor: 3.703

6.  Model Choice, Missing Data, and Taxon Sampling Impact Phylogenomic Inference of Deep Basidiomycota Relationships.

Authors:  Arun N Prasanna; Daniel Gerber; Teeratas Kijpornyongpan; M Catherine Aime; Vinson P Doyle; Laszlo G Nagy
Journal:  Syst Biol       Date:  2020-01-01       Impact factor: 15.683

7.  Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial.

Authors:  B R Wright; C E Grueber; M J Lott; K Belov; R N Johnson; C J Hogg
Journal:  Mol Biol Rep       Date:  2019-07-09       Impact factor: 2.316

8.  OCTAL: Optimal Completion of gene trees in polynomial time.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2018-03-15       Impact factor: 1.405

9.  Evolutionary Rate Variation among Lineages in Gene Trees has a Negative Impact on Species-Tree Inference.

Authors:  Mezzalina Vankan; Simon Y W Ho; David A Duchêne
Journal:  Syst Biol       Date:  2022-02-10       Impact factor: 15.683

10.  DNA Barcodes Combined with Multilocus Data of Representative Taxa Can Generate Reliable Higher-Level Phylogenies.

Authors:  Gerard Talavera; Vladimir Lukhtanov; Naomi E Pierce; Roger Vila
Journal:  Syst Biol       Date:  2022-02-10       Impact factor: 15.683

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.