Literature DB >> 24996413

Unforeseen Consequences of Excluding Missing Data from Next-Generation Sequences: Simulation Study of RAD Sequences.

Huateng Huang1, L Lacey Knowles2.   

Abstract

There is a lack of consensus on how next-generation sequence (NGS) data should be considered for phylogenetic and phylogeographic estimates, with some studies excluding loci with missing data, whereas others include them, even when sequences are missing from a large number of individuals. Here, we use simulations, focusing specifically on RAD (Restriction site Associated DNA) sequences, to highlight some of the unforeseen consequence of excluding missing data from next-generation sequencing. Specifically, we show that in addition to the obvious effects associated with reducing the amount of data used to make historical inferences, the decisions we make about missing data (such as the minimum number of individuals with a sequence for a locus to be included in the study) also impact the types of loci sampled for a study. In particular, as the tolerance for missing data becomes more stringent, the mutational spectrum represented in the sampled loci becomes truncated such that loci with the highest mutation rates are disproportionately excluded. This effect is exacerbated further by factors involved in the preparation of the genomic library (i.e., the use of reduced representation libraries, as well as the coverage) and the taxonomic diversity represented in the library (i.e., the level of divergence among the individuals). We demonstrate that the intuitive appeals about being conservative by removing loci may be misguided. [Next-generation sequencing; phylogenetic; phylogeography; RADseq; RADtags; species delimitation.].
© The Author(s) 2014. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2014        PMID: 24996413     DOI: 10.1093/sysbio/syu046

Source DB:  PubMed          Journal:  Syst Biol        ISSN: 1063-5157            Impact factor:   15.683


  57 in total

1.  Target-capture phylogenomics provide insights on gene and species tree discordances in Old World treefrogs (Anura: Rhacophoridae).

Authors:  Kin Onn Chan; Carl R Hutter; Perry L Wood; L Lee Grismer; Rafe M Brown
Journal:  Proc Biol Sci       Date:  2020-12-09       Impact factor: 5.349

2.  A genetic legacy of introgression confounds phylogeny and biogeography in oaks.

Authors:  John D McVay; Andrew L Hipp; Paul S Manos
Journal:  Proc Biol Sci       Date:  2017-05-17       Impact factor: 5.349

3.  The complex effects of demographic history on the estimation of substitution rate: concatenated gene analysis results in no more than twofold overestimation.

Authors:  Christopher H Martin; Sebastian Höhna; Jacob E Crawford; Bruce J Turner; Emilie J Richards; Lee H Simons
Journal:  Proc Biol Sci       Date:  2017-08-16       Impact factor: 5.349

4.  Estimation of contemporary effective population size and population declines using RAD sequence data.

Authors:  Schyler O Nunziata; David W Weisrock
Journal:  Heredity (Edinb)       Date:  2017-12-22       Impact factor: 3.821

5.  Genomic signatures of paleodrainages in a freshwater fish along the southeastern coast of Brazil: genetic structure reflects past riverine properties.

Authors:  A T Thomaz; L R Malabarba; L L Knowles
Journal:  Heredity (Edinb)       Date:  2017-08-02       Impact factor: 3.821

6.  Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial.

Authors:  B R Wright; C E Grueber; M J Lott; K Belov; R N Johnson; C J Hogg
Journal:  Mol Biol Rep       Date:  2019-07-09       Impact factor: 2.316

7.  OCTAL: Optimal Completion of gene trees in polynomial time.

Authors:  Sarah Christensen; Erin K Molloy; Pranjal Vachaspati; Tandy Warnow
Journal:  Algorithms Mol Biol       Date:  2018-03-15       Impact factor: 1.405

8.  Six-State Amino Acid Recoding is not an Effective Strategy to Offset Compositional Heterogeneity and Saturation in Phylogenetic Analyses.

Authors:  Alexandra M Hernandez; Joseph F Ryan
Journal:  Syst Biol       Date:  2021-10-13       Impact factor: 15.683

9.  Genome-wide genetic variation coupled with demographic and ecological niche modeling of the dusky-footed woodrat (Neotoma fuscipes) reveal patterns of deep divergence and widespread Holocene expansion across northern California.

Authors:  Robert A Boria; Sarah K Brown; Marjorie D Matocq; Jessica L Blois
Journal:  Heredity (Edinb)       Date:  2020-12-15       Impact factor: 3.821

10.  Spatial population genetics in heavily managed species: Separating patterns of historical translocation from contemporary gene flow in white-tailed deer.

Authors:  Tyler K Chafin; Zachery D Zbinden; Marlis R Douglas; Bradley T Martin; Christopher R Middaugh; M Cory Gray; Jennifer R Ballard; Michael E Douglas
Journal:  Evol Appl       Date:  2021-05-04       Impact factor: 5.183

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.