Literature DB >> 28665011

Demographic model selection using random forests and the site frequency spectrum.

Megan L Smith1, Megan Ruffley2,3, Anahí Espíndola2,3, David C Tank2,3, Jack Sullivan2,3, Bryan C Carstens1.   

Abstract

Phylogeographic data sets have grown from tens to thousands of loci in recent years, but extant statistical methods do not take full advantage of these large data sets. For example, approximate Bayesian computation (ABC) is a commonly used method for the explicit comparison of alternate demographic histories, but it is limited by the "curse of dimensionality" and issues related to the simulation and summarization of data when applied to next-generation sequencing (NGS) data sets. We implement here several improvements to overcome these difficulties. We use a Random Forest (RF) classifier for model selection to circumvent the curse of dimensionality and apply a binned representation of the multidimensional site frequency spectrum (mSFS) to address issues related to the simulation and summarization of large SNP data sets. We evaluate the performance of these improvements using simulation and find low overall error rates (~7%). We then apply the approach to data from Haplotrema vancouverense, a land snail endemic to the Pacific Northwest of North America. Fifteen demographic models were compared, and our results support a model of recent dispersal from coastal to inland rainforests. Our results demonstrate that binning is an effective strategy for the construction of a mSFS and imply that the statistical power of RF when applied to demographic model selection is at least comparable to traditional ABC algorithms. Importantly, by combining these strategies, large sets of models with differing numbers of populations can be evaluated.
© 2017 John Wiley & Sons Ltd.

Entities:  

Keywords:  RADseq; machine learning; model selection; phylogeography

Mesh:

Year:  2017        PMID: 28665011     DOI: 10.1111/mec.14223

Source DB:  PubMed          Journal:  Mol Ecol        ISSN: 0962-1083            Impact factor:   6.185


  9 in total

1.  A demonstration of unsupervised machine learning in species delimitation.

Authors:  Shahan Derkarabetian; Stephanie Castillo; Peter K Koo; Sergey Ovchinnikov; Marshal Hedin
Journal:  Mol Phylogenet Evol       Date:  2019-07-16       Impact factor: 4.286

2.  Population genomics for symbiotic anthozoans: can reduced representation approaches be used for taxa without reference genomes?

Authors:  Benjamin M Titus; Marymegan Daly
Journal:  Heredity (Edinb)       Date:  2022-04-13       Impact factor: 3.832

3.  Combining allele frequency and tree-based approaches improves phylogeographic inference from natural history collections.

Authors:  Megan Ruffley; Megan L Smith; Anahí Espíndola; Bryan C Carstens; Jack Sullivan; David C Tank
Journal:  Mol Ecol       Date:  2018-02-11       Impact factor: 6.185

4.  Parallel recolonizations generate distinct genomic sectors in kelp following high-magnitude earthquake disturbance.

Authors:  Felix Vaux; Elahe Parvizi; Dave Craw; Ceridwen I Fraser; Jonathan M Waters
Journal:  Mol Ecol       Date:  2022-06-21       Impact factor: 6.622

5.  Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks.

Authors:  Paul D Blischak; Michael S Barker; Ryan N Gutenkunst
Journal:  Mol Ecol Resour       Date:  2021-03-08       Impact factor: 7.090

6.  The divergence history of European blue mussel species reconstructed from Approximate Bayesian Computation: the effects of sequencing techniques and sampling strategies.

Authors:  Christelle Fraïsse; Camille Roux; Pierre-Alexandre Gagnaire; Jonathan Romiguier; Nicolas Faivre; John J Welch; Nicolas Bierne
Journal:  PeerJ       Date:  2018-07-30       Impact factor: 2.984

7.  Rivers, not refugia, drove diversification in arboreal, sub-Saharan African snakes.

Authors:  Kaitlin E Allen; Eli Greenbaum; Paul M Hime; Walter P Tapondjou N; Viktoria V Sterkhova; Chifundera Kusamba; Mark-Oliver Rödel; Johannes Penner; A Townsend Peterson; Rafe M Brown
Journal:  Ecol Evol       Date:  2021-05-01       Impact factor: 2.912

8.  Genomic evidence of an ancient inland temperate rainforest in the Pacific Northwest of North America.

Authors:  Megan Ruffley; Megan L Smith; Anahí Espíndola; Daniel F Turck; Niels Mitchell; Bryan Carstens; Jack Sullivan; David C Tank
Journal:  Mol Ecol       Date:  2022-04-09       Impact factor: 6.622

9.  Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning.

Authors:  Alexander T Xue; Daniel R Schrider; Andrew D Kern
Journal:  Mol Biol Evol       Date:  2021-03-09       Impact factor: 16.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.