Literature DB >> 33950563

Extending approximate Bayesian computation with supervised machine learning to infer demographic history from genetic polymorphisms using DIYABC Random Forest.

François-David Collin1, Ghislain Durif1, Louis Raynal1, Eric Lombaert2, Mathieu Gautier3, Renaud Vitalis3, Jean-Michel Marin1, Arnaud Estoup3.   

Abstract

Simulation-based methods such as approximate Bayesian computation (ABC) are well-adapted to the analysis of complex scenarios of populations and species genetic history. In this context, supervised machine learning (SML) methods provide attractive statistical solutions to conduct efficient inferences about scenario choice and parameter estimation. The Random Forest methodology (RF) is a powerful ensemble of SML algorithms used for classification or regression problems. Random Forest allows conducting inferences at a low computational cost, without preliminary selection of the relevant components of the ABC summary statistics, and bypassing the derivation of ABC tolerance levels. We have implemented a set of RF algorithms to process inferences using simulated data sets generated from an extended version of the population genetic simulator implemented in DIYABC v2.1.0. The resulting computer package, named DIYABC Random Forest v1.0, integrates two functionalities into a user-friendly interface: the simulation under custom evolutionary scenarios of different types of molecular data (microsatellites, DNA sequences or SNPs) and RF treatments including statistical tools to evaluate the power and accuracy of inferences. We illustrate the functionalities of DIYABC Random Forest v1.0 for both scenario choice and parameter estimation through the analysis of pseudo-observed and real data sets corresponding to pool-sequencing and individual-sequencing SNP data sets. Because of the properties inherent to the implemented RF methods and the large feature vector (including various summary statistics and their linear combinations) available for SNP data, DIYABC Random Forest v1.0 can efficiently contribute to the analysis of large SNP data sets to make inferences about complex population genetic histories.
© The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.

Entities:  

Keywords:  SNP; approximate Bayesian computation; demographic history; model or scenario selection; parameter estimation; pool-sequencing; population genetics; random forest; supervised machine learning

Year:  2021        PMID: 33950563     DOI: 10.1111/1755-0998.13413

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  4 in total

1.  Evidence for serial founder events during the colonization of North America by the yellow fever mosquito, Aedes aegypti.

Authors:  Evlyn Pless; Jeffrey R Powell; Krystal R Seger; Brett Ellis; Andrea Gloria-Soria
Journal:  Ecol Evol       Date:  2022-05-13       Impact factor: 3.167

2.  RAD-tag and mitochondrial DNA sequencing reveal the genetic structure of a widespread and regionally imperiled freshwater mussel, Obovaria olivaria (Bivalvia: Unionidae).

Authors:  Jamie R Bucholz; Nicholas M Sard; Nichelle M VanTassel; Jeffrey D Lozier; Todd J Morris; Annie Paquet; David T Zanatta
Journal:  Ecol Evol       Date:  2022-01-26       Impact factor: 2.912

3.  The SNP-Based Profiling of Montecristo Feral Goat Populations Reveals a History of Isolation, Bottlenecks, and the Effects of Management.

Authors:  Elisa Somenzi; Gabriele Senczuk; Roberta Ciampolini; Matteo Cortellari; Elia Vajana; Gwenola Tosser-Klopp; Fabio Pilla; Paolo Ajmone-Marsan; Paola Crepaldi; Licia Colli
Journal:  Genes (Basel)       Date:  2022-01-24       Impact factor: 4.096

4.  The reconstruction of invasion histories with genomic data in light of differing levels of anthropogenic transport.

Authors:  J Hudson; S D Bourne; H Seebens; M A Chapman; M Rius
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2022-01-24       Impact factor: 6.671

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.