Literature DB >> 29504715

A practical introduction to Random Forest for genetic association studies in ecology and evolution.

Marine S O Brieuc1,2, Charles D Waters1, Daniel P Drinan1, Kerry A Naish1.   

Abstract

Large genomic studies are becoming increasingly common with advances in sequencing technology, and our ability to understand how genomic variation influences phenotypic variation between individuals has never been greater. The exploration of such relationships first requires the identification of associations between molecular markers and phenotypes. Here, we explore the use of Random Forest (RF), a powerful machine-learning algorithm, in genomic studies to discern loci underlying both discrete and quantitative traits, particularly when studying wild or nonmodel organisms. RF is becoming increasingly used in ecological and population genetics because, unlike traditional methods, it can efficiently analyse thousands of loci simultaneously and account for nonadditive interactions. However, understanding both the power and limitations of Random Forest is important for its proper implementation and the interpretation of results. We therefore provide a practical introduction to the algorithm and its use for identifying associations between molecular markers and phenotypes, discussing such topics as data limitations, algorithm initiation and optimization, as well as interpretation. We also provide short R tutorials as examples, with the aim of providing a guide to the implementation of the algorithm. Topics discussed here are intended to serve as an entry point for molecular ecologists interested in employing Random Forest to identify trait associations in genomic data sets.
© 2018 John Wiley & Sons Ltd.

Entities:  

Keywords:  Random Forest; adaptation; association studies; ecological genetics

Mesh:

Year:  2018        PMID: 29504715     DOI: 10.1111/1755-0998.12773

Source DB:  PubMed          Journal:  Mol Ecol Resour        ISSN: 1755-098X            Impact factor:   7.090


  20 in total

Review 1.  Correlational selection in the age of genomics.

Authors:  Erik I Svensson; Stevan J Arnold; Reinhard Bürger; Katalin Csilléry; Jeremy Draghi; Jonathan M Henshaw; Adam G Jones; Stephen De Lisle; David A Marques; Katrina McGuigan; Monique N Simon; Anna Runemark
Journal:  Nat Ecol Evol       Date:  2021-04-15       Impact factor: 15.460

2.  Machine learning in postgenomic biology and personalized medicine.

Authors:  Animesh Ray
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2022-01-24

3.  Study becomes insight: Ecological learning from machine learning.

Authors:  Qiuyan Yu; Wenjie Ji; Lara Prihodko; C Wade Ross; Julius Y Anchang; Niall P Hanan
Journal:  Methods Ecol Evol       Date:  2021-08-06       Impact factor: 8.335

4.  Identification of Age-Specific and Common Key Regulatory Mechanisms Governing Eggshell Strength in Chicken Using Random Forests.

Authors:  Faisal Ramzan; Selina Klees; Armin Otto Schmitt; David Cavero; Mehmet Gültas
Journal:  Genes (Basel)       Date:  2020-04-24       Impact factor: 4.096

5.  Development of a predictive model for integrated medical and long-term care resource consumption based on health behaviour: application of healthcare big data of patients with circulatory diseases.

Authors:  Tomoyuki Takura; Keiko Hirano Goto; Asao Honda
Journal:  BMC Med       Date:  2021-01-08       Impact factor: 8.775

6.  Comparing the utility of in vivo transposon mutagenesis approaches in yeast species to infer gene essentiality.

Authors:  Anton Levitan; Andrew N Gale; Emma K Dallon; Darby W Kozan; Kyle W Cunningham; Roded Sharan; Judith Berman
Journal:  Curr Genet       Date:  2020-07-17       Impact factor: 3.886

7.  Comprehensive Genomic Investigation of Adaptive Mutations Driving the Low-Level Oxacillin Resistance Phenotype in Staphylococcus aureus.

Authors:  Stefano G Giulieri; Romain Guérillot; Jason C Kwong; Ian R Monk; Ashleigh S Hayes; Diane Daniel; Sarah Baines; Norelle L Sherry; Natasha E Holmes; Peter Ward; Wei Gao; Torsten Seemann; Timothy P Stinear; Benjamin P Howden
Journal:  mBio       Date:  2020-12-08       Impact factor: 7.867

8.  Indications of strong adaptive population genetic structure in albacore tuna (Thunnus alalunga) in the southwest and central Pacific Ocean.

Authors:  Giulia Anderson; John Hampton; Neville Smith; Ciro Rico
Journal:  Ecol Evol       Date:  2019-08-27       Impact factor: 2.912

9.  The use of classification and regression algorithms using the random forests method with presence-only data to model species' distribution.

Authors:  Lei Zhang; Falk Huettmann; Xudong Zhang; Shirong Liu; Pengsen Sun; Zhen Yu; Chunrong Mi
Journal:  MethodsX       Date:  2019-09-28

10.  Combining Random Forests and a Signal Detection Method Leads to the Robust Detection of Genotype-Phenotype Associations.

Authors:  Faisal Ramzan; Mehmet Gültas; Hendrik Bertram; David Cavero; Armin Otto Schmitt
Journal:  Genes (Basel)       Date:  2020-08-05       Impact factor: 4.096

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.