Literature DB >> 24501613

A Weighted Random Forests Approach to Improve Predictive Performance.

Stacey J Winham1, Robert R Freimuth1, Joanna M Biernacka2.   

Abstract

Identifying genetic variants associated with complex disease in high-dimensional data is a challenging problem, and complicated etiologies such as gene-gene interactions are often ignored in analyses. The data-mining method Random Forests (RF) can handle high-dimensions; however, in high-dimensional data, RF is not an effective filter for identifying risk factors associated with the disease trait via complex genetic models such as gene-gene interactions without strong marginal components. Here we propose an extension called Weighted Random Forests (wRF), which incorporates tree-level weights to emphasize more accurate trees in prediction and calculation of variable importance. We demonstrate through simulation and application to data from a genetic study of addiction that wRF can outperform RF in high-dimensional data, although the improvements are modest and limited to situations with effect sizes that are larger than what is realistic in genetics of complex disease. Thus, the current implementation of wRF is unlikely to improve detection of relevant predictors in high-dimensional genetic data, but may be applicable in other situations where larger effect sizes are anticipated.

Entities:  

Keywords:  Random Forests; gene-gene interactions; genetic data; genome wide association; high-dimensional data; weighting

Year:  2013        PMID: 24501613      PMCID: PMC3912194          DOI: 10.1002/sam.11196

Source DB:  PubMed          Journal:  Stat Anal Data Min        ISSN: 1932-1864            Impact factor:   1.051


  12 in total

1.  A perspective on epistasis: limits of models displaying no main effect.

Authors:  Robert Culverhouse; Brian K Suarez; Jennifer Lin; Theodore Reich
Journal:  Am J Hum Genet       Date:  2002-01-08       Impact factor: 11.025

2.  Identifying SNPs predictive of phenotype using random forests.

Authors:  Alexandre Bureau; Josée Dupuis; Kathleen Falls; Kathryn L Lunetta; Brooke Hayward; Tim P Keith; Paul Van Eerdewegh
Journal:  Genet Epidemiol       Date:  2005-02       Impact factor: 2.135

3.  The NCBI dbGaP database of genotypes and phenotypes.

Authors:  Matthew D Mailman; Michael Feolo; Yumi Jin; Masato Kimura; Kimberly Tryka; Rinat Bagoutdinov; Luning Hao; Anne Kiang; Justin Paschall; Lon Phan; Natalia Popova; Stephanie Pretel; Lora Ziyabari; Moira Lee; Yu Shao; Zhen Y Wang; Karl Sirotkin; Minghong Ward; Michael Kholodov; Kerry Zbicz; Jeffrey Beck; Michael Kimelman; Sergey Shevelev; Don Preuss; Eugene Yaschenko; Alan Graeff; James Ostell; Stephen T Sherry
Journal:  Nat Genet       Date:  2007-10       Impact factor: 38.330

4.  Enriched random forests.

Authors:  Dhammika Amaratunga; Javier Cabrera; Yung-Seop Lee
Journal:  Bioinformatics       Date:  2008-07-22       Impact factor: 6.937

Review 5.  Genome-wide association studies for complex traits: consensus, uncertainty and challenges.

Authors:  Mark I McCarthy; Gonçalo R Abecasis; Lon R Cardon; David B Goldstein; Julian Little; John P A Ioannidis; Joel N Hirschhorn
Journal:  Nat Rev Genet       Date:  2008-05       Impact factor: 53.242

Review 6.  Stability and aggregation of ranked gene lists.

Authors:  Anne-Laure Boulesteix; Martin Slawski
Journal:  Brief Bioinform       Date:  2009-09       Impact factor: 11.622

7.  A genome-wide association study of alcohol dependence.

Authors:  Laura J Bierut; Arpana Agrawal; Kathleen K Bucholz; Kimberly F Doheny; Cathy Laurie; Elizabeth Pugh; Sherri Fisher; Louis Fox; William Howells; Sarah Bertelsen; Anthony L Hinrichs; Laura Almasy; Naomi Breslau; Robert C Culverhouse; Danielle M Dick; Howard J Edenberg; Tatiana Foroud; Richard A Grucza; Dorothy Hatsukami; Victor Hesselbrock; Eric O Johnson; John Kramer; Robert F Krueger; Samuel Kuperman; Michael Lynskey; Karl Mann; Rosalind J Neuman; Markus M Nöthen; John I Nurnberger; Bernice Porjesz; Monika Ridinger; Nancy L Saccone; Scott F Saccone; Marc A Schuckit; Jay A Tischfield; Jen C Wang; Marcella Rietschel; Alison M Goate; John P Rice
Journal:  Proc Natl Acad Sci U S A       Date:  2010-03-02       Impact factor: 11.205

8.  Genetic variability in the NMDA-dependent AMPA trafficking cascade is associated with alcohol dependence.

Authors:  Victor M Karpyak; Jennifer R Geske; Colin L Colby; David A Mrazek; Joanna M Biernacka
Journal:  Addict Biol       Date:  2011-07-18       Impact factor: 4.280

9.  An application of Random Forests to a genome-wide association dataset: methodological considerations & new findings.

Authors:  Benjamin A Goldstein; Alan E Hubbard; Adele Cutler; Lisa F Barcellos
Journal:  BMC Genet       Date:  2010-06-14       Impact factor: 2.797

10.  Screening large-scale association study data: exploiting interactions using random forests.

Authors:  Kathryn L Lunetta; L Brooke Hayward; Jonathan Segal; Paul Van Eerdewegh
Journal:  BMC Genet       Date:  2004-12-10       Impact factor: 2.797

View more
  9 in total

1.  National Veterans Health Administration inpatient risk stratification models for hospital-acquired acute kidney injury.

Authors:  Robert M Cronin; Jacob P VanHouten; Edward D Siew; Svetlana K Eden; Stephan D Fihn; Christopher D Nielson; Josh F Peterson; Clifton R Baker; T Alp Ikizler; Theodore Speroff; Michael E Matheny
Journal:  J Am Med Inform Assoc       Date:  2015-06-23       Impact factor: 4.497

2.  Tree-Based Analysis.

Authors:  Mousumi Banerjee; Evan Reynolds; Hedvig B Andersson; Brahmajee K Nallamothu
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2019-05

3.  Predicting Anticoagulation Need for Otogenic Intracranial Sinus Thrombosis: A Machine Learning Approach.

Authors:  Matthew R Kaufmann; Philip Ryan Camilon; Jessica R Levi; Anand K Devaiah
Journal:  J Neurol Surg B Skull Base       Date:  2020-10-05

4.  Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.

Authors:  Stacey J Winham; Gregory D Jenkins; Joanna M Biernacka
Journal:  Genet Epidemiol       Date:  2015-12-07       Impact factor: 2.135

5.  Design of Probabilistic Random Forests with Applications to Anticancer Drug Sensitivity Prediction.

Authors:  Raziur Rahman; Saad Haider; Souparno Ghosh; Ranadip Pal
Journal:  Cancer Inform       Date:  2016-03-31

6.  Weighted Random Forests to Improve Arrhythmia Classification.

Authors:  Krzysztof Gajowniczek; Iga Grzegorczyk; Tomasz Ząbkowski; Chandrajit Bajaj
Journal:  Electronics (Basel)       Date:  2020-01-03       Impact factor: 2.397

7.  Prediction of prognosis and survival of patients with gastric cancer by a weighted improved random forest model: an application of machine learning in medicine.

Authors:  Cheng Xu; Jing Wang; Tianlong Zheng; Yue Cao; Fan Ye
Journal:  Arch Med Sci       Date:  2021-04-10       Impact factor: 3.707

Review 8.  What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.

Authors:  Anthony M Musolf; Emily R Holzinger; James D Malley; Joan E Bailey-Wilson
Journal:  Hum Genet       Date:  2021-12-04       Impact factor: 5.881

9.  Tree-Weighting for Multi-Study Ensemble Learners.

Authors:  Maya Ramchandran; Prasad Patil; Giovanni Parmigiani
Journal:  Pac Symp Biocomput       Date:  2020
  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.