Literature DB >> 33514397

A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions.

Alena Orlenko1, Jason H Moore2.   

Abstract

BACKGROUND: Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer's, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model's performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis.
RESULTS: To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions.
CONCLUSIONS: By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.

Entities:  

Keywords:  Alzheimer’s disease; Epistasis; Feature importances; Glaucoma; Machine learning; Random forest; Simulation

Year:  2021        PMID: 33514397      PMCID: PMC7847145          DOI: 10.1186/s13040-021-00243-0

Source DB:  PubMed          Journal:  BioData Min        ISSN: 1756-0381            Impact factor:   2.522


  32 in total

1.  Feasible and successful: genome-wide interaction analysis involving all 1.9 x 10(11) pair-wise interaction tests.

Authors:  Michael Steffens; Tim Becker; Thomas Sander; Rolf Fimmers; Christine Herold; Daniela A Holler; Costin Leu; Stefan Herms; Sven Cichon; Bastian Bohn; Thomas Gerstner; Michael Griebel; Markus M Nöthen; Thomas F Wienker; Max P Baur
Journal:  Hum Hered       Date:  2010-03-31       Impact factor: 0.444

2.  ViSEN: methodology and software for visualization of statistical epistasis networks.

Authors:  Ting Hu; Yuanzhu Chen; Jeff W Kiralis; Jason H Moore
Journal:  Genet Epidemiol       Date:  2013-03-06       Impact factor: 2.135

3.  Genome-wide association scan allowing for epistasis in type 2 diabetes.

Authors:  Jordana T Bell; Nicholas J Timpson; N William Rayner; Eleftheria Zeggini; Timothy M Frayling; Andrew T Hattersley; Andrew P Morris; Mark I McCarthy
Journal:  Ann Hum Genet       Date:  2010-12-06       Impact factor: 1.670

4.  Next-generation analysis of cataracts: determining knowledge driven gene-gene interactions using Biofilter, and gene-environment interactions using the PhenX Toolkit.

Authors:  Sarah A Pendergrass; Shefali S Verma; Emily R Holzinger; Carrie B Moore; John Wallace; Scott M Dudek; Wayne Huggins; Terrie Kitchner; Carol Waudby; Richard Berg; Catherine A McCarty; Marylyn D Ritchie
Journal:  Pac Symp Biocomput       Date:  2013

Review 5.  Epistasis: too often neglected in complex trait studies?

Authors:  Orjan Carlborg; Chris S Haley
Journal:  Nat Rev Genet       Date:  2004-08       Impact factor: 53.242

6.  PMLB: a large benchmark suite for machine learning evaluation and comparison.

Authors:  Randal S Olson; William La Cava; Patryk Orzechowski; Ryan J Urbanowicz; Jason H Moore
Journal:  BioData Min       Date:  2017-12-11       Impact factor: 2.522

Review 7.  Bioinformatics challenges for genome-wide association studies.

Authors:  Jason H Moore; Folkert W Asselbergs; Scott M Williams
Journal:  Bioinformatics       Date:  2010-01-06       Impact factor: 6.937

8.  A random forest approach to the detection of epistatic interactions in case-control studies.

Authors:  Rui Jiang; Wanwan Tang; Xuebing Wu; Wenhui Fu
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

9.  A review for detecting gene-gene interactions using machine learning methods in genetic epidemiology.

Authors:  Ching Lee Koo; Mei Jing Liew; Mohd Saberi Mohamad; Abdul Hakim Mohamed Salleh
Journal:  Biomed Res Int       Date:  2013-10-21       Impact factor: 3.411

10.  The revival of the Gini importance?

Authors:  Stefano Nembrini; Inke R König; Marvin N Wright
Journal:  Bioinformatics       Date:  2018-11-01       Impact factor: 6.937

View more
  2 in total

Review 1.  What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics.

Authors:  Anthony M Musolf; Emily R Holzinger; James D Malley; Joan E Bailey-Wilson
Journal:  Hum Genet       Date:  2021-12-04       Impact factor: 5.881

Review 2.  The promise of automated machine learning for the genetic analysis of complex traits.

Authors:  Elisabetta Manduchi; Joseph D Romano; Jason H Moore
Journal:  Hum Genet       Date:  2021-10-28       Impact factor: 5.881

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.