Literature DB >> 33517416

Addressing Measurement Error in Random Forests Using Quantitative Bias Analysis.

Tammy Jiang, Jaimie L Gradus, Timothy L Lash, Matthew P Fox.   

Abstract

Although variables are often measured with error, the impact of measurement error on machine-learning predictions is seldom quantified. The purpose of this study was to assess the impact of measurement error on the performance of random-forest models and variable importance. First, we assessed the impact of misclassification (i.e., measurement error of categorical variables) of predictors on random-forest model performance (e.g., accuracy, sensitivity) and variable importance (mean decrease in accuracy) using data from the National Comorbidity Survey Replication (2001-2003). Second, we created simulated data sets in which we knew the true model performance and variable importance measures and could verify that quantitative bias analysis was recovering the truth in misclassified versions of the data sets. Our findings showed that measurement error in the data used to construct random forests can distort model performance and variable importance measures and that bias analysis can recover the correct results. This study highlights the utility of applying quantitative bias analysis in machine learning to quantify the impact of measurement error on study results.
© The Author(s) 2021. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  machine learning; measurement error; misclassification; noise; quantitative bias analysis; random forests

Mesh:

Year:  2021        PMID: 33517416      PMCID: PMC8408353          DOI: 10.1093/aje/kwab010

Source DB:  PubMed          Journal:  Am J Epidemiol        ISSN: 0002-9262            Impact factor:   5.363


  45 in total

1.  The US National Comorbidity Survey Replication (NCS-R): design and field procedures.

Authors:  Ronald C Kessler; Patricia Berglund; Wai Tat Chiu; Olga Demler; Steven Heeringa; Eva Hiripi; Robert Jin; Beth-Ellen Pennell; Ellen E Walters; Alan Zaslavsky; Hui Zheng
Journal:  Int J Methods Psychiatr Res       Date:  2004       Impact factor: 4.035

2.  Bias from nondifferential but dependent misclassification of exposure and outcome.

Authors:  P Kristensen
Journal:  Epidemiology       Date:  1992-05       Impact factor: 4.822

3.  Good practices for quantitative bias analysis.

Authors:  Timothy L Lash; Matthew P Fox; Richard F MacLehose; George Maldonado; Lawrence C McCandless; Sander Greenland
Journal:  Int J Epidemiol       Date:  2014-07-30       Impact factor: 7.196

4.  Prediction of Sex-Specific Suicide Risk Using Machine Learning and Single-Payer Health Care Registry Data From Denmark.

Authors:  Jaimie L Gradus; Anthony J Rosellini; Erzsébet Horváth-Puhó; Amy E Street; Isaac Galatzer-Levy; Tammy Jiang; Timothy L Lash; Henrik T Sørensen
Journal:  JAMA Psychiatry       Date:  2020-01-01       Impact factor: 21.596

5.  Bias Correction Methods for Misclassified Covariates in the Cox Model: comparison offive correction methods by simulation and data analysis.

Authors:  Heejung Bang; Ya-Lin Chiu; Jay S Kaufman; Mehul D Patel; Gerardo Heiss; Kathryn M Rose
Journal:  J Stat Theory Pract       Date:  2013-01-01

6.  Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data.

Authors:  Milena A Gianfrancesco; Suzanne Tamang; Jinoos Yazdany; Gabriela Schmajuk
Journal:  JAMA Intern Med       Date:  2018-11-01       Impact factor: 21.873

7.  Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests.

Authors:  Thanh-Tung Nguyen; Joshua Huang; Qingyao Wu; Thuy Nguyen; Mark Li
Journal:  BMC Genomics       Date:  2015-01-21       Impact factor: 3.969

8.  Identification of candidate colon cancer biomarkers by applying a random forest approach on microarray data.

Authors:  Zhi Yan; Jiangeng Li; Yimin Xiong; Weitian Xu; Guorong Zheng
Journal:  Oncol Rep       Date:  2012-06-29       Impact factor: 3.906

9.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine.

Authors:  Ziad Obermeyer; Ezekiel J Emanuel
Journal:  N Engl J Med       Date:  2016-09-29       Impact factor: 91.245

10.  Prediction of suicide in psychiatric patients. Report of a prospective study.

Authors:  A D Pokorny
Journal:  Arch Gen Psychiatry       Date:  1983-03
View more
  7 in total

1.  Predictive models of pregnancy based on data from a preconception cohort study.

Authors:  Jennifer J Yland; Taiyao Wang; Zahra Zad; Sydney K Willis; Tanran R Wang; Amelia K Wesselink; Tammy Jiang; Elizabeth E Hatch; Lauren A Wise; Ioannis Ch Paschalidis
Journal:  Hum Reprod       Date:  2022-03-01       Impact factor: 6.918

2.  Jiang et al. Respond to "Quantitative Bias Analysis".

Authors:  Tammy Jiang; Jaimie L Gradus; Timothy L Lash; Matthew P Fox
Journal:  Am J Epidemiol       Date:  2021-09-01       Impact factor: 4.897

3.  Suicide prediction among men and women with depression: A population-based study.

Authors:  Tammy Jiang; Dávid Nagy; Anthony J Rosellini; Erzsébet Horváth-Puhó; Katherine M Keyes; Timothy L Lash; Sandro Galea; Henrik T Sørensen; Jaimie L Gradus
Journal:  J Psychiatr Res       Date:  2021-08-11       Impact factor: 5.250

4.  Using machine learning to predict suicide in the 30 days after discharge from psychiatric hospital in Denmark.

Authors:  Tammy Jiang; Anthony J Rosellini; Erzsébet Horváth-Puhó; Brian Shiner; Amy E Street; Timothy L Lash; Henrik T Sørensen; Jaimie L Gradus
Journal:  Br J Psychiatry       Date:  2021-08       Impact factor: 9.319

5.  Predicting Sex-Specific Nonfatal Suicide Attempt Risk Using Machine Learning and Data From Danish National Registries.

Authors:  Jaimie L Gradus; Anthony J Rosellini; Erzsébet Horváth-Puhó; Tammy Jiang; Amy E Street; Isaac Galatzer-Levy; Timothy L Lash; Henrik T Sørensen
Journal:  Am J Epidemiol       Date:  2021-12-01       Impact factor: 4.897

6.  Detection of child depression using machine learning methods.

Authors:  Umme Marzia Haque; Enamul Kabir; Rasheda Khanam
Journal:  PLoS One       Date:  2021-12-16       Impact factor: 3.240

Review 7.  Timing errors and temporal uncertainty in clinical databases-A narrative review.

Authors:  Andrew J Goodwin; Danny Eytan; William Dixon; Sebastian D Goodfellow; Zakary Doherty; Robert W Greer; Alistair McEwan; Mark Tracy; Peter C Laussen; Azadeh Assadi; Mjaye Mazwi
Journal:  Front Digit Health       Date:  2022-08-18
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.