Literature DB >> 35352701

Comparing Machine Learning to Regression Methods for Mortality Prediction Using Veterans Affairs Electronic Health Record Clinical Data.

Bocheng Jing1,2,3, W John Boscardin1,3,4, W James Deardorff3, Sun Young Jeon1,3, Alexandra K Lee1,3, Anne L Donovan5, Sei J Lee1,3.   

Abstract

BACKGROUND: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods.
OBJECTIVE: The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data.
DESIGN: This was a cohort study.
SETTING: Veterans Affairs (VA) EHR data. PARTICIPANTS: Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each). MEASUREMENTS AND ANALYTIC
METHODS: The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models.
RESULTS: Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics. LIMITATION: Our results should be confirmed in non-VA EHRs.
CONCLUSION: The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.
Copyright © 2022 Wolters Kluwer Health, Inc. All rights reserved.

Entities:  

Mesh:

Year:  2022        PMID: 35352701      PMCID: PMC9106858          DOI: 10.1097/MLR.0000000000001720

Source DB:  PubMed          Journal:  Med Care        ISSN: 0025-7079            Impact factor:   3.178


  27 in total

1.  Desktop medicine.

Authors:  Jason Karlawish
Journal:  JAMA       Date:  2010-11-10       Impact factor: 56.272

2.  Super learner.

Authors:  Mark J van der Laan; Eric C Polley; Alan E Hubbard
Journal:  Stat Appl Genet Mol Biol       Date:  2007-09-16

3.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods.

Authors:  Thomas Abeel; Thibault Helleputte; Yves Van de Peer; Pierre Dupont; Yvan Saeys
Journal:  Bioinformatics       Date:  2009-11-25       Impact factor: 6.937

4.  Using Machine Learning to Define the Association between Cardiorespiratory Fitness and All-Cause Mortality (from the Henry Ford Exercise Testing Project).

Authors:  Mouaz H Al-Mallah; Radwa Elshawi; Amjad M Ahmed; Waqas T Qureshi; Clinton A Brawner; Michael J Blaha; Haitham M Ahmed; Jonathan K Ehrman; Steven J Keteyian; Sherif Sakr
Journal:  Am J Cardiol       Date:  2017-08-30       Impact factor: 2.778

5.  Development and Validation of a Web-Based Pediatric Readmission Risk Assessment Tool.

Authors:  Thom Taylor; Danielle Altares Sarik; Daria Salyakina
Journal:  Hosp Pediatr       Date:  2020-03

6.  A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models.

Authors:  Evangelia Christodoulou; Jie Ma; Gary S Collins; Ewout W Steyerberg; Jan Y Verbakel; Ben Van Calster
Journal:  J Clin Epidemiol       Date:  2019-02-11       Impact factor: 6.437

7.  Mortality risk score prediction in an elderly population using machine learning.

Authors:  Sherri Rose
Journal:  Am J Epidemiol       Date:  2013-01-29       Impact factor: 4.897

8.  Prediction of lung cancer patient survival via supervised machine learning classification techniques.

Authors:  Chip M Lynch; Behnaz Abdollahi; Joshua D Fuqua; Alexandra R de Carlo; James A Bartholomai; Rayeanne N Balgemann; Victor H van Berkel; Hermann B Frieboes
Journal:  Int J Med Inform       Date:  2017-09-25       Impact factor: 4.046

Review 9.  Screening for Colorectal Cancer: Updated Evidence Report and Systematic Review for the US Preventive Services Task Force.

Authors:  Jennifer S Lin; Margaret A Piper; Leslie A Perdue; Carolyn M Rutter; Elizabeth M Webber; Elizabeth O'Connor; Ning Smith; Evelyn P Whitlock
Journal:  JAMA       Date:  2016-06-21       Impact factor: 56.272

10.  Statin Use for the Primary Prevention of Cardiovascular Disease in Adults: US Preventive Services Task Force Recommendation Statement.

Authors:  Kirsten Bibbins-Domingo; David C Grossman; Susan J Curry; Karina W Davidson; John W Epling; Francisco A R García; Matthew W Gillman; Alex R Kemper; Alex H Krist; Ann E Kurth; C Seth Landefeld; Michael L LeFevre; Carol M Mangione; William R Phillips; Douglas K Owens; Maureen G Phipps; Michael P Pignone
Journal:  JAMA       Date:  2016-11-15       Impact factor: 56.272

View more
  1 in total

1.  Do functional status and Medicare claims data improve the predictive accuracy of an electronic health record mortality index? Findings from a national Veterans Affairs cohort.

Authors:  William James Deardorff; Bocheng Jing; Sun Y Jeon; W John Boscardin; Alexandra K Lee; Kathy Z Fung; Sei J Lee
Journal:  BMC Geriatr       Date:  2022-05-18       Impact factor: 4.070

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.