Bocheng Jing1,2,3, W John Boscardin1,3,4, W James Deardorff3, Sun Young Jeon1,3, Alexandra K Lee1,3, Anne L Donovan5, Sei J Lee1,3. 1. San Francisco VA Health Care System. 2. Northern California Institute for Research and Education. 3. Division of Geriatrics. 4. Departments of Epidemiology and Biostatistics. 5. Anesthesia and Perioperative Medicine, University of California, San Francisco, San Francisco, CA.
Abstract
BACKGROUND: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods. OBJECTIVE: The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data. DESIGN: This was a cohort study. SETTING: Veterans Affairs (VA) EHR data. PARTICIPANTS: Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each). MEASUREMENTS AND ANALYTIC METHODS: The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models. RESULTS: Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics. LIMITATION: Our results should be confirmed in non-VA EHRs. CONCLUSION: The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.
BACKGROUND: It is unclear whether machine learning methods yield more accurate electronic health record (EHR) prediction models compared with traditional regression methods. OBJECTIVE: The objective of this study was to compare machine learning and traditional regression models for 10-year mortality prediction using EHR data. DESIGN: This was a cohort study. SETTING: Veterans Affairs (VA) EHR data. PARTICIPANTS: Veterans age above 50 with a primary care visit in 2005, divided into separate training and testing cohorts (n= 124,360 each). MEASUREMENTS AND ANALYTIC METHODS: The primary outcome was 10-year all-cause mortality. We considered 924 potential predictors across a wide range of EHR data elements including demographics (3), vital signs (9), medication classes (399), disease diagnoses (293), laboratory results (71), and health care utilization (149). We compared discrimination (c-statistics), calibration metrics, and diagnostic test characteristics (sensitivity, specificity, and positive and negative predictive values) of machine learning and regression models. RESULTS: Our cohort mean age (SD) was 68.2 (10.5), 93.9% were male; 39.4% died within 10 years. Models yielded testing cohort c-statistics between 0.827 and 0.837. Utilizing all 924 predictors, the Gradient Boosting model yielded the highest c-statistic [0.837, 95% confidence interval (CI): 0.835-0.839]. The full (unselected) logistic regression model had the highest c-statistic of regression models (0.833, 95% CI: 0.830-0.835) but showed evidence of overfitting. The discrimination of the stepwise selection logistic model (101 predictors) was similar (0.832, 95% CI: 0.830-0.834) with minimal overfitting. All models were well-calibrated and had similar diagnostic test characteristics. LIMITATION: Our results should be confirmed in non-VA EHRs. CONCLUSION: The differences in c-statistic between the best machine learning model (924-predictor Gradient Boosting) and 101-predictor stepwise logistic models for 10-year mortality prediction were modest, suggesting stepwise regression methods continue to be a reasonable method for VA EHR mortality prediction model development.
Authors: Mouaz H Al-Mallah; Radwa Elshawi; Amjad M Ahmed; Waqas T Qureshi; Clinton A Brawner; Michael J Blaha; Haitham M Ahmed; Jonathan K Ehrman; Steven J Keteyian; Sherif Sakr Journal: Am J Cardiol Date: 2017-08-30 Impact factor: 2.778
Authors: Evangelia Christodoulou; Jie Ma; Gary S Collins; Ewout W Steyerberg; Jan Y Verbakel; Ben Van Calster Journal: J Clin Epidemiol Date: 2019-02-11 Impact factor: 6.437
Authors: Chip M Lynch; Behnaz Abdollahi; Joshua D Fuqua; Alexandra R de Carlo; James A Bartholomai; Rayeanne N Balgemann; Victor H van Berkel; Hermann B Frieboes Journal: Int J Med Inform Date: 2017-09-25 Impact factor: 4.046
Authors: Jennifer S Lin; Margaret A Piper; Leslie A Perdue; Carolyn M Rutter; Elizabeth M Webber; Elizabeth O'Connor; Ning Smith; Evelyn P Whitlock Journal: JAMA Date: 2016-06-21 Impact factor: 56.272
Authors: Kirsten Bibbins-Domingo; David C Grossman; Susan J Curry; Karina W Davidson; John W Epling; Francisco A R García; Matthew W Gillman; Alex R Kemper; Alex H Krist; Ann E Kurth; C Seth Landefeld; Michael L LeFevre; Carol M Mangione; William R Phillips; Douglas K Owens; Maureen G Phipps; Michael P Pignone Journal: JAMA Date: 2016-11-15 Impact factor: 56.272
Authors: William James Deardorff; Bocheng Jing; Sun Y Jeon; W John Boscardin; Alexandra K Lee; Kathy Z Fung; Sei J Lee Journal: BMC Geriatr Date: 2022-05-18 Impact factor: 4.070