| Literature DB >> 31951272 |
Premanand Tiwari1, Kathryn L Colborn2, Derek E Smith3, Fuyong Xing2, Debashis Ghosh2, Michael A Rosenberg1,4.
Abstract
Importance: Atrial fibrillation (AF) is the most common sustained cardiac arrhythmia, and its early detection could lead to significant improvements in outcomes through the appropriate prescription of anticoagulation medication. Although a variety of methods exist for screening for AF, a targeted approach, which requires an efficient method for identifying patients at risk, would be preferred. Objective: To examine machine learning approaches applied to electronic health record data that have been harmonized to the Observational Medical Outcomes Partnership Common Data Model for identifying risk of AF. Design, Setting, and Participants: This diagnostic study used data from 2 252 219 individuals cared for in the UCHealth hospital system, which comprises 3 large hospitals in Colorado, from January 1, 2011, to October 1, 2018. Initial analysis was performed in December 2018; follow-up analysis was performed in July 2019. Exposures: All Observational Medical Outcomes Partnership Common Data Model-harmonized electronic health record features, including diagnoses, procedures, medications, age, and sex. Main Outcomes and Measures: Classification of incident AF in designated 6-month intervals, adjudicated retrospectively, based on area under the receiver operating characteristic curve and F1 statistic.Entities:
Mesh:
Year: 2020 PMID: 31951272 PMCID: PMC6991266 DOI: 10.1001/jamanetworkopen.2019.19396
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
UCHealth Population by AF Diagnosis
| Characteristic | Patients, No. (%) | |
|---|---|---|
| No AF (n = 2 224 183) | 6-mo Incident AF (n = 28 036) | |
| Age, mean (SD), y | 42.86 (22.26) | 71.65 (16.47) |
| Women | 1 212 586 (54.51) | 12 919 (46.08) |
| Hypertension | 358 347 (16.11) | 13 349 (47.60) |
| Coronary artery disease | 64 183 (2.88) | 3830 (13.66) |
| Mitral valve disease | 23 192 (1.04) | 1974 (7.04) |
| Heart failure | 34 806 (1.56) | 2906 (10.36) |
| Diabetes | 126 941 (5.7) | 4780 (17.04) |
| Obesity | 123 564 (5.55) | 2715 (9.68) |
| Chronic kidney disease | 38 834 (1.74) | 2229 (7.95) |
Abbreviation: AF, atrial fibrillation.
Diagnoses based on presence of International Classification of Diseases, Ninth Revision (ICD-9) or ICD-10 codes, as follows: hypertension, ICD-9 code, 401.x; ICD-10 code, I10; coronary artery disease, ICD-9 code, 414.01; ICD-10 code, I25.1; mitral valve disease, ICD-9 code, 394.0 or 424.0; ICD-10 code, I34.2 or I34.0; heart failure, ICD-9 code, 428.0; ICD-10 code, I50.9; diabetes, ICD-9 code, 250.0; ICD-10 code, E11.9; obesity, ICD-9 code, 278.0; ICD-10 code, E66.9; and chronic kidney disease, ICD-9 code, 585.9; ICD-10 code, N18.9.
Comparison of Resampling Strategies
| Strategy | F1 Score | AUC | Training Time, min |
|---|---|---|---|
| Oversampling | |||
| Random | 0.101 | 0.800 | 17.1 |
| Synthetic minority oversampling technique | 0.090 | 0.786 | 22.3 |
| Undersampling | |||
| Random | 0.099 | 0.808 | 5.4 |
| Cluster centroid | 0.062 | 0.743 | 50.8 |
| None | 0.002 | 0.500 | 10.2 |
Abbreviation: AUC, area under the receiver operating characteristic curve.
Sampling comparison from deep learning model.
Comparison of Machine Learning Approaches
| Approach | F1 Score | AUC | Training Time, min |
|---|---|---|---|
| Naive Bayes | 0.059 | 0.647 | 1.2 |
| Logistic regression with L2 regularization | 0.088 | 0.806 | 66.2 |
| Random forest | 0.076 | 0.792 | 3826.8 |
| Neural network | |||
| Shallow | 0.110 | 0.800 | 666.1 |
| Deep | 0.101 | 0.800 | 17.1 |
| Gradient boosted machine | 0.108 | 0.762 | 17 223.4 |
Abbreviation: AUC, area under the receiver operating characteristic curve.
Using random oversampling and all features. F1 score and AUC were calculated from model applied to held-out testing set (20%); training time was for training of training set (80%).
Figure. Precision Recall and Area Under the Receiver Operating Characteristic Curve for the Optimal Model
A, Because of the low incidence of atrial fibrillation systemwide, most decision thresholds did not have a high recall (positive predictive value). See text for details. B, Area under the receiver operating characteristic curve was 0.80.