| Literature DB >> 34019087 |
Jeph Herrin1, Neena S Abraham2,3,4, Xiaoxi Yao3,4,5, Peter A Noseworthy3,4,5, Jonathan Inselman3,4, Nilay D Shah3,4,6, Che Ngufor3,4,7.
Abstract
Importance: Anticipating the risk of gastrointestinal bleeding (GIB) when initiating antithrombotic treatment (oral antiplatelets or anticoagulants) is limited by existing risk prediction models. Machine learning algorithms may result in superior predictive models to aid in clinical decision-making. Objective: To compare the performance of 3 machine learning approaches with the commonly used HAS-BLED (hypertension, abnormal kidney and liver function, stroke, bleeding, labile international normalized ratio, older age, and drug or alcohol use) risk score in predicting antithrombotic-related GIB. Design, Setting, and Participants: This retrospective cross-sectional study used data from the OptumLabs Data Warehouse, which contains medical and pharmacy claims on privately insured patients and Medicare Advantage enrollees in the US. The study cohort included patients 18 years or older with a history of atrial fibrillation, ischemic heart disease, or venous thromboembolism who were prescribed oral anticoagulant and/or thienopyridine antiplatelet agents between January 1, 2016, and December 31, 2019. Exposures: A cohort of patients prescribed oral anticoagulant and thienopyridine antiplatelet agents was divided into development and validation cohorts based on date of index prescription. The development cohort was used to train 3 machine learning models to predict GIB at 6 and 12 months: regularized Cox proportional hazards regression (RegCox), random survival forests (RSF), and extreme gradient boosting (XGBoost). Main Outcomes and Measures: The performance of the models for predicting GIB in the validation cohort, evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value, and prediction density plots. Relative importance scores were used to identify the variables that were most influential in the top-performing machine learning model.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34019087 PMCID: PMC8140376 DOI: 10.1001/jamanetworkopen.2021.10703
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Characteristics of the Patients Included in Study
| Characteristic | Patients, No. (%) | ||
|---|---|---|---|
| No GI bleed (n = 294 141) | GI bleed (n = 12 322) | All (N = 306 463) | |
| Condition group | |||
| Atrial fibrillation | 46 716 (15.9) | 1689 (13.7) | 48 405 (15.8) |
| Atrial fibrillation and ischemic heart disease | 68 212 (23.2) | 3399 (27.6) | 71 611 (23.4) |
| Atrial fibrillation, ischemic heart disease, and venous thromboembolism | 9015 (3.1) | 597 (4.8) | 9612 (3.1) |
| Ischemic heart disease | 113 136 (38.5) | 4548 (36.9) | 117 684 (38.4) |
| Ischemic heart disease and venous thromboembolism | 18 423 (6.3) | 938 (7.6) | 19 361 (6.3) |
| Venous thromboembolism | 34 616 (11.8) | 972 (7.9) | 35 588 (11.6) |
| Venous thromboembolism and atrial fibrillation | 4023 (1.4) | 179 (1.5) | 4202 (1.4) |
| Age, mean (SD), y | 68.9 (12.7) | 71.2 (11.2) | 69.0 (12.6) |
| Age group, y | |||
| 18-64 | 94 600 (32.2) | 2871 (23.3) | 97 471 (31.8) |
| 65-74 | 92 352 (31.4) | 4239 (34.4) | 96 591 (31.5) |
| 75-86 | 107 189 (36.4) | 5212 (42.3) | 112 401 (36.7) |
| Race/ethnicity | |||
| White | 185 989 (63.2) | 7659 (62.2) | 193 648 (63.2) |
| Black | 31 757 (10.8) | 1552 (12.6) | 33 309 (10.9) |
| Other | 76 395 (26.0) | 3111 (25.2) | 79 506 (25.9) |
| Sex | |||
| Female | 133 703 (45.5) | 6583 (53.4) | 140 286 (45.8) |
| Male | 160 438 (54.5) | 5739 (46.6) | 166 177 (54.2) |
| Treatment group | |||
| Anticoagulants | 166 209 (56.5) | 6618 (53.7) | 172 827 (56.4) |
| Antiplatelets | 123 180 (41.9) | 5519 (44.8) | 128 699 (42.0) |
| Anticoagulants and antiplatelets | 4752 (1.6) | 185 (1.5) | 4937 (1.6) |
| Baseline comorbidities | |||
| Diabetes | 122 796 (41.7) | 5934 (48.2) | 128 730 (42.0) |
| Hypertension | 258 738 (88.0) | 11 546 (93.7) | 270 284 (88.2) |
| Peripheral arterial disease | 45 194 (15.4) | 2377 (19.3) | 47 571 (15.5) |
| Alcoholism | 18 938 (6.4) | 975 (7.9) | 19 913 (6.5) |
| Chronic kidney failure | 13 615 (4.6) | 922 (7.5) | 14 537 (4.7) |
| Chronic liver disease | 30 373 (10.3) | 1708 (13.9) | 32 081 (10.5) |
| Rheumatologic disease | 21 033 (7.2) | 1199 (9.7) | 22 232 (7.3) |
| Carotid revascularization | 13 255 (4.5) | 640 (5.2) | 13 895 (4.5) |
|
| 5467 (1.9) | 456 (3.7) | 5923 (1.9) |
| History of GI bleeding | 66 361 (22.6) | 5217 (42.3) | 71 578 (23.4) |
| Smoking | 134 235 (45.6) | 6261 (50.8) | 140 496 (45.8) |
| Sleep apnea | 37 322 (12.7) | 1830 (14.9) | 39 152 (12.8) |
| Thyroid disease | 84 826 (28.8) | 4049 (32.9) | 88 875 (29.0) |
| Valvular heart disease | 128 578 (43.7) | 6101 (49.5) | 134 679 (43.9) |
| Viral hepatitis | 6867 (2.3) | 386 (3.1) | 7253 (2.4) |
| Percutaneous coronary intervention | 21 485 (7.3) | 1567 (12.7) | 23 052 (7.5) |
| Charlson comorbidities | |||
| Cerebrovascular disease | 66 909 (22.7) | 3296 (26.7) | 70 205 (22.9) |
| Dementia | 17 676 (6.0) | 755 (6.1) | 18 431 (6.0) |
| Hemiplegia | 12 135 (4.1) | 571 (4.6) | 12 706 (4.1) |
| AIDS | 805 (0.3) | 35 (0.3) | 840 (0.3) |
| Medications | |||
| Antiarrythmic drugs | 23 640 (8.0) | 1041 (8.4) | 24 681 (8.1) |
| Antihyperlipidemic drugs | 170 733 (58.0) | 7592 (61.6) | 178 325 (58.2) |
| Gastroprotective agents, proton pump inhibitors, and/or histamine 2 blockers | 76 972 (26.2) | 5006 (40.6) | 81 978 (26.7) |
| Selective serotonin reuptake inhibitors | 38 368 (13.0) | 1962 (15.9) | 40 330 (13.2) |
| Nonsteroidal anti-inflammatory drugs | 47 964 (16.3) | 1926 (15.6) | 49 890 (16.3) |
| Antihypertensive drugs | 233 458 (79.4) | 10 492 (85.1) | 243 950 (79.6) |
Abbreviation: GI, gastrointestinal.
Other includes Asian, Hispanic, and unknown.
Defined by the Charlson Comorbidity Index.
Model Evaluation and Validation
| Model | Month | AUC (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | PPV (95% CI) | NPV (95% CI) | Balance accuracy (95% CI) | Cutoff |
|---|---|---|---|---|---|---|---|---|
| Development cohort | 6 | 0.68 (0.66-0.69) | 0.45 (0.03-0.97) | 0.71 (0.11-0.99) | 0.06 (0.03-0.11) | 0.98 (0.97-0.99) | 0.58 (0.51-0.64) | 0.02 (0.02, 0.02) |
| 12 | 0.67 (0.65-0.68) | 0.65 (0.21-0.97) | 0.52 (0.10-0.90) | 0.06 (0.04-0.08) | 0.98 (0.97-0.99) | 0.59 (0.53-0.61) | 0.03 (0.03, 0.03) | |
| Validation cohort | 6 | 0.67 | 0.26 | 0.9 | 0.05 | 0.98 | 0.58 | 0.02 |
| 12 | 0.66 | 0.59 | 0.66 | 0.04 | 0.99 | 0.62 | 0.03 | |
| Development cohort | 6 | 0.66 (0.64-0.69) | 0.58 (0.42-0.72) | 0.65 (0.51-0.80) | 0.05 (0.04-0.06) | 0.98 (0.98-0.98) | 0.62 (0.61-0.64) | 0.04 (0.03, 0.04) |
| 12 | 0.65 (0.62-0.67) | 0.62 (0.55-0.67) | 0.60 (0.54-0.64) | 0.06 (0.05-0.06) | 0.98 (0.97-0.98) | 0.61 (0.59-0.62) | 0.06 (0.05, 0.07) | |
| Validation cohort | 6 | 0.62 | 0.53 | 0.65 | 0.03 | 0.99 | 0.59 | 0.04 |
| 12 | 0.60 | 0.58 | 0.58 | 0.03 | 0.98 | 0.58 | 0.06 | |
| Development cohort | 6 | 0.68 (0.66-0.70) | 0.67 (0.64-0.70) | 0.59 (0.58-0.59) | 0.04 (0.04-0.05) | 0.98 (0.98-0.99) | 0.63 (0.62-0.64) | 0.03 (0.03, 0.03) |
| 12 | 0.67 (0.65-0.69) | 0.67 (0.65-0.70) | 0.56 (0.55-0.57) | 0.06 (0.05-0.06) | 0.98 (0.97-0.98) | 0.62 (0.61-0.63) | 0.05 (0.05, 0.05) | |
| Validation cohort | 6 | 0.67 | 0.69 | 0.57 | 0.03 | 0.99 | 0.63 | 0.03 |
| 12 | 0.66 | 0.7 | 0.54 | 0.03 | 0.99 | 0.62 | 0.05 | |
| Development cohort | 6 | 0.61 (0.59-0.62) | 0.54 (0.51-0.56) | 0.62 (0.62-0.63) | 0.03 (0.03-0.04) | 0.98 (0.98-0.99) | 0.58 (0.56-0.59) | 3 |
| 12 | 0.60 (0.59-0.61) | 0.53 (0.51-0.54) | 0.62 (0.62-0.63) | 0.05 (0.04-0.05) | 0.98 (0.97-0.98) | 0.57 (0.56-0.58) | 3 | |
| Validation cohort | 6 | 0.60 | 0.57 | 0.58 | 0.02 | 0.99 | 0.56 | 3 |
| 12 | 0.59 | 0.56 | 0.58 | 0.03 | 0.99 | 0.56 | 3 | |
Abbreviations: AUC, area under the receiving operator characteristic curve; HAS-BLED, hypertension, abnormal kidney or liver function, stroke, history of and factors associated with presence of bleeding, labile international normalized ratio, older age (>65 years), use of drugs or alcohol concomitantly; NPV, negative predictive value; PPV, positive predictive value; RegCox, regularized Cox proportional hazards regression; RSF, random survival forests; XGBoost, extreme gradient boosting.
The balanced accuracy is the arithmetic mean of the sensitivity and specificity design to better judge the performance of a classifier (compared with the simple accuracy, which is the percentage of the correctly classified metric), especially in a setting in which the classes are highly imbalanced.
The cutoff is the classification threshold that minimizes the distance between the receiver operating characteristic curve and the upper left corner of the graph or the point (0, 1).
Because there was only 1 validation data set, results for the validation cohort are expressed as point estimates without 95% CIs.
Figure 1. Performance Plots for Predicting Gastrointestinal Bleeding at 12 Months in the Development Cohort
The Gini score was computed by dividing the area between the gain curve and the random classifier (dotted diagonal line) by the area between the perfect classifier and the random classifier. AUC indicates area under the ROC curve; HAS-BLED, hypertension, abnormal kidney or liver function, stroke, history of and factors associated with presence of bleeding, labile international normalized ratio, older age (>65 years), use of drugs or alcohol concomitantly; RegCox, regularized Cox proportional hazards regression; RSF, random survival forests; XGBoost, extreme gradient boosting.
Figure 2. Survival Prediction Density by Gastrointestinal (GI) Bleed Status at 12 Months for HAS-BLED and Regularized Cox Proportional Hazards Regression (RegCox) Models
HAS-BLED indicates hypertension, abnormal kidney or liver function, stroke, history of and factors associated with presence of bleeding, labile international normalized ratio, older age (>65 years), use of drugs or alcohol concomitantly.
Figure 3. Importance Scores for Factors Included in the Regularized Cox Proportional Hazards Regression (RegCox) Machine Learning Model
The importance scores (β coefficients) from the 12-month RegCox model for all factors are shown. AC indicates anticoagulants; AFIB, atrial fibrillation; AP, antiplatelets; CRF, chronic renal failure; GIB, gastrointestinal bleeding; H2, histamine 2 blocker; IHD, ischemic heart disease; PAD, peripheral arterial disease; PCI, percutaneous coronary intervention; PPI, proton pump inhibitor; SSRI, selective serotonin reuptake inhibitor; VTE, venous thromboembolism.