| Literature DB >> 34526544 |
Girish Dwivedi1,2,3, Frank M Sanfilippo4, Juan Lu1,5,6,2, Ling Wang5,7, Mohammed Bennamoun6, Isaac Ward5, Senjian An8, Ferdous Sohel6,9, Benjamin J W Chow10.
Abstract
Our aim was to investigate the usefulness of machine learning approaches on linked administrative health data at the population level in predicting older patients' one-year risk of acute coronary syndrome and death following the use of non-steroidal anti-inflammatory drugs (NSAIDs). Patients from a Western Australian cardiovascular population who were supplied with NSAIDs between 1 Jan 2003 and 31 Dec 2004 were identified from Pharmaceutical Benefits Scheme data. Comorbidities from linked hospital admissions data and medication history were inputs. Admissions for acute coronary syndrome or death within one year from the first supply date were outputs. Machine learning classification methods were used to build models to predict ACS and death. Model performance was measured by the area under the receiver operating characteristic curve (AUC-ROC), sensitivity and specificity. There were 68,889 patients in the NSAIDs cohort with mean age 76 years and 54% were female. 1882 patients were admitted for acute coronary syndrome and 5405 patients died within one year after their first supply of NSAIDs. The multi-layer neural network, gradient boosting machine and support vector machine were applied to build various classification models. The gradient boosting machine achieved the best performance with an average AUC-ROC of 0.72 predicting ACS and 0.84 predicting death. Machine learning models applied to linked administrative data can potentially improve adverse outcome risk prediction. Further investigation of additional data and approaches are required to improve the performance for adverse outcome risk prediction.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34526544 PMCID: PMC8443580 DOI: 10.1038/s41598-021-97643-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The machine learning workflow and contribution of our study. The figure was created using Microsoft PowerPoint 365, available from: https://office.microsoft.com/PowerPoint.
Figure 2Timeline for study cohort showing history, exposure and follow-up periods. The first supply date for the COX-2 inhibitors or ibuprofen within years 2003 and 2004 was defined as . The figure was created using Microsoft Visio 365, available from: https://products.office.com/en/visio/flowchart-software.
Figure 3Flowchart showing identification of the study cohort. ACS, acute coronary syndrome. The figure was created using Microsoft PowerPoint 365, available from: https://office.microsoft.com/PowerPoint.
Characteristics of the study cohorts for NSAIDs during 2003–2004.
| Features | Total cohort | ACS | All-cause death | ||
|---|---|---|---|---|---|
| No. of patients (%) n = 68 889 | No. of patients (%) n = 2757 (4.0%) | p-value | No. of patients (%) n = 5405 (7.8%) | p-value | |
| Age (years, mean[SD]) | 76.0 (7.2) | 78.8 (8.0) | < 0.0001 | 80.9 (7.8) | < 0.0001 |
| Female | 37 389 (54.3) | 1324 (48.0) | < 0.0001 | 2733 (54.3) | < 0.0001 |
| Celecoxib | 29 774 (43.2) | 1204 (43.7) | 0.6 | 2373 (43.9) | 0.3 |
| Rofecoxib | 24 432 (35.5) | 953 (34.6) | 0.3 | 1851 (34.3) | 0.051 |
| Indometacin | 4088 (5.9) | 180 (6.5) | 0.2 | 370 (6.9) | 0.004 |
| Sulindac | 286 (0.4) | 11 (0.4) | 0.9 | 29 (0.5) | 0.1 |
| Diclofenac | 11 660 (16.9) | 422 (15.3) | 0.02 | 705 (13.0) | < 0.0001 |
| Diclofenac, combinations | 18 (0.03) | < 5 | < 0.0001 | < 5 | 0.2 |
| Piroxicam | 3414 (5.0) | 130 (4.7) | 0.6 | 203 (3.8) | < 0.0001 |
| Meloxicam | 12 982 (18.8) | 434 (15.7) | < 0.0001 | 686 (12.7) | < 0.0001 |
| Ibuprofen | 4305 (6.3) | 161 (5.8) | 0.4 | 319 (5.9) | 0.3 |
| Naproxen | 5883 (8.5) | 214 (7.8) | 0.1 | 435 (8.1) | 0.2 |
| Ketoprofen | 1951 (2.8) | 65 (2.4) | 0.1 | 122 (2.3) | 0.008 |
| Tiaprofenic acid | 398 (0.6) | 10 (0.4) | 0.1 | 34 (0.6) | 0.6 |
| Fenamates | 46 (0.07) | < 5 | 0.9 | < 5 | 0.4 |
| Ischemic heart disease | 14 445 (21.0) | 1031 (37.4) | < 0.0001 | 1150 (21.3) | 0.6 |
| Hypertension | 1091 (1.6) | 52 (1.9) | 0.2 | 113 (2.1) | 0.002 |
| Atrial fibrillation | 3468 (5.0) | 198 (7.2) | < 0.0001 | 390 (7.2) | < 0.0001 |
| Diabetes | 3779 (5.5) | 186 (6.8) | 0.003 | 409 (7.6) | < 0.0001 |
| COPD | 3659 (5.3) | 255 (9.3) | < 0.0001 | 624 (11.5) | < 0.0001 |
| PVD | 3226 (4.7) | 277 (10.1) | < 0.0001 | 474 (8.8) | < 0.0001 |
| Stroke | 2545 (3.7) | 164 (6.0) | < 0.0001 | 425 (7.9) | < 0.0001 |
| Chronic kidney disease | 1337 (1.9) | 78 (2.8) | < 0.0001 | 159 (2.9) | < 0.0001 |
| Cancer | 3049 (4.4) | 98 (3.6) | 0.02 | 747 (13.8) | < 0.0001 |
| Dementia | 386 (0.6) | 27 (1.0) | 0.003 | 140 (2.6) | < 0.0001 |
| Depression | 831 (1.2) | 43 (1.6) | 0.08 | 94 (1.7) | 0.0002 |
| Heart failure | 2474 (3.6) | 335 (12.2) | < 0.0001 | 629 (11.6) | < 0.0001 |
| Cardiomyopathy | 136 (0.2) | 9 (0.3) | 0.1 | 19 (0.4) | 0.008 |
| Drug group A | 42 495 (61.7) | 1868 (67.8) | < 0.0001 | 4032 (74.6) | < 0.0001 |
| Drug group B | 22 848 (33.2) | 1214 (44.0) | < 0.0001 | 2352 (43.5) | < 0.0001 |
| Drug group C | 57 953 (84.1) | 2389 (86.7) | 0.0002 | 4496 (83.2) | 0.05 |
| Drug group D | 13 710 (19.9) | 546 (19.8) | 0.9 | 1128 (20.9) | 0.06 |
| Drug group G | 6924 (10.1) | 243 (8.8) | 0.03 | 536 (9.9) | 0.7 |
| Drug group H | 12 275 (17.8) | 520 (19.6) | 0.01 | 1494 (27.6) | < 0.0001 |
| Drug group J | 30 857 (44.8) | 1336 (48.5) | < 0.0001 | 3151 (58.3) | < 0.0001 |
| Drug group L | 3192 (4.6) | 131 (4.8) | 0.8 | 585 (10.8) | < 0.0001 |
| Drug group M | 29 698 (43.1) | 1152 (41.8) | 0.2 | 1935 (35.8) | < 0.0001 |
| Drug group N | 46 410 (67.4) | 2075 (75.3) | < 0.0001 | 4517 (83.6) | < 0.0001 |
| Drug group P | 5829 (8.5) | 290 (10.5) | < 0.0001 | 593 (11.0) | < 0.0001 |
| Drug group R | 13 747 (20.0) | 640 (23.2) | < 0.0001 | 1456 (26.9) | < 0.0001 |
| Drug group S | 21 652 (31.4) | 933 (33.8) | 0.005 | 1919 (35.5) | < 0.0001 |
| Drug group V | 2554 (3.7) | 109 (4.0) | 0.5 | 261 (4.8) | < 0.0001 |
SD standard deviation, COPD chronic obstructive pulmonary disease, PVD peripheral vascular disease, ACS acute coronary syndrome. Drug groups: A alimentary tract and metabolism, B blood and blood forming organs, C cardiovascular system, D dermatologicals, G genito urinary system and sex hormones, H systemic hormonal preparations, J anti-infectives for systemic use, L antineoplastic and immunomodulating agents, M musculo-skeletal system, N nervous system, P antiparasitic products, R respiratory system, S sensory organs, V various.
Performance of machine learning models and Cox regression measured by sensitivity, specificity, and AUC-ROC.
| Models | Performance metrics | ACS (95% CI) | All-cause death (95% CI) | ACS or All-cause death (95% CI) |
|---|---|---|---|---|
| GBM | Sensitivity | 0.61 (0.60, 0.63) | 0.78 (0.78, 0.79) | 0.68 (0.67, 0.69) |
| Specificity | 0.72 (0.70, 0.73) | 0.74 (0.73, 0.75) | 0.75 (0.74, 0.75) | |
| AUC-ROC | 0.72 (0.71, 0.72) | 0.837 (0.836, 0.839) | 0.780 (0.778, 0.781) | |
| MLNN | Sensitivity | 0.61 (0.60, 0.63) | 0.76 (0.75, 0.76) | 0.69 (0.68, 0.70) |
| Specificity | 0.70 (0.69, 0.71) | 0.76 (0.75, 0.77) | 0.75 (0.74, 0.75) | |
| AUC-ROC | 0.70 (0.70, 0.71) | 0.834 (0.833, 0.836) | 0.778 (0.776, 0.780) | |
| SVM | Sensitivity | 0.61 (0.60, 0.62) | 0.74 (0.73, 0.75) | 0.70 (0.69, 0.71) |
| Specificity | 0.72 (0.71, 0.73) | 0.75 (0.74, 0.75) | 0.73 (0.72, 0.74) | |
| AUC-ROC | 0.710 (0.707, 0.712) | 0.813 (0.812, 0.814) | 0.777 (0.776, 0.779) | |
| Cox Regression | Sensitivity | 0.62 (0.60, 0.64) | 0.71 (0.70, 0.72) | 0.66 (0.66, 0.67) |
| Specificity | 0.63 (0.61, 0.65) | 0.69 (0.67, 0.70) | 0.66 (0.65, 0.66) | |
| AUC-ROC | 0.659 (0.656, 0.662) | 0.76 (0.75, 0.76) | 0.711 (0.710, 713) | |
| Cox regression* | Sensitivity | 0.638 (0.602, 0.674) | 0.726 (0.710, 0.742) | 0.653 (0.641, 0.665) |
| Specificity | 0.66 (0.625, 0.694) | 0.729 (0.712, 0.746) | 0.728 (0.718, 0.739) | |
| AUC-ROC | 0.695 (0.688, 0.702) | 0.795 (0.793, 0.797) | 0.750 (0.745, 754) |
*Cox model with binary variables.
Risk prediction performance of GBM models (AUC-ROC 95% CI) for different NSAIDs.
| NSAID | ACS | All-cause death | ACS or All-cause death |
|---|---|---|---|
| Indometacin | 0.71 (0.70, 0.72) | 0.81 (0.80, 0.81) | 0.77 (0.77, 0.78) |
| Sulindac | 0.84 (0.78, 0.89) | 0.82 (0.80, 0.84) | 0.82 (0.80, 0.84) |
| Diclofenac | 0.67 (0.66, 0.68) | 0.80 (0.79, 0.80) | 0.74 (0.74, 0.75) |
| Piroxicam | 0.66 (0.65, 0.68) | 0.80 (0.79, 0.81) | 0.73 (0.72, 0.74) |
| Meloxicam | 0.70 (0.69, 0.70) | 0.80 (0.80, 0.81) | 0.75 (0.74, 0.75) |
| Ibuprofen | 0.71 (0.70, 0.73) | 0.82 (0.81, 0.82) | 0.76 (0.75, 0.77) |
| Naproxen | 0.71 (0.70, 0.72) | 0.82 (0.81, 0.82) | 0.77 (0.77, 0.78) |
| Ketoprofen | 0.71 (0.69, 0.73) | 0.79 (0.78, 0.80) | 0.73 (0.72, 0.74) |
| Tiaprofenic acid | 0.77 (0.72, 0.82) | 0.85 (0.83, 0.87) | 0.83 (0.80, 0.85) |
| Rofecoxib | 0.710 (0.705, 0.714) | 0.821 (0.819, 0.823) | 0.78 (0.77, 0.78) |
| Celecoxib | 0.72 (0.71, 0.72) | 0.811 (0.809, 0.813) | 0.772 (0.770, 0.774) |
NSAID non-steroidal anti-inflammatory drug, GBM gradient boosting machine.
Figure 4Ranking of NSAID feature importance from the GBM prediction models for adverse cardiovascular outcomes controlling for age, sex, comorbidity history and drug history. (a) Feature importance for ACS; (b) Feature importance for all-cause death; (c) Feature importance for the composite outcome (ACS or all-cause death). The figure was created using scikit-learn[26].