| Literature DB >> 34665246 |
Michael Ripperger1, Sarah C Lotspeich2, Drew Wilimitis1, Carrie E Fry3, Allison Roberts4, Matthew Lenert1, Charlotte Cherry4, Sanura Latham4, Katelyn Robinson1, Qingxia Chen1,2, Melissa L McPheeters1,3, Ben Tyndall4, Colin G Walsh1,5,6.
Abstract
OBJECTIVE: To develop and validate algorithms for predicting 30-day fatal and nonfatal opioid-related overdose using statewide data sources including prescription drug monitoring program data, Hospital Discharge Data System data, and Tennessee (TN) vital records. Current overdose prevention efforts in TN rely on descriptive and retrospective analyses without prognostication.Entities:
Keywords: drug overdose; machine learning; opioid epidemic; prescription drug monitoring programs; vital statistics
Mesh:
Substances:
Year: 2021 PMID: 34665246 PMCID: PMC8714265 DOI: 10.1093/jamia/ocab218
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 7.942
Figure 1.Conceptual diagram of training data splits, weak learners, the ensembling/calibration development step, and the testing step.
Characteristics of both the 20 weak learner models in the development set and the 14 ensemble models in the test set for fatal and nonfatal overdose
| Fatal overdose | Nonfatal overdose | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Weak learner/ensemble | AUROC | AUPRC | Cases | Controls | % Outcomes | AUROC | AUPRC | Cases | Controls | % Outcomes |
| WL1 | 0.77 | 0.00024 | 224 | 3 566 077 | 0.0063 | 0.79 | 0.0018 | 1131 | 3 580 452 | 0.032 |
| WL2 | 0.73 | 0.00023 | 0.78 | 0.0016 | ||||||
| WL3 | 0.76 | 0.00023 | 0.79 | 0.0019 | ||||||
| WL4 | 0.75 | 0.00024 | 0.78 | 0.0016 | ||||||
| WL5 | 0.72 | 0.00026 | 0.79 | 0.0021 | ||||||
| WL6 | 0.73 | 0.00027 | 0.80 | 0.0019 | ||||||
| WL7 | 0.71 | 0.00023 | 0.79 | 0.0017 | ||||||
| WL8 | 0.78 | 0.00024 | 0.80 | 0.0017 | ||||||
| WL9 | 0.75 | 0.00025 | 0.79 | 0.0017 | ||||||
| WL10 | 0.72 | 0.00026 | 0.78 | 0.0015 | ||||||
| Maximum | 0.83 | 0.00040 | 963 | 14 316 606 | 0.0067 | 0.82 | 0.0014 | 4031 | 14 309 753 | 0.028 |
| Minimum | 0.67 | 0.00032 | 0.76 | 0.0014 | ||||||
| Mean | 0.83 | 0.00042 | 0.83 | 0.0015 | ||||||
| Median | 0.80 | 0.00041 | 0.82 | 0.0015 | ||||||
| LASSO | 0.79 | 0.00038 | 0.82 | 0.0015 | ||||||
| Ridge | 0.83 | 0.00042 | 0.83 | 0.0016 | ||||||
| Random forest | 0.38 | 0.00007 | 0.49 | 0.0004 | ||||||
Note: Ensemble models combined and calibrated weak learner model predictions from the development set.
AUPRC: area under the precision recall curve; AUROC: area under the receiver operating curve.
Risk concentration of the ensembled fatal and nonfatal prediction models which were validated in the test set
| Fatal/Nonfatal | Ensembling method | Quantile | Prescriptions | Cases | Proportion of cases | Inclusive lower bound | Exclusive upper bound |
|---|---|---|---|---|---|---|---|
| Fatal | Mean | 1 | 4 106 507 | 32 | 0.033 | 0.00E+00 | 1.65E−08 |
| 2 | 210 474 | 4 | 0.004 | 1.65E−08 | 6.28E−08 | ||
| 3 | 1 412 596 | 14 | 0.015 | 6.28E−08 | 3.33E−05 | ||
| 4 | 1 429 211 | 33 | 0.034 | 3.33E−05 | 3.53E−04 | ||
| 5 | 1 431 757 | 40 | 0.042 | 3.53E−04 | 7.07E−04 | ||
| 6 | 1 432 104 | 66 | 0.069 | 7.07E−04 | 1.46E−03 | ||
| 7 | 1 434 688 | 100 | 0.104 | 1.46E−03 | 2.92E−03 | ||
| 8 | 1 428 476 | 171 | 0.178 | 2.92E−03 | 6.80E−03 | ||
| 9 | 1 431 756 | 503 | 0.522 | 6.80E−03 | 3.34E−01 | ||
| Ridge regression | 1 | 1 431 758 | 81 | 0.084 | 3.85E−05 | 5.47E−05 | |
| 2 | 4 776 940 | 43 | 0.045 | 5.47E−05 | 5.48E−05 | ||
| 3 | 950 091 | 4 | 0.004 | 5.48E−05 | 5.48E−05 | ||
| 4 | 1 443 236 | 55 | 0.057 | 5.48E−05 | 5.54E−05 | ||
| 5 | 1 420 274 | 60 | 0.062 | 5.54E−05 | 5.66E−05 | ||
| 6 | 1 431 800 | 103 | 0.107 | 5.66E−05 | 6.00E−05 | ||
| 7 | 1 431 716 | 159 | 0.165 | 6.00E−05 | 6.73E−05 | ||
| 8 | 1 431 754 | 458 | 0.476 | 6.73E−05 | 3.19E−01 | ||
| Nonfatal | Mean | 1 | 1 929 336 | 43 | 0.011 | 0.00E+00 | 1.93E−08 |
| 2 | 933 421 | 25 | 0.006 | 1.93E−08 | 8.68E−06 | ||
| 3 | 1 437 802 | 67 | 0.017 | 8.68E−06 | 2.50E−04 | ||
| 4 | 1 425 055 | 81 | 0.020 | 2.50E−04 | 6.91E−04 | ||
| 5 | 1 432 474 | 123 | 0.031 | 6.91E−04 | 1.41E−03 | ||
| 6 | 1 430 183 | 143 | 0.035 | 1.41E−03 | 2.55E−03 | ||
| 7 | 1 431 883 | 290 | 0.072 | 2.55E−03 | 4.50E−03 | ||
| 8 | 1 431 048 | 415 | 0.103 | 4.50E−03 | 8.04E−03 | ||
| 9 | 1 431 205 | 835 | 0.207 | 8.04E−03 | 1.59E−02 | ||
| 10 | 1 431 377 | 2009 | 0.498 | 1.59E−02 | 2.86E−01 | ||
| Ridge regression | 1 | 1 431 493 | 106 | 0.026 | 1.33E−04 | 2.11E−04 | |
| 2 | 2 073 804 | 45 | 0.011 | 2.11E−04 | 2.11E−04 | ||
| 3 | 788 932 | 19 | 0.005 | 2.11E−04 | 2.11E−04 | ||
| 4 | 1 431 285 | 96 | 0.024 | 2.11E−04 | 2.14E−04 | ||
| 5 | 1 432 043 | 143 | 0.035 | 2.14E−04 | 2.21E−04 | ||
| 6 | 1 430 714 | 172 | 0.042 | 2.21E−04 | 2.31E−04 | ||
| 7 | 1 431 378 | 239 | 0.059 | 2.31E−04 | 2.49E−04 | ||
| 8 | 1 431 379 | 478 | 0.119 | 2.49E−04 | 2.85E−04 | ||
| 9 | 1 431 378 | 807 | 0.200 | 2.85E−04 | 3.85E−04 | ||
| 10 | 1 431 378 | 1926 | 0.478 | 3.85E−04 | 9.97E−01 |
Calibration statistics for the mean and ridge regression ensembling methods for the fatal and nonfatal overdose models after application in the test set
| Ensembled model | Brier score | Intercept | Slope | Sz | Sp |
|---|---|---|---|---|---|
| Fatal mean | 0.0001305 | −5.5329 | 0.6205 | −191.59 | 0.00 |
| Fatal ridge regression | 0.0000673 | −0.3313 | 0.9599 | 1.55 | 0.120 |
| Nonfatal mean | 0.0004239 | −4.0305 | 0.7625 | −272.14 | 0.00 |
| Nonfatal ridge regression | 0.0002923 | −1.7524 | 0.7942 | −9.34 | 0.00 |
AUROC and AUPRC for various subgroups in the test set for the fatal and nonfatal ridge regression ensembled models
| Characteristic | Subgroup | Fatal | Nonfatal | ||||||
|---|---|---|---|---|---|---|---|---|---|
| AUROC | AUPRC | Cases (%) | Controls (%) | AUROC | AUPRC | Cases (%) | Controls (%) | ||
| Age | 20–29 | 0.83 | 0.00030 | 16 (1.74) | 393 406 (3.22) | 0.75 | 0.0014 | 141 (3.50) | 381 438 (3.12) |
| 30–39 | 0.79 | 0.00036 | 126 (13.74) | 1 417 833 (11.60) | 0.75 | 0.0011 | 502 (12.48) | 1 421 592 (11.63) | |
| 40–49 | 0.79 | 0.00054 | 233 (25.41) | 1 934 651 (15.83) | 0.80 | 0.0013 | 590 (14.66) | 1 945 431 (15.91) | |
| 50–59 | 0.80 | 0.00062 | 351 (38.28) | 2 583 639 (21.14) | 0.82 | 0.0021 | 966 (24.01) | 2 606 848 (21.32) | |
| 60–69 | 0.84 | 0.00045 | 172 (18.76) | 2 695 746 (22.06) | 0.82 | 0.0019 | 1036 (25.75) | 2 704 083 (22.12) | |
| 70–79 | 0.91 | 0.00022 | 14 (1.53) | 1 876 554 (15.36) | 0.83 | 0.0015 | 546 (13.57) | 1 856 381 (15.18) | |
| 80–89 | 0.95 | 0.00020 | 5 (0.55) | 948 381 (7.76) | 0.78 | 0.0013 | 217 (5.39) | 945 839 (7.74) | |
| Sex | F | 0.84 | 0.00048 | 470 (48.81) | 7 633 488 (53.42) | 0.81 | 0.0016 | 2439 (60.61) | 7 657 711 (53.61) |
| M | 0.81 | 0.00044 | 447 (46.42) | 4 556 799 (31.89) | 0.80 | 0.0015 | 1570 (39.02) | 4 542 227 (31.80) | |
| U | 0.74 | 0.00006 | 46 (4.78) | 2 100 560 (14.70) | 0.99 | 0.0009 | 15 (0.37) | 2 084 944 (14.60) | |
| Race | Asian-American | N/A | N/A | 0 (0.00) | 12 253 (0.090) | N/A | N/A | 0 (0.00) | 11 428 (0.080) |
| Black | 0.86 | 0.00023 | 35 (3.63) | 1 101 369 (7.71) | 0.79 | 0.0010 | 198 (4.92) | 1 105 227 (7.74) | |
| Native American | N/A | N/A | 0 (0.00) | 1 888 (0.010) | N/A | N/A | 0 (0.00) | 1947 (0.010) | |
| Other | 0.78 | 0.00021 | 2 (0.21) | 32 659 (0.23) | 0.88 | 0.0006 | 4 (0.10) | 32 400 (0.23) | |
| Unknown | 0.79 | 0.00041 | 105 (10.90) | 2 794 866 (19.56) | 0.92 | 0.0031 | 413 (10.26) | 2 763 247 (19.34) | |
| White | 0.83 | 0.00045 | 821 (85.25) | 10 347 812 (72.41) | 0.80 | 0.0015 | 3409 (84.72) | 10 370 633 (72.60) | |
| Ethnicity | Hispanic | 0.83 | 0.00034 | 2 (0.21) | 25 870 (0.18) | 0.81 | 0.0009 | 5 (0.12) | 25 665 (0.18) |
| Non-Hispanic | 0.81 | 0.00035 | 709 (73.62) | 10 404 402 (72.80) | 0.80 | 0.0014 | 3080 (76.54) | 10 448 420 (73.14) | |
| Unknown | 0.86 | 0.00066 | 252 (26.17) | 3 860 575 (27.01) | 0.89 | 0.0021 | 939 (23.33) | 3 810 797 (26.68) | |
| RUCC | 1, metro, >1 000 000 | 0.86 | 0.00064 | 357 (37.07) | 4 856 038 (33.98) | 0.84 | 0.0021 | 1593 (39.59) | 4 898 834 (34.29) |
| 2, metro, 250 000–1 000 000 | 0.83 | 0.00042 | 301 (31.26) | 3 852 307 (26.96) | 0.83 | 0.0015 | 1006 (25.00) | 3 829 032 (26.8) | |
| 3, metro, <250 000 | 0.78 | 0.00021 | 76 (7.89) | 1 491 989 (10.44) | 0.83 | 0.0010 | 325 (8.08) | 1 491 254 (10.44) | |
| 4, urban, >20 000+ metro adjacent | 0.80 | 0.00046 | 67 (6.96) | 1 346 081 (9.42) | 0.81 | 0.0016 | 398 (9.89) | 1 330 476 (9.31) | |
| 5, urban, >20 000+ | N/A | N/A | 0 (0.00) | 94 940 (0.66) | 0.88 | 0.0007 | 13 (0.32) | 101 859 (0.71) | |
| 6, urban, 2500–19 999 metro adjacent | 0.82 | 0.00030 | 101 (10.49) | 1 714 426 (12.00) | 0.80 | 0.0013 | 459 (11.41) | 1 709 034 (11.96) | |
| 7, urban, 2500–19 999 | 0.71 | 0.00014 | 26 (2.70) | 453 522 (3.17) | 0.82 | 0.0016 | 108 (2.68) | 442 912 (3.10) | |
| 8, rural, <2500, metro adjacent | 0.68 | 0.00020 | 21 (2.18) | 320 187 (2.24) | 0.82 | 0.0011 | 86 (2.14) | 312 688 (2.19) | |
| 9, rural, <2500 | 0.80 | 0.00510 | 14 (1.45) | 161 357 (1.13) | 0.82 | 0.0011 | 36 (0.89) | 168 793 (1.18) | |
AUPRC: area under the precision recall curve; AUROC: area under the receiver operating curve; RUCC: Rural–Urban Continuity Codes.
Figure 2.AUPRC of the LASSO, max, mean, median, min, and ridge regression ensembling methods for fatal and nonfatal overdose models when the number of partitions was changed. Note: compressed y-axes used to visualize minimal differences in models by number of partitions. AUPRC: area under the precision recall curve.
Figure 3.Top 15 predictive features by mean rank of importance for the fatal opioid overdose model.
Figure 4.Top 15 predictive features by mean rank of importance for the nonfatal opioid overdose model.