| Literature DB >> 35623797 |
Majid Afshar1, Brihat Sharma2, Dmitriy Dligach3, Madeline Oguss4, Randall Brown5, Neeraj Chhabra6, Hale M Thompson2, Talar Markossian7, Cara Joyce8, Matthew M Churpek4, Niranjan S Karnik2.
Abstract
BACKGROUND: Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse.Entities:
Mesh:
Year: 2022 PMID: 35623797 PMCID: PMC9159760 DOI: 10.1016/S2589-7500(22)00041-3
Source DB: PubMed Journal: Lancet Digit Health ISSN: 2589-7500
Figure 1:Patient flow diagram for training cohort
AUDIT=Alcohol Use Disorders Identification Test. DAST=Drug Abuse Screening Test. Positive DAST are scores of 2 or higher for both sexes and positive AUDIT are scores of 5 or higher for women and 8 or higher for men. More than one type of substance misuse might apply to the same individual.
Patient characteristics and outcomes from temporal validation cohort (n=16 917)
| Alcohol misuse only (n=466) | Opioid misuse only (n=341) | Non-opioid misuse only (n=104) | Polysubstance misuse (n=112) | No misuse (n=15 894) | |
|---|---|---|---|---|---|
| Age, years | 48·0 (38·3–57·8) | 52·0 (38·0–60·0) | 53·0 (36·8–59·0) | 38·0 (32·0–53·0) | 59·0 (39·0–70·0) |
| Sex | |||||
| Male | 324 (69·5%) | 228 (66·9%) | 70 (67·3%) | 86 (76·8%) | 6265 (39·4%) |
| Female | 142 (30·5%) | 113 (33·1%), | 34 (32·7%) | 26 (23·2%) | 9629 (60·6%) |
| Race and ethnicity | |||||
| Non-Hispanic White | 185 (39·7%) | 115 (33·7%) | 19 (18·3%) | 34 (30·4%) | 5803 (36·5%) |
| Non-Hispanic Black | 155 (33·3%) | 183 (53·7%) | 65 (62·5%) | 42 (37·5%) | 5745 (36·1%) |
| Hispanic | 99 (21·2%) | 22 (6·5%) | 15 (14·4%) | 23 (20·5%) | 3170 (19·9%) |
| Mixed | 27 (5·8%) | 21 (6·2%) | 5 (4·8%) | 13 (11·6%) | 1176 (7·3%) |
| AUDIT score | 22·5 (14·0–29·0; n=466) | 1·0 (0·0–2·3; n=44) | 3·0 (1·0–4·0; n=20) | 23·5 (16·0–31·0; n=112) | 2·0 (1·0–4·0; n=598) |
| DAST score | 1·0 (0·75–2·0; n=71) | 7·0 (5·0–8·0; n=341) | 4·0 (2·0–5·0; n=104) | 6·0 (4·0–8·0; n=112) | 1·0 (0·0–1·0; n=416) |
| Insurance | |||||
| Medicare | 255 (54·7%) | 232 (68·0%) | 20 (19·2%) | 8 (7·1%) | 5553 (34·9%) |
| Medicaid | 63 (13·5%) | 41 (12·0%) | 63 (60·6%) | 72 (64·3%) | 5720 (35·9%) |
| Private | 99 (21·2%) | 52 (15·2%) | 16 (15·4%) | 26 (23·2%) | 4084 (25·7%) |
| Other | 49 (10·5%) | 16 (4·7%) | 5 (4·8%) | 6 (5·4%) | 537 (3·4%) |
| Comorbidities | |||||
| Hypertension | 243 (52·1%) | 167 (48·9%) | 65 (62·5%) | 39 (34·8%) | 9545 (60·1%) |
| Renal failure | 30 (6·4%) | 56 (16·4%) | 19 (18·3%) | 4 (3·6%) | 3571 (22·5%) |
| Neurologic | 104 (22·3%) | 53 (15·5%) | 14 (13·5%) | 14 (12·5%) | 2569 (16·2%) |
| Congestive heart failure | 39 (8·4%) | 64 (18·8%) | 28 (26·9%) | 6 (5·4%) | 2883 (18·1%) |
| Diabetes | 83 (17·8%) | 60 (17·6%) | 30 (28·8%) | 16 (14·3%) | 4767 (29·9%) |
| Liver disease | 168 (36·1%) | 43 (12·6%) | 8 (7·7%) | 12 (10·7%) | 1258 (7·9%) |
| Chronic lung disease | 75 (16·1%) | 132 (38·7%) | 40 (38·5%) | 22 (19·6%) | 3431 (21·6%) |
| Psychiatric disorders | 80 (17·2%) | 65 (19·1%) | 44 (42·3%) | 45 (40·2%) | 904 (5·7%) |
| Depression | 153 (32·8%) | 93 (27·3%) | 27 (25·9%) | 45 (40·2%) | 2680 (16·9%) |
| Alcohol misuse | 387 (83·0%) | 25 (7·3%) | 27 (25·9%) | 91 (81·3%) | 399 (2·5%) |
| Drug misuse | 66 (14·2%) | 326 (95·6%) | 79 (75·9%) | 85 (75·9%) | 467 (2·9%) |
| AIDS | 7 (1·5%) | 13 (3·8%) | 4 (3·8%) | 0 | 103 (<1%) |
| Disposition | |||||
| Home | 336 (72·1%) | 195 (57·2%) | 67 (64·4%) | 85 (75·9%) | 9549 (60·1%) |
| Death | 2 (<1%) | 5 (1·5%) | 0 | 0 | 190 (1·2%) |
| Long-term residential care or short-term post-acute care | 43 (9·2%) | 47 (13·8%) | 14 (13·4%) | 10 (8·9%) | 129 (<1%) |
| Against medical advice | 17 (3·6%) | 36 (10·6%) | 1 (<1%) | 4 (3·6%) | 1417 (8·9%) |
| Other | 68 (14·6%) | 47 (13·8%) | 22 (21·2%) | 13 (11·6%) | 4609 (28·9%) |
Data are n (%) or median (IQR). Comparisons across all variables were significant with p values <0·01. Polysubstance misuse can include patients with alcohol, or opioid misuse, or non-opioid drug misuse, or any combination of the three.
Mixed=Asian, Native American or Alaskan Native, Native Hawaiian or other Pacific Islander, other, or refused to answer or answer unknown.
Full experiment models and results
| Temporal validation | External validation | |||
|---|---|---|---|---|
| AUPRC (95% CI) | AUROC (95% CI) | AUPRC (95% CI) | AUROC (95% CI) | |
|
| ||||
| Alcohol | 0·70 (0·66–0·73) | 0·92 (0·90–0·93) | NA | NA |
| Opioid | 0·75 (0·71–0·80) | 0·98 (0·98–0·99) | NA | NA |
|
| ||||
| Overall | 0·65 (0·60–0·70) | 0·96 (0·94–0·97) | NA | NA |
| Alcohol | 0·73 (0·70–0·77) | 0·95 (0·94–0·96) | 0·92 (0·90–0·94) | 0·89 (0·87–0·91) |
| Opioid | 0·84 (0·80–0·87) | 0·99 (0·98–0·99) | 0·88 (0·85–0·91) | 0·90 (0·88–0·92) |
| Non-opioid | 0·39 (0·31–0·48) | 0·94 (0·91–0·96) | NA | NA |
|
| ||||
| Overall | 0·61 (0·56–0·66) | 0·94 (0·92–0·95) | NA | NA |
| Alcohol | 0·72 (0·68–0·76) | 0·93 (0·91–0·94) | 0·87 (0·84–0·89) | 0·79 (0·76–0·82) |
| Opioid | 0·82 (0·78–0·85) | 0·98 (0·97–0·99) | 0·75 (0·71–0·79) | 0·78 (0·75–0·82) |
| Non-opioid | 0·29 (0·22–0·37) | 0·91 (0·88–0·93) | NA | NA |
|
| ||||
| Overall | 0·59 (0·55–0·64) | 0·95 (0·94–0·96) | NA | NA |
| Alcohol | 0·72 (0·68–0·76) | 0·94 (0·93–0·95) | 0·87 (0·85–0·90) | 0·83 (0·80–0·85) |
| Opioid | 0·83 (0·80–0·86) | 0·99 (0·98–0·99) | 0·84 (0·81–0·88) | 0·88 (0·86–0·91) |
| Non-opioid | 0·22 (0·17–0·29) | 0·93 (0·91–0·95) | NA | NA |
|
| ||||
| Overall | 0·66 (0·61–0·71) | 0·97 (0·96–0·97) | NA | NA |
| Alcohol | 0·74 (0·70–0·77) | 0·95 (0·94–0·96) | 0·91 (0·89–0·93) | 0·88 (0·85–0·90) |
| Opioid | 0·84 (0·80–0·88) | 0·99 (0·98–0·99) | 0·87 (0·84–0·90) | 0·90 (0·88–0·92) |
| Non-opioid | 0·40 (0·32–0·49) | 0·96 (0·95–0·97) | NA | NA |
|
| ||||
| Overall | 0·69 (0·64–0·74) | 0·97 (0·96–0·98) | NA | NA |
| Alcohol | 0·78 (0·75–0·82) | 0·96 (0·95–0·97) | 0·92 (0·90–0·93) | 0·88 (0·86–0·90) |
| Opioid | 0·87 (0·84–0·91) | 0·99 (0·99–0·99) | 0·91 (0·88–0·93) | 0·94 (0·92–0·95) |
| Non-opioid | 0·41 (0·34–0·50) | 0·96 (0·94–0·98) | NA | NA |
|
| ||||
| Overall | 0·64 (0·59–0·69) | 0·94 (0·93–0·97) | NA | NA |
| Alcohol | 0·70 (0·66–0·74) | 0·93 (0·92–0·94) | 0·83 (0·80–0·86) | 0·78 (0·75–0·80) |
| Opioid | 0·83 (0·79–0·86) | 0·98 (0·97–0·98) | 0·82 (0·78–0·85) | 0·85 (0·82–0·88) |
| Non-opioid | 0·40 (0·32–0·48) | 0·92 (0·90–0·95) | NA | NA |
|
| ||||
| Overall | 0·58 (0·54–0·63) | 0·95 (0·94–0·96) | NA | NA |
| Alcohol | 0·71 (0·67–0·74) | 0·94 (0·92–0·95) | 0·87 (0·84–0·90) | 0·83 (0·81–0·86) |
| Opioid | 0·82 (0·79–0·86) | 0·98 (0·98–0·99) | 0·85 (0·81–0·88) | 0·89 (0·86–0·91) |
| Non-opioid | 0·22 (0·16–0·28) | 0·94 (0·92–0·95) | NA | NA |
|
| ||||
| Overall | 0·67 (0·62–0·72) | 0·97 (0·95–0·98) | NA | NA |
| Alcohol | 0·75 (0·71–0·78) | 0·96 (0·94–0·97) | 0·91 (0·89–0·93) | 0·87 (0·85–0·90) |
| Opioid | 0·85 (0·81–0·88) | 0·99 (0·98–0·99) | 0·88 (0·85–0·90) | 0·90 (0·87–0·92) |
| Non-opioid | 0·43 (0·35–0·51) | 0·96 (0·94–0·97) | NA | NA |
|
| ||||
| Overall | 0·68 (0·63–0·73) | 0·97 (0·97–0·98) | NA | NA |
| Alcohol | 0·79 (0·75–0·82) | 0·96 (0·95–0·97) | 0·92 (0·90–0·94) | 0·89 (0·87–0·91) |
| Opioid | 0·87 (0·83–0·90) | 0·99 (0·99–0·99) | 0·92 (0·90–0·94) | 0·95 (0·93–0·96) |
| Non-opioid | 0·38 (0·31–0·46) | 0·97 (0·96–0·98) | NA | NA |
Temporal validation occurred at Rush University Medical Center. External validation occurred at Loyola University Medical Center. AUPRC=under the precision-recall curve. AUROC=area under the receiver operating characteristic curve. CUI=concept unique identifiers. NA=not applicable.
Previously published baseline model for classifying alcohol misuse in the hospital.[22]
Previously published baseline model for classifying opioid misuse in the hospital.[17]
Figure 2:Classification plots for temporal validation comparing single-label alcohol (A) and single-label opioid (B) classifiers to multilabel convolutional neural network classifier
(A) Single-label logistic regression alcohol misuse classifier versus multilabel convolutional neural network alcohol misuse classifier. (B) Single-label convolutional neural network opioid misuse classifier versus multilabel convolutional neural network opioid misuse classifier. AUCROC=area under the receiver operating characteristic curve.
Bias report for temporal validation dataset (year 2020)
| N | Substance misuse prevalence | False-discovery rate (95% CI) | False-positive rate (95% CI) | False-omission rate (95% CI) | False-negative rate (95% CI) | |
|---|---|---|---|---|---|---|
| All encounters | 16917 | 1023 | 0·25 (0·23–0·29) | 0·02 (0·02–0·02) | 0·01 (0·01–0·01) | 0·18 (0·15–0·20) |
| Age group, years | ||||||
| 18–44 | 5336 | 433 | 0·19 (0·16–0·23) | 0·02 (0·01–0·02) | 0·01 (0·01–0·02) | 0·16 (0·13–0·20) |
| ≥45 | 11 581 | 590 | 0·30 (0·26–0·33) | 0·02 (0·02–0·02) | 0·01 (0·01–0·01) | 0·18 (0·15–0·21) |
| Sex | ||||||
| Female | 9944 | 315 | 0·29 (0·24–0·34) | 0·01 (0·01–0·01) | 0·01 (0·01–0·01) | 0·22 (0·17–0·27) |
| Male | 6973 | 708 | 0·25 (0·22–0·28) | 0·03 (0·03–0·04) | 0·02 (0·02–0·02) | 0·16 (0·13–0·19) |
| Race and ethnicity | ||||||
| Non-Hispanic Black | 6190 | 445 | 0·29 (0·26–0·34) | 0·03 (0·02–0·03) | 0·03 (0·02–0·03) | 0·18 (0·15–0·22) |
| Non-Hispanic White | 6156 | 353 | 0·22 (0·18–0·26) | 0·01 (0·01–0·02) | 0·01 (0·01–0·02) | 0·19 (0·15–0·23) |
| Hispanic | 3329 | 159 | 0·20 (0·14–0·26) | 0·01 (0·01–0·02) | 0·01 (0·01–0·02) | 0·15 (0·09–0·21) |
| | 1242 | 66 | 0·32 (0·22–0·43) | 0·02 (0·02–0·03) | 0·02 (0·02–0·03) | 0·14 (0·06–0·24) |
Substance misuse prevalence represents any occurrence of alcohol misuse, opioid misuse, or non-opioid drug misuse. The referent labels and predicted labels for any type of substance misuse were used to calculate the number of false positives (FP), true positives (TP), false negatives (FN), and true negatives (TN). Bias assessment metrics included: false-discovery rate (FP/[FP+TP]); false-positive rate (FP/[FP+TN]); false-omission rate (FN/[FN+TN]); and false-negative rate (FN/[FN+TP]).
Mixed=Asian, Native-American, or Pacific Islander, other, or refused to answer or answer unknown.