| Literature DB >> 33196805 |
Jie Yang1,2, Liqin Wang1,2, Neelam A Phadke2,3, Paige G Wickner2,4, Christian M Mancini2,3, Kimberly G Blumenthal2,3, Li Zhou1,2.
Abstract
Importance: Although critical to patient safety, health care-related allergic reactions are challenging to identify and monitor. Objective: To develop a deep learning model to identify allergic reactions in the free-text narrative of hospital safety reports and evaluate its generalizability, efficiency, productivity, and interpretability. Design, Setting, and Participants: This cross-sectional study analyzed hospital safety reports filed between May 2004 and January 2019 at Brigham and Women's Hospital and between April 2006 and June 2018 at Massachusetts General Hospital in Boston. Training and validating a deep learning model involved extracting safety reports using 101 expert-curated keywords from Massachusetts General Hospital (data set I). The model was then evaluated on 3 data sets: reports without keywords (data set II), reports from a different time frame (data set III), and reports from a different hospital (Brigham and Women's Hospital; data set IV). Statistical analyses were performed between March 1, 2019, and July 18, 2020. Main Outcomes and Measures: The area under the receiver operating characteristic curve and area under the precision-recall curve were used on data set I. The precision at top-k was used on data sets II to IV.Entities:
Mesh:
Year: 2020 PMID: 33196805 PMCID: PMC7670315 DOI: 10.1001/jamanetworkopen.2020.22836
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Study Data Sets and Overall Design
This diagram depicts the 4 data sets used in this study, including the number of reports in each data set and when these reports were filed. Three data sets were from Massachusetts General Hospital (MGH), and 1 data set was from Brigham and Women’s Hospital (BWH). Data set I was used to train the deep learning model, whereas data sets II, III and IV were used to assess model performance and generalizability. AUPRC indicates area under the precision-recall curve; and AUROC, area under the receiver operating characteristic curve.
Characteristics of the Hospital Safety Reports, Patient Population, and Data Sets for Machine Learning Model Development and Validation
| Characteristic | No. (%) | |||||
|---|---|---|---|---|---|---|
| MGH | BWH | Total | ||||
| Data set I annotated (with keywords) | Data set II (without keywords) | Data set III (recent reports) | All MGH reports | Data set IV (all BWH reports) | All reports | |
| Years | April 2006-March 2016 | April 2006-March 2016 | March 2016-June 2018 | April 2006-June 2018 | May 2004-January 2019 | BWH: May 2004-January 2019 MGH: April 2006-June 2018 |
| Patients | 7630 | 63 768 | 27 922 | 97 778 | 75 076 | 172 854 |
| All reports | 9107 | 105 904 | 46 046 | 174 799 | 124 229 | 299 028 |
| Reports of identifiable patients | 9047 | 94 692 | 42 454 | 157 824 | 118 764 | 276 588 |
| No. of reports per patient, mean (range) | 1.2 (1-12) | 1.5 (1-54) | 1.5 (1-34) | 1.6 (1-54) | 1.6 (1-40) | 1.6 (1-54) |
| No. of words per reports, median (IQR) | 74 (43-124) | 51 (30-86) | 63 (35-106) | 57 (33-96) | 37 (17-67) | 48 (25-84) |
| Patient demographics | ||||||
| Age, median (IQR), y | 58.3 (38.6-71.5) | 59.3 (43.4-71.9) | 60.1 (43.6-71.7) | 59.3 (43.0-71.6) | 60.2 (44.7-71.6) | 59.7 (43.8-71.6) |
| Sex | ||||||
| Female | 3504 (45.9) | 30 823 (48.3) | 13 594 (48.7) | 47 891 (49.0) | 38 653 (51.5) | 86 544 (50.1) |
| Male | 3977 (52.1) | 31 715 (49.7) | 13 859 (49.6) | 48 016 (49.1) | 32 303 (43.0) | 80 319 (46.5) |
| Unknown | 149 (2.0) | 1230 (1.9) | 469 (1.7) | 1871 (1.9) | 4120 (5.5) | 5991 (3.5) |
| Race | ||||||
| White | 5999 (78.6) | 50 043 (78.5) | 21 617 (77.4) | 76 322 (78.1) | 53 736 (71.6) | 130 058 (75.2) |
| Black | 415 (5.4) | 3543 (5.6) | 1742 (6.2) | 5481 (5.6) | 6832 (9.1) | 12 313 (7.1) |
| Asian | 228 (3.0) | 1956 (3.1) | 1048 (3.8) | 3264 (3.3) | 1877 (2.5) | 5141 (3.0) |
| Others | 94 (1.2) | 841 (1.3) | 280 (1.0) | 1213 (1.2) | 613 (0.8) | 1826 (1.1) |
| Unknown | 894 (11.7) | 7385(11.6) | 3235 (11.6) | 11 498 (11.8) | 12 018 (16.0) | 23 516 (13.6) |
| Ethnicity | ||||||
| Non-Hispanic | 6605 (86.6) | 55 408 (86.9) | 24 079 (86.2) | 84 579 (86.5) | 62 271 (82.9) | 146 850 (85.0) |
| Hispanic | 588 (7.7) | 4802 (7.5) | 2298 (8.2) | 7610 (7.8) | 5417 (7.2) | 13 027 (7.5) |
| Unknown | 437 (5.7) | 3558 (5.6) | 1545 (5.5) | 5589 (5.7) | 7388 (9.8) | 12 977 (7.5) |
Abbreviations: BWH, Brigham and Women’s Hospital; IQR, interquartile range; MGH, Massachusetts General Hospital.
Summary of the characteristics of patient demographics information and cases.
Patients with a complete and valid medical record number.
Reports including those with and without a valid patient medical record number.
The sum of the 3 data sets from MGH is not equal to the total number of all reports because of the following reason. In a previous study in which data set I was created,[12] exact keyword matching with a gradually curated keyword list was used to create the data set; thus, some cases, which contained morphological or lexical variations of the keywords, were missed. Therefore, in this study, to conduct a strict evaluation of the model’s ability to identify allergic reactions missed by keyword search, we constructed data set II using a more comprehensive keyword-matching algorithm. We excluded all the reports that contained any of the expert-curated keywords and morphological or lexical variations of the keywords (eg, prefix [eg, allerg-], suffix [eg, -cillin] and letter cases such as uppercase, lowercase, or capitals). Because of this reason, data set I plus data set II was less than all of the MGH reports between April 2006 and March 2016.
Reports linked to a valid patient medical record number.
Calculated using the reports linked to a valid patient medical record number.
Calculated using the event date and patient’s date of birth.
Figure 2. Deep Learning Model Performance
BWH indicates Brigham and Women’s Hospital; and MGH, Massachusetts General Hospital.
Model Efficiency and Productivity
| Data set | Measures | Keyword-search approach | Attention-based DNN model |
|---|---|---|---|
| II | Cases to review | 0 | 1627 |
| True cases | 0 | 184 | |
| Precision, % | NA | 11.3 | |
| III | Cases to review | 10 131 | 1984 |
| True cases | 570 | 625 | |
| Precision, % | 5.6 | 31.5 | |
| IV | Cases to review | 15 896 | 5800 |
| True cases | 1344 | 1569 | |
| Precision, % | 8.5 | 27.1 | |
| Total | Cases to review | 26 027 | 9411 |
| True cases | 1914 | 2378 | |
| Precision, % | 7.4 | 25.3 |
Abbreviations: DNN, deep neural network; NA, not applicable.
This table demonstrates the efficiency (ie, Cases to review—number of identified cases requiring manual review), productivity (ie, True cases—number of positive cases yielded), and precision (ie, positive predictive value; the proportion of true cases among all identified cases) of the attention-based DNN model compared with the keyword-search approach in data sets II, III, and IV (see eMethods in the Supplement for details).
Figure 3. Attention Heat Maps
These attention heat maps show how much attention the model gives to which words when making predictions of positive and negative cases of allergic reaction. Darker colors represent a higher attention weight. A, The words associated with prediction of positive cases included itchy, hive, and throat. The model captured misspellings (eg, Benedryl for Benadryl) and lexical variations (eg, hive for hives). B, The words associated with prediction of negative cases included order, SOB (shortness of breath), BP (blood pressure), and not. Details of individual cases were modified to preserve anonymity; no modifications affected the weights shown in this heat map.