Literature DB >> 34062318

Automated tracking of emergency department abdominal CT findings during the COVID-19 pandemic using natural language processing.

Matthew D Li¹, Peter A Wood², Tarik K Alkasab³, Michael H Lev³, Jayashree Kalpathy-Cramer², Marc D Succi⁴.

Abstract

PURPOSE: During the COVID-19 pandemic, emergency department (ED) volumes have fluctuated. We hypothesized that natural language processing (NLP) models could quantify changes in detection of acute abdominal pathology (acute appendicitis (AA), acute diverticulitis (AD), or bowel obstruction (BO)) on CT reports.
METHODS: This retrospective study included 22,182 radiology reports from CT abdomen/pelvis studies performed at an urban ED between January 1, 2018 to August 14, 2020. Using a subset of 2448 manually annotated reports, we trained random forest NLP models to classify the presence of AA, AD, and BO in report impressions. Performance was assessed using 5-fold cross validation. The NLP classifiers were then applied to all reports.
RESULTS: The NLP classifiers for AA, AD, and BO demonstrated cross-validation classification accuracies between 0.97 and 0.99 and F1-scores between 0.86 and 0.91. When applied to all CT reports, the estimated numbers of AA, AD, and BO cases decreased 43-57% in April 2020 (first regional peak of COVID-19 cases) compared to 2018-2019. However, the number of abdominal pathologies detected rebounded in May-July 2020, with increases above historical averages for AD. The proportions of CT studies with these pathologies did not significantly increase during the pandemic period.
CONCLUSION: Dramatic decreases in numbers of acute abdominal pathologies detected by ED CT studies were observed early on during the COVID-19 pandemic, though these numbers rapidly rebounded. The proportions of CT cases with these pathologies did not increase, which suggests patients deferred care during the first pandemic peak. NLP can help automatically track findings in ED radiology reporting.

Entities: Chemical

Keywords: Appendicitis; Bowel obstruction; COVID-19; CT; Diverticulitis; Emergency

Mesh：

Year: 2021 PMID： 34062318 PMCID： PMC8154187 DOI： 10.1016/j.ajem.2021.05.057

Source DB: PubMed Journal: Am J Emerg Med ISSN： 0735-6757 Impact factor: 4.093

Introduction

Fear of COVID-19 has prevented many patients with acute non-COVID-19-related problems from seeking medical care, with reported dramatic decreases in the number of patients presenting to the emergency department during periods of peak COVID-19 incidence and prevalence [[1], [2], [3], [4]]. Simultaneously, radiology imaging volume has decreased during these peak pandemic periods [[5], [6], [7], [8], [9]]. The radiology report offers a treasure trove of information for more detailed epidemiologic research; however, this information is not easily accessible for epidemiologic research. Previous work studying the incidence of acute abdominal pathologies detected by imaging during the pandemic using radiology reports has depended on time-consuming expert manual curation of these reports to assess findings [5,10]. Machine learning for natural language processing (NLP) has been applied to radiology reports for various applications [11], but can help automate and facilitate such epidemiologic studies, with application to large scales of data that are not realistic for manual curation. Such an approach has been used to automatically analyze head CT and brain MRI free-text reports to show that acute or subacute ischemic stroke incidence decreased during the COVID-19 pandemic, but the proportion of studies positive for stroke increased [12]. Thus, we hypothesized that an NLP model could be trained and used to quantify changes in the detection of acute abdominal pathology on abdomen/pelvis CT studies during the COVID-19 pandemic, specifically for acute appendicitis, acute diverticulitis, and bowel obstruction. We also used these models to study trends over time.

Methods

This retrospective study was exempted by the Mass General Brigham (Boston, MA) institutional review board, with waiver of patient informed consent. See Fig. 1 for a schematic of the study design.

Fig. 1

Schematic of the study design. This analysis was repeated for 3 acute abdominal pathologies-of-interest including acute appendicitis, acute diverticulitis, and bowel obstruction. ED, emergency department; NLP, natural language processing.

CT report data collection

We extracted free-text radiology reports for CT abdomen/pelvis, CT abdomen only, and CT pelvis only studies, with or without intravenous contrast, performed for emergency or emergency observation patients from January 1, 2018 to August 14, 2020 at a large urban level 1 trauma center (Massachusetts General Hospital, Boston, Massachusetts), that serves approximately 110,000 emergency patients annually. We excluded vascular CTA studies of the abdomen/pelvis studies. There was a total of 22,182 studies extracted. All of these studies had report “Impressions.” These reports were from 17,999 unique patients.

Annotated NLP model training dataset

To create a training dataset enriched for pathologies-of-interest in the study reports, we used regular expressions to extract randomly samples of reports with study “Impressions” containing the strings “appendi,” “divertic,” and “bowel obstruction.” From each of these three pathologies-of-interest, 500 studies were randomly sampled from reports extracted above from January 1, 2018 to May 10, 2020. Some study impressions mentioned more than one of these categories, so duplicated studies in this cohort were removed. We also sampled an additional 1000 studies that did not contain these strings in the study impression, to augment the number of negative cases. The final training set was composed of 2448 CT reports. The purpose of enriching the dataset for these pathologies was to increase the number of positive cases where the NLP algorithm could learn patterns in the text. A fellowship-trained emergency radiologist (M.D.S.) manually annotated each of these 2448 CT reports based on the study impression for the presence or absence of the following findings: acute appendicitis, acute diverticulitis, and bowel obstruction.

NLP model training and testing

We adapted a previously published NLP approach for automatic radiology report analysis for this study, with code derived from that work [12]. Model training as performed in using the sklearn (version 0.20.3), and nltk (version 3.4) packages in Python (version 3.7.1). Random forest NLP models were trained to parse the radiology report free-text “Impression” to classify the report for the presence or absence of the pathology-of-interest, with separate models for acute appendicitis, acute diverticulitis, and bowel obstruction. Before training each model, we used regular expressions (re Python package version 2.2.1) to extract the sentences with words containing the stems “appendi”, “divertic”, and “bowel” or “obstruct”, for each model respectively. Extracted sentences were then stemmed using the nltk package “snowball.EnglishStemmer” and were then converted to vectors using bag-of-words vectorization with N-grams (N = 2 to 3, minimum term frequency = 1%, the same as in the previous work [12]). The nltk “mark_negation” function was then applied, which adds a “_NEG” suffix to words between a negation term and a punctuation mark, thereby conveying negation information. The random forest NLP classifiers used these vector representations of the sentence excerpts from the radiology report “Impressions” as inputs. A different random forest NLP binary classification model was trained for each of the pathologies-of-interest (all with 100 tree forests with default hyperparameters in sklearn version 0.20.3), using the manually annotated labels as the reference standard. We evaluated the performance of each model using 5-fold cross validation, stratified on the pathology-of-interest, with reporting of the average accuracy, precision, recall, and F1 score. For each cross-validation fold, we also calculated the predicted number of positive cases and compared the predictions with the actual case numbers.

NLP-based epidemiological analysis

We applied the trained NLP models to the full cohort of 22,182 CT studies to estimate the numbers of cases of acute appendicitis, acute diverticulitis, and bowel obstruction detected on CT scans. We analyzed these trends over time and in relation to the total number of CT studies performed in the emergency department. Proportional changes in relation to the average number of cases per month from 2018 to 2019 were also calculated.

Statistics

The Python scipy (version 1.1.0) package was used for statistical analyses, including the Pearson chi-square test of independence. Statistical significance was established as p < 0.05. Bootstrap 95% confidence intervals (CI) were calculated for performance metrics. Time trend plots were made in Microsoft Excel (version 16.43).

Results

NLP training dataset characteristics

In the training dataset enriched for pathologies-of-interest (2448 sampled CT reports), as determined through manual annotation, the presence of acute appendicitis, acute diverticulitis, and bowel obstruction was seen in 172 (7.0%), 225 (9.2%), and 201 (8.2%) of study impressions respectively. All radiology “impressions” contained free text without a standardized reporting structure. Reports were generated by many different trainees (residents and fellows) and attending radiologists, with subjectively different styles of reporting.

Performance of the NLP models

The 5-fold cross validation results for the NLP models are summarized in Table 1 . The acute appendicitis, acute diverticulitis, and bowel obstruction models all showed high performance, with accuracy ranging from 0.97–0.99 and F1 scores from 0.86–0.91. Because the NLP models were trained for the purpose of estimating the incidence of these pathologies-of-interest on CT reports, we also evaluated the predicted number of cases in comparison to the actual number of cases, averaged across the cross-validation folds. For the acute appendicitis NLP model, we found the model predicted 33.4 positive cases (95% CI 32.8–34.0), 3.0% lower than the average actual number of positive cases (34.4, which was the same in each cross-validation fold due to stratification on the pathology-of-interest). For the acute diverticulitis NLP model, we found the model predicted 45.4 positive cases (95% CI 44.0–47.0), almost the same as the 45.0 average actual number of positive cases. For the bowel obstruction NLP model, we found the model predicted 37.2 positive cases (95% CI 36.4–37.8), 7.5% lower than the 40.2 average actual number of positive cases. Thus, the acute appendicitis and bowel obstruction NLP models tend to systematically slightly underestimate the total number of cases, while the acute diverticulitis NLP model appears better calibrated.

Table 1

Five-fold cross-validation performance of natural language processing (NLP) models detection of acute appendicitis, acute diverticulitis, and bowel obstruction from 2448 manually annotated abdomen/pelvis CT reports. The bootstrap median average cross-validation accuracy, precision, recall, and F1 score are reported, with bootstrap 95% confidence intervals.

NLP model	Accuracy	Precision	Recall	F1 score
Acute Appendicitis	0.99 (0.99–0.99)	0.93 (0.92–0.93)	0.90 (0.89–0.91)	0.91 (0.91–0.91)
Acute Diverticulitis	0.97 (0.97–0.98)	0.86 (0.84–0.87)	0.86 (0.84–0.88)	0.86 (0.85–0.87)
Bowel Obstruction	0.99 (0.99–0.99)	0.95 (0.94–0.95)	0.87 (0.86–0.89)	0.91 (0.90–0.91)

Trends in acute abdominal pathologies detected on CT during the COVID-19 pandemic

As the NLP models were able to automatically estimate the number of cases of pathologies-of-interest from the CT report “impressions” with reasonable fidelity, we applied the NLP models to the full dataset of 22,182 abdomen/pelvis CT reports from January 1, 2018 until August 14, 2020. There was a 38% decrease in emergency abdomen/pelvis CT volume in April 2020 compared to the average number of CT studies performed per month from 2018 to 2019 (Fig. 2A), which coincided with the peak in new cases from the first wave of the COVID-19 pandemic in Massachusetts, United States [13]. However, in May, June, and July 2020, as the number of COVID-19 cases declined, there were increases in the number of CTs performed each month relative to the monthly average of 2018–2019 by 26, 49, and 68% respectively, as imaging volume rebounded to new highs (Fig. 3A).

Fig. 2

Fig. 3

Trends in acute abdominal pathology detected by CT in 2020, represented as the proportional change by month relative to the average over 24 months from 2018 to 2019. (A) Line plot for the proportional change in ED CT abdomen/pelvis studies and estimated number of cases of acute appendicitis, acute diverticulitis and bowel obstruction relative to the average from 2018 to 2019. (B) Line plot for the proportional change in the estimated proportion of total ED CT abdomen/pelvis studies performed with acute appendicitis, acute diverticulitis or bowel obstruction detected by NLP analysis relative to the average from 2018 to 2019.

Trends in acute abdominal pathology detected on CT over time from January 2018 to July 2020. (A) Line plot for the number of ED CT abdomen/pelvis studies performed by month. (B) Line plot for the estimated number of cases of acute appendicitis, acute diverticulitis, and bowel obstruction detected by NLP analysis of radiology report impressions by month. (C) Line plot for the estimated proportion of CT studies performed with acute abdominal pathology detected by NLP analysis (case positivity rate) by month. The same figure legend for plots (B) and (C) is shown below plot (C). Trends in acute abdominal pathology detected by CT in 2020, represented as the proportional change by month relative to the average over 24 months from 2018 to 2019. (A) Line plot for the proportional change in ED CT abdomen/pelvis studies and estimated number of cases of acute appendicitis, acute diverticulitis and bowel obstruction relative to the average from 2018 to 2019. (B) Line plot for the proportional change in the estimated proportion of total ED CT abdomen/pelvis studies performed with acute appendicitis, acute diverticulitis or bowel obstruction detected by NLP analysis relative to the average from 2018 to 2019. Decreases in the estimated numbers of acute appendicitis, acute diverticulitis, and bowel obstruction cases detected on CT began in March 2020, before hitting a nadir in April 2020 (Fig. 2B), with decreases of 57, 56, and 43% respectively, relative to the average number detected per month on CT from 2018 to 2019 (Fig. 3A). However, in May 2020, the number of cases detected rebounded (Fig. 2B), with 19, 48, and 37% increases in the number of cases respectively relative to 2018–2019 (Fig. 3A). The number of cases of acute diverticulitis detected increased even further in June and July 2020 (Fig. 2B), while the numbers of acute appendicitis and bowel obstruction cases remained more similar in comparison to 2018–2019 averages (−19 to 24% proportional change relative to 2018–2019) (Fig. 3A). The proportions of total abdomen/pelvis CT cases with acute appendicitis, acute diverticulitis, and bowel obstruction detected fluctuated during the pandemic time period in our region (March to July 2020) (Fig. 3B). The proportion of CT cases with acute appendicitis significantly decreased from 3.3% over 2018–2019 to 2.5% during the pandemic period (p = 0.01). The proportion of CT cases with acute diverticulitis slightly increased from 3.5% over 2018–2019 to 4.0% during the pandemic period, though this increase was not significant (p = 0.17). The proportion of CT cases with bowel obstruction slightly decreased from 4.5% over 2018–2019 to 3.9% during the pandemic period, though this decrease was not significant (p = 0.16).

Discussion

In this study, we trained random forest NLP models (adapted from previous NLP work on radiology reports [12]) to automatically detect the presence of acute appendicitis, acute diverticulitis, and bowel obstruction on radiology reports from ED abdomen/pelvis CT studies with good performance on cross-validation. The NLP models were then applied to all CT studies performed from January 1, 2018 to August 14, 2020 for the purpose of analyzing trends in the detection of these acute abdominal pathologies before and during the COVID-19 pandemic. Using our NLP models, we found decreases in the detection of these acute abdominal pathologies, though these numbers rapidly rebounded from a nadir in April 2020, at the height of the first wave of the pandemic in our region. The decrease in detection of acute appendicitis and diverticulitis on CT during the COVID-19 pandemic is in agreement with previous retrospective work [10,14,15]. However, while Romero et al. [10] found an increased proportion of ordered CT studies for appendicitis evaluation were positive for acute appendicitis, we found a significant decrease in the proportion of all abdomen/pelvis CTs with appendicitis detected. Similarly, O'Brien et al. found significantly increased proportions of abdominal CTs with any positive pathology during the pandemic compared to a matched time period from 2019, with increased proportions of CTs with acute appendicitis and bowel obstruction detected, but decreased proportions of CTs with acute diverticulitis detected [5]. We found no significant change in the proportions of CTs with acute diverticulitis and bowel obstruction detected. These differences in the change in CT case positivity rates during the pandemic between our study and these other studies is likely multifactorial, though may reflect differences in different patient population behaviors. Patients in our study population may have been even more inclined to defer care (or seek care elsewhere) during the COVID-19 pandemic peak compared to these previous studies, as our hospital took care of a large number of COVID-19 patients in the region. Concurrently, we also found substantial decreases in ED abdomen/pelvis CT volume early in the pandemic, consistent with previously published analyses [[5], [6], [7], [8], [9]], though we also observed a rapid rebound in CT case volume, above historical averages, perhaps reflecting increased demand for ED imaging due to deferred healthcare visits. Given the possible differences in CT case positivity rate across patient populations as shown by our analysis in comparison to previous studies, findings in many retrospective studies may not generalize to other populations. Thus, updated epidemiologic analyses across multiple populations will be important as the pandemic evolves. The automated NLP approach that we used in this study can help facilitate such analyses and alleviate the burden of manual review, which has been applied to the detection of stroke on neuroimaging reports during the pandemic [12]. This NLP approach could also help to provide input data for predictive modeling of the demand for diagnostic imaging [17] and other healthcare resources during the pandemic. Other research groups have also developed NLP models for detection of acute appendicitis. Rink et al., used a hybrid machine learning approach involving a customized lexicon and manually defined patterns to identify appendicitis in radiology reports with an F1 score of 0.87 [18]. Lakhani et al. developed multiple NLP algorithms using a rule-based approach but applied to detect a wide range of critical results; their model detected acute appendicitis in radiology reports with an F1 score of 0.94 [19]. By comparison, our non-rule-based NLP model performed similarly on this task, with an F1 score of 0.91. We showed similar performance of our NLP models for detection of acute diverticulitis and bowel obstruction on CT reports. Future work may use more advanced deep learning approaches for NLP to further improve performance [20], but for the intended application in this study, the NLP models that we trained demonstrated reasonably adequate performance. There are limitations to this study. First, our NLP models were trained and applied to data from a single institution. While there is heterogeneity in radiologist report diction within our institution, radiologists at different institutions may have systematically different reporting styles, which would limit the generalizability of the NLP models trained in this study to other institutions. Previous work on the generalizability of such NLP models for radiology reports has shown that performance does decrease when the models are applied to different institutions [12]. However, for the goal of analyzing trends in acute abdominal pathology of ED CTs at our institution, our NLP models performed as designed. Second, our NLP models for acute appendicitis and bowel obstruction slightly underestimated the total number of cases in cross-validation. However, given that the NLP models were trained on report data sampled over an extend time period from 2018 to 2020, this slight underestimation should be consistent over time in our CT study cohort. Third, while the focus of our model is on the binary presence of acute appendicitis, acute diverticulitis, or bowel obstruction, an important avenue of research is related to complications associated with these pathologies [5,10,14,15], which may occur due to delayed clinical presentations. Our models were not trained for identification of complications (e.g., abscess, perforation), though future NLP work may focus on this.

Conclusion

Dramatic decreases in numbers of cases of acute appendicitis, acute diverticulitis, and bowel obstruction detected by ED CT studies were observed early on during the COVID-19 pandemic, though these numbers rapidly rebounded, even higher than historical averages for acute diverticulitis cases. However, the proportions of CT cases with these acute abdominal pathologies did not increase, even decreasing significantly in the case of acute appendicitis, which suggests that patients deferred care during the first peak of the COVID-19 pandemic in the United States. NLP can be helpful for automatic tracking of findings in ED radiology reporting, which may help facilitate epidemiology research.

Author contributions

Matthew D. Li: Conceptualization, Methodology, Software, Formal Analysis, Investigation, Data Curation, Writing – Original Draft. Peter A. Wood: Investigation, Data Curation, Writing – Original Draft. Tarik K. Alkasab: Investigation, Writing – Review & Editing. Michael H. Lev: Investigation, Writing – Review & Editing. Jayashree Kalpathy-Cramer: Investigation, Writing – Review & Editing. Marc D. Succi: Conceptualization, Investigation, Data Curation, Writing – Review & Editing

Funding sources

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Declaration of Competing Interest

MDS reports personal fees and non-financial support from 2 Minute Medicine, Inc., and patent royalties from Frequency Therapeutics for work not related to this manuscript. JKC reports grants from , non-financial support from AWS, and grants from , outside the submitted work. The other authors report no relevant conflicts of interest.

17 in total

1. Analysis of Stroke Detection during the COVID-19 Pandemic Using Natural Language Processing of Radiology Reports.

Authors: M D Li; M Lang; F Deng; K Chang; K Buch; S Rincon; W A Mehan; T M Leslie-Mazwi; J Kalpathy-Cramer
Journal: AJNR Am J Neuroradiol Date: 2020-12-17 Impact factor: 3.825

2. Impact of the COVID-19 Pandemic on Emergency Department Visits - United States, January 1, 2019-May 30, 2020.

Authors: Kathleen P Hartnett; Aaron Kite-Powell; Jourdan DeVies; Michael A Coletta; Tegan K Boehmer; Jennifer Adjemian; Adi V Gundlapalli
Journal: MMWR Morb Mortal Wkly Rep Date: 2020-06-12 Impact factor: 17.586

Review 3. Immediate and long-term impact of the COVID-19 pandemic on delivery of surgical services.

Authors: K Søreide; J Hallet; J B Matthews; A A Schnitzbauer; P D Line; P B S Lai; J Otero; D Callegaro; S G Warner; N N Baxter; C S C Teh; J Ng-Kamstra; J G Meara; L Hagander; L Lorenzon
Journal: Br J Surg Date: 2020-04-30 Impact factor: 6.939

4. Delayed access or provision of care in Italy resulting from fear of COVID-19.

Authors: Marzia Lazzerini; Egidio Barbi; Andrea Apicella; Federico Marchetti; Fabio Cardinale; Gianluca Trobia
Journal: Lancet Child Adolesc Health Date: 2020-04-09

5. Impact of the Coronavirus Disease 2019 (COVID-19) Pandemic on Imaging Case Volumes.

Authors: Jason J Naidich; Artem Boltyenkov; Jason J Wang; Jesse Chusid; Danny Hughes; Pina C Sanelli
Journal: J Am Coll Radiol Date: 2020-05-16 Impact factor: 5.532

6. The cases not seen: Patterns of emergency department visits and procedures in the era of COVID-19.

Authors: Joshua J Baugh; Benjamin A White; Dustin McEvoy; Brian J Yun; David F M Brown; Ali S Raja; Sayon Dutta
Journal: Am J Emerg Med Date: 2020-11-05 Impact factor: 2.469

7. Extracting actionable findings of appendicitis from radiology reports using natural language processing.

Authors: Bryan Rink; Kirk Roberts; Sanda Harabagiu; Richard H Scheuermann; Seth Toomay; Travis Browning; Teresa Bosler; Ronald Peshock
Journal: AMIA Jt Summits Transl Sci Proc Date: 2013-03-18

8. Impact of the COVID-19 pandemic on emergency department CT for suspected diverticulitis.

Authors: Averi L Gibson; Byron Y Chen; Max P Rosen; S Nicolas Paez; Hao S Lo
Journal: Emerg Radiol Date: 2020-10-28

9. Patients avoided important care during the early weeks of the coronavirus pandemic: diverticulitis patients were more likely to present with an abscess on CT.

Authors: Michael P Zintsmaster; Daniel T Myers
Journal: Emerg Radiol Date: 2020-09-26

10. COVID-19: Recovery Models for Radiology Departments.

Authors: Steven Guitron; Oleg S Pianykh; Marc D Succi; Min Lang; James Brink
Journal: J Am Coll Radiol Date: 2020-09-07 Impact factor: 5.532

1 in total

1. Institutional Surgical Response and Associated Volume Trends Throughout the COVID-19 Pandemic and Postvaccination Recovery Period.

Authors: Soham Ghoshal; Grant Rigney; Debby Cheng; Ryan Brumit; Michael S Gee; Richard A Hodin; Keith D Lillemoe; Wilton C Levine; Marc D Succi
Journal: JAMA Netw Open Date: 2022-08-01

1 in total