Literature DB >> 24303268

Extracting actionable findings of appendicitis from radiology reports using natural language processing.

Bryan Rink¹, Kirk Roberts, Sanda Harabagiu, Richard H Scheuermann, Seth Toomay, Travis Browning, Teresa Bosler, Ronald Peshock.

Abstract

Radiology reports often contain findings about the condition of a patient which should be acted upon quickly. These actionable findings in a radiology report can be automatically detected to ensure that the referring physician is notified about such findings and to provide feedback to the radiologist that further action has been taken. In this paper we investigate a method for detecting actionable findings of appendicitis in radiology reports. The method identifies both individual assertions regarding the presence of appendicitis and other findings related to appendicitis using syntactic dependency patterns. All relevant individual statements from a report are collectively considered to determine whether the report is consistent with appendicitis. Evaluation on a corpus of 400 radiology reports annotated by two expert radiologists showed that our approach achieves a precision of 91%, a recall of 83%, and an F1-measure of 87%.

Entities: Disease Gene Species

Year: 2013 PMID： 24303268 PMCID： PMC3845763

Source DB: PubMed Journal: AMIA Jt Summits Transl Sci Proc

Introduction

The ability to automatically identify actionable findings in radiology reports, such as appendicitis can play an important role in clinical quality improvement research. As reported by the National Patient Safety Goals of the Joint Commission, up to 70% of sentinel medical errors result from communication errors 1 . Moreover, a natural language processing system may help alleviate the inadequate communication of critical results, which as reported by Huntington et al 2 is the principal cause of malpractice cases involving radiologists in the USA. In the research reported in this paper, we explore the use of natural language processing (NLP) techniques for identifying radiographic findings. We focus on a single actionable finding, namely the detection of records consistent with appendicitis. Appendicitis is a condition in which the appendix becomes inflamed. Our approach uses a method based on the recognition of patterns to detect direct assertions about appendicitis made by the radiologist along with common findings associated with appendicitis such as inflammation located in or around the appendix. These patterns use information about the syntactic structure of the text to determine anatomical location for mentions of inflammation. Syntactic structure is also used to correctly scope the usage of negation for mentions of appendicitis (e.g., “No findings of appendicitis.” ) Our evaluation shows that this approach achieves an F1-measure of 85%, with a precision of 84%, and a recall of 87 on a set of 400 radiology reports annotated by two radiologists.

Radiology corpus

We created a corpus of 7,230 de-identified radiology reports from the UT Southwestern Medical Center utCRIS data warehouse. The reports were then filtered to retrieve only those radiology reports possibly diagnostic for appendicitis. The corpus is divided into an unannotated development set of 6,830 reports, and an annotated test set of 400 reports. The test set consists of 100 reports which contain an indicator of inflammation (e.g. inflammation , thickening ), 100 reports containing an anatomical term near the appendix (e.g., cecum ), 100 reports containing both types of terms, and 100 reports containing none of these types of terms. The development set was examined for common language patterns in the reports. Those patterns were then encoded into the dependency patterns used in our approach for detecting appendicitis indicators in text. The testing set has 87 positive reports for indicating appendicitis and 313 negative reports.

Lexicons

Natural language, particularly within the domain of radiology reports, allows for a large degree of variation in expression. For example, some doctors will use the term “appendectomy” while others use “appendicectomy”. Furthermore, there may be similar problems, such as: inflammation , inflammatory changes , thickening , and dilatation . While not direct synonyms, for many purposes they can be treated as the same medical condition. When detecting appendicitis it is also important to know which anatomical structures are located near other anatomical structures. For instance, inflammation near the cecum is very indicative of appendicitis because the cecum is directly connected to the appendix. Lexicons are used to capture these variations in language. The two largest lexicons encode terms relating to inflammation or other abnormalities associated with appendicitis in different syntactic contexts. Excerpts from the inflammation and anatomy lexicons follow: Nominal Inflammation Lexicon [INFLAMM]: abscess, appendicitis, appendicolith, change, changes, inflammation, mass, perforation, phlegmon, rupture, stranding, thickening Adjectival and Verbal Inflammation Lexicon [INFLAMM_VERB_ADJ]: abnormal, change, dilated, distended, enlarged, inflamed, perforated, phlegmonous, ruptured, thickened Anatomy lexicon [ANAT]: appendiceal, appendicitis, appendicolith, appendix, cecal, cecum, mesentery, periappendiceal, pericecal, periluminal, quadrant The choice was made to create a lexicon limited to anatomical terms related to the appendix versus using a general vocabulary such as SNOMED CT 3 or the Foundational Model of Anatomy 4 . Our examination of the radiology reports showed that a limited number of terms were used to describe the anatomy important to appendicitis. The lexicons listed in this section are referenced by the patterns used in our approach, which we describe in the next two sections.

Identification of assertion status for mentions of appendicitis

Often a radiology report will directly mention appendicitis in either an affirmative finding ( There are some strandy changes suggesting appendicitis ), or in a negative finding ( No evidence for appendicitis ). When a radiology report mentions appendicitis, the default belief status is positive for appendicitis. Based on the context, though that belief status can be changed to negative. This is known to be a difficult problem as negation markers such as no and not need to be scoped to the correct term 5 , 6 We solve these problems by capturing the local syntactic context for a mention of appendicitis which is relevant to its assertion status. The context for a mention of appendicitis is composed of several components, including statements of belief and observational statements. Statements of belief are detected using phrases such as the following: Statements of belief: consistent with, concerning for, suggestive of, suspicious for, worrisome for, evidence of, possibility of, suspicion of, probability of These phrases are matched using dependency patterns such as [consistent with appendicitis] and [compatible with appendicitis]. These patterns allow the flexibility to match statements such as worrisome for early appendicitis . In addition to detecting belief statements, it is also necessary to detect observational cues such as evidence of appendicitis and Examination indicates appendicitis. Additional lexicons contain terms describing observation through the use of nouns and verbs. The observation lexicons is: Nominal observation lexicon [OBSERV_NOUN]: finding, findings, examination, examinations, evidence, impression, impressions, appearance, appearances, pattern, patterns, diagnosis, signs, sign, etiology Verbal observation lexicon [OBSERV_VERB]: seen, noted, identified, visualized, detected, found, observed Indication lexicon : indicate, indicating, indicates, represent, representing, represents, suggest, suggesting, suggests, reflect, reflecting, reflects, shows, show, demonstrates, demonstrate, presents, present These expressions are combined in multiple ways to form dependency patterns, as shown in Table 1 . In addition to the patterns for positive assertions, we have a small handful of negative patterns as well including [excludes appendicitis] and [excluded appendicitis] (e.g., appendicitis is excluded ). Matches for negative patterns are checked for negation as well, and invert the assertion if they are negated. For example appendicitis is not excluded would be considered a positive assertion.

Table 1:

The most frequently matched patterns. Words in ALL CAPS refer to lexicons. Literals are indicated in italics . refers to the root of the dependency parse tree. Words without any restrictions are denoted by [].

Pattern	Examples

INFLAMM → prep S-PREP → pobj ANAT	Inflammatory changes in the right lower quadrant
ANAT → amod INFLAMM_VERB_ADJ	The appendix is mildly distended; patient’s diseased appendix
INFLAMM → amod ANAT	There is no periappendiceal inflammation
INFLAMM_VERB_ADJ → nsubjpass ANAT	The previously dilated appendix is no longer dilated
INFLAMM_VERB_ADJ → prep S-PREP → pobj ANAT	Thickening in the region of the cecum
< ROOT > → appendicitis	This may be perforated appendicitis.
INFLAMM_VERB_ADJ → nsubj ANAT	Acute appendicitis, potentially ruptured.
ANAT → prep with → pobj INFLAMM	Dilated appendix with adjacent stranding
OBSERV_NOUN → prep of → pobj appendicitis	there is no evidence of appendicitis
INFLAMM →; prep S-PREP → pobj [] → prep S-PREP → pobj ANAT	thickening of the tip of the cecum
right lower quadrant ← n n INFLAMM	right lower quadrant mass
OBSERV_NOUN → nsubj consistent → prep with → pobj appendicitis	Findings compatible with acute, nonperforated appendicitis.
INFLAMM ← nsubjpass OBSERV_VERB → prep S-PREP → pobj ANAT	An appendicolith is seen at the proximal appendix

Spatially locating inflammation

Although an explicit mention of appendicitis can be a strong indicator that the patient has the disease, many reports do not directly state such a finding because it is implied to an expert through other findings, e.g. An inflamed, perforated appendix is seen. Therefore some of the patterns detect detect mentions of inflammation and the specific anatomy associated with each mention. If the inflammation is associated with an anatomical structure near the appendix we consider that as evidence toward a finding of appendicitis. The first set of patterns for spatially locating inflammation relies on a prepositional relation between a mention of inflammation and an anatomical term such as: inflammation around the appendix or appendix with inflammation . The set of prepositions considered for spatial relations is: Spatial prepositions [S-PREP]: around, about, of, to, near, at, within, into, in, throughout, along Sometimes the radiologist will specify that inflammation has occurred within a specific component of an anatomical structure, such as a wall or a tip, as in inflamed appendiceal tip. A lexicon exists for detecting components of anatomical structures so that spatial relation detection can be performed between inflammation and a component, which is then associated with a structure such as appendix or colon . Anatomical component lexicon : wall, walls, tip, base, junction, valve, lumen, segment, segments, region, area, aspect, location, fat, orifice, opening, entrance, mucosa, apex, mass, masses, plane, planes A component is associated with its larger structure through the patterns: [component] of [structure] , [structure-adjective] [component] , junction with [structure] . These patterns allow for matching examples such as cecal tip thickening or stranding near the junction with the appendix.

Classification

The indicators for appendicitis we detect in radiology reports have varying degrees of effectiveness for predicting whether the patient has appendicitis. For example, if we detect Patient has acute appendicitis we are much more confident in an assessment of appendicitis than if we only saw cecum is inflamed. However, the latter statement does provide some evidence towards a decision of appendicitis. Also, the more evidence that we see for a decision of appendicitis, the more confidence we can give to our decision. We use an SVM classifier to make a decision about whether a report indicates appendicitis. The features for the classifier are all based on the indicators we find in the report. The simplest type of features capture individual words that have been detected in the report along with their semantic role (e.g. anatomy= appendix , inflammation= thickening ). We also have features which capture the combination of anatomy and inflammation together (e.g. ( inflammation , cecum )). The final class of features capture all roles matched in a pattern along with their content (e.g. (observation= evidence , inflammation= inflamed , anatomy= appendix ). All of the features also capture whether the indicator was negated or not.

Automatic Pattern Discovery

Depending upon a set of manually crafted rules presents two primary limitations: (1) radiologists express their findings using a diverse set of lexical and syntactic expressions, and (2) manually created patterns are generally expressed assuming proper automatic recognition of linguistic structure, whereas syntactic parsers make many mistakes on clinical text. To overcome these limitations, we propose a method for automatically generating syntactic dependency patterns to increase the recall of actionable finding extraction. Our method leverages our manually created patterns in combination with dependency transformation operators to discover new patterns. Two types of transformation operators are applied to the manually created patterns to automatically create candidate patterns. The first operator replaces lexicon and literal restrictions on a single token with a wildcard to match all possible words. For example, given the pattern (“[OBSERV_NOUN] of appendicitis ”, it would create three patterns: (1) “* of appendicitis ”, (2) “[OBSERV_NOUN] * appendicitis ”, and (3) “[OBSERV_NOUN] of *”. The second operator replaces each dependency relation with a wildcard. For the above pattern, the expansions would be: (4) “[OBSERV_NOUN] of appendicitis ” and (5) “[OBSERV_NOUN] of appendicitis ” Next, we search our dependency-parsed corpus of radiology reports for matches to these patterns. For each match, we observe the word or dependency relation which corresponds to the wildcard (e.g., the wildcard for pattern (1) above might match the word “indication”). We refer to a specific word or dependency relation matching a wildcard as a grounding. All grounded candidate patterns which occur less than three times in the corpus are discarded, while the remainder are kept for scoring. Wildcard matching and occurrence counting for grounded patterns were implemented by representing the dependency parses for sentences using RDF 7 and storing them in a Sesame (openrdf.org) native RDF store. Sentences matching candidate patterns were then retrieved by issuing the appropriate database queries (using SPARQL 8 ). Each candidate pattern is assigned a score using Fisher’s exact test comparing the documents matched by a pattern and the documents marked as positive for appendicitis by the manually created patterns. Candidate patterns which match many of the reports marked as positive and few of the documents marked as negative will be assigned the highest scores. The top ranking patterns are then examined by a human to prune out patterns that would not be expected to generalize well on a new set of reports. Using this technique we were able to add 39 patterns positive for appendicitis and 3 patterns negative for appendicitis.

Results

Our test set consists of 400 radiology reports from UT Southwestern Medical Center. The authors ST and TB annotated the reports as positive or negative for appendicitis. Using our lexicons for anatomical terms near the appendix and inflammation terms, we created four groups of 100 reports each: (1) reports containing at least one term from both lexicons, (2) reports containing an inflammation term, but not an anatomical term, (3) reports containing an anatomical term, but not an inflammation term, and (4) reports containing neither inflammation nor anatomical terms. This resulted in a more balanced training set, since many of the initial reports were not relevant to appendicitis.

Evaluation

We evaluated our approach using the F 1 measure. The F 1 measure is a metric for evaluating both the precision and recall of a classification method. The formula is . Table 2 shows the results of our evaluation. The evaluation was performed using a 5-fold cross validation. The testing set has 87 positive reports and 313 reports negative for indicating appendicitis. Therefore an approach which marked all reports as non-indicative of appendicitis would achieve an accuracy of only 78.3%, while our method achieved an accuracy of 93.8%, a significantly better result. We also evaluated an approach which only marked documents as indicating appendicitis if the word appendicitis appeared in the report. The results show that 74.71% of reports indicative for appendicitis mention the word appendicitis . The F1 measure of such a system is a full nine points lower than our approach. The patterns added by the automatic pattern discovery method contributed to a gain in F1 of almost 1.6 points.

Table 2:

Evaluation for the detection of indications of appendicitis in radiology reports.

Method	Accuracy	Precision	Recall	F1

Always “Non-indicative”	78.3%
“Appendicitis” only	90.5%	78.31	74.71	76.47
Manual patterns only	93.3%	83.52	87.36	85.39
Manual and automatically generated patterns	94.3%	91.14	82.76	86.75

Error Analysis

While our approach performs well, it does still make some errors in judging reports for indications of appendicitis. Given the many components comprising the approach, it is informative to analyze why errors are made. One source of errors arises from the lack of section identification. Reports which mention evaluate for appendicitis in the history section, but which later indicate a normal appendix are marked as indicative of appendicitis. Several more rounds of interactions with the doctors would be necessary to fine-tune our lexicons and patterns. For example, contrast is somewhat erroneously included as a term of inflammation. There are two types of contrast often given for a CT scan. Oral contrast to define the lumen of the stomach and intestines and IV contrast which opacifies the blood. If there is ”contrast enhancement” or just ”enhancement” of a given area it is a sign that there is increased blood flow to that region possible caused by an infection or inflammatory process. The term contrast is often used in contexts such as: The appendix is filled with oral contrast and there is no evidence of small or large bowel obstruction . In this context contrast should be understood to mean that oral contrast was able to freely enter an fill the appendix and hence there is nothing blocking flow into the appendix.

Conclusion

In this paper we described an approach for identifying radiology reports which indicate appendicitis. The approach is based on recognizing direct assertions of appendicitis and indirect evidence such as indications of inflammation near the appendix. These indications are detected through the use of patterns which capture the syntactic dependency structure of the text. Various linguistic phenomenon such as negation and conjunctions are handled. Each report is automatically categorized as indicative or non-indicative of appendicitis by combining all relevant statements found within the reports. Direct indications of appendicitis are given more influence in the decision and indications which are general or specific to anatomical structures other than the appendix are given less influence. Our evaluation shows that our approach can identify reports consistent with appendicitis with a precision of 84%, a recall of 87% and an F1 score of 85. Error analysis has revealed promising directions for relatively easy improvement including section identification and continued input from the doctors. The automatic identification of actionable findings in radiology reports can lead to improvements in patient outcomes and quality of care. One application could be an interpretive software layer which could alert the referring clinician about radiology reports containing language associated with a given disease process. A second application would be enabling the radiologist to ensure that a given outcome resulted from the language they used.

5 in total

1. SNOMED clinical terms: overview of the development process and project status.

Authors: M Q Stearns; C Price; K A Spackman; A Y Wang
Journal: Proc AMIA Symp Date: 2001

2. A reference ontology for biomedical informatics: the Foundational Model of Anatomy.

Authors: Cornelius Rosse; José L V Mejino
Journal: J Biomed Inform Date: 2003-12 Impact factor: 6.317

3. Communication gaffes: a root cause of malpractice claims.

Authors: Beth Huntington; Nettie Kuhn
Journal: Proc (Bayl Univ Med Cent) Date: 2003-04

4. A novel hybrid approach to automated negation detection in clinical radiology reports.

Authors: Yang Huang; Henry J Lowe
Journal: J Am Med Inform Assoc Date: 2007-02-28 Impact factor: 4.497

5. The ACR guideline on communication: to be or not to be, that is the question.

Authors: Leonard L Lucey; David C Kushner
Journal: J Am Coll Radiol Date: 2010 Impact factor: 5.532

5 in total

7 in total

1. Understanding spatial language in radiology: Representation framework, annotation, and spatial relation extraction from chest X-ray reports using deep learning.

Authors: Surabhi Datta; Yuqi Si; Laritza Rodriguez; Sonya E Shooshan; Dina Demner-Fushman; Kirk Roberts
Journal: J Biomed Inform Date: 2020-06-18 Impact factor: 6.317

2. Rad-SpatialNet: A Frame-based Resource for Fine-Grained Spatial Relations in Radiology Reports.

Authors: Surabhi Datta; Morgan Ulinski; Jordan Godfrey-Stovall; Shekhar Khanpara; Roy F Riascos-Castaneda; Kirk Roberts
Journal: LREC Int Conf Lang Resour Eval Date: 2020-05

3. Automated Generation of Synoptic Reports from Narrative Pathology Reports in University Malaya Medical Centre Using Natural Language Processing.

Authors: Wee-Ming Tan; Kean-Hooi Teoh; Mogana Darshini Ganggayah; Nur Aishah Taib; Hana Salwani Zaini; Sarinder Kaur Dhillon
Journal: Diagnostics (Basel) Date: 2022-04-01

7. Automated tracking of emergency department abdominal CT findings during the COVID-19 pandemic using natural language processing.

Authors: Matthew D Li; Peter A Wood; Tarik K Alkasab; Michael H Lev; Jayashree Kalpathy-Cramer; Marc D Succi
Journal: Am J Emerg Med Date: 2021-05-27 Impact factor: 4.093