| Literature DB >> 36253581 |
Wilson Lau1, Kevin Lybarger2, Martin L Gunn3, Meliha Yetisgen2.
Abstract
Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, and count. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9-93.4% F1 for finding triggers and 72.0-85.6% F1 for argument roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6% for finding triggers and 79.1-89.7% for argument roles, demonstrating that the model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.Entities:
Keywords: Deep learning; Event extraction; Information extraction; Natural language processing
Year: 2022 PMID: 36253581 PMCID: PMC9576130 DOI: 10.1007/s10278-022-00717-5
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.903
Annotation schema of Lesion finding and Medical Problem finding
| Lesion finding | Lesion description (trigger) | Span-only | - | “mass,” “lesion,” “nodule” |
| Anatomy | Span-only | - | “left lower lobe” | |
| Assertion | Span-with-value | Present ( | “no,” “possible” | |
| Characteristics | Span-only | - | “hypodense,” “septal” | |
| Count | Span-only | - | “2,” “numerous,” “multiple” | |
| Size | Span-only | - | “4.1 × 3.1 cm,” “small” | |
| Size trend | Span-with-value | New, increasing, decreasing, no-change | “stable,” “unchanged” | |
| Medical Problem finding | Medical problem (trigger) | Span-only | - | “atherosclerotic calcifications” |
| Anatomy | Span-only | - | “abdominal aorta,” “right kidney” | |
| Assertion | Span-with-value | Present ( | “no,” “possible” |
Fig. 1Example annotations for Lesion and Medical Problem events
Fig. 2Two Medical Problem finding event annotations with equivalent triggers
Fig. 3Two Lesion finding event annotations with partially matched span-only arguments
Fig. 4Two Lesion finding event annotations with the same value for Lesion-Size-Trend
Event annotation statistics
| Lesion-Description | 2344 | - | |
| Lesion-Anatomy | 2039 | Lesion-Anatomy | 2187 |
| Lesion-Assertion | 945 | Lesion-Assertion | 1008 |
| Lesion-Characteristic | 1931 | Lesion-Characteristic | 1968 |
| Lesion-Count | 235 | Lesion-Count | 237 |
| Lesion-Size | 816 | Lesion-Size (Past) | 94 |
| Lesion-Size (Present) | 736 | ||
| Lesion-Size-Trend | 371 | Lesion-Size-Trend | 387 |
| Medical-Problem | 8065 | - | |
| Medical-Anatomy | 2990 | Medical-Anatomy | 3952 |
| Medical-Assertion | 2793 | Medical-Assertion | 3454 |
Gold standard corpus statistics
| Number of words per report | 50 | 327 | 288 | 1383 |
| Number of events per report | 2 | 21 | 18 | 130 |
| Number of Medical Problem events per report | 0 | 16 | 13 | 129 |
| Number of Lesion events per report | 0 | 5 | 3 | 36 |
| Number of arguments per Medical Problem event | 0 | 1 | 1 | 5 |
| Number of arguments per Lesion event | 0 | 3 | 3 | 16 |
Fig. 5Architecture of the NeuroNER BiLSTM-CRF model
Fig. 6Architecture of the BERT NER model
Fig. 7Architecture of the BERT RE model
Entity extraction results (average precision, recall, and F1 in %), based on 10 runs of fivefold cross-validation. The numbers in brackets are 95% confidence intervals of the averages. The best F1 values are in bold
| Medical-Problem | 88.8 | 84.9 | 86.7 (± 0.45) | 89.1 | 83.9 | 86.4 (± 0.37) | 90.5 | 83.6 | 86.8 (± 0.37) | 91.3 | 85.0 | |
| Medical-Anatomy | 79.1 | 79.9 | 79.3 (± 0.92) | 82.3 | 77.9 | 79.9 (± 0.87) | 83.8 | 77.3 | 80.3 (± 0.84) | 85.7 | 78.5 | |
| Medical-Assertion | 85.6 | 84.5 | 84.9 (± 0.79) | 86.9 | 85.7 | 86.3 (± 0.70) | 87.8 | 84.7 | 86.1 (± 0.63) | 88.5 | 86.3 | |
| Lesion-Description | 87.2 | 87.9 | 87.5 (± 0.71) | 89.1 | 86.8 | 87.9 (± 0.66) | 89.0 | 87.6 | 88.2 (± 0.62) | 90.0 | 88.4 | |
| Lesion-Anatomy | 80.2 | 78.6 | 79.0 (± 0.92) | 85.5 | 76.5 | 80.6 (± 0.94) | 85.8 | 76.8 | 80.8 (± 0.89) | 86.8 | 80.7 | |
| Lesion-Assertion | 81.3 | 72.1 | 76.2 (± 1.55) | 86.0 | 70.0 | 76.8 (± 1.60) | 85.6 | 70.5 | 77.1 (± 1.48) | 86.5 | 73.6 | |
| Lesion-Characteristic | 76.6 | 72.6 | 74.1 (± 1.36) | 81.8 | 70.5 | 75.4 (± 1.22) | 82.8 | 71.3 | 76.3 (± 1.11) | 84.2 | 73.6 | |
| Lesion-Size | 84.1 | 85.8 | 84.4 (± 1.88) | 91.1 | 84.2 | 87.3 (± 1.37) | 89.1 | 84.4 | 86.4 (± 1.56) | 90.7 | 88.2 | |
| Lesion-Count | 89.1 | 85.6 | 86.7 (± 2.20) | 90.9 | 86.6 | 88.0 (± 2.15) | 92.0 | 88.0 | (± 2.07) | 91.0 | 87.5 | 88.7 (± 2.16) |
| Lesion-Size-Trend | 69.0 | 63.2 | 65.5 (± 3.20) | 78.0 | 60.7 | 67.6 (± 3.14) | 75.2 | 59.5 | 65.5 (± 2.98) | 77.3 | 63.6 | |
| Overall | 84.2 | 82.1 | 83.1 (± 0.37) | 86.7 | 80.9 | 83.7 (± 0.36) | 87.7 | 80.6 | 84.0 (± 0.28) | 88.8 | 82.4 | |
End-to-end argument role extraction results (average precision, recall, and F1 in %), based on 10 runs of fivefold cross-validation. The numbers in brackets are 95% confidence intervals of the averages. The best F1 values are in bold
| Span-only | Medical-Anatomy | 78.4 | 67.1 | 72.1 (± 1.12) | 80.0 | 66.6 | 72.5 (± 1.02) | 81.4 | 68.3 | |
| Span-with-value | Medical-Assertion | 86.8 | 82.3 | 84.5 (± 0.54) | 87.5 | 81.7 | 84.4 (± 0.43) | 88.6 | 83.0 | |
| Span-only | Lesion-Anatomy | 83.6 | 67.7 | 74.7 (± 1.15) | 84.2 | 68.1 | 75.1 (± 0.98) | 84.7 | 71.3 | |
| Lesion-Characteristic | 80.4 | 65.2 | 71.6 (± 1.32) | 81.5 | 66.0 | 72.6 (± 1.21) | 82.6 | 67.9 | ||
| Lesion-Count | 87.0 | 81.6 | 83.4 (± 2.11) | 89.8 | 83.6 | 88.1 | 83.3 | 85.1 (± 2.09) | ||
| Lesion-Size | 85.1 | 59.9 | 69.9 (± 2.56) | 85.5 | 60.6 | 70.5 (± 2.10) | 86.4 | 62.5 | ||
| Span-with-value | Lesion-Assertion | 85.4 | 79.7 | 82.4 (± 0.69) | 84.9 | 80.0 | 82.3 (± 0.76) | 86.1 | 81.2 | |
| Lesion-Size-Trend | 82.1 | 71.4 | 76.0 (± 1.94) | 80.3 | 70.4 | 74.4 (± 2.21) | 81.9 | 74.1 | ||
Overall extraction performance for each type of arguments (average precision, recall, and F1 in %)
| Trigger | 90.9 | 92.1 | 91.5 (± 0.24) | 91.5 | 92.2 | 91.8 (± 0.26) | 92.6 | 93.2 | |
| Span-only | 79.8 | 67.1 | 72.8 (± 0.71) | 81.1 | 67.0 | 73.3 (± 0.66) | 82.3 | 69.0 | |
| Span-with-value | 86.3 | 81.2 | 83.6 (± 0.46) | 86.3 | 76.3 | 83.5 (± 0.41) | 87.6 | 82.1 | |
Fig. 8Examples of long text spans being extracted into multiple entities