| Literature DB >> 32904141 |
Surabhi Datta1, Kirk Roberts1.
Abstract
In this paper, we present a dataset consisting of 2000 chest X-ray reports (available as part of the Open-i image search platform) annotated with spatial information. The annotation is based on Spatial Role Labeling. The information includes annotating a radiographic finding, its associated anatomical location, any potential diagnosis described in connection to the spatial relation (between finding and location), and any hedging phrase used to describe the certainty level of a finding/diagnosis. All these annotations are identified with reference to a spatial expression (or Spatial Indicator) that triggers a spatial relation in a sentence. The spatial roles used to encode the spatial information are Trajector, Landmark, Diagnosis, and Hedge. In total, there are 1962 Spatial Indicators (mainly prepositions). There are 2293 Trajectors, 2167 Landmarks, 455 Diagnosis, and 388 Hedges in the dataset. This annotated dataset can be used for developing automatic approaches targeted toward spatial information extraction from radiology reports which then can be applied to numerous clinical applications. We utilize this dataset to develop deep learning-based methods for automatically extracting the Spatial Indicators as well as the associated spatial roles [1].Entities:
Keywords: Chest radiology; Information extraction; Natural language processing; Radiology report; Spatial Role Labeling; Spatial relations
Year: 2020 PMID: 32904141 PMCID: PMC7451761 DOI: 10.1016/j.dib.2020.106056
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Annotated dataset descriptions.
| Document | Represents a chest X-ray report |
| Text | Raw text of the report |
| Annotations | Contains the processed text and spatial annotations for a report |
| Token | Contains start character and number of characters of a token |
| Sentence | Contains start token number and number of included tokens to identify a sentence |
| RadSpRLRelation | Indicates the presence of a spatial relation. Includes the start token number and number of tokens of a spatial expression ( |
| Radiological entity (usually a radiographic finding whose position is described | |
| Anatomical location of a | |
| Potential diagnosis associated with a spatial relation | |
| Any uncertainty phrase used to describe a finding or diagnosis | |
Spatial indicator statistics.
| Total number of | 1962 |
| Number of distinct | 29 |
| 765 | |
| 526 | |
| 176 | |
| 141 | |
| 102 | |
Most frequent terms for each spatial role.
| 279 (nodular, streaky, interstitial, focal airspace, focal, airspace, vague, patchy, bibasilar, ill-defined, mild streaky, subtle increased, few small nodular, round, scattered, rounded nodular, abnormal, vague nodular, patchy airspace, bilateral, bandlike, vague patchy, dense, minimal, minimal streaky, streaky basilar, alveolar) | ||
| 205 (mild, minimal, diffuse, moderate, severe, multilevel, chronic, advanced) | ||
| 63 (moderate right-sided, large) | ||
| 63 (large, small bilateral, large right) | ||
| 57 (focal, focal airspace, dense) | ||
| Total Distinct | 861 | |
| 285 (also includes lungs) | ||
| 146 (mid, lower) | ||
| 111 | ||
| 43 | ||
| 40 | ||
| Total Distinct | 570 | |
| 53 (pleural, pleural-parenchymal, chronic) | ||
| 83 (subsegmental, focal, chronic subsegmental, foci of subsegmental, lingular) | ||
| 21 (focal) | ||
| 15 (calcified, partially calcified) | ||
| 11 | ||
| Total Distinct | 224 | |
| 40 | ||
| 39 | ||
| 38 (focal) | ||
| 34 (also includes XXXX represents, XXXX representing, XXXX representative of) | ||
| 21 | ||
| Total Distinct | 80 | |
Fig. 1Most common associated landmarks and trajectors for three frequent Diagnosis. [n LM] indicates that a particular diagnosis is connected to a total of n landmarks, while [n TR] indicates that a particular diagnosis is connected to a total of n trajectors.
Fig. 3Most common associated diagnoses and trajectors for three frequent Landmark. [n DG] indicates that a particular landmark is connected to a total of n dagnoses, while [n TR] indicates that a particular landmark is connected to a total of n trajectors.
Overlapping terms between two spatial roles.
| Distinct overlapping terms ( | 45 |
| Distinct overlapping terms ( | 73 |
| Same terms with equal frequency ( | 38 |
| 31 | |
| 5 | |
| 4 | |
Analysis of Hedge terms.
| Frequent | |
| Frequent | |
Fig. 2Most common associated diagnoses and landmarks for three frequent Trajector. [n DG] indicates that a particular trajector is connected to a total of n dagnoses, while [n LM] indicates that a particular trajector is connected to a total of n landmarks.
| Subject | |
| Specific subject area | |
| Type of data | |
| How data were acquired | |
| Data format | |
| Parameters for data collection | |
| Description of data collection | |
| Data source location | |
| Data accessibility | Repository name: Mendeley data repository |
| Related research article |