| Literature DB >> 35717401 |
Ricardo Bigolin Lanfredi1, Mingyuan Zhang2, William F Auffermann3, Jessica Chan3, Phuong-Anh T Duong3, Vivek Srikumar4, Trafton Drew5, Joyce D Schroeder3, Tolga Tasdizen6.
Abstract
Deep learning has shown recent success in classifying anomalies in chest x-rays, but datasets are still small compared to natural image datasets. Supervision of abnormality localization has been shown to improve trained models, partially compensating for dataset sizes. However, explicitly labeling these anomalies requires an expert and is very time-consuming. We propose a potentially scalable method for collecting implicit localization data using an eye tracker to capture gaze locations and a microphone to capture a dictation of a report, imitating the setup of a reading room. The resulting REFLACX (Reports and Eye-Tracking Data for Localization of Abnormalities in Chest X-rays) dataset was labeled across five radiologists and contains 3,032 synchronized sets of eye-tracking data and timestamped report transcriptions for 2,616 chest x-rays from the MIMIC-CXR dataset. We also provide auxiliary annotations, including bounding boxes around lungs and heart and validation labels consisting of ellipses localizing abnormalities and image-level labels. Furthermore, a small subset of the data contains readings from all radiologists, allowing for the calculation of inter-rater scores.Entities:
Mesh:
Year: 2022 PMID: 35717401 PMCID: PMC9206650 DOI: 10.1038/s41597-022-01441-z
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 8.501
Fig. 1Overview of the steps in the building of the dataset.
Inter-rater scores for validation of the quality of the data.
| Label | FK P1 | FK P2 | IoU P1 | IoU P2 |
|---|---|---|---|---|
| Airway wall thickening (P1) | 0.03 ± 0.20 | 0.28 (n = 4) | ||
| Atelectasis (P1, P2) | 0.34 ± 0.05 | 0.25 ± 0.08 | 0.29 ± 0.03 (n = 33) | 0.37 ± 0.04 (n = 19) |
| Consolidation (P1, P2) | 0.36 ± 0.07 | 0.50 ± 0.07 | 0.36 ± 0.04 (n = 25) | 0.37 ± 0.04 (n = 17) |
| Emphysema (P1) & High lung volume/emphysema (P2) | 0.26 ± 0.32 | 0.10 ± 0.34 | 0.43 (n = 3) | 0.53 (n = 2) |
| Enlarged cardiac silhouette (P1, P2) | 0.56 ± 0.07 | 0.55 ± 0.08 | 0.75 ± 0.02 (n = 20) | 0.75 ± 0.02 (n = 20) |
| Enlarged hilum (P2) | −0.03 ± 0.36 | Undefined | ||
| Fibrosis (P1) & Interstitial lung disease (P2) | 0.29 ± 0.44 | 0.16 ± 0.57 | 0.53 (n = 1) | 0.23 (n = 1) |
| Fracture (P1) & Acute fracture (P2) | 0.25 ± 0.25 | 0.71 ± 0.36 | 0.42 (n = 4) | 0.21 (n = 1) |
| Groundglass opacity (P1, P2) | 0.10 ± 0.17 | 0.21 ± 0.11 | 0.42 (n = 6) | 0.38 ± 0.05 (n = 16) |
| Hiatal hernia (P2) | Undefined | Undefined | ||
| Mass (P1) | −0.01 ± 0.70 | Undefined | ||
| Nodule (P1) | 0.29 ± 0.25 | 0.38 (n = 2) | ||
| Lung nodule or mass (P2) | 0.36 ± 0.49 | 0.83 (n = 1) | ||
| Pleural abnormality (P2) | 0.67 ± 0.07 | 0.27 ± 0.02 (n = 18) | ||
| Pleural effusion (P1) | 0.53 ± 0.06 | 0.37 ± 0.03 (n = 22) | ||
| Pleural thickening (P1) | 0.06 ± 0.40 | 0.29 (n = 1) | ||
| Pneumothorax (P1, P2) | 0.55 ± 0.25 | 0.62 ± 0.28 | 0.41 (n = 3) | 0.27 (n = 3) |
| Pulmonary edema (P1, P2) | 0.22 ± 0.13 | 0.10 ± 0.14 | 0.25 ± 0.03 (n = 11) | 0.36 ± 0.06 (n = 11) |
| Quality issue (P1) | 0.02 ± 0.30 | |||
| Support devices (P1, P2) | 0.77 ± 0.05 | 0.47 ± 0.06 | ||
| Wide mediastinum (P1) & Abnormal mediastinal contour (P2) | 0.23 ± 0.34 | 0.05 ± 0.25 | 0.57 (n = 2) | 0.58 (n = 3) |
For phases 1 (P1) and 2 (P2), we present reliability on image-level labels, calculated using Fleiss’ Kappa (FK), and average IoU of the abnormality ellipses. All scores are paired with standard errors. The number of samples given for the IoU values represents the number of independent CXRs used in the calculation. The phases in which each label was present is listed in parenthesis. Table cells are left blank for labels that were not present in a specific phase.
Statistics of each phase of data collection and the subset of the MIMIC-CXR dataset from which images were sampled.
| Dataset | Phase 1(P1) | Phase 2(P2) | Phase 3(P3) | MIMIC-CXR filtered (M) |
|---|---|---|---|---|
| # cases | 295 | 250 | 2,507 | 194,495 |
| # cases studies with eye-tracking data | 285 | 240 | 2,507 | 0 |
| # MIMIC-CXR images | 59 | 50 | 2,507 | |
| # subjects | 58 | 50 | 2,110 | 60,018 |
| % female | 63.8 | 54.0 | 50.7 | 53.9 |
| % male | 36.2 | 46.0 | 49.1 | 45.7 |
| % test set | 15.3 | 14.0 | 20.2 | 1.4 |
| % Normal Radiograph (P1, P2, P3) & No Finding (M) | 18.0 | 24.4 | 22.8 | 32.9 |
| % Abnormal mediastinal contour (P2,P3) & Wide mediastinum (P1) | 2.7 | 5.6 | 2.7 | |
| % Acute fracture (P2,P3) & Fracture (P1, M) | 5.1 | 2.8 | 1.0 | 1.9 |
| % Airway wall thickening (P1) | 7.1 | |||
| % Atelectasis (P1,P2,P3,M) | 41.4 | 27.6 | 25.8 | 20.5 |
| % Cardiomegaly (M) | 19.8 | |||
| % Consolidation (P1,P2,P3,M) | 28.5 | 28.8 | 25.9 | 4.7 |
| % Enlarged cardiac silhouette (P1,P2,P3) | 28.1 | 28.4 | 21.8 | |
| % Enlarged Cardiomediastinum (M) | 3.2 | |||
| % Enlarged hilum (P2,P3) | 2.8 | 1.9 | ||
| % Groundglass opacity (P1,P2,P3) | 9.2 | 18.8 | 12.6 | |
| % Hiatal hernia (P2,P3) | 0.0 | 0.9 | ||
| % High lung volume/emphysema (P2,P3) & Emphysema (P1) | 3.1 | 3.2 | 2.9 | |
| % Interstitial lung disease (P2,P3) & Fibrosis (P1) | 1.7 | 1.2 | 1.0 | |
| % Lung nodule or mass (P2,P3) & Lung Lesion (M) | 1.6 | 5.1 | 2.7 | |
| % Lung Opacity (M) | 22.8 | |||
| % Mass (P1) | 0.7 | |||
| % Nodule (P1) | 4.7 | |||
| % Other (P1,P2,P3) | 13.9 | 8.8 | 6.0 | |
| % Pleural abnormality (P2,P3) | 30.0 | 29.5 | ||
| % Pleural Effusion (P1,M) | 31.2 | 24.2 | ||
| % Pleural thickening (P1) | 2.0 | |||
| % Pleural Other (M) | 0.9 | |||
| % Pneumonia (M) | 7.2 | |||
| % Pneumothorax (P1,P2,P3,M) | 4.7 | 4.4 | 2.9 | 4.6 |
| % Pulmonary edema (P1,P2,P3) & Edema (M) | 13.9 | 13.6 | 13.7 | 12.1 |
| % Quality issue (P1) | 3.4 | |||
| % Support devices (P1,P2,P3,M) | 36.9 | 34.8 | 44.8 | 29.3 |
The dataset where each label was present is shown inside parentheses. “Normal radiograph” represents CXRs for which no other label was selected. Table cells are left blank for labels that were not present in that dataset. For how the labels of the different datasets are related, check Fig. 5.
Fig. 5Hierarchy of the labels of all the phases of our dataset and the labels of the MIMIC-CXR dataset. Arrows point to a subset of the originating label. The datasets to which each label belongs are listed inside parentheses, according to P1 (Phase 1), P2 (Phase 2), P3 (Phase 3), and M (MIMIC-CXR). Labels that do not have a hierarchical relationship with other labels are not connected to any arrows.
Fig. 2Screens of the data-collection interface in the sequence they are presented to a radiologist, including instruction screens (a,c,e,h,j,n), calibration of pupil size (b), dictation of reports (d), choice of global labels (f,g), selection of ellipses and certainties (i), drawing of lung/heart box (k), and editing of transcription (l,m). Digital visualization is recommended for reading the content.
Fig. 3Example of the localization information provided by the eye-tracking data and how it was validated. (a) CXR read by the radiologist. (b) Union of the abnormality ellipses selected by radiologists used to compare against heatmaps. (c) Heatmap generated by the fixations made by the radiologist while dictating the report. (d) Average heatmap for all radiologists and CXRs read in phases 1 and 2, normalized to the location of lung and heart of the CXR.
Fig. 4Time analysis of the correlation between each mention of a label and what percentage of fixations were located inside the ellipses that localized each respective label. We present two lines, one as a function of time and another as a function of the counting of sentences and pauses before the mention. The step lines represent the percentage for separate data bins. We also draw the 95% confidence interval for each bin in each line, calculated with bootstrapping. The number of fixations used to calculate each bin is shown in separate lines.
| Measurement(s) | gaze locations • radiology report • abnormality localizations • chest localization (lung and heart) • abnormality presence |
| Technology Type(s) | eye tracking device • Microphone Device • User Interface Device |
| Sample Characteristic - Organism | Homo sapiens |