| Literature DB >> 31011155 |
Ivo M Baltruschat1,2, Hannes Nickisch3, Michael Grass3, Tobias Knopp4,5, Axel Saalbach3.
Abstract
The increased availability of labeled X-ray image archives (e.g. ChestX-ray14 dataset) has triggered a growing interest in deep learning techniques. To provide better insight into the different approaches, and their applications to chest X-ray classification, we investigate a powerful network architecture in detail: the ResNet-50. Building on prior work in this domain, we consider transfer learning with and without fine-tuning as well as the training of a dedicated X-ray network from scratch. To leverage the high spatial resolution of X-ray data, we also include an extended ResNet-50 architecture, and a network integrating non-image data (patient age, gender and acquisition type) in the classification process. In a concluding experiment, we also investigate multiple ResNet depths (i.e. ResNet-38 and ResNet-101). In a systematic evaluation, using 5-fold re-sampling and a multi-label loss function, we compare the performance of the different approaches for pathology classification by ROC statistics and analyze differences between the classifiers using rank correlation. Overall, we observe a considerable spread in the achieved performance and conclude that the X-ray-specific ResNet-38, integrating non-image data yields the best overall results. Furthermore, class activation maps are used to understand the classification process, and a detailed analysis of the impact of non-image features is provided.Entities:
Mesh:
Year: 2019 PMID: 31011155 PMCID: PMC6476887 DOI: 10.1038/s41598-019-42294-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Four examples of the ChestX-ray14 dataset. ChestX-ray14 consists of 112,120 frontal chest X-rays from 30,805 patients. All images are labeled with up to 14 pathologies or “No Finding”. The dataset does not only include acute findings, as the pneumothorax in figure (c), but also treated patients with a drain as “pneumothorax” (d).
Architecture of the original, off-the-shelf, and fine-tuned ResNet-50.
| Layer name | Output size | Original 50-layer | Off-the-shelf | Fine-tuned |
|---|---|---|---|---|
| conv1 | 112 × 112 | 7 × 7, 64-d, stride 2 | same |
|
| pooling1 | 56 × 56 | 3 × 3, 64-d, max pool, stride 2 | same | same |
| conv2_x | 56 × 56 |
| same |
|
| conv3_0 | 28 × 28 |
| same |
|
| conv3_x | 28 × 28 |
| same |
|
| conv4_0 | 14 × 14 |
| same |
|
| conv4_x | 14 × 14 |
| same |
|
| conv5_0 | 7 × 7 |
| same |
|
| conv5_x | 7 × 7 |
| same |
|
| pooling2 | 1 × 1 | 7 × 7, 2048-d, average pool, stride 1 | same | same |
| dense | 1 × 1 | 1000-d, dense-layer |
| |
| loss | 1 × 1 | 1000-d, softmax |
| |
In our experiments, we use the ResNet-50 architecture and this table shows differences between the original architecture and ours (off-the-shelf and fine-tuned ResNet-50). If there is no difference to the original network, the word “same” is written in the table. The violet and bold text emphasizes, which parts of the network are changed for our application. All layers do employ automatic padding (i.e. depending on the kernel size) to keep spatial size the same. The conv3_0, conv4_0, and conv5_0 layers perform a down-sampling of the spatial size with a stride of 2.
Figure 2Patient-data adapted model architecture: ResNet-50-large-meta. Our architecture is based on the ResNet-50 model. Because of the enlarged input size, we added a max-polling layer after the first three ResBlocks. In addition, we fused image features and patient features at the end of our model to incorporate patient information.
Overview of label distributions in the ChestX-ray14 dataset.
| (a) Diseases | |||
|---|---|---|---|
| Pathology | True | False | Prevalence [%] |
| Cardiomegaly | 2,776 | 109,344 | 2.48 |
| Emphysema | 2,516 | 109,604 | 2.24 |
| Edema | 2,303 | 109,817 | 2.05 |
| Hernia | 227 | 111,893 | 0.20 |
| Pneumothorax | 5,302 | 106,818 | 4.73 |
| Effusion | 13,317 | 98,803 | 11.88 |
| Mass | 5,782 | 106,338 | 5.16 |
| Fibrosis | 1,686 | 110,434 | 1.50 |
| Atelectasis | 11,559 | 100,561 | 10.31 |
| Consolidation | 4,667 | 107,453 | 4.16 |
| Pleural Thicken. | 3,385 | 108,735 | 3.02 |
| Nodule | 6,331 | 105,789 | 5.65 |
| Pneumonia | 1,431 | 110,689 | 1.28 |
| Infiltration | 19,894 | 92,226 | 17.74 |
Figure 3Distribution of patient age in the ChestX-ray14 dataset. Each bin covers a width of two years. The average patient age is 46.87 years with a standard deviation of 16.60 years.
Overview of label distributions in the ChestX-ray14 dataset.
| (b) Meta-information | |||
|---|---|---|---|
| Female | Male | Ratio | |
| Patient Gender | 63,340 | 48,780 | 1.30 |
| PA | AP | Ratio | |
| View Position | 67,310 | 44,810 | 1.50 |
AUC result overview for all our experiments.
| Pathology | Without non-image features | With non-image features | ||||||
|---|---|---|---|---|---|---|---|---|
| OTS | FT | 1channel | large | OTS | FT | 1channel | large | |
|
| 72.7 ± 1.8 | 88.5 ± 0.7 | 88.9 ± 0.5 | 89.7 ± 0.3 | 75.9 ± 1.4 | 88.4 ± 0.8 | 89.8 ± 0.8 | |
|
| 77.8 ± 2.1 | 89.2 ± 1.0 | 87.0 ± 0.8 | 88.3 ± 1.3 | 79.8 ± 1.9 | 87.4 ± 1.3 | 89.1 ± 1.2 | |
|
| 84.4 ± 0.6 | 88.8 ± 0.5 | 85.7 ± 0.5 | 89.0 ± 0.6 | 88.9 ± 0.3 | |||
|
| 78.8 ± 1.4 | 85.5 ± 3.8 | 88.1 ± 4.2 | 87.5 ± 4.5 | 81.9 ± 2.5 | 88.2 ± 3.2 | 89.3 ± 4.4 | |
|
| 77.3 ± 1.3 | 85.7 ± 0.9 | 85.9 ± 0.9 | 79.1 ± 1.2 | 86.5 ± 0.6 | 85.4 ± 0.7 | 85.9 ± 1.1 | |
|
| 79.4 ± 0.4 | 87.1 ± 0.2 | 80.6 ± 0.4 | 87.2 ± 0.3 | 87.3 ± 0.3 | |||
|
| 66.8 ± 0.6 | 82.2 ± 1.0 | 83.3 ± 0.6 | 68.6 ± 0.6 | 82.2 ± 1.0 | 83.3 ± 0.7 | 83.2 ± 0.3 | |
|
| 72.0 ± 0.9 | 79.9 ± 0.8 | 79.2 ± 1.6 | 73.9 ± 0.8 | 79.6 ± 0.5 | 78.9 ± 0.5 | ||
|
| 71.8 ± 0.6 | 79.9 ± 0.4 | 79.2 ± 0.7 | 73.2 ± 0.7 | 80.1 ± 0.6 | 79.3 ± 0.6 | 79.1 ± 0.4 | |
|
| 74.3 ± 0.3 | 79.5 ± 0.5 | 80.0 ± 0.3 | 75.3 ± 0.3 | 79.6 ± 0.5 | 80.4 ± 0.5 | 80.0 ± 0.7 | |
| 68.8 ± 1.0 | 78.4 ± 0.9 | 78.0 ± 1.1 | 70.8 ± 1.1 | 78.6 ± 1.1 | 78.2 ± 1.3 | 77.1 ± 1.3 | ||
|
| 65.0 ± 0.8 | 72.6 ± 0.9 | 73.3 ± 0.8 | 75.1 ± 1.3 | 66.5 ± 0.7 | 74.7 ± 0.6 | 74.0 ± 0.7 | |
|
| 66.4 ± 2.7 | 74.4 ± 1.6 | 74.3 ± 1.5 | 75.3 ± 2.2 | 68.3 ± 2.3 | 73.3 ± 1.3 | 74.8 ± 1.5 | |
|
| 65.9 ± 0.2 | 69.9 ± 0.6 | 67.0 ± 0.4 | 70.1 ± 0.5 | 70.0 ± 0.7 | |||
|
| 73.0 ± 1.1 | 81.7 ± 1.0 | 81.9 ± 0.9 | 82.1 ± 1.2 | 74.8 ± 1.1 | 82.0 ± 0.9 | 82.0 ± 1.0 | |
|
| 71.6 ± 0.3 | 76.9 ± 0.5 | 77.1 ± 0.4 | 72.5 ± 0.3 | 76.8 ± 0.4 | 77.1 ± 0.4 | 77.1 ± 0.3 | |
In this table, we present averaged results over all five splits and the calculated standard deviation (std) for each pathology. We divide our experiments into three categories. First, without and with non-image features. Second, transfer-learning with off-the-shelf (OTS) and fine-tuned (FT) models. Third, from scratch where “1channel” refers to same input size as in transfer-learning but changed number of channels. “large” means we changed the input dimensions to 448 × 448 × 1. For better comparison, we present the average AUC and the standard deviation over all pathologies in the last row. Bold text emphasizes the overall highest AUC value. Values are scaled by 100 for convenience.
Spearman’s rank correlation coefficient is calculated between all model pairs and is averaged over all five splits.
| Without | With | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| OTS | FT | 1channel | large | OTS | FT | 1channel | large | ||
| Without | OTS | — | 0.65 | 0.74 | 0.73 | 0.46 | 0.38 | 0.40 | 0.59 |
| FT | 0.65 | — | 0.81 | 0.80 | 0.38 | 0.42 | 0.43 | 0.64 | |
| 1channel | 0.74 | 0.81 | — | 0.93 | 0.41 | 0.43 | 0.47 | 0.71 | |
| large | 0.73 | 0.80 | 0.93 | — | 0.40 | 0.43 | 0.47 | 0.71 | |
| With | OTS | 0.46 | 0.38 | 0.41 | 0.40 | — | 0.32 | 0.33 | 0.39 |
| FT | 0.38 | 0.42 | 0.43 | 0.43 | 0.32 | — | 0.35 | 0.42 | |
| 1channel | 0.40 | 0.43 | 0.47 | 0.47 | 0.33 | 0.35 | — | 0.45 | |
| large | 0.59 | 0.64 | 0.71 | 0.71 | 0.39 | 0.42 | 0.45 | — | |
Our experiments are grouped into three categories. First, “Without” and “With” non-image features. Second, transfer-learning with off-the-shelf (OTS) and fine-tuned (FT) models. Third, from scratch where “1channel” refers to same input size as in transfer-learning but changed number of channels. “large” means we changed the input dimensions to 448 × 448 × 1. We identify three clusters: all models under “With”, models trained from scratch and “Without”, and the “OTS” model.
Figure 4Grad-CAM result for two example images. In the first one, we marked the location of the pneumothorax with a yellow box. As shown in the Grad-CAM image next to it, the models highest activation for the prediction is within the correct area. The second row shows a negative example where the highest activation, which was responsible for the final predication “pneumothorax”, is at the drain. This indicates that our trained CNN picked up drains as a main feature for “pneumothorax”. We marked the drain with yellow arrows.
Figure 5Comparison of our best model to other groups. We sort the pathologies with increasing average AUC over all groups. For our model, we report the minimum and maximum over all folds as error bar to illustrate the effect of splitting.
AUC result overview for our experiments on the official split. In this table, we present results for our best performing architecture with different depth (i.e. ResNet38-large-meta, ResNet50-large-meta, ResNet101-large-meta) and compare them to other groups.
| Pathology | Wang | Yao | Guendel | “-large-meta” | ||
|---|---|---|---|---|---|---|
| ResNet-38 | ResNet-50 | ResNet-101 | ||||
|
| 0.810 | 0.856 |
| 0.875 | 0.877 | 0.865 |
|
| 0.833 | 0.842 |
|
| 0.875 | 0.868 |
|
| 0.805 | 0.806 | 0.835 |
| 0.842 | 0.828 |
|
| 0.872 | 0.775 | 0.896 |
| 0.916 | 0.855 |
|
| 0.799 | 0.805 |
| 0.840 | 0.819 | 0.839 |
|
| 0.759 | 0.806 |
| 0.822 | 0.818 | 0.818 |
|
| 0.693 | 0.777 |
| 0.820 | 0.810 | 0.796 |
|
| 0.786 | 0.743 |
| 0.816 | 0.800 | 0.778 |
|
| 0.700 | 0.733 |
| 0.763 | 0.755 | 0.747 |
|
| 0.703 | 0.711 | 0.745 |
| 0.742 | 0.734 |
| 0.684 | 0.724 | 0.761 |
| 0.742 | 0.739 | |
|
| 0.669 | 0.724 |
| 0.747 | 0.736 | 0.738 |
|
| 0.658 | 0.684 |
| 0.714 | 0.703 | 0.694 |
|
| 0.661 | 0.673 |
| 0.694 | 0.694 | 0.686 |
|
| 0.745 | 0.761 |
| 0.806 | 0.795 | 0.785 |
|
| — | — | — | 0.727 | 0.725 | 0.720 |
Additionally we provide an average AUC over all pathologies in the last row. Bold text emphasizes the overall highest AUC value.