| Literature DB >> 35242769 |
Hossein Aboutalebi1,2, Maya Pavlova3, Hayden Gunraj3, Mohammad Javad Shafiee1,2,3, Ali Sabri4, Amer Alaref5,6, Alexander Wong1,2,3.
Abstract
Medical image analysis continues to hold interesting challenges given the subtle characteristics of certain diseases and the significant overlap in appearance between diseases. In this study, we explore the concept of self-attention for tackling such subtleties in and between diseases. To this end, we introduce, a multi-scale encoder-decoder self-attention (MEDUSA) mechanism tailored for medical image analysis. While self-attention deep convolutional neural network architectures in existing literature center around the notion of multiple isolated lightweight attention mechanisms with limited individual capacities being incorporated at different points in the network architecture, MEDUSA takes a significant departure from this notion by possessing a single, unified self-attention mechanism with significantly higher capacity with multiple attention heads feeding into different scales in the network architecture. To the best of the authors' knowledge, this is the first "single body, multi-scale heads" realization of self-attention and enables explicit global context among selective attention at different levels of representational abstractions while still enabling differing local attention context at individual levels of abstractions. With MEDUSA, we obtain state-of-the-art performance on multiple challenging medical image analysis benchmarks including COVIDx, Radiological Society of North America (RSNA) RICORD, and RSNA Pneumonia Challenge when compared to previous work. Our MEDUSA model is publicly available.Entities:
Keywords: COVID-19; chest X-ray (CXR); computer vision; deep neural net; diagnosis
Year: 2022 PMID: 35242769 PMCID: PMC8886730 DOI: 10.3389/fmed.2021.821120
Source DB: PubMed Journal: Front Med (Lausanne) ISSN: 2296-858X
Figure 1Architecture of the proposed multi-scale encoder-decoder self-attention (MEDUSA) and how it can be incorporated to a deep neural network. The global component of MEDUSA is fed by the input data which its output is connected to different scales through the network via the multi-scale specific module. The scale box here refers to the bilinear interpolation operation on the output of function based on the width and height of the feature map of the corresponding convolutional block. Here, we have only drew three convolutional blocks but the network can have an arbitrary number of convolutional blocks.
Figure 2Example chest X-ray images from the benchmark dataset.
Sensitivity, positive predictive value (PPV), and accuracy of the proposed network (MEDUSA) on the test data from the CXR benchmark dataset in comparison to other networks.
|
|
|
|
|
|---|---|---|---|
| ResNet-50 ( | 88.50 | 92.20 | 90.50 |
| COVID-Net ( | 93.50 |
| 94.00 |
| COVID-Net CXR-2 ( | 95.50 | 97.00 | 96.30 |
| SE-ResNet-50 ( | 90.50 | 98.90 | 94.75 |
| CBAM ( | 70.00 |
| 85.00 |
| MEDUSA |
| 99.00 |
|
Best results are highlighted in bold.
Confusion matrix of the proposed network (MEDUSA).
|
|
|
|
|---|---|---|
| Negative | 198 | 2 |
| Positive | 5 | 195 |
Figure 3Example chest X-ray images (A–D) from the COVIDx CXR-2 benchmark dataset overlaid by the global attention output from the proposed MEDUSA network. The red regions indicate higher attention while blue regions indicate lower global attention.
Figure 4Comparison between attention-enforced outputs at different convolutional blocks when using CBAM, SE Block, and MEDUSA self-attention mechanisms. Each row demonstrates the results of the attention mechanisms on a different layer of the ResNet-50 network architecture.
Ablation study. Sensitivity, positive predictive value (PPV), and accuracy from the ablation study for the proposed network (MEDUSA) in comparison to other networks.
|
|
|
|
|
|---|---|---|---|
| ResNet-50 ( | 88.50 | 92.20 | 90.50 |
| Seg Type 1 +ResNet-50 ( | 92.00 | 95.83 | 94.00 |
| Seg Type 2 +ResNet-50 ( | 94.50 | 96.43 | 95.50 |
| MEDUSA (Self-attention is disabled at test time) |
| 88.70 | 93.00 |
| MEDUSA | 97.50 |
|
|
Best results are highlighted in bold.
Figure 5Example chest X-ray images (labeled A–E) with different types of significant distortions and visual anomalies from the Radiological Society of North America (RSNA) Pneumonia Challenge dataset.
Sensitivity, positive predictive value (PPV), and accuracy of the proposed network (multi-scale encoder-decoder self-attention, MEDUSA) on the test data from the RSNA Pneumonia dataset in comparison to other networks.
|
|
|
|
|
|---|---|---|---|
| SE-ResNet-50 ( | 40.0 | 90.9 | 68.0 |
| CheXNet ( | 50.0 |
| 73.0 |
| CBAM ( | 68.0 | 77.3 | 74.0 |
| MEDUSA |
| 83.7 |
|
Best results are highlighted in bold.
Figure 6Impact of MEDUSA global attention on the images with distortion. The images (A–C) are corresponding to images (A–C) in Figure 5.
Figure 7Impact of MEDUSA local attention on the images with distortion. Columns 1, 2, and 3 correspond to images (A–C) in Figure 5, respectively.
Figure 8Example chest X-ray images from the RSNA RICORD COVID-19 Severity dataset.
Sensitivity, positive predictive value (PPV), and accuracy of the proposed network (MEDUSA) on the test data from the RSNA RICORD COVID-19 severity dataset in comparison to other networks.
|
|
|
|
|
|---|---|---|---|
| SE-ResNet-50 ( |
| 77.4 | 76.7 |
| CheXNet ( | 63.46 | 84.62 | 83.33 |
| CBAM ( | 84.0 | 73.0 | 70.0 |
| MEDUSA | 85.0 |
|
|
Best results are highlighted in .