| Literature DB >> 35910403 |
Kumar T Rajamani1, Priya Rani2, Hanna Siebert3, Rajkumar ElagiriRamalingam4, Mattias P Heinrich3.
Abstract
Deep learning-based image segmentation models rely strongly on capturing sufficient spatial context without requiring complex models that are hard to train with limited labeled data. For COVID-19 infection segmentation on CT images, training data are currently scarce. Attention models, in particular the most recent self-attention methods, have shown to help gather contextual information within deep networks and benefit semantic segmentation tasks. The recent attention-augmented convolution model aims to capture long range interactions by concatenating self-attention and convolution feature maps. This work proposes a novel attention-augmented convolution U-Net (AA-U-Net) that enables a more accurate spatial aggregation of contextual information by integrating attention-augmented convolution in the bottleneck of an encoder-decoder segmentation architecture. A deep segmentation network (U-Net) with this attention mechanism significantly improves the performance of semantic segmentation tasks on challenging COVID-19 lesion segmentation. The validation experiments show that the performance gain of the attention-augmented U-Net comes from their ability to capture dynamic and precise (wider) attention context. The AA-U-Net achieves Dice scores of 72.3% and 61.4% for ground-glass opacity and consolidation lesions for COVID-19 segmentation and improves the accuracy by 4.2% points against a baseline U-Net and 3.09% points compared to a baseline U-Net with matched parameters. Supplementary Information: The online version contains supplementary material available at 10.1007/s11760-022-02302-3.Entities:
Keywords: Attention mechanism; Attention-augmented convolution; COVID-19; Consolidation; Ground-glass opacities; Segmentation; U-Net
Year: 2022 PMID: 35910403 PMCID: PMC9311338 DOI: 10.1007/s11760-022-02302-3
Source DB: PubMed Journal: Signal Image Video Process ISSN: 1863-1703 Impact factor: 1.583
Fig. 1Sample slice from one of the dataset and the corresponding ground-glass opacity lesion (GGO) marking in first row and GGO and consolidation lesion marking in second row. Dataset from website [22]
Fig. 2A block diagram of the proposed attention-augmented U-Net (AA-U-Net). Input image is progressively filtered and downsampled by factor 2 at each scale in the encoding part. The attention-augmented convolution is inserted as an extension of the U-Net’s bottleneck in order to capture contextual information from only the necessary and meaningful non-local contextual information in smart and efficient way
Performance (averaged) of infection regions on COVID-19 datasets
| Model | Fold | Dice | Sen. | Spec. | MAE |
|---|---|---|---|---|---|
| Inf-Net [ | 0.682 | 0.692 | 0.943 | 0.082 | |
| Semi-Inf-Net [ | 0.739 | 0.725 | 0.960 | 0.064 | |
| 0.809 | 0.876 | 0.990 | 0.0192 | ||
| 0.798 | 0.781 | 0.888 | 0.986 | 0.0258 | |
| 0.735 | 0.850 | 0.981 | 0.0357 | ||
| 0.814 | 0.889 | 0.989 | 0.0185 | ||
| 0.808 | 0.872 | 0.988 | 0.0240 | ||
| 0.750 | 0.825 | 0.985 | 0.0318 | ||
| 0.800 | 0.879 | 0.989 | 0.0208 | ||
| U-Net | 0.787 | 0.776 | 0.887 | 0.985 | 0.0274 |
| 0.740 | 0.823 | 0.984 | 0.0331 | ||
| 0.806 | 0.883 | 0.989 | 0.02005 | ||
| U-Net(1070K) | 0.780 | 0.844 | 0.990 | ||
| 0.7339 | 0.836 | 0.982 | 0.035 | ||
| 0.878 | 0.990 | ||||
| 0.791 | 0.876 | 0.986 | 0.026 | ||
| 0.832 | 0.984 |
The data have been split into threefold, and the results have been averaged over multiple runs for each fold. These are quantitative results of infection regions computed fold-wise, with their 3D Dice scores
The bold represents the best results produced by our proposed method. The bold italics represent the results produced by the models used by the authors in their earlier work
Fig. 3Visual comparison of multi-class lung segmentation results, where the red and green labels indicate the GGO and Consolidation, respectively
Performance (averaged) on nine real CT patient data
| Model | Dice | Sen. | Spec. | MAE | % Gain |
|---|---|---|---|---|---|
| Inf-Net [ | 0.579 | 0.87 | 0.974 | 0.047 | |
| 0.8840 | 0.9915 | 0.0135 | |||
| U-Net | 0.7515 | 0.8811 | 0.9904 | 0.0149 | |
| 0.87 |
These are quantitative results of infection regions computed patient-wise, with their 3D Dice scores. The best results are shown in bold font and the Gain with respect to baseline UNet is shown in italics
The bold italics represent the results produced by the models used by the authors in their earlier work
Quantitative results of ground-glass opacities and consolidation
| Model | GGO | Consol. | Avg | % | # |
|---|---|---|---|---|---|
| InfNet + FCN | 0.646 | 0.301 | 0.474 | 33.1M | |
| InfNet + MC [ | 0.624 | 0.458 | 0.541 | 33.1M | |
| 0.723 | 0.596 | 0.660 | 847K | ||
| 0.613 | 849K | ||||
| U-Net | 0.717 | 0.566 | 0.641 | 611K | |
| U-Net(1070K) | 0.722 | 0.574 | 0.648 | 1.10 | 1070K |
| 982.7K |
The results have been averaged across multiple folds and multiple runs. The best results are shown in bold font
The bold represent the results produced by the models used by the authors in their current work. The bolditalics represent the results produced by the models used by the authors in their earlier work
Quantitative results of ground-glass opacities and consolidation
| Model | GGO | %Gain | Consol | %Gain |
|---|---|---|---|---|
| InfNet + FCN8s [ | 0.646 | 0.301 | ||
| InfNet + MC [ | 0.624 | 0.458 | ||
| U-Net | 0.7167 | 0.57 | 0.5661 | |
| U-Net(1070K) | 0.7221 | 0.60 | 0.5748 | |
The results have been shown across threefold and averaged over multiple runs. The best results are shown in bold font, and the %Gain with respect to baseline UNet is shown in italics
Performance averaged across nine real CT patient data
| Mean Pat. | GGO | % | Cons. | % | Avg. | % |
|---|---|---|---|---|---|---|
| UNet | 0.683 | 0.651 | 0.671 | |||
| UNet+CCA | 0.679 | 0.666 | 0.674 | |||
These are quantitative results of multi-label regions computed patient-wise, with their 3D Dice scores