| Literature DB >> 35068591 |
R Karthik1, R Menaka1, Hariharan M2, Daehan Won3.
Abstract
Accurate detection of COVID-19 is one of the challenging research topics in today's healthcare sector to control the coronavirus pandemic. Automatic data-powered insights for COVID-19 localization from medical imaging modality like chest CT scan tremendously augment clinical care assistance. In this research, a Contour-aware Attention Decoder CNN has been proposed to precisely segment COVID-19 infected tissues in a very effective way. It introduces a novel attention scheme to extract boundary, shape cues from CT contours and leverage these features in refining the infected areas. For every decoded pixel, the attention module harvests contextual information in its spatial neighborhood from the contour feature maps. As a result of incorporating such rich structural details into decoding via dense attention, the CNN is able to capture even intricate morphological details. The decoder is also augmented with a Cross Context Attention Fusion Upsampling to robustly reconstruct deep semantic features back to high-resolution segmentation map. It employs a novel pixel-precise attention model that draws relevant encoder features to aid in effective upsampling. The proposed CNN was evaluated on 3D scans from MosMedData and Jun Ma benchmarked datasets. It achieved state-of-the-art performance with a high dice similarity coefficient of 85.43% and a recall of 88.10%.Entities:
Keywords: Attention; COVID-19; Decoder, CNN; Deep learning; Segmentation
Year: 2022 PMID: 35068591 PMCID: PMC8767763 DOI: 10.1016/j.patcog.2022.108538
Source DB: PubMed Journal: Pattern Recognit ISSN: 0031-3203 Impact factor: 7.740
Fig. 1Architectural diagram of the proposed CNN.
Fig. 2Architectural overview of the multi-Kernel encoding module.
Fig. 3Schematic diagram of cross context attention fusion upsampler.
Fig. 4Processing pipeline for connected contour regions extraction from chest CT.
Fig. 5Schematic diagram of the proposed CPA decoder.
Fig. 6Pixelwise attention module.
3D CT scan datasets curated from different sources.
| S.No | Source | Details | Number of COVID-19 labeled 3D scans | Average number of slices per scan | Average 2D resolution |
|---|---|---|---|---|---|
| 1 | Jun Ma benchmark dataset | 3D scans from Corona cases Initiative | 10 | 258 | 550 × 550 |
| Annotated scans from Radiopaedia | 10 | 94 | 550 × 550 | ||
| 2 | MosMedData | Municipal hospitals in Moscow | 50 | 41 | 512 |
Comparison of various upsampling schemes for semantic segmentation. The results are presented both on the individual MosMeddata and Jun Ma datasets as well as on the combined set. The same test set partition is used to test the model.
| S. No. | Method | Jun Ma dataset | MosMedData | Combined dataset | |||
|---|---|---|---|---|---|---|---|
| DSC | IoU | DSC | IoU | DSC | IoU | ||
| 1 | Bilinear upsampling | 70.49 | 58.80 | 63.54 | 52.17 | 66.27 | 53.15 |
| 2 | Sub-pixel shuffling dense upsample | 73.52 | 59.69 | 67.83 | 55.47 | 69.12 | 57.77 |
| 3 | Global Attention Upsample | 71.63 | 57.56 | 70.97 | 58.38 | 71.35 | 60.01 |
| 4 | Attention-guided dense-upsampling | 78.17 | 66.32 | 73.02 | 58.97 | 74.86 | 61.88 |
| 5 | Data-dependent Upsampling | 77.31 | 64.80 | 74.95 | 58.62 | 75.92 | 63.12 |
| 6 | Proposed CCAF upsampler CNN | 80.43 | 69.87 | 75.19 | 65.30 | 77.67 | 65.79 |
Analysis of the proposed CPA decoder with existing decoder architectures used for semantic segmentation on datasets described in Section 4.1. To ensure fair comparison the models were trained and tested on the same data partitions.
| S.No. | Method | Jun Ma dataset | MosMedData | Combined dataset | |||
|---|---|---|---|---|---|---|---|
| DSC | IoU | DSC | IoU | DSC | IoU | ||
| 1 | Point-wise attention decoder | 75.19 | 64.35 | 72.77 | 62.08 | 73.77 | 62.98 |
| 2 | Stride spatial pyramid pooling and dual attention decoder | 79.33 | 67.29 | 74.14 | 65.24 | 76.25 | 67.85 |
| 3 | Cross-granular attention decoder | 78.13 | 69.81 | 83.85 | 73.19 | 78.89 | 68.65 |
| 4 | Proposed CPA decoder | 82.63 | 72.20 | 83.49 | 72.78 | 80.12 | 70.42 |
Observations of the Ablation studies. The results are grouped dataset-wise. The proposed models were trained and tested under each dataset.
| S No | Dataset | Method | DSC | IoU | Precision | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|---|
| 1 | Jun Ma dataset | CCAF upsampler CNN | 80.43 | 69.87 | 78.85 | 82.46 | 99.75 | 77.00 |
| CPA decoder CNN | 82.63 | 72.20 | 80.26 | 96.06 | 99.76 | 81.69 | ||
| Proposed CNN | 88.01 | 75.03 | 85.57 | 90.05 | 99.77 | 85.03 | ||
| 2 | MosMedData | CCAF upsampler CNN | 75.19 | 65.30 | 73.11 | 77.32 | 99.70 | 77.34 |
| CPA decoder CNN | 83.49 | 72.78 | 84.66 | 82.19 | 99.75 | 83.26 | ||
| Proposed CNN | 83.71 | 71.51 | 82.43 | 84.58 | 99.75 | 82.21 | ||
| 3 | Combined dataset | CCAF upsampler CNN | 77.67 | 62.65 | 76.32 | 79.21 | 99.72 | 75.60 |
| CPA decoder CNN | 80.12 | 68.70 | 80.96 | 79.39 | 99.79 | 78.69 | ||
| Proposed CNN | 85.43 | 73.44 | 81.23 | 89.88 | 99.77 | 84.61 |
Learning curves showing epoch-wise trends in the decay of cross-entropy loss and evolution of DSC. Additionally, the PR curve recorded on the validation set is provided.
Experimental observations of model training, validation, and testing evaluated on Dice and IoU scores. Additional runtime analysis including Inference times, Number of learnable parameters and number of floating-point operations (FLOPs) are also computed for the ablation models.
| S.No | Experiment | Dice-coefficient (%) | Mean IoU (%) | Inference Time (milliseconds/image) | Number of Parameters (in millions) | Giga FLOPs | ||||
|---|---|---|---|---|---|---|---|---|---|---|
| Training | Validation | Testing | Training | Validation | Testing | |||||
| 1 | CCAF upsampler CNN | 81.68 | 80.11 | 77.67 | 66.52 | 65.99 | 62.65 | 20.47 | 18.82 | 7.04 |
| 2 | CPA Decoder CNN | 85.42 | 83.76 | 80.12 | 73.19 | 71.61 | 66.70 | 26..95 | 22.45 | 8.55 |
| 3 | Proposed CNN | 88.01 | 87.54 | 85.43 | 77.57 | 76.73 | 73.44 | 38.24 | 30.51 | 13.78 |
Visual comparison of COVID-19 segmentations results from different experiments.
Fig. 7Effectiveness of the proposed modules in improving segmentation performance for small and large infection regions (in terms of surface area).
Quantitative comparison of the proposed attention model with state-of-the-art attention models for semantic segmentation. The experiments are grouped by the dataset. Results are shown for both individual Jun Ma and Mosmed data, also on the set formed by combining these two sources.
| S No | Dataset | Method | DSC | IoU | Precision | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|---|
| 1 | Jun Ma dataset | FocusNet | 75.67 | 66.38 | 73.64 | 77.17 | 99.67 | 73.45 |
| Dual Attention Network | 80.15 | 70.61 | 77.82 | 81.49 | 99.72 | 79.09 | ||
| Asymmetric Non-local networks | 81.12 | 71.78 | 80.16 | 82.08 | 99.73 | 82.03 | ||
| Multi-scale self-guided attention | 86.67 | 75.31 | 88.42 | 84.05 | 99.75 | 84.45 | ||
| Criss Cross Attention | 85.58 | 74.60 | 82.84 | 88.12 | 99.75 | 83.21 | ||
| Semi Inf Net | 88.45 | 76.07 | 90.47 | 85.11 | 99.78 | 86.55 | ||
| Proposed CNN | 88.01 | 75.03 | 85.57 | 90.05 | 99.77 | 86.74 | ||
| 2 | MosMedData | FocusNet | 73.49 | 63.23 | 71.22 | 75.88 | 99.70 | 71.54 |
| Dual Attention Network | 75.02 | 61.00 | 74.82 | 75.70 | 99.71 | 72.10 | ||
| Asymmetric Non-local networks | 82.17 | 69.19 | 83.25 | 80.67 | 99.74 | 81.67 | ||
| Multi-scale self-guided attention | 80.97 | 68.78 | 80.24 | 81.33 | 99.72 | 77.34 | ||
| Criss Cross Attention | 82.32 | 70.05 | 84.68 | 80.92 | 99.74 | 80.64 | ||
| Semi Inf Net | 83.23 | 72.55 | 85.76 | 79.61 | 99.74 | 82.50 | ||
| Proposed CNN | 83.71 | 71.51 | 82.43 | 84.58 | 99.75 | 81.49 | ||
| 3 | Combined dataset | FocusNet | 73.81 | 62.13 | 68.41 | 80.15 | 99.71 | 71.95 |
| Dual Attention Network | 77.39 | 64.16 | 74.59 | 80.42 | 99.68 | 76.23 | ||
| Asymmetric Non-local networks | 81.96 | 66.08 | 80.25 | 83.74 | 99.72 | 78.76 | ||
| Multi-scale self-guided attention | 82.05 | 71.17 | 79.47 | 84.79 | 99.75 | 80.49 | ||
| Criss Cross Attention | 83.85 | 72.54 | 79.68 | 88.47 | 99.73 | 82.75 | ||
| Semi Inf Net | 84.56 | 72.32 | 80.50 | 89.05 | 99.74 | 83.71 | ||
| Proposed CNN | 85.43 | 73.44 | 81.23 | 89.88 | 99.74 | 84.57 |
Runtime analysis of the attention-based CNN models considered for comparison in Table 7. To maximum possible extent, in most experiments the backbone was uniformly chosen to be ResNet50 in order to enable comparison of different attention approaches on top of the same CNN.
| S. No. | Method | Backbone | Inference time (milliseconds/image) | Number of Parameters (in millions) | Giga FLOPs |
|---|---|---|---|---|---|
| 1 | FocusNet | SE-Net50 | 12.38 | 26.82 | 2.74 |
| 2 | Dual Attention Network | ResNet50 | 35.45 | 49.51 | 14.27 |
| 3 | Asymmetric Non-local networks | ResNet50 | 52.78 | 44.04 | 12.57 |
| 4 | Multi-scale self-guided attention | ResNet50 | 60.73 | 38.78 | 10.19 |
| 5 | Criss Cross Attention | ResNet50 | 25.14 | 28.18 | 6.32 |
| 6 | Semi Inf Net | Res2Net | 44.23 | 33.12 | 7.36 |
| 7 | Proposed CNN | Inception-ResNet-V2 based MKE module | 38.24 | 30.51 | 13.78 |
Performance comparison of the proposed work against state-of-the-art segmentation methods. All the models were freshly instantiated and run on the individual datasets listed in Table 1, also on the combined set. The results are grouped dataset-wise.
| S No | Dataset | Method | DSC | IoU | Precision | Sensitivity | Specificity | AUC |
|---|---|---|---|---|---|---|---|---|
| 1 | Jun Ma dataset | U-Net | 68.98 | 52.96 | 65.38 | 72.24 | 99.50 | 64.72 |
| Attention U-net | 72.45 | 58.32 | 70.66 | 74.34 | 99.67 | 70.76 | ||
| R2U Net | 77.14 | 64.91 | 79.12 | 74.78 | 99.76 | 76.13 | ||
| FCN8s (ResNet50 backbone) | 75.56 | 63.65 | 75.32 | 74.81 | 99.70 | 73.49 | ||
| Wang et al. | 78.98 | 65.81 | 74.45 | 83.71 | 99.77 | 77.01 | ||
| DeepLabV3 (ResNet50 backbone) | 77.41 | 64.37 | 75.09 | 78.91 | 99.75 | 77.19 | ||
| Link Net | 79.09 | 65.99 | 75.76 | 81.92 | 99.78 | 77.45 | ||
| PSPNet | 84.56 | 72.38 | 85.90 | 83.31 | 99.82 | 80.33 | ||
| Proposed CNN | 88.01 | 75.03 | 85.57 | 90.05 | 99.77 | 86.17 | ||
| 2 | MosMedData | U-Net | 62.77 | 48.79 | 60.23 | 64.73 | 99.53 | 60.81 |
| Attention U-net | 70.41 | 55.08 | 75.86 | 65.13 | 99.61 | 68.39 | ||
| R2U Net | 70.11 | 57.34 | 69.51 | 70.47 | 99.69 | 69.63 | ||
| FCN8s (ResNet50 backbone) | 71.36 | 58.47 | 78.44 | 65.33 | 99.65 | 69.23 | ||
| Wang et al. | 73.99 | 60.00 | 75.57 | 72.33 | 99.70 | 72.72 | ||
| DeepLabV3 (ResNet50 backbone) | 76.46 | 62.80 | 77.21 | 75.37 | 99.74 | 74.22 | ||
| Link Net | 76.94 | 64.15 | 74.83 | 79.10 | 99.76 | 74.51 | ||
| PSPNet | 79.29 | 64.80 | 76.78 | 81.44 | 99.78 | 78.10 | ||
| Proposed CNN | 83.71 | 71.51 | 82.43 | 84.58 | 99.82 | 81.18 | ||
| 3 | Combined dataset | U-Net | 65.00 | 51.91 | 61.08 | 69.48 | 99.44 | 60.40 |
| Attention U-net | 71.76 | 55.52 | 65.09 | 79.50 | 99.46 | 66.97 | ||
| R2U Net | 72.18 | 57.83 | 77.27 | 67.79 | 99.70 | 71.34 | ||
| FCN8s (ResNet50 backbone) | 73.24 | 60.28 | 77.13 | 69.95 | 99.89 | 73.10 | ||
| Wang et al. | 75.31 | 65.82 | 73.60 | 77.10 | 99.69 | 76.15 | ||
| DeepLabV3 (ResNet50 backbone) | 76.78 | 63.47 | 74.50 | 79.17 | 99.71 | 75.56 | ||
| Link Net | 77.45 | 65.36 | 81.67 | 73.65 | 99.76 | 76.51 | ||
| PSPNet | 81.32 | 67.38 | 81.12 | 81.50 | 99.75 | 79.97 | ||
| Proposed CNN | 85.43 | 73.44 | 81.23 | 89.88 | 99.74 | 84.57 |
Inference time analysis of the CNN models compared in Table 10 recorded on NVIDIA Tesla K80 GPUs. In all the compared methods, ResNet50 was used as the common backbone.
| S. No. | Method | Backbone | Inference time (milliseconds/image) | Parameters (in millions) | Giga FLOPs |
|---|---|---|---|---|---|
| 1 | U-Net | ResNet50 encoder | 34.37 | 32.51 | 10.56 |
| 2 | Attention U-net | ResNet50 encoder | 47.12 | 36.07 | 12.45 |
| 3 | R2U Net | ResNet50 | 57.80 | 27.31 | 15.80 |
| 4 | FCN8s | ResNet50 | 25.54 | 26.10 | 7.71 |
| 5 | DeCovNet | ResNet50 | 20.25 | 22.12 | 6.36 |
| 6 | DeepLabV3 | ResNet50 | 62.09 | 39.62 | 40.71 |
| 7 | Link Net | ResNet50 | 27.25 | 31.17 | 10.63 |
| 8 | PSPNet | ResNet50 | 11.05 | 24.29 | 2.83 |
| 9 | Proposed CNN | Inception-ResNet-V2 based MKE module | 38.24 | 30.51 | 13.78 |