| Literature DB >> 34909050 |
Pak Kin Wong1, Tao Yan2, Huaqiao Wang3, In Neng Chan1, Jiangtao Wang4, Yang Li3, Hao Ren4, Chi Hong Wong5.
Abstract
The quick and precise identification of COVID-19 pneumonia, non-COVID-19 viral pneumonia, bacterial pneumonia, mycoplasma pneumonia, and normal lung on chest CT images play a crucial role in timely quarantine and medical treatment. However, manual identification is subject to potential misinterpretations and time-consumption issues owing the visual similarities of pneumonia lesions. In this study, we propose a novel multi-scale attention network (MSANet) based on a bag of advanced deep learning techniques for the automatic classification of COVID-19 and multiple types of pneumonia. The proposed method can automatically pay attention to discriminative information and multi-scale features of pneumonia lesions for better classification. The experimental results show that the proposed MSANet can achieve an overall precision of 97.31%, recall of 96.18%, F1-score of 96.71%, accuracy of 97.46%, and macro-average area under the receiver operating characteristic curve (AUC) of 0.9981 to distinguish between multiple classes of pneumonia. These promising results indicate that the proposed method can significantly assist physicians and radiologists in medical diagnosis. The dataset is publicly available at https://doi.org/10.17632/rf8x3wp6ss.1.Entities:
Keywords: Attention mechanism; COVID-19; Chest computed tomography; Multi-scale convolution neural network; Pneumonia identification
Year: 2021 PMID: 34909050 PMCID: PMC8660060 DOI: 10.1016/j.bspc.2021.103415
Source DB: PubMed Journal: Biomed Signal Process Control ISSN: 1746-8094 Impact factor: 3.880
Fig. 1Representative CT images of different types of pneumonia (The yellow arrows indicate pneumonia lesions in the right and/or left lobes): (a) COVID-19, (b) non-COVID-19 VP, (c) BP, (d) MP.
Characteristics of enrolled patients and images.
| Class | Training set (∼60%) | Validation set (∼20%) | Test set (∼20%) | |||
|---|---|---|---|---|---|---|
| No. of patients | No. of images | No. of patients | No. of images | No. of patients | No. of images | |
| COVID-19 | 123 | 6585 | 41 | 2067 | 42 | 2035 |
| non-COVID-19 VP | 36 | 2320 | 12 | 853 | 12 | 844 |
| BP | 96 | 4415 | 32 | 1707 | 32 | 1644 |
| MP | 54 | 1792 | 18 | 867 | 18 | 784 |
| Normal | 156 | 7298 | 50 | 2265 | 50 | 2103 |
| Total | 465 | 22,410 | 153 | 7759 | 154 | 7410 |
Comparison of CCAP dataset and other open-source datasets.
| Paper | 2D/3D dataset | Dataset statistics | Classification statics** | Format | Accuracy (%)† | Accuracy (%)‡ | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. of patients | No. of scans | No. of slices | COVID-19 | non-COVID-19 | ||||||||
| CAP* | Normal | |||||||||||
| VP | BP | MP | ||||||||||
| 2D | 120 | – | 2482 | 1252 | 1230 | PNG | 97.38 | 97.61 | ||||
| 2D | 1130 | – | 17,104 | 7593 | 2618 | 6893 | PNG | 95.31 | 96.18 | |||
| 2D | 104 | – | 19,685 | 4001 | 9979 | JPEG | 92.18 | 95.91 | ||||
| 2D/3D | 3777 | 6752 | 617,775 | 21,872 | 36,894 | – | PNG | – | 92.31 | |||
| 2D/3D | 377 | 377 | 63,849 | 2282 | – | 9776 | TIFF | 98.49 | 98.76 | |||
| CCAP | 2D/3D | 772 | 772 | 37,579 | 10,687 | 4017 | 7766 | 3443 | 11,666 | JPG | – | 97.46 |
Fig. 2Architecture of proposed MSANet.
Fig. 3Typical CT images segmented by lung segmentation module: (a) COVID-19, (b) non-COVID-19 VP, (c) BP, (d) MP, (e) Normal.
Fig. 4Architecture of proposed spatial attention block.
Evaluation metrics.
| Metrics | Calculation equations |
|---|---|
| Accuracy | |
| Recall | |
| Precision | |
| F1-score |
Effect of loss function on class-wise performance.
| Method | class | Precision (%) | Recall (%) | F1-score (%) | AUC |
|---|---|---|---|---|---|
| CNN1 learner with | COVID-19 | 94.29 | 91.74 | 93.00 | 0.9934 |
| non-COVID-19 VP | 63.39 | 38.98 | 48.28 | 0.9204 | |
| BP | 75.23 | 93.31 | 83.30 | 0.9827 | |
| MP | 91.95 | 72.83 | 81.28 | 0.9842 | |
| Normal | 89.56 | 95.86 | 92.60 | 0.9945 | |
| CNN1 learner with | COVID-19 | 94.62 | 94.20 | 94.20 | 0.9962 |
| non-COVID-19 VP | 66.44 | 68.96 | 67.67 | 0.9591 | |
| BP | 88.12 | 89.29 | 88.70 | 0.9898 | |
| MP | 86.52 | 72.83 | 79.09 | 0.9831 | |
| Normal | 90.33 | 93.72 | 92.00 | 0.9898 |
Note: CNN1 learner is ResNet101 as the backbone with an input size of 256 × 256. denotes conventional cross-entropy loss. denotes multi-class focal loss.
Comparison of different backbone networks.
| Backbone network | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) | macro-average AUC |
|---|---|---|---|---|---|
| Baseline (ResNet101) | 85.20 | 83.80 | 84.37 | 87.84 | 0.9837 |
| Xception | 90.13 | 88.64 | 89.24 | 91.73 | |
| VGG16 | 73.58 | 72.00 | 72.11 | 76.83 | 0.9264 |
| InceptionV3 | 90.61 | 89.60 | 90.03 | 92.05 | 0.9849 |
| MobileNetV2 | 88.08 | 87.24 | 87.44 | 90.97 | 0.9898 |
| DenseNet121 | 91.39 | 88.67 | 89.71 | 92.54 | |
| EfficientNetB0 | 0.9887 |
Note: Bold denotes the best.
Details and average performance of the proposed methods on the test dataset.
| Model | Precision (%) | Recall (%) | F1-score (%) | Accuracy (%) | macro-average AUC | Training time* (s) | Test time** (s) |
|---|---|---|---|---|---|---|---|
| CNN0 | 94.78 | 92.80 | 93.64 | 95.51 | 0.9975 | 26,664 | 569 |
| CNN0 without SAB | 93.67 | 92.89 | 93.21 | 95.53 | 0.9971 | 25,806 | 508 |
| CNN1 | 92.03 | 90.07 | 90.86 | 93.44 | 0.9887 | 7546 | 169 |
| CNN1 without SAB | 91.87 | 88.83 | 90.08 | 93.06 | 0.9901 | 7040 | 161 |
| CNN2 | 83.04 | 79.92 | 81.16 | 84.94 | 0.9671 | 3806 | 96 |
| CNN2 without SAB | 82.75 | 80.11 | 81.18 | 84.25 | 0.9636 | ||
| MSANet | 39,727 | 917 | |||||
| MSANet without SABs | 95.28 | 92.58 | 93.69 | 95.60 | 0.9970 | 37,599 | 832 |
Note: SAB denotes spatial attention block. Bold means the best. *Training time refers to the total time spent for training 20 epochs on the training set and verification set in Table 1. **Test time represents the total time spent for judging 7410 test images in Table 1.
Fig. 5Confusion matrices of different CNN configurations: (a) Confusion matrix of CNN0 learner, (b) Confusion matrix of CNN1 learner, (c) Confusion matrix of CNN2 learner, (d) Confusion matrix of MSANet.
Fig. 6ROC curves and AUCs obtained using MSANet.