| Literature DB >> 35693545 |
Shu Liang1, Rencan Nie1,2, Jinde Cao3,4, Xue Wang1, Gucheng Zhang1.
Abstract
COVID-19 spreads and contracts people rapidly, to diagnose this disease accurately and timely is essential for quarantine and medical treatment. RT-PCR plays a crucial role in diagnosing the COVID-19, whereas computed tomography (CT) delivers a faster result when combining artificial assistance. Developing a Deep Learning classification model for detecting the COVID-19 through CT images is conducive to assisting doctors in consultation. We proposed a feature complement fusion network (FCF) for detecting COVID-19 through lung CT scan images. This framework can extract both local features and global features by CNN extractor and ViT extractor severally, which successfully complement the deficiency problem of the receptive field of the other. Due to the attention mechanism in our designed feature complement Transformer (FCT), extracted local and global feature embeddings achieve a better representation. We combined a supervised with a weakly supervised strategy to train our model, which can promote CNN to guide the VIT to converge faster. Finally, we got a 99.34% accuracy on our test set, which surpasses the current state-of-art popular classification model. Moreover, this proposed structure can easily extend to other classification tasks when changing other proper extractors.Entities:
Keywords: COVID-19 detecting; Deep Learning; Feature complement fusion; Weakly supervised learning
Year: 2022 PMID: 35693545 PMCID: PMC9167685 DOI: 10.1016/j.asoc.2022.109111
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 8.263
Fig. 1Respective Field: (a) respective field of CNN, (b) respective field of ViT.
Fig. 2The architecture of our proposed FCF-Net includes two feature extractors, a feature fusion block, and a weakly supervised module. A CT scan image passed two various feature extractors to generate 512-D local and global feature embeddings. Feature fusion block concatenates these embeddings and processes them within a feature complement Transformer (FCT), then makes the final prediction. The weakly supervised module first maps the various embedding to 2-D feature. These 2-D features will translate into probability distribution through a softmax layer. Finally, we calculate the KL divergence of these two distributions to optimize the ViT extractor in a weakly supervised way.
Fig. 3FCT structure.
Fig. 4Dataset Configuration.
Fig. 5Samples in COVID-CTset: (a) COVID-19 samples, where red arrows denote the infection area. (b) Normal samples.
Parameter Analysis.
| Para. | Value | Accuracy | Recall | Precision | F1-Score |
|---|---|---|---|---|---|
| 0.4 | 97.81 | 98.25 | 97.39 | 97.82 | |
| 0.5 | 98.68 | 98.25 | 99.12 | 98.68 | |
| 0.6 | 99.56 | ||||
| 0.7 | 97.81 | 96.93 | 98.66 | 97.79 | |
| 0.8 | 97.15 | 96.49 | 97.78 | 97.13 | |
| 1.0 | 98.90 | 97.85 | 98.92 | ||
| 1.2 | 98.46 | 97.81 | 99.11 | 98.45 |
Ablation on FCF-Res model.
| FCT | KL | Accuracy | Recall | Precision | F1-Score |
|---|---|---|---|---|---|
| 97.81 | 97.37 | 98.23 | 97.80 | ||
| ✓ | 98.46 | 98.68 | 98.25 | 98.47 | |
| ✓ | 98.49 | 98.68 | 98.25 | 98.47 | |
| ✓ | ✓ |
Fig. 6The variation of accuracy in the training process: (a) FCF-Res model, (b) FCF-Dense model.
Ablation on FCF-Dense model.
| FCT | KL | Accuracy | Recall | Precision | F1-Score |
|---|---|---|---|---|---|
| 98.46 | 99.12 | 97.84 | 98.47 | ||
| ✓ | 98.90 | 99.12 | 98.69 | 98.91 | |
| ✓ | 98.68 | 97.85 | 98.70 | ||
| ✓ | ✓ | 99.12 |
Compared to Classic SOTA.
| Model type | Model | Params. (M) | FLOPs (G) | Accuracy | Recall | Precision | F1-Score |
|---|---|---|---|---|---|---|---|
| CNN | ResNet 50 | 23.5 | 10.6 | 97.81 | 97.37 | 98.23 | 97.80 |
| ResNet 101 | 42.5 | 20.2 | 98.03 | 98.25 | 97.82 | 98.03 | |
| ResNet 152 | 58.1 | 30.0 | 98.68 | 97.84 | 98.70 | ||
| DenseNet 121 | 6.9 | 7.2 | 98.46 | 99.12 | 97.84 | 98.47 | |
| DenseNet 161 | 26.5 | 20.0 | 98.46 | 97.42 | 98.48 | ||
| DenseNet 169 | 12.5 | 8.6 | 98.90 | 99.12 | 98.69 | 98.91 | |
| DenseNet 201 | 18.1 | 11.2 | 98.68 | 97.84 | 98.70 | ||
| ViT | ViT-b | 15.8 | 8.2 | 81.36 | 88.16 | 77.60 | 82.55 |
| T2T-ViT-5 | 10.9 | 6.0 | 94.74 | 92.11 | 97.22 | 94.59 | |
| Feature Fusion | FPN(ResNet 50) | 26.3 | 17.8 | 98.03 | 97.81 | 98.24 | 98.02 |
| UNet | 17.2 | 80.0 | |||||
| Feature Complement Fusion | FCF-Res(ours) | 40.0 | 16.6 | 99.13 | |||
| FCF-Dense(ours) | 24.2 | 13.2 | 99.12 | 99.56 | |||
Fig. 7Confusion matrix of FCF and their source feature extractor.
Fig. 8ROC of FCF and their source feature extractor.
Fig. 9ROC of FCF and other feature fusion methods.
Accuracy performance of COVID-19 SOTA methods on COVID-CTset [48], iCTCF [52] and MosMedData [53].
| Method | Params. (M) | FLOPs (G) | COVID-CTset | iCTCF | MosMedData |
|---|---|---|---|---|---|
| ADLN | 26.2 | 17.8 | 96.05 | 97.56 | 72.40 |
| InstaCovNet-19 | 165.8 | 69.2 | 78.73 | ||
| COVID-Net CT | 15.2 | 0.9 | 78.82 | 97.00 | 66.86 |
| FCF-Res(ours) | 40.0 | 16.6 | 98.19 |
Fig. 10Grad-CAM on comparison methods.