| Literature DB >> 35884345 |
Zongqing Ma1,2, Qiaoxue Xie1,2, Pinxue Xie3, Fan Fan1,2, Xinxiao Gao3, Jiang Zhu1,2.
Abstract
Automatic and accurate optical coherence tomography (OCT) image classification is of great significance to computer-assisted diagnosis of retinal disease. In this study, we propose a hybrid ConvNet-Transformer network (HCTNet) and verify the feasibility of a Transformer-based method for retinal OCT image classification. The HCTNet first utilizes a low-level feature extraction module based on the residual dense block to generate low-level features for facilitating the network training. Then, two parallel branches of the Transformer and the ConvNet are designed to exploit the global and local context of the OCT images. Finally, a feature fusion module based on an adaptive re-weighting mechanism is employed to combine the extracted global and local features for predicting the category of OCT images in the testing datasets. The HCTNet combines the advantage of the convolutional neural network in extracting local features and the advantage of the vision Transformer in establishing long-range dependencies. A verification on two public retinal OCT datasets shows that our HCTNet method achieves an overall accuracy of 91.56% and 86.18%, respectively, outperforming the pure ViT and several ConvNet-based classification methods.Entities:
Keywords: convolutional neural network; image classification; optical coherence tomography; vision transformer
Mesh:
Year: 2022 PMID: 35884345 PMCID: PMC9313149 DOI: 10.3390/bios12070542
Source DB: PubMed Journal: Biosensors (Basel) ISSN: 2079-6374
Figure 1Schematic of the spectral domain OCT system.
Figure 2The framework of the proposed HCTNet.
Figure 3The architecture of the residual dense block.
Figure 4The detailed architecture of the encoder module of the T-branch.
Figure 5The illustration of the feature fusion module.
Quantitative comparison results for retinal OCT image classification on the OCT2017 dataset.
| Method | Class | Accuracy (%) | Sensitivity (%) | Precision (%) | OA (%) | OS (%) | OP (%) | Time (ms) |
|---|---|---|---|---|---|---|---|---|
| Transfer learning [ | CNV | 83.86 | 92.64 | 76.52 | 76.26 | 57.34 | 73.47 | 6.31 |
| DME | 89.53 | 36.00 | 74.61 | |||||
| Drusen | 90.13 | 18.56 | 65.22 | |||||
| Normal | 88.99 | 92.18 | 77.55 | |||||
| VGG16 [ | CNV | 92.92 | 91.37 | 92.83 | 86.68 | 79.79 | 81.29 | 1.08 |
| DME | 94.20 | 78.79 | 78.43 | |||||
| Drusen | 92.34 | 55.45 | 65.89 | |||||
| Normal | 93.90 | 93.57 | 88.02 | |||||
| ResNet [ | CNV | 93.74 | 90.92 | 94.92 | 89.87 | 86.11 | 85.82 | 3.92 |
| DME | 95.88 | 85.23 | 84.60 | |||||
| Drusen | 94.36 | 72.21 | 72.74 | |||||
| Normal | 95.75 | 96.08 | 91.01 | |||||
| IFCNN [ | CNV | 93.45 | 91.09 | 94.16 | 88.67 | 83.84 | 84.42 | 1.46 |
| DME | 95.06 | 83.68 | 80.97 | |||||
| Drusen | 93.95 | 65.80 | 72.92 | |||||
| Normal | 94.8 | 94.78 | 89.63 | |||||
| HCTNet | CNV | 94.60 | 92.23 | 95.53 | 91.56 | 88.57 | 88.11 | 3.74 |
| DME | 96.14 | 87.96 | 84.42 | |||||
| Drusen | 95.54 | 77.36 | 79.00 | |||||
| Normal | 96.84 | 96.73 | 93.50 |
Figure 6Confusion matrix generated by the HCTNet.
Statistical analysis (p-value) of the proposed HCTNet compared to other networks.
| Method | OA | OS | OP |
|---|---|---|---|
| HCTNet & Transfer learning [ | < | < | < |
| HCTNet & VGG16 [ | < | < | 0.0002 |
| HCTNet & ResNet [ | 0.0139 | 0.0038 | 0.0363 |
| HCTNet & IFCNN [ | 0.0001 | < | 0.0022 |
Figure 7Examples of classification results predicted by the HCTNet on the OCT2017 dataset. The first row shows the good cases, and the second row is the bad cases. (a) CNV. (b) DME. (c) DRUSEN. (d) NORMAL.
Quantitative comparison results for retinal OCT image classification on the Srinivasan2014 dataset.
| Method | Class | Accuracy (%) | Sensitivity (%) | Precision (%) | OA (%) | OS (%) | OP (%) | Time (ms) |
|---|---|---|---|---|---|---|---|---|
| Transfer learning [ | AMD | 90.90 | 68.37 | 89.40 | 79.41 | 76.25 | 84.01 | 6.82 |
| DME | 81.45 | 76.88 | 79.10 | |||||
| Normal | 86.47 | 83.49 | 83.54 | |||||
| VGG16 [ | AMD | 92.76 | 77.12 | 86.90 | 83.69 | 81.96 | 85.20 | 1.30 |
| DME | 84.83 | 79.76 | 81.23 | |||||
| Normal | 89.79 | 88.99 | 87.45 | |||||
| ResNet [ | AMD | 92.35 | 71.73 | 90.28 | 84.55 | 82.13 | 86.92 | 4.02 |
| DME | 87.48 | 81.41 | 86.12 | |||||
| Normal | 89.28 | 93.26 | 84.36 | |||||
| IFCNN [ | AMD | 92.46 | 71.71 | 92.49 | 84.62 | 81.86 | 87.47 | 1.60 |
| DME | 86.54 | 82.09 | 83.10 | |||||
| Normal | 90.24 | 91.78 | 86.82 | |||||
| HCTNet | AMD | 95.94 | 82.60 | 95.08 | 86.18 | 85.40 | 88.53 | 3.81 |
| DME | 86.61 | 80.22 | 85.29 | |||||
| Normal | 89.81 | 93.39 | 85.22 |
Quantitative comparison results on the noisy and original OCT2017 dataset.
| Datasets | OA (%) | OS (%) | OP (%) |
|---|---|---|---|
| Noisy OCT2017 | 91.52 | 88.57 | 88.20 |
| Original OCT2017 | 91.56 | 88.57 | 88.11 |
Figure 8The impact of different components on classification performance. FF denotes the feature fusion module. (a) The accuracy metric for each independent class. (b) OA, OS, and OP over all classes.