| Literature DB >> 35002666 |
Zhuang Ai1, Xuan Huang2,3, Yuan Fan1, Jing Feng2, Fanxin Zeng4, Yaping Lu1.
Abstract
Diabetic retinopathy (DR) is one of the common chronic complications of diabetes and the most common blinding eye disease. If not treated in time, it might lead to visual impairment and even blindness in severe cases. Therefore, this article proposes an algorithm for detecting diabetic retinopathy based on deep ensemble learning and attention mechanism. First, image samples were preprocessed and enhanced to obtain high quality image data. Second, in order to improve the adaptability and accuracy of the detection algorithm, we constructed a holistic detection model DR-IIXRN, which consists of Inception V3, InceptionResNet V2, Xception, ResNeXt101, and NASNetLarge. For each base classifier, we modified the network model using transfer learning, fine-tuning, and attention mechanisms to improve its ability to detect DR. Finally, a weighted voting algorithm was used to determine which category (normal, mild, moderate, severe, or proliferative DR) the images belonged to. We also tuned the trained network model on the hospital data, and the real test samples in the hospital also confirmed the advantages of the algorithm in the detection of the diabetic retina. Experiments show that compared with the traditional single network model detection algorithm, the auc, accuracy, and recall rate of the proposed method are improved to 95, 92, and 92%, respectively, which proves the adaptability and correctness of the proposed method.Entities:
Keywords: attention mechanism; deep learning; diabetic retinopathy; ensemble learning; image processing
Year: 2021 PMID: 35002666 PMCID: PMC8740273 DOI: 10.3389/fninf.2021.778552
Source DB: PubMed Journal: Front Neuroinform ISSN: 1662-5196 Impact factor: 4.081
Difference between dichotomy and multiclassification in diabetic retinopathy (DR).
|
|
|
|
|---|---|---|
| Dichotomy | High accuracy | Prone to under or overtreatment |
| Multiclassification | Can give doctors more accuracy staging of the disease to get the optimal treatment plan | Small differences in images between different levels of the disease, average accuracy |
Figure 1Flow chart of system architecture.
Experimental structure
| Input: |
| Output: Test set prediction category for each test set sample |
|
1: 2: The 3: 4: 5: Calculate the weight value of each base classifier.
In this formula, q represents the number of base classifiers, n is the parameter that represents the gap between the good and bad algorithms. 6: Calculate the final probability values for each category in the test set sample.
Where i represents each base classifier. 7: Test Set Selection Process In the test, the category with the largest probability value was selected to determine the category |
Image preprocessing
|
1: Define the list of stored images after preprocessing: 2: 3: Cut pixels in an image where the pixel values in the entire row or column are all below 7. 4: Get the smaller value of image height and image width. 5: The center point of the image is taken as the center of the circle, and the smaller value of height and width is taken as the diameter to determine a circle. All pixel values in the non-circular region are replaced with 0. 6: Cut pixels in an image where the pixel values in the entire row or column are all below 7. 7: Scale Image to 299. 8: Add the Image to the 9: 10: |
Image enhancement
|
1: Define the enhanced storage list of images: 2: Get a set of images for each category in 3: 4: 5: Calculate the difference between the sample sizes of category 6: According to 7: 8: Left-right symmetrical transformation, up-down symmetric transformation, and random-angle rotation transformation on image to stack 9: Add the stack image to the 10: 11: 12: |
Calculate the base classifier F1 value
|
1: Defines a list of base classifier F1 values: 2: Define a list of base classifier models: 3: Define the list of initial probability values for the test set sample: 4: 5: Remove the top layer in base_classifier, load the weight parameters of the corresponding model in “imageNet,” and get the model:base_model. 6: Add the CBAM attention mechanism module after base_model. 7: Add the classification model output layers module after base_model. Get the model:model. 8: The training set 9: Remove the limitation in base_model that the training parameters remain the same, and re-train again. The model is added to 10: 11: 12: The validation set 13: The test set 14: 15: |
Figure 2Datasets. (A) Datasets sample distribution, (B) Set sample distribution, and (C) Example of each category.
Figure 3Image processing schematic diagram of each stage. (A) Original fundus image. (B) Image obtained by deleting entire rows and columns of pixels with a pixel value less than 7. (C) Take the center point of B as the center of the circle, and the smaller value between the height and the width as the diameter to determine the circle and then fill the area outside the circle with pixel values as 0. (D) On the basis of C, the image obtained after deleting entire rows and columns of pixels with a pixel value less than 7. (E) Zooms the image to the specified size. (F) Image obtained by three image transformations on the basis of E. (G) Take the center point of F as the center of the circle, and the smaller value between the height and the width as the diameter to determine the circle and then fill the area outside the circle with pixel values as 0.
Upsampling image number information.
|
|
|
|
|
|---|---|---|---|
|
|
|
| |
| 0 | 15,472 | 0 | 15,472 |
| 1 | 1,445 | 9 | 14,450 |
| 2 | 3,183 | 3 | 12,732 |
| 3 | 553 | 26 | 14,931 |
| 4 | 421 | 35 | 15,156 |
Figure 4Model modification process. (A) Diagram of attention mechanism network structure, (B) Model training process.
Figure 6Comparison of experimental results. (A) Influence of different image preprocessing schemes on F1 value. (B) Comparison of the performance of DR-IIXRN with Zhao and Bravo proposed algorithms. (C) Compared the performance of DR-IIXRN with Wu proposed algorithms. (D) Comparison of the performance of DR-IIXRN with the five base classifiers. (E) Influence of n value in formula 10 between 10 and 100 on evaluation index. (F) Comparison of the performance of DR-IIXRN with Pratt proposed algorithms. (G) Influence of n value in formula 10 between 1 and 10 on evaluation index.
The influence of different cases on evaluation indexes.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| A | No DR | 0.74 | 1 | 0.85 | 5,175 |
| Mild DR | 0 | 0 | 0 | 493 | |
| Moderate DR | 0 | 0 | 0 | 1,049 | |
| Severe DR | 0 | 0 | 0 | 160 | |
| PDR | 0 | 0 | 0 | 149 | |
| B | No DR | 0.79 | 0.23 | 0.36 | 5,175 |
| Mild DR | 0.08 | 0.46 | 0.14 | 493 | |
| Moderate DR | 0.14 | 0.01 | 0.02 | 1,049 | |
| Severe DR | 0.03 | 0.19 | 0.06 | 160 | |
| PDR | 0.05 | 0.62 | 0.1 | 149 | |
| C | No DR | 0.75 | 0.97 | 0.85 | 5,175 |
| Mild DR | 0.11 | 0.01 | 0.02 | 493 | |
| Moderate DR | 0.33 | 0.06 | 0.11 | 1,049 | |
| Severe DR | 0.26 | 0.12 | 0.17 | 160 | |
| PDR | 0.26 | 0.1 | 0.14 | 149 | |
| D | No DR | 0.8 | 0.94 | 0.86 | 5,175 |
| Mild DR | 0.11 | 0.02 | 0.03 | 493 | |
| Moderate DR | 0.43 | 0.28 | 0.34 | 1,049 | |
| Severe DR | 0.4 | 0.23 | 0.29 | 160 | |
| PDR | 0.53 | 0.21 | 0.3 | 149 | |
| E | No DR | 0.83 | 0.92 | 0.87 | 5,175 |
| Mild DR | 0.1 | 0.04 | 0.05 | 493 | |
| Moderate DR | 0.52 | 0.45 | 0.48 | 1,049 | |
| Severe DR | 0.46 | 0.32 | 0.37 | 160 | |
| PDR | 0.71 | 0.5 | 0.59 | 149 | |
| F | No DR | 0.83 | 0.92 | 0.87 | 5,175 |
| Mild DR | 0.1 | 0.04 | 0.05 | 493 | |
| Moderate DR | 0.53 | 0.44 | 0.48 | 1,049 | |
| Severe DR | 0.46 | 0.33 | 0.38 | 160 | |
| PDR | 0.69 | 0.5 | 0.58 | 149 |
Figure 5Horizontal translations.
The influence of different values of n on the evaluation indexes.
|
|
|
|
|
|---|---|---|---|
| First step | 10 | 0.7904 | 0.7587 |
| 20 | 0.7882 | 0.7573 | |
| 30 | 0.7865 | 0.7564 | |
| 40 | 0.7846 | 0.7552 | |
| 50 | 0.7838 | 0.7549 | |
| 60 | 0.7842 | 0.7552 | |
| 70 | 0.7838 | 0.755 | |
| 80 | 0.7832 | 0.7546 | |
| 90 | 0.7825 | 0.7542 | |
| 100 | 0.7816 | 0.7536 | |
| Second step | 1 | 0.7931 | 0.7595 |
| 2 | 0.7933 | 0.7599 | |
| 3 | 0.7933 | 0.7602 | |
| 4 | 0.7929 | 0.7601 | |
| 5 | 0.7921 | 0.7596 | |
| 6 | 0.7916 | 0.7593 | |
| 7 | 0.7913 | 0.7591 | |
| 8 | 0.7913 | 0.7591 | |
| 9 | 0.7913 | 0.7593 | |
| 10 | 0.7905 | 0.7587 |
Deep ensemble learning vs. base classifiers.
|
|
|
|
|
|---|---|---|---|
| Inception V3 | 0.77 | No DR | 0.87 |
| Mild DR | 0.05 | ||
| Moderate DR | 0.48 | ||
| Severe DR | 0.38 | ||
| PDR | 0.58 | ||
| InceptionResNet V2 | 0.78 | No DR | 0.88 |
| Mild DR | 0.07 | ||
| Moderate DR | 0.51 | ||
| Severe DR | 0.42 | ||
| PDR | 0.59 | ||
| Xception | 0.78 | No DR | 0.88 |
| Mild DR | 0.05 | ||
| Moderate DR | 0.5 | ||
| Severe DR | 0.39 | ||
| PDR | 0.61 | ||
| ResNeXt101 | 0.76 | No DR | 0.87 |
| Mild DR | 0.06 | ||
| Moderate DR | 0.48 | ||
| Severe DR | 0.41 | ||
| PDR | 0.55 | ||
| NASNetLarge | 0.76 | No DR | 0.87 |
| Mild DR | 0.07 | ||
| Moderate DR | 0.5 | ||
| Severe DR | 0.38 | ||
| PDR | 0.51 | ||
| DR-IIXRN | 0.79 | No DR | 0.89 |
| Mild DR | 0.06 | ||
| Moderate DR | 0.52 | ||
| Severe DR | 0.43 | ||
| PDR | 0.6 |
DR-IIXRN vs. Harry Pratt proposed model.
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| |
| No DR | 0.95 | 0.95 | 0.83 | 0.78 | 0.47 | 0.19 | 0.89 | 0.85 |
| Mild DR | 0.04 | 0 | 0.16 | 0 | 0.99 | 1 | 0.06 | 0 |
| Moderate DR | 0.46 | 0.23 | 0.61 | 0.4 | 0.95 | 0.93 | 0.52 | 0.29 |
| Severe DR | 0.34 | 0.78 | 0.59 | 0.52 | 0.99 | 0.99 | 0.43 | 0.1 |
| PDR | 0.52 | 0.44 | 0.72 | 0.32 | 0.99 | 0.97 | 0.6 | 0.37 |
The influence of different algorithms on the evaluation index.
|
|
|
|
|
|
|---|---|---|---|---|
| Bravo | 0.5051 | 0.5081 | 0.5052 | – |
| Bi-ResNet [Ziyuan Zhao] | 0.4889 | 0.5503 | 0.4897 | – |
| RA-Net [Ziyuan Zhao] | 0.4717 | 0.5268 | 0.4724 | – |
| BiRA-Net [Ziyuan Zhao] | 0.5431 | 0.5725 | 0.5436 | – |
| VGG19 [Yuchen Wu] | – | – | – | 0.51 |
| Resnet50 [Yuchen Wu] | – | – | – | 0.49 |
| InceptionV3 [Yuchen Wu] | – | – | – | 0.61 |
| DR-IIXRN | 0.6347 | 0.51 | 0.791 | 0.79 |
Dataset category distribution and evaluation indicators.
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| Test set | Precision | 0.9 | 1 | 0.81 | 0.96 | 1 | 0.93 |
| Recall | 0.95 | 0.69 | 0.97 | 0.89 | 0.97 | 0.92 | |
| F1 | 0.93 | 0.81 | 0.89 | 0.92 | 0.99 | 0.92 | |
| Specificity | 0.98 | 1 | 0.91 | 0.99 | 1 | 0.97 | |
| Accuracy | 0.95 | 0.69 | 0.97 | 0.89 | 0.97 | 0.92 | |
| AUC | 0.97 | 0.84 | 0.94 | 0.94 | 0.99 | 0.95 | |
| Support | 20 | 16 | 36 | 27 | 34 | 133 | |
| Data set | Support | 159 | 99 | 289 | 166 | 232 | 945 |
Figure 7The influence of different base classifiers on accuracy.