| Literature DB >> 34035883 |
Jianfeng Cui1, Xiaoyun Zhang2, Feibing Xiong2, Chin-Ling Chen3,4,5.
Abstract
The automatic diagnosis of various retinal diseases based on fundus images is important in supporting clinical decision-making. Convolutional neural networks (CNNs) have achieved remarkable results in such tasks. However, their high expression ability possibly leads to overfitting. Therefore, data augmentation (DA) techniques have been proposed to prevent overfitting while enriching datasets. Recent CNN architectures with more parameters render traditional DA techniques insufficient. In this study, we proposed a new DA strategy based on multimodal fusion (DAMF) which could integrate the standard DA method, data disrupting method, data mixing method, and autoadjustment method to enhance the image data in the training dataset to create new training images. In addition, we fused the results of the classifier by voting on the basis of DAMF, which further improved the generalization ability of the model. The experimental results showed that the optimal DA mode could be matched to the image dataset through our DA strategy. We evaluated DAMF on the iChallenge-PM dataset. At last, we compared training results between 12 DAMF processed datasets and the original training dataset. Compared with the original dataset, the optimal DAMF achieved an accuracy increase of 2.85% on iChallenge-PM.Entities:
Mesh:
Year: 2021 PMID: 34035883 PMCID: PMC8118733 DOI: 10.1155/2021/5549779
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1Evolution of inception structures: (a) Inception-V1 chart; (b) Inception-V2 chart; (c) Inception-V3 chart.
Figure 2Residual structures in the ResNet network.
Figure 3Source: Genevieve B.Orr: (a) SGD without momentum; (b) SGD with momentum.
12 DAMFs.
| No. | Training set name | DA method | Quantity |
|---|---|---|---|
| 1 | PALM-Training800-overturning | Original dataset + random flip (4 directions: up, down, left, and right) | 800 |
| 2 | PALM-Training800-noise | Original dataset + Gaussian white noise | 800 |
| 3 | PALM-Training800-color | Original dataset + randomly changing colors (brightness, contrast, saturation) | 800 |
| 4 | PALM-Training800-cropping | Original dataset + random cropping | 800 |
| 5 | PALM-Training800-deforming | Original dataset + random scaling, stretching (stretched into a square by the length or width of the images) | 800 |
| 6 | PALM-Training800-dimming | Original dataset + change clarity | 800 |
| 7 | PALM-Training1600-overturning-noise-color | Randomly stack method 3 or 4 (serial number) on the basis of PALM-Training800-overturning | 1600 |
| 8 | PALM-Training1600-overturning-cropping-deforming | Randomly stack method 5 or 6 (serial number) on the basis of PALM-Training800-overturning | 1600 |
| 9 | PALM-Training1600-overturning-dimming | Randomly stack method 7 (serial number) on the basis of PALM-Training800-overturning | 1600 |
| 10 | PALM-Training3200-overturning-noise-color-cropping-deforming-dimming | Randomly superimpose method 5 or 6 or 7 (serial number) on the basis of PALM-Training800-overturning-noise-color | 3200 |
| 11 | PALM-Training800-imgaug1 | Original dataset + random cropping with 0–50 pixels around, 50% probability horizontal flip, Gaussian blur (sigma = 0 to 3.0) | 800 |
| 12 | PALM-Training1600-overturning-dimming-imgaug2 | PALM-Training800-overturning-dimming dataset + multiple mixed random overlay | 1600 |
Figure 4DA effect (original dataset imagex: different images in the original dataset: (r), (s), and (t) were mutated from (q)). (a) Original dataset image1. (b) Randomly change direction. (c) Original dataset image2. (d) Randomly add Gaussian noise. (e) Original dataset image3. (f) random color. (g) Original dataset image4. (h) Random stretching. (i) Original dataset image5. (j) Randomly adjust the sharpness. (k) Original dataset image6. (l) Randomly flip, adjust colors, and add Gaussian noise. (m) Original dataset image7. (n) Randomly flip, adjust colors, and add Gaussian noise. (o) Original dataset image8. (p) Random stretching and cropping. (q) Original dataset image9. (r) Random cropping with 0–50 pixels around, 50% probability horizontal flip, and Gaussian blur (sigma = 0 to 3.0). (s) Original dataset image. (t) Randomly stack all operations.
Algorithm 1Logic diagram of the model fusion algorithm.
Figure 5Logic diagram of the fusion model.
Figure 6Accuracy of VGG-16 for 30 epochs of training on 13 datasets (original dataset: 2num and DAMF datasets: 1num and 3–13num).
VGG-16 training results on 13 datasets.
| No. | Dataset | Accuracy | Loss |
|---|---|---|---|
| 1 | PALM-Training1600-overturning-dimming-imgaug2 | 0.95858336 | 0.18674079 |
| 2 | PALM-Training3200-overturning-noise-color-cropping-deforming-dimming | 0.95550001 | 0.27185006 |
| 3 | PALM-Training1600-overturning-cropping-deforming | 0.95266668 | 0.16523545 |
| 4 | PALM-Training800-color | 0.95033336 | 0.17019135 |
| 5 | PALM-Training800-dimming | 0.94875002 | 0.17919912 |
| 6 | PALM-Training800-cropping | 0.94625 | 0.18303553 |
| 7 | PALM-Training1600-overturning-dimming | 0.94525003 | 0.23124305 |
| 8 | PALM-Training1600-overturning-noise-color | 0.94008333 | 0.21350351 |
| 9 | PALM-Training800-overturning | 0.93858335 | 0.20894363 |
| 10 | PALM-Training800-deforming | 0.93708334 | 0.20814224 |
| 11 | PALM-Training800-noise | 0.93608335 | 0.26124661 |
| 12 | PALM-Training800-imgaug1 | 0.93391667 | 0.19876853 |
| 13 |
|
| 0.19310093 |
Training results of VGG-16, AlexNet, GoogLeNet, and ResNet-50 on the filtered datasets.
| Primary learner | PALM-Training800-color | PALM-Training1600-overturning-cropping-deforming | PALM-Training3200-overturning-noise-color-cropping-deforming-dimming | PALM-Training1600-overturning-dimming-imgaug2 | ||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Loss rate | Accuracy | Loss rate | Accuracy | Loss rate | Accuracy | Loss rate | |
| AlexNet | 0.946083 | 0.162323 | 0.950833 | 0.160698 | 0.954667 | 0.197096 | 0.957583 | 0.157773 |
| GoogLeNet | 0.909 | 0.203461 | 0.9395 | 0.169759 | 0.962417 | 0.13617727 | 0.947917 | 0.170418 |
| ResNet-50 | 0.9395 | 0.219571 | 0.953583 | 0.151836 | 0.955917 | 0.156737 | 0.95125 | 0.166358 |
| VGG-16 | 0.9503 | 0.17019 | 0.95267 | 0.16523 | 0.9555 | 0.27185 | 0.95858 | 0.18674 |
Figure 7Loss of VGG-16 for 30 epochs of training on 13 datasets.
Figure 8Logic diagram of the fusion model.
Figure 9The accuracy and loss rate of three primary learners in training process. Four best training results of (a) AlexNet; (b) GoogLeNet; (c) ResNet-50.
Figure 10Distribution of accuracy and loss rate of VGG-16 under different MF.
Results on the iChallenge-PM dataset.
| Accuracy (%) | Methods | |
|---|---|---|
| Siying Dai [ | 81.82 | Optimize network structure + DA |
| InstDis [ | 95.32 | Optimize network structure + DA |
| Contrastive [ | 96.94 | Optimize network structure + DA |
| Invariant [ | 97.30 | Optimize network structure + DA |
| Xiaomeng Li [ | 98.65 | Optimize network structure + DA |
|
|
| Optimize network structure + DA |