| Literature DB >> 34036304 |
Qitong Gao1, Joshua Amason2, Scott Cousins2, Miroslav Pajic1,3, Majda Hadziahmetovic2.
Abstract
Purpose: This study aims to meet a growing need for a fully automated, learning-based interpretation tool for retinal images obtained remotely (e.g. teleophthalmology) through different imaging modalities that may include imperfect (uninterpretable) images.Entities:
Mesh:
Year: 2021 PMID: 34036304 PMCID: PMC8161696 DOI: 10.1167/tvst.10.6.30
Source DB: PubMed Journal: Transl Vis Sci Technol ISSN: 2164-2591 Impact factor: 3.283
Distribution of the Original Dataset, Augmented Training Set, and Testing Dataset
| By Modality | ||||||
|---|---|---|---|---|---|---|
| OCT | CFP | |||||
| Retinal Pathology Negative | Retinal Pathology Positive | Uninterpretable | Retinal Pathology Negative | Retinal Pathology Positive | Uninterpretable | |
| Original | 982 | 164 | 2 | 952 | 125 | 71 |
| Augmented training set | 3189 | 1736 | 14 | 2839 | 1302 | 798 |
| Testing set | 73 | 40 | 1 | 68 | 32 | 14 |
| By Eye | ||||||
| Retinal Pathology Negative | Retinal Pathology Positive | Total | ||||
| Original | 924 | 224 | 1148 | |||
| Augmented training set | 2601 | 2338 | 4939 | |||
| Testing set | 57 | 57 | 114 | |||
Figure 1.Overview of the proposed CNN model design methodology. The OCT and CFP images obtained from the automated screening system were first labeled respectively by experts (step I), and the individual diagnoses were used to generate training labels according to the Label Consensus Mechanism (step II). The two types of images were augmented and pre-processed to constitute the inputs to the CNN (step III), before being used, along with the obtained labels, for the CNN training (step IV).
Figure 2.The architecture of the proposed CNN model with Class Activation Mapping (CAM). The OCT and CFP modalities are first processed with two sets of convolutional filters respectively; the resulting features are then concatenated and processed by a fully connected layer (θ3) for classification. CAMs are generated using the outputs from the two global average pooling layers and weights from the fully connected layer.
Performance Comparison Among Our Approach, Baseline A, Baseline B, and Baseline C on the Full Testing Dataset
| Accuracy/No. (%, 95% CI) | FNR/No. (%, 95% CI) | Recall/No. (%, 95% CI) | Specificity/No. (%, 95% CI) | AUC % (95% CI) | ||
|---|---|---|---|---|---|---|
| Baseline A | 93 | 19 | 38 | 83.58% | <0.001 | |
| (81.58%, 74.46%–88.70%) | (33.33%, 24.68%–41.99%) | (66.67%, 58.01%–75.32%) | (76.11%–91.05%) | |||
| Baseline B | 81 | 23 | 34 | 47 | 74.05% | <0.001 |
| (71.05%, 62.73%–79.38%) | (40.35%, 31.34%–49.36%) | (59.65%, 50.64%–68.66%) | (82.46%, 75.47%–89.44) | (64.96%–83.14%) | ||
| Baseline C | 91 | 40 | 87.32% | <0.001 | ||
| (79.82%, 72.46%–87.19%) | (70.18%, 61.78%–78.57%) | (80.71%–93.93%) |
Performance comparison between our approach (alternate gradient descent with binary output), baseline A (2 single modal CNNs as 3-output task), baseline B (interpretability classifiers followed by 2 single modal CNNs as 2-output task), and baseline C (two-stream CNNs representing state-of-the-art methods for 2-modal image analysis) on the full testing dataset.†
Statistics in italic correspond to better performance achieved by baselines than our approach, which are discussed in detail in the Results section. CIs for accuracy, FNR, Recall and Specificity were generated following the Wilson score interval. CI for AUC computed following Hanley et al.
P values generated by performing McNemar's test between the predictions and labels.
Figure 3.Accuracy-false negative rate (ACC-FNR) (A) curve and ROC (B) curve on the Testing Dataset. A ACC-FNR curve for our approach and baseline C. Baseline C has lower FNR than our approach with a decision threshold of 0.5; however, our method achieves both higher accuracy and lower FNR with a decision threshold providing optimal tradeoff between accuracy and FNR (e.g. the threshold of 0.65 as shown by the red dot in the plot). B ROC curves for our approach and baseline methods. Our approach achieves the highest AUC compared to all the baseline methods.
Performance Comparison Among Our Approach, Baseline A, Baseline B, and Baseline C on the Dataset Containing Only Interpretable Images
| Accuracy/No. (%, 95% CI) | FNR/No. (%, 95% CI) | Recall/No. (%, 95% CI) | Specificity/No. (%, 95% CI) | AUC % (95% CI) | ||
|---|---|---|---|---|---|---|
| Baseline A | 59 | 17 | 25 | 34 | 79.89% | <0.001 |
| (75.64%, 66.11%–85.17%) | (40.48%, 29.58%–51.37%) | (59.52%, 48.63%–70.42%) | (94.44%, 89.36%–99.53%) | (71.00%–88.79%) | ||
| Baseline B | 65 | 10 | 32 | 33 | 89.48% | <0.001 |
| (83.33%, 75.06%–91.60%) | (23.81%, 14.36%–33.26%) | (76.19%, 66.74%–85.64%) | (91.67%, 85.53%–97.80%) | (81.88%–97.09%) | ||
| Baseline C | 68 | 6 | 36 | 32 | 90.21% | 0.00443 |
| (87.18%, 79.76%–94.60%) | (14.29%, 6.52%–22.05%) | (85.71%, 77.95%–93.48%) | (88.89%, 81.91%–95.86%) | (83.31%–97.12%) |
Performance comparison between our approach (alternate gradient descent with binary output), baseline A (2 single modal CNNs as 3-output task), baseline B (interpretability classifiers followed by 2 single modal CNNs as 2-output task), and baseline C (two-stream CNNs representing state-of-the-art methods for 2-modal image analysis) on the dataset containing only interpretable images.