| Literature DB >> 36249698 |
Travis K Redd1, N Venkatesh Prajna2, Muthiah Srinivasan2, Prajna Lalitha2, Tiru Krishnan3, Revathi Rajaraman4, Anitha Venugopal5, Nisha Acharya6, Gerami D Seitzman6, Thomas M Lietman6, Jeremy D Keenan6, J Peter Campbell1, Xubo Song7.
Abstract
Purpose: Develop computer vision models for image-based differentiation of bacterial and fungal corneal ulcers and compare their performance against human experts. Design: Cross-sectional comparison of diagnostic performance. Participants: Patients with acute, culture-proven bacterial or fungal keratitis from 4 centers in South India.Entities:
Keywords: AUC, area under the receiver operating characteristic curve; Artificial intelligence; Bacterial keratitis; CI, confidence interval; CNN, convolutional neural network; Computer vision; Convolutional neural networks; Corneal ulcer; Deep learning; Fungal keratitis; Infectious keratitis; MUTT, Mycotic Ulcer Treatment Trials; SCUT, Steroids for Corneal Ulcers Trial
Year: 2022 PMID: 36249698 PMCID: PMC9560557 DOI: 10.1016/j.xops.2022.100119
Source DB: PubMed Journal: Ophthalmol Sci ISSN: 2666-9145
Figure 1Receiver operating characteristic curves of 5 deep convolutional neural network (CNN) models on a single-center external testing set consisting of 100 corneal ulcer images (50% fungal, 50% bacterial) from Coimbatore. The performance of the CNN ensemble is also depicted, representing the average output of all 5 models for each image. MobileNet demonstrated the highest performance among the model architectures tested. Ninety-five percent confidence intervals are depicted next to each area under the receiver operating characteristic curve (AUC) estimate.
Figure 2Receiver operating characteristic curves of the 5 convolutional neural network (CNN) models and 12 human graders on a multicenter testing set consisting of 80 corneal ulcer images (48 bacterial, 32 fungal). The CNN ensemble demonstrated a statistically significantly higher area under the receiver operating characteristic curve (AUC; 0.84) than the human ensemble (AUC, 0.76; P < 0.01). Ninety-five percent confidence intervals are depicted next to each AUC estimate.
Convolutional Neural Network and Human Grader Performance on External Test Sets
| Single Center Test Set (Coimbatore) | Multicenter Test Set | |
|---|---|---|
| Classifier | AUC (95% CI) | AUC (95% CI) |
| CNNs | ||
| MobileNet | 0.86 (0.78-0.93) | 0.83 (0.74-0.92) |
| DenseNet | 0.84 (0.76-0.92) | 0.83 (0.74-0.92) |
| ResNet | 0.76 (0.67-0.85) | 0.82 (0.72-0.91) |
| VGG | 0.74 (0.64-0.84) | 0.75 (0.64-0.86) |
| Xception | 0.68 (0.57-0.78) | 0.75 (0.64-0.86) |
| CNN Ensemble | ||
| Human Graders | ||
| Grader 1 | - | 0.79 (0.69-0.89) |
| Grader 2 | - | 0.78 (0.68-0.88) |
| Grader 3 | - | 0.76 (0.65-0.87) |
| Grader 4 | - | 0.73 (0.61-0.83) |
| Grader 5 | - | 0.70 (0.58-0.81) |
| Grader 6 | - | 0.69 (0.57-0.81) |
| Grader 7 | - | 0.67 (0.20-1.00) |
| Grader 8 | - | 0.65 (0.52-0.77) |
| Grader 9 | - | 0.64 (0.41-0.87) |
| Grader 10 | - | 0.61 (0.48-0.73) |
| Grader 11 | - | 0.57 (0.45-0.69) |
| Grader 12 | - | 0.42 (0.05-0.95) |
| Human Grader Ensemble |
CNN = convolutional neural network; — = not available.
Boldface values indicate ensemble rather than individual performance. Data are presented as area under the receiver operating curve (95% confidence interval).
P < 0.01 (DeLong method).
Figure 3Confusion matrices of the convolutional neural network (CNN) and human ensembles, with prediction categories (bacterial or fungal) assigned according to the threshold defined by Youden’s index. Percent values indicate column proportions. For example, the CNN ensemble showed 81% accuracy for identifying fungal infections and 75% accuracy for bacterial infections.
Figure 4Receiver operating characteristic curves of the best-performing convolutional neural network (CNN) (MobileNet), best-performing human (grader 1), and the ensemble of the 2 (CNN plus human ensemble). The area under the receiver operating characteristic curve (AUC) of the ensemble was 0.87, compared with an AUC of 0.79 for grader 1 (P = 0.09) and AUC of 0.83 for MobileNet (P = 0.17). Ninety-five percent confidence intervals are depicted next to each AUC estimate.
Figure 5Representative gradient class activation heatmaps of the images on which the best-performing convolutional neural network model (MobileNet) achieved highest agreement (top 5 fungal and top 5 bacterial) between model prediction and ground truth and lowest agreement (bottom 5 fungal and bottom 5 bacterial). Red coloration indicates regions of the input image that conferred the highest influence on the model’s prediction. Superimposed on each image is the percent agreement between the model’s prediction and the ground truth (for fungal images, P(fungal) × 100%; for bacterial images: 1 – P(fungal) × 100%). Adjacent to each heatmap is the raw image input to the model. The model tended to perform well when focusing attention on the corneal infiltrate, with relatively worse performance when areas of attention strayed from the cornea or when tested on images with quality limitations including overexposure (O), underexposure (U), or eccentric fixation (E).