| Literature DB >> 34177258 |
Amitojdeep Singh1,2, Janarthanam Jothi Balaji3, Mohammed Abdul Rasheed1, Varadharajan Jayakumar1, Rajiv Raman4, Vasudevan Lakshminarayanan1,2.
Abstract
BACKGROUND: The lack of explanations for the decisions made by deep learning algorithms has hampered their acceptance by the clinical community despite highly accurate results on multiple problems. Attribution methods explaining deep learning models have been tested on medical imaging problems. The performance of various attribution methods has been compared for models trained on standard machine learning datasets but not on medical images. In this study, we performed a comparative analysis to determine the method with the best explanations for retinal OCT diagnosis.Entities:
Keywords: choroidal neovascularization; deep learning; diabetic macular edema; drusen; explainable AI; image processing; machine learning; optical coherence tomography; retina
Year: 2021 PMID: 34177258 PMCID: PMC8219310 DOI: 10.2147/OPTH.S312236
Source DB: PubMed Journal: Clin Ophthalmol ISSN: 1177-5467
Confusion Matrix for the Model on the Test Set of 1000 Images
| 249 | 0 | 1 | 0 | 250 | ||
| 1 | 249 | 0 | 0 | 250 | ||
| 3 | 0 | 247 | 0 | 250 | ||
| 0 | 0 | 2 | 248 | 250 | ||
Figure 1Heatmaps for scans with the larger pathologies – (top) choroidal neovascularization (CNV) and (bottom) diabetic macular edema (DME). For each case - Row 1: Input image, DeConvNet, Deep Taylor, DeepLIFT. Row 2: Gradient, GBP, Input times gradient, IG. Row 3: LRP – EPS, LRP – Z, Occlusion, Salience. Row 1: Input image, DeConvNet, Deep Taylor, DeepLIFT. Row 2: Gradient, GBP, Input times gradient, IG. Row 3: LRP – EPS, LRP – Z, Occlusion, Salience. Row 4: SHAP Random, SHAP Selected, SmoothGrad. The scale in the bottom right shows that the parts highlighted in magenta color provide positive evidence regarding presence of a disease while those in blue color provide a negative evidence indicating that the image is closer to normal. DeepTaylor, GBP perform the best, SHAP highlights partial but precise regions. The fluid accumulation for CNV and the edges of the edema for DME were highlighted by better performing methods.
Figure 2Heatmaps for 2 scans with drusen, the smaller pathology. Top: Correct diagnosis, Bottom: Incorrect diagnosis. The pathological structures are smaller than the previous two and as a result most of the methods highlight regions outside too. SHAP is the most precise here in. In the incorrect case there is higher negative evidence (blue), especially with occlusion. The performance of the methods can be observed in terms of positive highlights of the bumpy RPE.
Figure 3Violin plots of normalized ratings of all methods. The breadth of the plot shows the probability density of the data and the median value is reported on top of the plots. Deep Taylor was rated the highest overall followed by GBP and SHAP.
Median Ratings (with IQR) for Each Disease for All Attribution Methods. Deep Taylor (Bold) Had the Highest Ratings
| Method | Median Rating (IQR) | |||
|---|---|---|---|---|
| CNV | DME | Drusen | All | |
| DcNet | 2.17 (1.71–2.61) | 2.47 (1.74–3.09) | 2.32 (1.71–2.61) | 2.32 (1.71–2.82) |
| DLift-Res | 2.44 (1.85–2.72) | 2.44 (1.96–2.53) | 2.53 (2.32–3.09) | 2.47 (2.06–2.82) |
| Grad | 2.32 (1.77–2.53) | 2.47 (2.19–2.95) | 2.44 (2.03–2.61) | 2.44 (1.96–2.72) |
| GBP | 3.23 (3.09–3.80) | 3.26 (3.07–3.80) | 3.71 (3.22–3.99) | 3.29 (3.09–3.97) |
| I*Grad | 2.50 (2.32–2.95) | 2.47 (2.28–2.82) | 2.53(2.44–3.04) | 2.50 (2.32–2.95) |
| IG | 2.50 (2.32–2.95) | 2.47 (2.19–2.82) | 2.57 (2.44–3.20) | 2.50 (2.32–2.95) |
| LRP.E | 2.50 (2.32–2.95) | 2.50 (2.32–2.95) | 2.53 (2.41–3.04) | 2.50 (2.32–2.95) |
| LRP.Z | 2.50 (2.32–2.95) | 2.50 (2.32–2.95) | 2.53 (2.41–3.04) | 2.50 (2.32–2.95) |
| Occ64 | 1.71 (1.55–1.96) | 1.71 (1.42–1.85) | 1.71 (1.42–1.96) | 1.71 (1.52–1.96) |
| Saliency | 2.47 (1.74–3.29) | 2.72 (1.74–3.29) | 2.61 (1.74–3.29) | 2.61 (1.74–3.29) |
| SHAP-R | 3.23 (2.53–3.85) | 3.23 (2.53–3.85) | 3.58 (2.89–3.96) | 3.23 (2.53–3.85) |
| SHAP-S | 3.23 (2.53–3.85) | 3.23 (2.53–3.85) | 3.53 (2.61–3.96) | 3.26 (2.53–3.96) |
| SmoothGrad | 2.45 (1.85–2.95) | 2.47 (1.96–3.09) | 2.47 (1.85–3.04) | 2.47 (1.93–3.04) |
Figure 4Spearman correlation for clinician’s ratings.