Purpose: Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic retinopathy lesion detection as a benchmark task. Methods: The Indian Diabetic Retinopathy Image Dataset (IDRiD) publicly available database contains fundus images of diabetes patients with pixel level annotations of diabetic retinopathy (DR) lesions, the ground truth for this study. Three in advance trained DL models (ResNet50, VGG16 or InceptionV3) were used for DR detection in these images. Next, explainability was visualized with each of the 10 most used heatmapping techniques. The quantitative correspondence between the output of a heatmap and the ground truth was evaluated with the Explainability Consistency Score (ECS), a metric between 0 and 1, developed for this comparative task. Results: In case of the overall DR lesions detection, the ECS ranged from 0.21 to 0.51 for all model/heatmapping combinations. The highest score was for VGG16+Grad-CAM (ECS = 0.51; 95% confidence interval [CI]: [0.46; 0.55]). For individual lesions, VGG16+Grad-CAM performed best on hemorrhages and hard exudates. ResNet50+SmoothGrad performed best for soft exudates and ResNet50+Guided Backpropagation performed best for microaneurysms. Conclusions: Our empirical evaluation on the IDRiD database demonstrated that the combination DL model/heatmapping affects explainability when considering common DR lesions. Our approach found considerable disagreement between regions highlighted by heatmaps and expert annotations. Translational Relevance: We warrant a more systematic investigation and analysis of heatmaps for reliable explanation of image-based predictions of deep learning models. Copyright 2020 The Authors.
Purpose: Heatmapping techniques can support explainability of deep learning (DL) predictions in medical image analysis. However, individual techniques have been mainly applied in a descriptive way without an objective and systematic evaluation. We investigated comparative performances using diabetic retinopathy lesion detection as a benchmark task. Methods: The Indian Diabetic Retinopathy Image Dataset (IDRiD) publicly available database contains fundus images of diabetespatients with pixel level annotations of diabetic retinopathy (DR) lesions, the ground truth for this study. Three in advance trained DL models (ResNet50, VGG16 or InceptionV3) were used for DR detection in these images. Next, explainability was visualized with each of the 10 most used heatmapping techniques. The quantitative correspondence between the output of a heatmap and the ground truth was evaluated with the Explainability Consistency Score (ECS), a metric between 0 and 1, developed for this comparative task. Results: In case of the overall DR lesions detection, the ECS ranged from 0.21 to 0.51 for all model/heatmapping combinations. The highest score was for VGG16+Grad-CAM (ECS = 0.51; 95% confidence interval [CI]: [0.46; 0.55]). For individual lesions, VGG16+Grad-CAM performed best on hemorrhages and hard exudates. ResNet50+SmoothGrad performed best for soft exudates and ResNet50+Guided Backpropagation performed best for microaneurysms. Conclusions: Our empirical evaluation on the IDRiD database demonstrated that the combination DL model/heatmapping affects explainability when considering common DR lesions. Our approach found considerable disagreement between regions highlighted by heatmaps and expert annotations. Translational Relevance: We warrant a more systematic investigation and analysis of heatmaps for reliable explanation of image-based predictions of deep learning models. Copyright 2020 The Authors.
Entities:
Keywords:
deep learning; diabetic retinopathy; explainability; heatmap
Authors: Liu Li; Mai Xu; Hanruo Liu; Yang Li; Xiaofei Wang; Lai Jiang; Zulin Wang; Xiang Fan; Ningli Wang Journal: IEEE Trans Med Imaging Date: 2019-07-08 Impact factor: 10.048
Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun Journal: Nature Date: 2017-01-25 Impact factor: 49.962
Authors: C P Wilkinson; Frederick L Ferris; Ronald E Klein; Paul P Lee; Carl David Agardh; Matthew Davis; Diana Dills; Anselm Kampik; R Pararajasegaram; Juan T Verdaguer Journal: Ophthalmology Date: 2003-09 Impact factor: 12.079
Authors: Joanne W Y Yau; Sophie L Rogers; Ryo Kawasaki; Ecosse L Lamoureux; Jonathan W Kowalski; Toke Bek; Shih-Jen Chen; Jacqueline M Dekker; Astrid Fletcher; Jakob Grauslund; Steven Haffner; Richard F Hamman; M Kamran Ikram; Takamasa Kayama; Barbara E K Klein; Ronald Klein; Sannapaneni Krishnaiah; Korapat Mayurasakorn; Joseph P O'Hare; Trevor J Orchard; Massimo Porta; Mohan Rema; Monique S Roy; Tarun Sharma; Jonathan Shaw; Hugh Taylor; James M Tielsch; Rohit Varma; Jie Jin Wang; Ningli Wang; Sheila West; Liang Xu; Miho Yasuda; Xinzhi Zhang; Paul Mitchell; Tien Y Wong Journal: Diabetes Care Date: 2012-02-01 Impact factor: 19.112
Authors: Mohamed Shaban; Zeliha Ogur; Ali Mahmoud; Andrew Switala; Ahmed Shalaby; Hadil Abu Khalifeh; Mohammed Ghazal; Luay Fraiwan; Guruprasad Giridharan; Harpal Sandhu; Ayman S El-Baz Journal: PLoS One Date: 2020-06-22 Impact factor: 3.240
Authors: Jaakko Sahlsten; Joel Jaskari; Jyri Kivinen; Lauri Turunen; Esa Jaanio; Kustaa Hietala; Kimmo Kaski Journal: Sci Rep Date: 2019-07-24 Impact factor: 4.379
Authors: Jing Cao; Kun You; Jingxin Zhou; Mingyu Xu; Peifang Xu; Lei Wen; Shengzhan Wang; Kai Jin; Lixia Lou; Yao Wang; Juan Ye Journal: EClinicalMedicine Date: 2022-09-05