P Tschandl1,2, G Argenziano3, M Razmara4, J Yap4. 1. School of Computing Science, Simon Fraser University, Burnaby, Canada. 2. Department of Dermatology, Medical University of Vienna, Vienna, Austria. 3. Department of Dermatology, University of Campania, Naples, Italy. 4. MetaOptima Technology Inc., Vancouver, BC, Canada.
Abstract
BACKGROUND: Automated classification of medical images through neural networks can reach high accuracy rates but lacks interpretability. OBJECTIVES: To compare the diagnostic accuracy obtained by using content-based image retrieval (CBIR) to retrieve visually similar dermatoscopic images with corresponding disease labels against predictions made by a neural network. METHODS: A neural network was trained to predict disease classes on dermatoscopic images from three retrospectively collected image datasets containing 888, 2750 and 16 691 images, respectively. Diagnosis predictions were made based on the most commonly occurring diagnosis in visually similar images, or based on the top-1 class prediction of the softmax output from the network. Outcome measures were area under the receiver operating characteristic curve (AUC) for predicting a malignant lesion, multiclass-accuracy and mean average precision (mAP), measured on unseen test images of the corresponding dataset. RESULTS: In all three datasets the skin cancer predictions from CBIR (evaluating the 16 most similar images) showed AUC values similar to softmax predictions (0·842, 0·806 and 0·852 vs. 0·830, 0·810 and 0·847, respectively; P > 0·99 for all). Similarly, the multiclass-accuracy of CBIR was comparable with softmax predictions. Compared with softmax predictions, networks trained for detecting only three classes performed better on a dataset with eight classes when using CBIR (mAP 0·184 vs. 0·368 and 0·198 vs. 0·403, respectively). CONCLUSIONS: Presenting visually similar images based on features from a neural network shows comparable accuracy with the softmax probability-based diagnoses of convolutional neural networks. CBIR may be more helpful than a softmax classifier in improving diagnostic accuracy of clinicians in a routine clinical setting.
BACKGROUND: Automated classification of medical images through neural networks can reach high accuracy rates but lacks interpretability. OBJECTIVES: To compare the diagnostic accuracy obtained by using content-based image retrieval (CBIR) to retrieve visually similar dermatoscopic images with corresponding disease labels against predictions made by a neural network. METHODS: A neural network was trained to predict disease classes on dermatoscopic images from three retrospectively collected image datasets containing 888, 2750 and 16 691 images, respectively. Diagnosis predictions were made based on the most commonly occurring diagnosis in visually similar images, or based on the top-1 class prediction of the softmax output from the network. Outcome measures were area under the receiver operating characteristic curve (AUC) for predicting a malignant lesion, multiclass-accuracy and mean average precision (mAP), measured on unseen test images of the corresponding dataset. RESULTS: In all three datasets the skin cancer predictions from CBIR (evaluating the 16 most similar images) showed AUC values similar to softmax predictions (0·842, 0·806 and 0·852 vs. 0·830, 0·810 and 0·847, respectively; P > 0·99 for all). Similarly, the multiclass-accuracy of CBIR was comparable with softmax predictions. Compared with softmax predictions, networks trained for detecting only three classes performed better on a dataset with eight classes when using CBIR (mAP 0·184 vs. 0·368 and 0·198 vs. 0·403, respectively). CONCLUSIONS: Presenting visually similar images based on features from a neural network shows comparable accuracy with the softmax probability-based diagnoses of convolutional neural networks. CBIR may be more helpful than a softmax classifier in improving diagnostic accuracy of clinicians in a routine clinical setting.
Authors: Veronica Rotemberg; Nicholas Kurtansky; Brigid Betz-Stablein; Liam Caffery; Emmanouil Chousakos; Noel Codella; Marc Combalia; Stephen Dusza; Pascale Guitera; David Gutman; Allan Halpern; Brian Helba; Harald Kittler; Kivanc Kose; Steve Langer; Konstantinos Lioprys; Josep Malvehy; Shenara Musthaq; Jabpani Nanda; Ofer Reiter; George Shih; Alexander Stratigos; Philipp Tschandl; Jochen Weber; H Peter Soyer Journal: Sci Data Date: 2021-01-28 Impact factor: 6.444