Katharina Sies1, Julia K Winkler1, Christine Fink1, Felicitas Bardehle1, Ferdinand Toberer1, Timo Buhl2, Alexander Enk1, Andreas Blum3, Albert Rosenberger4, Holger A Haenssle5. 1. Department of Dermatology, University of Heidelberg, Heidelberg, Germany. 2. Department of Dermatology, University of Göttingen, Göttingen, Germany. 3. Office Based Clinic of Dermatology, Konstanz, Germany. 4. Department of Genetic Epidemiology, University of Goettingen, Goettingen, Germany. 5. Department of Dermatology, University of Heidelberg, Heidelberg, Germany. Electronic address: holger.haenssle@med.uni-heidelberg.de.
Abstract
BACKGROUND: Convolutional neural networks (CNNs) have shown a dermatologist-level performance in the classification of skin lesions. We aimed to deliver a head-to-head comparison of a conventional image analyser (CIA), which depends on segmentation and weighting of handcrafted features, to a CNN trained by deep learning. METHODS: Cross-sectional study using a real-world, prospectively acquired, dermoscopic dataset of 1981 skin lesions to compare the diagnostic performance of a market-approved CNN (Moleanalyzer-Pro™, developed in 2018) to a CIA (Moleanalyzer-3™/Dynamole™; developed in 2004, all FotoFinder Systems Inc, Germany). As a reference standard, we used histopathological diagnoses (n = 785) or, in non-excised benign lesions (n = 1196), expert consensus plus an uneventful follow-up by sequential digital dermoscopy for at least 2 years. RESULTS: A total of 281 malignant lesions and 1700 benign lesions from 435 patients (62.2% male, mean age: 52 years) were prospectively imaged. The CNN showed a sensitivity of 77.6% (95% confidence interval [CI]: [72.4%-82.1%]), specificity of 95.3% (95% CI: [94.2%-96.2%]), and receiver operating characteristic (ROC)-area under the curve (AUC) of 0.945 (95% CI: [0.930-0.961]). In contrast, the CIA achieved a sensitivity of 53.4% (95% CI: [47.5%-59.1%]), specificity of 86.6% (95% CI: [84.9%-88.1%]) and ROC-AUC of 0.738 (95% CI: [0.701-0.774]). The data set included melanomas originally diagnosed by dynamic changes during sequential digital dermoscopy (52 of 201, 20.6%), which reduced the sensitivities of both classifiers. Pairwise comparisons of sensitivities, specificities, and ROC-AUCs indicated a clear outperformance by the CNN (all p < 0.001). CONCLUSIONS: The superior diagnostic performance of the CNN argues against a continued application of former CIAs as an aide to physicians' clinical management decisions.
BACKGROUND: Convolutional neural networks (CNNs) have shown a dermatologist-level performance in the classification of skin lesions. We aimed to deliver a head-to-head comparison of a conventional image analyser (CIA), which depends on segmentation and weighting of handcrafted features, to a CNN trained by deep learning. METHODS: Cross-sectional study using a real-world, prospectively acquired, dermoscopic dataset of 1981 skin lesions to compare the diagnostic performance of a market-approved CNN (Moleanalyzer-Pro™, developed in 2018) to a CIA (Moleanalyzer-3™/Dynamole™; developed in 2004, all FotoFinder Systems Inc, Germany). As a reference standard, we used histopathological diagnoses (n = 785) or, in non-excised benign lesions (n = 1196), expert consensus plus an uneventful follow-up by sequential digital dermoscopy for at least 2 years. RESULTS: A total of 281 malignant lesions and 1700 benign lesions from 435 patients (62.2% male, mean age: 52 years) were prospectively imaged. The CNN showed a sensitivity of 77.6% (95% confidence interval [CI]: [72.4%-82.1%]), specificity of 95.3% (95% CI: [94.2%-96.2%]), and receiver operating characteristic (ROC)-area under the curve (AUC) of 0.945 (95% CI: [0.930-0.961]). In contrast, the CIA achieved a sensitivity of 53.4% (95% CI: [47.5%-59.1%]), specificity of 86.6% (95% CI: [84.9%-88.1%]) and ROC-AUC of 0.738 (95% CI: [0.701-0.774]). The data set included melanomas originally diagnosed by dynamic changes during sequential digital dermoscopy (52 of 201, 20.6%), which reduced the sensitivities of both classifiers. Pairwise comparisons of sensitivities, specificities, and ROC-AUCs indicated a clear outperformance by the CNN (all p < 0.001). CONCLUSIONS: The superior diagnostic performance of the CNN argues against a continued application of former CIAs as an aide to physicians' clinical management decisions.