Holger Andreas Haenssle1, Julia Katharina Winkler2, Christine Fink2, Ferdinand Toberer2, Alexander Enk2, Wilhelm Stolz3, Teresa Deinlein4, Rainer Hofmann-Wellenhof4, Harald Kittler5, Philipp Tschandl5, Cliff Rosendahl6, Aimilios Lallas7, Andreas Blum8, Mohamed Souhayel Abassi9, Luc Thomas10, Isabelle Tromme11, Albert Rosenberger12. 1. Department of Dermatology, University of Heidelberg, Heidelberg, Germany. Electronic address: Holger.Haenssle@med.uni-heidelberg.de. 2. Department of Dermatology, University of Heidelberg, Heidelberg, Germany. 3. Department of Dermatology, Allergology and Environmental Medicine II, Hospital Thalkirchner Street, Munich, Germany. 4. Department of Dermatology and Venerology, Medical University of Graz, Graz, Austria. 5. Department of Dermatology, Medical University of Vienna, Vienna, Austria. 6. School of Medicine, The University of Queensland, Queensland, Australia. 7. First Department of Dermatology, Aristotle University, Thessaloniki, Greece. 8. Office Based Clinic of Dermatology, Konstanz, Germany. 9. Faculty of Computer Science and Mathematics, University of Passau, Passau, Germany. 10. Department of Dermatology, Lyons Cancer Research Center, Lyon 1 University, Lyon, France. 11. Department of Dermatology, Université Catholique de Louvain, St. Luc University Hospital, Brussels, Belgium. 12. Department of Genetic Epidemiology, University of Goettingen, Goettingen, Germany.
Abstract
BACKGROUND: The clinical differentiation of face and scalp lesions (FSLs) is challenging even for trained dermatologists. Studies comparing the diagnostic performance of a convolutional neural network (CNN) with dermatologists in FSL are lacking. METHODS: A market-approved CNN (Moleanalyzer-Pro, FotoFinder Systems) was used for binary classifications of 100 dermoscopic images of FSL. The same lesions were used in a two-level reader study including 64 dermatologists (level I: dermoscopy only; level II: dermoscopy, clinical close-up images, textual information). Primary endpoints were the CNN's sensitivity and specificity in comparison with the dermatologists' management decisions in level II. Generalizability of the CNN results was tested by using four additional external data sets. RESULTS: The CNN's sensitivity, specificity and ROC AUC were 96.2% [87.0%-98.9%], 68.8% [54.7%-80.1%] and 0.929 [0.880-0.978], respectively. In level II, the dermatologists' management decisions showed a mean sensitivity of 84.2% [82.2%-86.2%] and specificity of 69.4% [66.0%-72.8%]. When fixing the CNN's specificity at the dermatologists' mean specificity (69.4%), the CNN's sensitivity (96.2% [87.0%-98.9%]) was significantly higher than that of dermatologists (84.2% [82.2%-86.2%]; p < 0.001). Dermatologists of all training levels were outperformed by the CNN (all p < 0.001). In confirmation, the CNN's accuracy (83.0%) was significantly higher than dermatologists' accuracies in level II management decisions (all p < 0.001). The CNN's performance was largely confirmed in three additional external data sets but particularly showed a reduced specificity in one Australian data set including FSL on severely sun-damaged skin. CONCLUSIONS: When applied as an assistant system, the CNN's higher sensitivity at an equivalent specificity may result in an improved early detection of face and scalp skin cancers.
BACKGROUND: The clinical differentiation of face and scalp lesions (FSLs) is challenging even for trained dermatologists. Studies comparing the diagnostic performance of a convolutional neural network (CNN) with dermatologists in FSL are lacking. METHODS: A market-approved CNN (Moleanalyzer-Pro, FotoFinder Systems) was used for binary classifications of 100 dermoscopic images of FSL. The same lesions were used in a two-level reader study including 64 dermatologists (level I: dermoscopy only; level II: dermoscopy, clinical close-up images, textual information). Primary endpoints were the CNN's sensitivity and specificity in comparison with the dermatologists' management decisions in level II. Generalizability of the CNN results was tested by using four additional external data sets. RESULTS: The CNN's sensitivity, specificity and ROC AUC were 96.2% [87.0%-98.9%], 68.8% [54.7%-80.1%] and 0.929 [0.880-0.978], respectively. In level II, the dermatologists' management decisions showed a mean sensitivity of 84.2% [82.2%-86.2%] and specificity of 69.4% [66.0%-72.8%]. When fixing the CNN's specificity at the dermatologists' mean specificity (69.4%), the CNN's sensitivity (96.2% [87.0%-98.9%]) was significantly higher than that of dermatologists (84.2% [82.2%-86.2%]; p < 0.001). Dermatologists of all training levels were outperformed by the CNN (all p < 0.001). In confirmation, the CNN's accuracy (83.0%) was significantly higher than dermatologists' accuracies in level II management decisions (all p < 0.001). The CNN's performance was largely confirmed in three additional external data sets but particularly showed a reduced specificity in one Australian data set including FSL on severely sun-damaged skin. CONCLUSIONS: When applied as an assistant system, the CNN's higher sensitivity at an equivalent specificity may result in an improved early detection of face and scalp skin cancers.