Roman C Maron1, Michael Weichenthal2, Jochen S Utikal3, Achim Hekler1, Carola Berking4, Axel Hauschild2, Alexander H Enk5, Sebastian Haferkamp6, Joachim Klode7, Dirk Schadendorf7, Philipp Jansen7, Tim Holland-Letz8, Bastian Schilling9, Christof von Kalle1, Stefan Fröhling1, Maria R Gaiser3, Daniela Hartmann4, Anja Gesierich9, Katharina C Kähler2, Ulrike Wehkamp2, Ante Karoglan10, Claudia Bär10, Titus J Brinker11. 1. National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany. 2. Department of Dermatology, University Hospital Kiel, Kiel, Germany. 3. Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center, Heidelberg, Germany. 4. Department of Dermatology, University Hospital Munich (LMU), Munich, Germany. 5. Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. 6. Department of Dermatology, University Hospital Regensburg, Regensburg, Germany. 7. Department of Dermatology, University Hospital Essen, Essen, Germany. 8. Department of Biostatistics, German Cancer Research Center, Heidelberg, Germany. 9. Department of Dermatology, University Hospital Würzburg, Würzburg, Germany. 10. Department of Dermatology, University Hospital Magdeburg, Magdeburg, Germany. 11. National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. Electronic address: titus.brinker@dkfz.de.
Abstract
BACKGROUND: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. METHODS: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. FINDINGS: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). INTERPRETATION: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).
BACKGROUND: Recently, convolutional neural networks (CNNs) systematically outperformed dermatologists in distinguishing dermoscopic melanoma and nevi images. However, such a binary classification does not reflect the clinical reality of skin cancer screenings in which multiple diagnoses need to be taken into account. METHODS: Using 11,444 dermoscopic images, which covered dermatologic diagnoses comprising the majority of commonly pigmented skin lesions commonly faced in skin cancer screenings, a CNN was trained through novel deep learning techniques. A test set of 300 biopsy-verified images was used to compare the classifier's performance with that of 112 dermatologists from 13 German university hospitals. The primary end-point was the correct classification of the different lesions into benign and malignant. The secondary end-point was the correct classification of the images into one of the five diagnostic categories. FINDINGS: Sensitivity and specificity of dermatologists for the primary end-point were 74.4% (95% confidence interval [CI]: 67.0-81.8%) and 59.8% (95% CI: 49.8-69.8%), respectively. At equal sensitivity, the algorithm achieved a specificity of 91.3% (95% CI: 85.5-97.1%). For the secondary end-point, the mean sensitivity and specificity of the dermatologists were at 56.5% (95% CI: 42.8-70.2%) and 89.2% (95% CI: 85.0-93.3%), respectively. At equal sensitivity, the algorithm achieved a specificity of 98.8%. Two-sided McNemar tests revealed significance for the primary end-point (p < 0.001). For the secondary end-point, outperformance (p < 0.001) was achieved except for basal cell carcinoma (on-par performance). INTERPRETATION: Our findings show that automated classification of dermoscopic melanoma and nevi images is extendable to a multiclass classification problem, thus better reflecting clinical differential diagnoses, while still outperforming dermatologists at a significant level (p < 0.001).
Authors: Yao Zhang; Kamil Ali; Jacob A George; Jason S Reichenberg; Matthew C Fox; Adewole S Adamson; James W Tunnell; Mia K Markey Journal: J Med Imaging (Bellingham) Date: 2021-02-10
Authors: Marc Combalia; Noel Codella; Veronica Rotemberg; Cristina Carrera; Stephen Dusza; David Gutman; Brian Helba; Harald Kittler; Nicholas R Kurtansky; Konstantinos Liopyris; Michael A Marchetti; Sebastian Podlipnik; Susana Puig; Christoph Rinner; Philipp Tschandl; Jochen Weber; Allan Halpern; Josep Malvehy Journal: Lancet Digit Health Date: 2022-05
Authors: Roman C Maron; Achim Hekler; Eva Krieghoff-Henning; Max Schmitt; Justin G Schlager; Jochen S Utikal; Titus J Brinker Journal: J Med Internet Res Date: 2021-03-25 Impact factor: 5.428
Authors: Ayush Jain; David Way; Vishakha Gupta; Yi Gao; Guilherme de Oliveira Marinho; Jay Hartford; Rory Sayres; Kimberly Kanada; Clara Eng; Kunal Nagpal; Karen B DeSalvo; Greg S Corrado; Lily Peng; Dale R Webster; R Carter Dunn; David Coz; Susan J Huang; Yun Liu; Peggy Bui; Yuan Liu Journal: JAMA Netw Open Date: 2021-04-01
Authors: Julia Höhn; Achim Hekler; Eva Krieghoff-Henning; Jakob Nikolas Kather; Jochen Sven Utikal; Friedegund Meier; Frank Friedrich Gellrich; Axel Hauschild; Lars French; Justin Gabriel Schlager; Kamran Ghoreschi; Tabea Wilhelm; Heinz Kutzner; Markus Heppt; Sebastian Haferkamp; Wiebke Sondermann; Dirk Schadendorf; Bastian Schilling; Roman C Maron; Max Schmitt; Tanja Jutzi; Stefan Fröhling; Daniel B Lipka; Titus Josef Brinker Journal: J Med Internet Res Date: 2021-07-02 Impact factor: 5.428
Authors: Titus J Brinker; Roman C Maron; Jochen S Utikal; Achim Hekler; Axel Hauschild; Elke Sattler; Wiebke Sondermann; Sebastian Haferkamp; Bastian Schilling; Markus V Heppt; Philipp Jansen; Markus Reinholz; Cindy Franklin; Laurenz Schmitt; Daniela Hartmann; Eva Krieghoff-Henning; Max Schmitt; Michael Weichenthal; Christof von Kalle; Stefan Fröhling Journal: J Med Internet Res Date: 2020-09-11 Impact factor: 5.428