Achim Hekler1, Jochen S Utikal2, Alexander H Enk3, Axel Hauschild4, Michael Weichenthal4, Roman C Maron1, Carola Berking5, Sebastian Haferkamp6, Joachim Klode7, Dirk Schadendorf7, Bastian Schilling8, Tim Holland-Letz9, Benjamin Izar10, Christof von Kalle1, Stefan Fröhling1, Titus J Brinker11. 1. National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany. 2. Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center, Heidelberg, Germany. 3. Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. 4. Department of Dermatology, University Hospital Kiel, Kiel, Germany. 5. Department of Dermatology, University Hospital Munich (LMU), Munich, Germany. 6. Department of Dermatology, University Hospital Regensburg, Regensburg, Germany. 7. Department of Dermatology, University Hospital Essen, Essen, Germany. 8. Department of Dermatology, University Hospital Würzburg, Würzburg, Germany. 9. Division of Biostatistics, German Cancer Research Center, Heidelberg, Germany. 10. Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA. 11. National Center for Tumor Diseases, German Cancer Research Center, Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. Electronic address: titus.brinker@dkfz.de.
Abstract
BACKGROUND: In recent studies, convolutional neural networks (CNNs) outperformed dermatologists in distinguishing dermoscopic images of melanoma and nevi. In these studies, dermatologists and artificial intelligence were considered as opponents. However, the combination of classifiers frequently yields superior results, both in machine learning and among humans. In this study, we investigated the potential benefit of combining human and artificial intelligence for skin cancer classification. METHODS: Using 11,444 dermoscopic images, which were divided into five diagnostic categories, novel deep learning techniques were used to train a single CNN. Then, both 112 dermatologists of 13 German university hospitals and the trained CNN independently classified a set of 300 biopsy-verified skin lesions into those five classes. Taking into account the certainty of the decisions, the two independently determined diagnoses were combined to a new classifier with the help of a gradient boosting method. The primary end-point of the study was the correct classification of the images into five designated categories, whereas the secondary end-point was the correct classification of lesions as either benign or malignant (binary classification). FINDINGS: Regarding the multiclass task, the combination of man and machine achieved an accuracy of 82.95%. This was 1.36% higher than the best of the two individual classifiers (81.59% achieved by the CNN). Owing to the class imbalance in the binary problem, sensitivity, but not accuracy, was examined and demonstrated to be superior (89%) to the best individual classifier (CNN with 86.1%). The specificity in the combined classifier decreased from 89.2% to 84%. However, at an equal sensitivity of 89%, the CNN achieved a specificity of only 81.5% INTERPRETATION: Our findings indicate that the combination of human and artificial intelligence achieves superior results over the independent results of both of these systems.
BACKGROUND: In recent studies, convolutional neural networks (CNNs) outperformed dermatologists in distinguishing dermoscopic images of melanoma and nevi. In these studies, dermatologists and artificial intelligence were considered as opponents. However, the combination of classifiers frequently yields superior results, both in machine learning and among humans. In this study, we investigated the potential benefit of combining human and artificial intelligence for skin cancer classification. METHODS: Using 11,444 dermoscopic images, which were divided into five diagnostic categories, novel deep learning techniques were used to train a single CNN. Then, both 112 dermatologists of 13 German university hospitals and the trained CNN independently classified a set of 300 biopsy-verified skin lesions into those five classes. Taking into account the certainty of the decisions, the two independently determined diagnoses were combined to a new classifier with the help of a gradient boosting method. The primary end-point of the study was the correct classification of the images into five designated categories, whereas the secondary end-point was the correct classification of lesions as either benign or malignant (binary classification). FINDINGS: Regarding the multiclass task, the combination of man and machine achieved an accuracy of 82.95%. This was 1.36% higher than the best of the two individual classifiers (81.59% achieved by the CNN). Owing to the class imbalance in the binary problem, sensitivity, but not accuracy, was examined and demonstrated to be superior (89%) to the best individual classifier (CNN with 86.1%). The specificity in the combined classifier decreased from 89.2% to 84%. However, at an equal sensitivity of 89%, the CNN achieved a specificity of only 81.5% INTERPRETATION: Our findings indicate that the combination of human and artificial intelligence achieves superior results over the independent results of both of these systems.
Authors: Mark Lee Willingham; Shane Y P K Spencer; Christopher A Lum; Janira M Navarro Sanchez; Terrilea Burnett; John Shepherd; Kevin Cassel Journal: Melanoma Res Date: 2021-12-01 Impact factor: 3.599
Authors: Claire M Felmingham; Nikki R Adler; Zongyuan Ge; Rachael L Morton; Monika Janda; Victoria J Mar Journal: Am J Clin Dermatol Date: 2021-03 Impact factor: 7.403
Authors: Andrew Hope; Maikel Verduin; Thomas J Dilling; Ananya Choudhury; Rianne Fijten; Leonard Wee; Hugo Jwl Aerts; Issam El Naqa; Ross Mitchell; Marc Vooijs; Andre Dekker; Dirk de Ruysscher; Alberto Traverso Journal: Cancers (Basel) Date: 2021-05-14 Impact factor: 6.639
Authors: Titus J Brinker; Roman C Maron; Jochen S Utikal; Achim Hekler; Axel Hauschild; Elke Sattler; Wiebke Sondermann; Sebastian Haferkamp; Bastian Schilling; Markus V Heppt; Philipp Jansen; Markus Reinholz; Cindy Franklin; Laurenz Schmitt; Daniela Hartmann; Eva Krieghoff-Henning; Max Schmitt; Michael Weichenthal; Christof von Kalle; Stefan Fröhling Journal: J Med Internet Res Date: 2020-09-11 Impact factor: 5.428