Titus J Brinker1, Achim Hekler2, Axel Hauschild3, Carola Berking4, Bastian Schilling5, Alexander H Enk6, Sebastian Haferkamp7, Ante Karoglan8, Christof von Kalle2, Michael Weichenthal3, Elke Sattler4, Dirk Schadendorf9, Maria R Gaiser10, Joachim Klode9, Jochen S Utikal10. 1. National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany; Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. Electronic address: titus.brinker@nct-heidelberg.de. 2. National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Im Neuenheimer Feld 460, 69120 Heidelberg, Germany. 3. Department of Dermatology, University Hospital Kiel, Kiel, Germany. 4. Department of Dermatology, University Hospital Munich (LMU), Munich, Germany. 5. Department of Dermatology, University Hospital Wuerzburg, Wuerzburg, Germany. 6. Department of Dermatology, University Hospital Heidelberg, Heidelberg, Germany. 7. Department of Dermatology, University Hospital Regensburg, Regensburg, Germany. 8. Department of Dermatology, University Hospital Magdeburg, Magdeburg, Germany. 9. Department of Dermatology, University Hospital Essen, Essen, Germany. 10. Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany.
Abstract
BACKGROUND: Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field. METHODS: An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20 biopsy-verified melanoma images, each), all open-source. The questionnaire recorded factors such as the years of experience in dermatology, performed skin checks, age, sex and the rank within the university hospital or the status as resident physician. For each image, the dermatologists were asked to provide a management decision (treat/biopsy lesion or reassure the patient). Main outcome measures were sensitivity, specificity and the receiver operating characteristics (ROC). RESULTS: Total 157 dermatologists assessed all 100 dermoscopic images with an overall sensitivity of 74.1%, specificity of 60.0% and an ROC of 0.67 (range = 0.538-0.769); 145 dermatologists assessed all 100 clinical images with an overall sensitivity of 89.4%, specificity of 64.4% and an ROC of 0.769 (range = 0.613-0.9). Results between test-sets were significantly different (P < 0.05) confirming the need for a standardised benchmark. CONCLUSIONS: We present the first public melanoma classification benchmark for both non-dermoscopic and dermoscopic images for comparing artificial intelligence algorithms with diagnostic performance of 145 or 157 dermatologists. Melanoma Classification Benchmark should be considered as a reference standard for white-skinned Western populations in the field of binary algorithmic melanoma classification.
BACKGROUND: Several recent publications have demonstrated the use of convolutional neural networks to classify images of melanoma at par with board-certified dermatologists. However, the non-availability of a public human benchmark restricts the comparability of the performance of these algorithms and thereby the technical progress in this field. METHODS: An electronic questionnaire was sent to dermatologists at 12 German university hospitals. Each questionnaire comprised 100 dermoscopic and 100 clinical images (80 nevi images and 20 biopsy-verified melanoma images, each), all open-source. The questionnaire recorded factors such as the years of experience in dermatology, performed skin checks, age, sex and the rank within the university hospital or the status as resident physician. For each image, the dermatologists were asked to provide a management decision (treat/biopsy lesion or reassure the patient). Main outcome measures were sensitivity, specificity and the receiver operating characteristics (ROC). RESULTS: Total 157 dermatologists assessed all 100 dermoscopic images with an overall sensitivity of 74.1%, specificity of 60.0% and an ROC of 0.67 (range = 0.538-0.769); 145 dermatologists assessed all 100 clinical images with an overall sensitivity of 89.4%, specificity of 64.4% and an ROC of 0.769 (range = 0.613-0.9). Results between test-sets were significantly different (P < 0.05) confirming the need for a standardised benchmark. CONCLUSIONS: We present the first public melanoma classification benchmark for both non-dermoscopic and dermoscopic images for comparing artificial intelligence algorithms with diagnostic performance of 145 or 157 dermatologists. Melanoma Classification Benchmark should be considered as a reference standard for white-skinned Western populations in the field of binary algorithmic melanoma classification.
Authors: Sarah Graham; Colin Depp; Ellen E Lee; Camille Nebeker; Xin Tu; Ho-Cheol Kim; Dilip V Jeste Journal: Curr Psychiatry Rep Date: 2019-11-07 Impact factor: 5.285
Authors: Sarah A Graham; Ellen E Lee; Dilip V Jeste; Ryan Van Patten; Elizabeth W Twamley; Camille Nebeker; Yasunori Yamada; Ho-Cheol Kim; Colin A Depp Journal: Psychiatry Res Date: 2019-12-09 Impact factor: 3.222
Authors: Achim Hekler; Jakob N Kather; Eva Krieghoff-Henning; Jochen S Utikal; Friedegund Meier; Frank F Gellrich; Julius Upmeier Zu Belzen; Lars French; Justin G Schlager; Kamran Ghoreschi; Tabea Wilhelm; Heinz Kutzner; Carola Berking; Markus V Heppt; Sebastian Haferkamp; Wiebke Sondermann; Dirk Schadendorf; Bastian Schilling; Benjamin Izar; Roman Maron; Max Schmitt; Stefan Fröhling; Daniel B Lipka; Titus J Brinker Journal: Front Med (Lausanne) Date: 2020-05-06
Authors: Albert T Young; Kristen Fernandez; Jacob Pfau; Rasika Reddy; Nhat Anh Cao; Max Y von Franque; Arjun Johal; Benjamin V Wu; Rachel R Wu; Jennifer Y Chen; Raj P Fadadu; Juan A Vasquez; Andrew Tam; Michael J Keiser; Maria L Wei Journal: NPJ Digit Med Date: 2021-01-21
Authors: Julia Höhn; Achim Hekler; Eva Krieghoff-Henning; Jakob Nikolas Kather; Jochen Sven Utikal; Friedegund Meier; Frank Friedrich Gellrich; Axel Hauschild; Lars French; Justin Gabriel Schlager; Kamran Ghoreschi; Tabea Wilhelm; Heinz Kutzner; Markus Heppt; Sebastian Haferkamp; Wiebke Sondermann; Dirk Schadendorf; Bastian Schilling; Roman C Maron; Max Schmitt; Tanja Jutzi; Stefan Fröhling; Daniel B Lipka; Titus Josef Brinker Journal: J Med Internet Res Date: 2021-07-02 Impact factor: 5.428