Philipp Tschandl1, Noel Codella2, Bengü Nisa Akay3, Giuseppe Argenziano4, Ralph P Braun5, Horacio Cabo6, David Gutman7, Allan Halpern8, Brian Helba9, Rainer Hofmann-Wellenhof10, Aimilios Lallas11, Jan Lapins12, Caterina Longo13, Josep Malvehy14, Michael A Marchetti8, Ashfaq Marghoob15, Scott Menzies16, Amanda Oakley17, John Paoli18, Susana Puig14, Christoph Rinner19, Cliff Rosendahl20, Alon Scope21, Christoph Sinz1, H Peter Soyer22, Luc Thomas23, Iris Zalaudek24, Harald Kittler25. 1. ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria. 2. IBM Research AI, T J Watson Research Center, Yorktown Heights, NY, USA. 3. Department of Dermatology, Medicine Faculty, Ankara University, Ankara, Turkey. 4. Dermatology Unit, University of Campania, Naples, Italy. 5. Skin Cancer Center, Department of Dermatology, University Hospital Zürich, Zürich, Switzerland. 6. Department of Dermatology, Instituto de Investigaciones Médicas, Buenos Aires, Argentina. 7. Department of Neurology, Emory University School of Medicine, Atlanta, GA, USA. 8. Dermatology Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 9. Kitware, Clifton Park, NY, USA. 10. Department of Dermatology, Medical University Graz, Graz, Austria. 11. First Department of Dermatology, Aristotle University, Thessaloniki, Greece. 12. Department of Dermatology, Karolinska University Hospital and Karolinska Institutet, Stockholm, Sweden. 13. Department of Dermatology, University of Modena and Reggio Emilia, Modena, Italy; Azienda Unità Sanitaria Locale-IRCCS di Reggio Emilia, Centro Oncologico ad Alta Tecnologia Diagnostica-Dermatologia, Reggio Emilia, Italy. 14. Melanoma Unit, Dermatology Department, Hospital Clínic Barcelona, Universitat de Barcelona, IDIBAPS, Barcelona, Spain; Centro de Investigación Biomédica en Red de Enfermedades Rarasd (CIBER ER), Instituto de Salud Carlos III, Barcelona, Spain. 15. Memorial Sloan Kettering Cancer Center, Hauppauge, NY, USA. 16. Sydney Melanoma Diagnostic Centre & Sydney Medical School, Faculty of Medicine and Health, The University of Sydney, Sydney, NSW, Australia. 17. Department of Dermatology, Waikato District Health Board and Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand. 18. Department of Dermatology and Venereology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden. 19. Center for Medical Statistics, Informatics and Intelligent Systems (CeMSIIS), Medical University of Vienna, Vienna, Austria. 20. School of Clinical Medicine, University of Queensland, University of Queensland, Brisbane, QLD, Australia. 21. Medical Screening Institute, Sheba Medical Center and Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel. 22. Dermatology Research Centre, The University of Queensland Diamantina Institute, University of Queensland, Brisbane, QLD, Australia. 23. Department of Dermatology, Hospitalier Lyon Sud, Lyon, France; Lyon Cancer Research Center INSERM U1052-CNRS UMR5286, Lyon, France; Lyon 1 University, Lyon, France. 24. Dermatology Clinic, Maggiore Hospital, University of Trieste, Trieste, Italy. 25. ViDIR Group, Department of Dermatology, Medical University of Vienna, Vienna, Austria. Electronic address: harald.kittler@meduniwien.ac.at.
Abstract
BACKGROUND: Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. METHODS: For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. FINDINGS: Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). INTERPRETATION: State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. FUNDING: None.
BACKGROUND: Whether machine-learning algorithms can diagnose all pigmented skin lesions as accurately as human experts is unclear. The aim of this study was to compare the diagnostic accuracy of state-of-the-art machine-learning algorithms with human readers for all clinically relevant types of benign and malignant pigmented skin lesions. METHODS: For this open, web-based, international, diagnostic study, human readers were asked to diagnose dermatoscopic images selected randomly in 30-image batches from a test set of 1511 images. The diagnoses from human readers were compared with those of 139 algorithms created by 77 machine-learning labs, who participated in the International Skin Imaging Collaboration 2018 challenge and received a training set of 10 015 images in advance. The ground truth of each lesion fell into one of seven predefined disease categories: intraepithelial carcinoma including actinic keratoses and Bowen's disease; basal cell carcinoma; benign keratinocytic lesions including solar lentigo, seborrheic keratosis and lichen planus-like keratosis; dermatofibroma; melanoma; melanocytic nevus; and vascular lesions. The two main outcomes were the differences in the number of correct specific diagnoses per batch between all human readers and the top three algorithms, and between human experts and the top three algorithms. FINDINGS: Between Aug 4, 2018, and Sept 30, 2018, 511 human readers from 63 countries had at least one attempt in the reader study. 283 (55·4%) of 511 human readers were board-certified dermatologists, 118 (23·1%) were dermatology residents, and 83 (16·2%) were general practitioners. When comparing all human readers with all machine-learning algorithms, the algorithms achieved a mean of 2·01 (95% CI 1·97 to 2·04; p<0·0001) more correct diagnoses (17·91 [SD 3·42] vs 19·92 [4·27]). 27 human experts with more than 10 years of experience achieved a mean of 18·78 (SD 3·15) correct answers, compared with 25·43 (1·95) correct answers for the top three machine algorithms (mean difference 6·65, 95% CI 6·06-7·25; p<0·0001). The difference between human experts and the top three algorithms was significantly lower for images in the test set that were collected from sources not included in the training set (human underperformance of 11·4%, 95% CI 9·9-12·9 vs 3·6%, 0·8-6·3; p<0·0001). INTERPRETATION: State-of-the-art machine-learning classifiers outperformed human experts in the diagnosis of pigmented skin lesions and should have a more important role in clinical practice. However, a possible limitation of these algorithms is their decreased performance for out-of-distribution images, which should be addressed in future research. FUNDING: None.
Authors: Scott W Menzies; Leanne Bischof; Hugues Talbot; Alex Gutenev; Michelle Avramidis; Livian Wong; Sing Kai Lo; Geoffrey Mackellar; Victor Skladnev; William McCarthy; John Kelly; Brad Cranney; Peter Lye; Harold Rabinovitz; Margaret Oliviero; Andreas Blum; Alexandra Varol; Alexandra Virol; Brian De'Ambrosis; Roderick McCleod; Hiroshi Koga; Caron Grin; Ralph Braun; Robert Johr Journal: Arch Dermatol Date: 2005-11
Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun Journal: Nature Date: 2017-01-25 Impact factor: 49.962
Authors: Gary Monheit; Armand B Cognetta; Laura Ferris; Harold Rabinovitz; Kenneth Gross; Mary Martini; James M Grichnik; Martin Mihm; Victor G Prieto; Paul Googe; Roy King; Alicia Toledano; Nikolai Kabelev; Maciej Wojton; Dina Gutkowicz-Krusin Journal: Arch Dermatol Date: 2010-10-18
Authors: Michael A Marchetti; Noel C F Codella; Stephen W Dusza; David A Gutman; Brian Helba; Aadi Kalloo; Nabin Mishra; Cristina Carrera; M Emre Celebi; Jennifer L DeFazio; Natalia Jaimes; Ashfaq A Marghoob; Elizabeth Quigley; Alon Scope; Oriol Yélamos; Allan C Halpern Journal: J Am Acad Dermatol Date: 2017-09-29 Impact factor: 11.527
Authors: J Malvehy; A Hauschild; C Curiel-Lewandrowski; P Mohr; R Hofmann-Wellenhof; R Motley; C Berking; D Grossman; J Paoli; C Loquai; J Olah; U Reinhold; H Wenger; T Dirschka; S Davis; C Henderson; H Rabinovitz; J Welzel; D Schadendorf; U Birgersson Journal: Br J Dermatol Date: 2014-10-19 Impact factor: 9.302
Authors: Niels Kvorning Ternov; T Vestergaard; L Rosenkrantz Hölmich; K Karmisholt; A L Wagenblast; H Klyver; M Hald; L Schøllhammer; L Konge; A H Chakera Journal: Arch Dermatol Res Date: 2020-06-28 Impact factor: 3.017
Authors: Michael A Marchetti; Ashley Yu; Japbani Nanda; Philipp Tschandl; Harald Kittler; Ashfaq A Marghoob; Allan C Halpern; Stephen W Dusza Journal: J Am Acad Dermatol Date: 2020-04-29 Impact factor: 11.527
Authors: Yao Zhang; Kamil Ali; Jacob A George; Jason S Reichenberg; Matthew C Fox; Adewole S Adamson; James W Tunnell; Mia K Markey Journal: J Med Imaging (Bellingham) Date: 2021-02-10
Authors: Claire M Felmingham; Nikki R Adler; Zongyuan Ge; Rachael L Morton; Monika Janda; Victoria J Mar Journal: Am J Clin Dermatol Date: 2021-03 Impact factor: 7.403