Seung Seog Han1, Young Jae Kim2, Ik Jun Moon2, Joon Min Jung2, Mi Young Lee2, Woo Jin Lee2, Chong Hyun Won2, Mi Woo Lee2, Seong Hwan Kim3, Cristian Navarrete-Dechent4, Sung Eun Chang5. 1. Department of Dermatology, I Dermatology Clinic, Seoul, Korea; IDerma, Inc, Seoul, Korea. 2. Department of Dermatology, Asan Medical Center, Ulsan University College of Medicine, Seoul, Korea. 3. Department of Plastic and Reconstructive Surgery, Kangnam Sacred Heart Hospital, Hallym University College of Medicine, Seoul, Korea. 4. Department of Dermatology, School of Medicine, Pontificia Universidad Catolica de Chile, Santiago, Chile. 5. Department of Dermatology, Asan Medical Center, Ulsan University College of Medicine, Seoul, Korea. Electronic address: csesnumd@gmail.com.
Abstract
TRIAL DESIGN: This was a single-center, unmasked, paralleled, randomized controlled trial. METHODS: A randomized trial was conducted in a tertiary care institute in South Korea to validate whether artificial intelligence (AI) could augment the accuracy of nonexpert physicians in the real-world settings, which included diverse out-of-distribution conditions. Consecutive patients aged >19 years, having one or more skin lesions suspicious for skin cancer detected by either the patient or physician, were randomly allocated to four nondermatology trainees and four dermatology residents. The attending dermatologists examined the randomly allocated patients with (AI-assisted group) or without (unaided group) the real-time assistance of AI algorithm (https://b2020.modelderm.com#world; convolutional neural networks; unmasked design) after simple randomization of the patients. RESULTS: Using 576 consecutive cases (Fitzpatrick skin phototypes III or IV) with suspicious lesions out of the initial 603 recruitments, the accuracy of the AI-assisted group (n = 295, 53.9%) was found to be significantly higher than those of the unaided group (n = 281, 43.8%; P = 0.019). Whereas the augmentation was more significant from 54.7% (n = 150) to 30.7% (n = 138; P < 0.0001) in the nondermatology trainees who had the least experience in dermatology, it was not significant in the dermatology residents. The algorithm could help trainees in the AI-assisted group include more differential diagnoses than the unaided group (2.09 vs. 1.95 diagnoses; P = 0.0005). However, a 12.2% drop in Top-1 accuracy of the trainees was observed in cases in which all Top-3 predictions given by the algorithm were incorrect. CONCLUSIONS: The multiclass AI algorithm augmented the diagnostic accuracy of nonexpert physicians in dermatology.
TRIAL DESIGN: This was a single-center, unmasked, paralleled, randomized controlled trial. METHODS: A randomized trial was conducted in a tertiary care institute in South Korea to validate whether artificial intelligence (AI) could augment the accuracy of nonexpert physicians in the real-world settings, which included diverse out-of-distribution conditions. Consecutive patients aged >19 years, having one or more skin lesions suspicious for skin cancer detected by either the patient or physician, were randomly allocated to four nondermatology trainees and four dermatology residents. The attending dermatologists examined the randomly allocated patients with (AI-assisted group) or without (unaided group) the real-time assistance of AI algorithm (https://b2020.modelderm.com#world; convolutional neural networks; unmasked design) after simple randomization of the patients. RESULTS: Using 576 consecutive cases (Fitzpatrick skin phototypes III or IV) with suspicious lesions out of the initial 603 recruitments, the accuracy of the AI-assisted group (n = 295, 53.9%) was found to be significantly higher than those of the unaided group (n = 281, 43.8%; P = 0.019). Whereas the augmentation was more significant from 54.7% (n = 150) to 30.7% (n = 138; P < 0.0001) in the nondermatology trainees who had the least experience in dermatology, it was not significant in the dermatology residents. The algorithm could help trainees in the AI-assisted group include more differential diagnoses than the unaided group (2.09 vs. 1.95 diagnoses; P = 0.0005). However, a 12.2% drop in Top-1 accuracy of the trainees was observed in cases in which all Top-3 predictions given by the algorithm were incorrect. CONCLUSIONS: The multiclass AI algorithm augmented the diagnostic accuracy of nonexpert physicians in dermatology.
Authors: Roxana Daneshjou; Kailas Vodrahalli; Roberto A Novoa; Melissa Jenkins; Weixin Liang; Veronica Rotemberg; Justin Ko; Susan M Swetter; Elizabeth E Bailey; Olivier Gevaert; Pritam Mukherjee; Michelle Phung; Kiana Yekrang; Bradley Fong; Rachna Sahasrabudhe; Johan A C Allerup; Utako Okata-Karigane; James Zou; Albert S Chiou Journal: Sci Adv Date: 2022-08-12 Impact factor: 14.957