Roman C Maron1, Sarah Haggenmüller1, Christof von Kalle2, Jochen S Utikal3, Friedegund Meier4, Frank F Gellrich4, Axel Hauschild5, Lars E French6, Max Schlaak7, Kamran Ghoreschi8, Heinz Kutzner9, Markus V Heppt10, Sebastian Haferkamp11, Wiebke Sondermann12, Dirk Schadendorf12, Bastian Schilling13, Achim Hekler1, Eva Krieghoff-Henning1, Jakob N Kather14, Stefan Fröhling15, Daniel B Lipka15, Titus J Brinker16. 1. Digital Biomarkers for Oncology Group, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany. 2. Department of Clinical-Translational Sciences, Charité University Medicine and Berlin Institute of Health (BIH), Berlin, Germany. 3. Department of Dermatology, Heidelberg University, Mannheim, Germany; Skin Cancer Unit, German Cancer Research Center (DKFZ), Heidelberg, Germany. 4. Skin Cancer Center at the University Cancer Centre and National Center for Tumor Diseases Dresden, Department of Dermatology, University Hospital Carl Gustav Carus, Technische Universität Dresden, Germany. 5. Department of Dermatology, University Hospital (UKSH), Kiel, Germany. 6. Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany; Dr. Phillip Frost Department of Dermatology and Cutaneous Surgery, University of Miami, Miller School of Medicine, Miami, FL, USA. 7. Department of Dermatology and Allergy, University Hospital, LMU Munich, Munich, Germany. 8. Department of Dermatology, Venereology and Allergology, Charité - Universitätsmedizin Berlin, Berlin, Germany. 9. Dermatopathology Laboratory, Friedrichshafen, Germany. 10. Department of Dermatology, University Hospital Erlangen, Erlangen, Germany. 11. Department of Dermatology, University Hospital Regensburg, Regensburg, Germany. 12. Department of Dermatology, University Hospital Essen, Essen, Germany. 13. Department of Dermatology, University Hospital Würzburg, Würzburg, Germany. 14. Division of Translational Medical Oncology, German Cancer Research Center (DKFZ), Heidelberg, Germany. 15. National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany. 16. Digital Biomarkers for Oncology Group, National Center for Tumor Diseases (NCT), German Cancer Research Center (DKFZ), Heidelberg, Germany. Electronic address: titus.brinker@dkfz.de.
Abstract
BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.
BACKGROUND: A basic requirement for artificial intelligence (AI)-based image analysis systems, which are to be integrated into clinical practice, is a high robustness. Minor changes in how those images are acquired, for example, during routine skin cancer screening, should not change the diagnosis of such assistance systems. OBJECTIVE: To quantify to what extent minor image perturbations affect the convolutional neural network (CNN)-mediated skin lesion classification and to evaluate three possible solutions for this problem (additional data augmentation, test-time augmentation, anti-aliasing). METHODS: We trained three commonly used CNN architectures to differentiate between dermoscopic melanoma and nevus images. Subsequently, their performance and susceptibility to minor changes ('brittleness') was tested on two distinct test sets with multiple images per lesion. For the first set, image changes, such as rotations or zooms, were generated artificially. The second set contained natural changes that stemmed from multiple photographs taken of the same lesions. RESULTS: All architectures exhibited brittleness on the artificial and natural test set. The three reviewed methods were able to decrease brittleness to varying degrees while still maintaining performance. The observed improvement was greater for the artificial than for the natural test set, where enhancements were minor. CONCLUSIONS: Minor image changes, relatively inconspicuous for humans, can have an effect on the robustness of CNNs differentiating skin lesions. By the methods tested here, this effect can be reduced, but not fully eliminated. Thus, further research to sustain the performance of AI classifiers is needed to facilitate the translation of such systems into the clinic.
Authors: Marc Combalia; Noel Codella; Veronica Rotemberg; Cristina Carrera; Stephen Dusza; David Gutman; Brian Helba; Harald Kittler; Nicholas R Kurtansky; Konstantinos Liopyris; Michael A Marchetti; Sebastian Podlipnik; Susana Puig; Christoph Rinner; Philipp Tschandl; Jochen Weber; Allan Halpern; Josep Malvehy Journal: Lancet Digit Health Date: 2022-05