Maximilian Pfau1,2, Guenther Walther3, Leon von der Emde4, Philipp Berens5,6, Livia Faes7,8, Monika Fleckenstein9, Tjebo F C Heeren8, Karsten Kortüm10,11, Sandrine H Künzel4, Philipp L Müller4,5, Peter M Maloca8,12,13, Sebastian M Waldstein14,15, Maximilian W M Wintergerst4, Steffen Schmitz-Valckenberg4,9, Robert P Finger4, Frank G Holz4. 1. Department of Biomedical Data Science, Stanford University, Medical School Office Building (MSOB), 1265 Welch Road, 94305-5479, Stanford, CA, USA. maximilian.pfau@ukbonn.de. 2. Universitäts-Augenklinik Bonn, Bonn, Deutschland. maximilian.pfau@ukbonn.de. 3. Department of Statistics, Stanford University, Stanford, USA. 4. Universitäts-Augenklinik Bonn, Bonn, Deutschland. 5. Forschungsinstitut für Augenheilkunde, Universität Tübingen, Tübingen, Deutschland. 6. Interfakultäres Institut für Bioinformatik und Medizininformatik, Universität Tübingen, Tübingen, Deutschland. 7. Augenklinik, Luzerner Kantonsspital, Luzern, Schweiz. 8. Moorfields Eye Hopsital NHS Foundation Trust, London, Großbritannien. 9. John A. Moran Eye Center, University of Utah, Salt Lake City, USA. 10. Augenklinik, Ludwig-Maximilians-Universität München, München, Deutschland. 11. Augenarztpraxis Dres. Kortüm, Ludwigsburg, Deutschland. 12. Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Schweiz. 13. OCTlab, Universitätsspital Basel, Basel, Schweiz. 14. Univ.-Klinik für Augenheilkunde und Optometrie, Medizinische Universität Wien, Wien, Österreich. 15. Department of Ophthalmology, Westmead Hospital, University of Sydney, Sydney, Australien.
Abstract
BACKGROUND: Empirical models have been an integral part of everyday clinical practice in ophthalmology since the introduction of the Sanders-Retzlaff-Kraff (SRK) formula. Recent developments in the field of statistical learning (artificial intelligence, AI) now enable an empirical approach to a wide range of ophthalmological questions with an unprecedented precision. OBJECTIVE: Which criteria must be considered for the evaluation of AI-related studies in ophthalmology? MATERIAL AND METHODS: Exemplary prediction of visual acuity (continuous outcome) and classification of healthy and diseased eyes (discrete outcome) using retrospectively compiled optical coherence tomography data (50 eyes of 50 patients, 50 healthy eyes of 50 subjects). The data were analyzed with nested cross-validation (for learning algorithm selection and hyperparameter optimization). RESULTS: Based on nested cross-validation for training, visual acuity could be predicted in the separate test data-set with a mean absolute error (MAE, 95% confidence interval, CI of 0.142 LogMAR [0.077; 0.207]). Healthy versus diseased eyes could be classified in the test data-set with an agreement of 0.92 (Cohen's kappa). The exemplary incorrect learning algorithm and variable selection resulted in an MAE for visual acuity prediction of 0.229 LogMAR [0.150; 0.309] for the test data-set. The drastic overfitting became obvious on comparison of the MAE with the null model MAE (0.235 LogMAR [0.148; 0.322]). CONCLUSION: Selection of an unsuitable measure of the goodness-of-fit, inadequate validation, or withholding of a null or reference model can obscure the actual goodness-of-fit of AI models. The illustrated pitfalls can help clinicians to identify such shortcomings.
BACKGROUND: Empirical models have been an integral part of everyday clinical practice in ophthalmology since the introduction of the Sanders-Retzlaff-Kraff (SRK) formula. Recent developments in the field of statistical learning (artificial intelligence, AI) now enable an empirical approach to a wide range of ophthalmological questions with an unprecedented precision. OBJECTIVE: Which criteria must be considered for the evaluation of AI-related studies in ophthalmology? MATERIAL AND METHODS: Exemplary prediction of visual acuity (continuous outcome) and classification of healthy and diseased eyes (discrete outcome) using retrospectively compiled optical coherence tomography data (50 eyes of 50 patients, 50 healthy eyes of 50 subjects). The data were analyzed with nested cross-validation (for learning algorithm selection and hyperparameter optimization). RESULTS: Based on nested cross-validation for training, visual acuity could be predicted in the separate test data-set with a mean absolute error (MAE, 95% confidence interval, CI of 0.142 LogMAR [0.077; 0.207]). Healthy versus diseased eyes could be classified in the test data-set with an agreement of 0.92 (Cohen's kappa). The exemplary incorrect learning algorithm and variable selection resulted in an MAE for visual acuity prediction of 0.229 LogMAR [0.150; 0.309] for the test data-set. The drastic overfitting became obvious on comparison of the MAE with the null model MAE (0.235 LogMAR [0.148; 0.322]). CONCLUSION: Selection of an unsuitable measure of the goodness-of-fit, inadequate validation, or withholding of a null or reference model can obscure the actual goodness-of-fit of AI models. The illustrated pitfalls can help clinicians to identify such shortcomings.
Entities:
Keywords:
Automated analysis; Deep learning; Empirical approach; Machine-learning; Statistical learning
Authors: Hrvoje Bogunovic; Sebastian M Waldstein; Thomas Schlegl; Georg Langs; Amir Sadeghipour; Xuhui Liu; Bianca S Gerendas; Aaron Osborne; Ursula Schmidt-Erfurth Journal: Invest Ophthalmol Vis Sci Date: 2017-06-01 Impact factor: 4.799
Authors: Maximilian Pfau; Guenther Walther; Leon von der Emde; Philipp Berens; Livia Faes; Monika Fleckenstein; Tjebo F C Heeren; Karsten Kortüm; Sandrine H Künzel; Philipp L Müller; Peter M Maloca; Sebastian M Waldstein; Maximilian W M Wintergerst; Steffen Schmitz-Valckenberg; Robert P Finger; Frank G Holz Journal: Ophthalmologe Date: 2020-10 Impact factor: 1.059
Authors: Leon von der Emde; Maximilian Pfau; Sarah Thiele; Philipp T Möller; Ruth Hassenrik; Monika Fleckenstein; Frank G Holz; Steffen Schmitz-Valckenberg Journal: Transl Vis Sci Technol Date: 2018-01-09 Impact factor: 3.283
Authors: Leon von der Emde; Maximilian Pfau; Chantal Dysli; Sarah Thiele; Philipp T Möller; Moritz Lindner; Matthias Schmid; Monika Fleckenstein; Frank G Holz; Steffen Schmitz-Valckenberg Journal: Sci Rep Date: 2019-07-31 Impact factor: 4.379
Authors: Jeffrey De Fauw; Joseph R Ledsam; Bernardino Romera-Paredes; Stanislav Nikolov; Nenad Tomasev; Sam Blackwell; Harry Askham; Xavier Glorot; Brendan O'Donoghue; Daniel Visentin; George van den Driessche; Balaji Lakshminarayanan; Clemens Meyer; Faith Mackinder; Simon Bouton; Kareem Ayoub; Reena Chopra; Dominic King; Alan Karthikesalingam; Cían O Hughes; Rosalind Raine; Julian Hughes; Dawn A Sim; Catherine Egan; Adnan Tufail; Hugh Montgomery; Demis Hassabis; Geraint Rees; Trevor Back; Peng T Khaw; Mustafa Suleyman; Julien Cornebise; Pearse A Keane; Olaf Ronneberger Journal: Nat Med Date: 2018-08-13 Impact factor: 53.440
Authors: Maximilian Pfau; Guenther Walther; Leon von der Emde; Philipp Berens; Livia Faes; Monika Fleckenstein; Tjebo F C Heeren; Karsten Kortüm; Sandrine H Künzel; Philipp L Müller; Peter M Maloca; Sebastian M Waldstein; Maximilian W M Wintergerst; Steffen Schmitz-Valckenberg; Robert P Finger; Frank G Holz Journal: Ophthalmologe Date: 2020-10 Impact factor: 1.059
Authors: Philipp L Müller; Alexandru Odainic; Tim Treis; Philipp Herrmann; Adnan Tufail; Frank G Holz; Maximilian Pfau Journal: Sci Rep Date: 2021-01-14 Impact factor: 4.379
Authors: Philipp L Müller; Bart Liefers; Tim Treis; Filipa Gomes Rodrigues; Abraham Olvera-Barrios; Bobby Paul; Narendra Dhingra; Andrew Lotery; Clare Bailey; Paul Taylor; Clarisa I Sánchez; Adnan Tufail Journal: Transl Vis Sci Technol Date: 2021-03-01 Impact factor: 3.283
Authors: Leon von der Emde; Maximilian Pfau; Frank G Holz; Monika Fleckenstein; Karsten Kortuem; Pearse A Keane; Daniel L Rubin; Steffen Schmitz-Valckenberg Journal: Eye (Lond) Date: 2021-03-25 Impact factor: 3.775