Lin Lawrence Guo1, Stephen R Pfohl2, Jason Fries2, Jose Posada2, Scott Lanyon Fleming2, Catherine Aftandilian3, Nigam Shah2, Lillian Sung1,4. 1. Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, Canada. 2. Biomedical Informatics Research, Stanford University, Palo Alto, California, United States. 3. Division of Pediatric Hematology/Oncology, Stanford University, Palo Alto, United States. 4. Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, Canada.
Abstract
OBJECTIVE: The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. METHODS: Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. RESULTS: Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. CONCLUSION: There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings. Thieme. All rights reserved.
OBJECTIVE: The change in performance of machine learning models over time as a result of temporal dataset shift is a barrier to machine learning-derived models facilitating decision-making in clinical practice. Our aim was to describe technical procedures used to preserve the performance of machine learning models in the presence of temporal dataset shifts. METHODS: Studies were included if they were fully published articles that used machine learning and implemented a procedure to mitigate the effects of temporal dataset shift in a clinical setting. We described how dataset shift was measured, the procedures used to preserve model performance, and their effects. RESULTS: Of 4,457 potentially relevant publications identified, 15 were included. The impact of temporal dataset shift was primarily quantified using changes, usually deterioration, in calibration or discrimination. Calibration deterioration was more common (n = 11) than discrimination deterioration (n = 3). Mitigation strategies were categorized as model level or feature level. Model-level approaches (n = 15) were more common than feature-level approaches (n = 2), with the most common approaches being model refitting (n = 12), probability calibration (n = 7), model updating (n = 6), and model selection (n = 6). In general, all mitigation strategies were successful at preserving calibration but not uniformly successful in preserving discrimination. CONCLUSION: There was limited research in preserving the performance of machine learning models in the presence of temporal dataset shift in clinical medicine. Future research could focus on the impact of dataset shift on clinical decision making, benchmark the mitigation strategies on a wider range of datasets and tasks, and identify optimal strategies for specific settings. Thieme. All rights reserved.
Authors: Sharon E Davis; Robert A Greevy; Christopher Fonnesbeck; Thomas A Lasko; Colin G Walsh; Michael E Matheny Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497
Authors: Sabrina Siregar; Daan Nieboer; Michel I M Versteegh; Ewout W Steyerberg; Johanna J M Takkenberg Journal: Interact Cardiovasc Thorac Surg Date: 2019-03-01
Authors: Andreas N Strobl; Andrew J Vickers; Ben Van Calster; Ewout Steyerberg; Robin J Leach; Ian M Thompson; Donna P Ankerst Journal: J Biomed Inform Date: 2015-05-16 Impact factor: 6.317
Authors: Carlos Sáez; Alba Gutiérrez-Sacristán; Isaac Kohane; Juan M García-Gómez; Paul Avillach Journal: Gigascience Date: 2020-08-01 Impact factor: 6.524
Authors: Lin Lawrence Guo; Stephen R Pfohl; Jason Fries; Alistair E W Johnson; Jose Posada; Catherine Aftandilian; Nigam Shah; Lillian Sung Journal: Sci Rep Date: 2022-02-17 Impact factor: 4.379