Sharon E Davis1, Jeremiah R Brown2,3, Chad Dorn1, Dax Westerman1, Richard J Solomon4, Michael E Matheny1,5,6,7. 1. Departments of Biomedical Informatics (S.E.D., C.D., D.W., M.E.M.), Vanderbilt University Medical Center, Nashville, TN. 2. Departments of Epidemiology (J.R.B.), Dartmouth Geisel School of Medicine, Hanover, NH. 3. Biomedical Data Science (J.R.B.), Dartmouth Geisel School of Medicine, Hanover, NH. 4. Department of Medicine, Larner College of Medicine, University of Vermont, Burlington (R.J.S.). 5. Biostatistics (M.E.M.), Vanderbilt University Medical Center, Nashville, TN. 6. Medicine (M.E.M.), Vanderbilt University Medical Center, Nashville, TN. 7. Tennessee Valley Healthcare System VA Medical Center, Veterans Health Administration, Nashville, TN (M.E.M.).
Abstract
BACKGROUND: The utility of quality dashboards to inform decision-making and improve clinical outcomes is tightly linked to the accuracy of the information they provide and, in turn, accuracy of underlying prediction models. Despite recognition of the need to update prediction models to maintain accuracy over time, there is limited guidance on updating strategies. We compare predefined and surveillance-based updating strategies applied to a model supporting quality evaluations among US veterans. METHODS: We evaluated the performance of a US Department of Veterans Affairs-specific model for postcardiac catheterization acute kidney injury using routinely collected observational data over the 6 years following model development (n=90 295 procedures in 2013-2019). Predicted probabilities were generated from the original model, an annually retrained model, and a surveillance-based approach that monitored performance to inform the timing and method of updates. We evaluated how updating the national model impacted regional quality profiles. We compared observed-to-expected outcome ratios, where values above and below 1 indicated more and fewer adverse outcomes than expected, respectively. RESULTS: The original model overpredicted risk at the national level (observed-to-expected outcome ratio, 0.75 [0.74-0.77]). Annual retraining updated the model 5×; surveillance-based updating retrained once and recalibrated twice. While both strategies improved performance, the surveillance-based approach provided superior calibration (observed-to-expected outcome ratio, 1.01 [0.99-1.03] versus 0.94 [0.92-0.96]). Overprediction by the original model led to optimistic quality assessments, incorrectly indicating most of the US Department of Veterans Affairs' 18 regions observed fewer acute kidney injury events than predicted. Both updating strategies revealed 16 regions performed as expected and 2 regions increasingly underperformed, having more acute kidney injury events than predicted. CONCLUSIONS: Miscalibrated clinical prediction models provide inaccurate pictures of performance across clinical units, and degrading calibration further complicates our understanding of quality. Updating strategies tailored to health system needs and capacity should be incorporated into model implementation plans to promote the utility and longevity of quality reporting tools.
BACKGROUND: The utility of quality dashboards to inform decision-making and improve clinical outcomes is tightly linked to the accuracy of the information they provide and, in turn, accuracy of underlying prediction models. Despite recognition of the need to update prediction models to maintain accuracy over time, there is limited guidance on updating strategies. We compare predefined and surveillance-based updating strategies applied to a model supporting quality evaluations among US veterans. METHODS: We evaluated the performance of a US Department of Veterans Affairs-specific model for postcardiac catheterization acute kidney injury using routinely collected observational data over the 6 years following model development (n=90 295 procedures in 2013-2019). Predicted probabilities were generated from the original model, an annually retrained model, and a surveillance-based approach that monitored performance to inform the timing and method of updates. We evaluated how updating the national model impacted regional quality profiles. We compared observed-to-expected outcome ratios, where values above and below 1 indicated more and fewer adverse outcomes than expected, respectively. RESULTS: The original model overpredicted risk at the national level (observed-to-expected outcome ratio, 0.75 [0.74-0.77]). Annual retraining updated the model 5×; surveillance-based updating retrained once and recalibrated twice. While both strategies improved performance, the surveillance-based approach provided superior calibration (observed-to-expected outcome ratio, 1.01 [0.99-1.03] versus 0.94 [0.92-0.96]). Overprediction by the original model led to optimistic quality assessments, incorrectly indicating most of the US Department of Veterans Affairs' 18 regions observed fewer acute kidney injury events than predicted. Both updating strategies revealed 16 regions performed as expected and 2 regions increasingly underperformed, having more acute kidney injury events than predicted. CONCLUSIONS: Miscalibrated clinical prediction models provide inaccurate pictures of performance across clinical units, and degrading calibration further complicates our understanding of quality. Updating strategies tailored to health system needs and capacity should be incorporated into model implementation plans to promote the utility and longevity of quality reporting tools.
Authors: Sharon E Davis; Robert A Greevy; Christopher Fonnesbeck; Thomas A Lasko; Colin G Walsh; Michael E Matheny Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497
Authors: Emelia J Benjamin; Michael J Blaha; Stephanie E Chiuve; Mary Cushman; Sandeep R Das; Rajat Deo; Sarah D de Ferranti; James Floyd; Myriam Fornage; Cathleen Gillespie; Carmen R Isasi; Monik C Jiménez; Lori Chaffin Jordan; Suzanne E Judd; Daniel Lackland; Judith H Lichtman; Lynda Lisabeth; Simin Liu; Chris T Longenecker; Rachel H Mackey; Kunihiro Matsushita; Dariush Mozaffarian; Michael E Mussolino; Khurram Nasir; Robert W Neumar; Latha Palaniappan; Dilip K Pandey; Ravi R Thiagarajan; Mathew J Reeves; Matthew Ritchey; Carlos J Rodriguez; Gregory A Roth; Wayne D Rosamond; Comilla Sasson; Amytis Towfighi; Connie W Tsao; Melanie B Turner; Salim S Virani; Jenifer H Voeks; Joshua Z Willey; John T Wilkins; Jason Hy Wu; Heather M Alger; Sally S Wong; Paul Muntner Journal: Circulation Date: 2017-01-25 Impact factor: 29.690
Authors: Glenn M Chertow; Elisabeth Burdick; Melissa Honour; Joseph V Bonventre; David W Bates Journal: J Am Soc Nephrol Date: 2005-09-21 Impact factor: 10.121
Authors: Jeremiah R Brown; John F Robb; Clay A Block; Anton C Schoolwerth; Aaron V Kaplan; Gerald T O'Connor; Richard J Solomon; David J Malenka Journal: Circ Cardiovasc Interv Date: 2010-06-29 Impact factor: 6.546
Authors: Ewout W Steyerberg; Andrew J Vickers; Nancy R Cook; Thomas Gerds; Mithat Gonen; Nancy Obuchowski; Michael J Pencina; Michael W Kattan Journal: Epidemiology Date: 2010-01 Impact factor: 4.822
Authors: Jeremiah R Brown; David J Malenka; James T DeVries; John F Robb; John E Jayne; Bruce J Friedman; Bruce D Hettleman; Nathaniel W Niles; Aaron V Kaplan; Anton C Schoolwerth; Craig A Thompson Journal: Catheter Cardiovasc Interv Date: 2008-09-01 Impact factor: 2.692
Authors: Jeremiah R Brown; Richard J Solomon; Mark J Sarnak; Peter A McCullough; Mark E Splaine; Louise Davies; Cathy S Ross; Harold L Dauerman; Janette L Stender; Sheila M Conley; John F Robb; Kristine Chaisson; Richard Boss; Peggy Lambert; David J Goldberg; Deborah Lucier; Frank A Fedele; Mirle A Kellett; Susan Horton; William J Phillips; Cynthia Downs; Alan Wiseman; Todd A MacKenzie; David J Malenka Journal: Circ Cardiovasc Qual Outcomes Date: 2014-07-29