Gijs F N Berkelmans1, Stephanie H Read2, Soffia Gudbjörnsdottir3, Sarah H Wild4, Stefan Franzen3, Yolanda van der Graaf5, Björn Eliasson6, Frank L J Visseren7, Nina P Paynter8, Jannick A N Dorresteijn1. 1. Department of Vascular Medicine, University Medical Center Utrecht, the Netherlands. 2. Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK and on behalf of the Scottish Diabetes Research Network epidemiology group; Women's College Research Institute, Canada. 3. Swedish National Diabetes Register, Center of Registers in Region, Gothenburg, Sweden. 4. Usher Institute, University of Edinburgh, Edinburgh, Scotland, UK and on behalf of the Scottish Diabetes Research Network epidemiology group. 5. Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands. 6. Swedish National Diabetes Register, Center of Registers in Region, Gothenburg, Sweden; Department of Molecular and Clinical Medicine, Institute of Medicine, University of Gothenburg, Gothenburg, Sweden. 7. Department of Vascular Medicine, University Medical Center Utrecht, the Netherlands. Electronic address: f.l.j.visseren@umcutrecht.nl. 8. Harvard Medical School, Brigham & Women's Hospital, Boston, USA.
Abstract
OBJECTIVES: To compare the validity and robustness of five methods for handling missing characteristics when using cardiovascular disease risk prediction models for individual patients in a real-world clinical setting. STUDY DESIGN AND SETTING: The performance of the missing data methods was assessed using data from the Swedish National Diabetes Registry (n = 419,533) with external validation using the Scottish Care Information - diabetes database (n = 226,953). Five methods for handling missing data were compared. Two methods using submodels for each combination of available data, two imputation methods: conditional imputation and median imputation, and one alternative modeling method, called the naïve approach, based on hazard ratios and populations statistics of known risk factors only. The validity was compared using calibration plots and c-statistics. RESULTS: C-statistics were similar across methods in both development and validation data sets, that is, 0.82 (95% CI 0.82-0.83) in the Swedish National Diabetes Registry and 0.74 (95% CI 0.74-0.75) in Scottish Care Information-diabetes database. Differences were only observed after random introduction of missing data in the most important predictor variable (i.e., age). CONCLUSION: Validity and robustness of median imputation was not dissimilar to more complex methods for handling missing values, provided that the most important predictor variables, such as age, are not missing.
OBJECTIVES: To compare the validity and robustness of five methods for handling missing characteristics when using cardiovascular disease risk prediction models for individual patients in a real-world clinical setting. STUDY DESIGN AND SETTING: The performance of the missing data methods was assessed using data from the Swedish National Diabetes Registry (n = 419,533) with external validation using the Scottish Care Information - diabetes database (n = 226,953). Five methods for handling missing data were compared. Two methods using submodels for each combination of available data, two imputation methods: conditional imputation and median imputation, and one alternative modeling method, called the naïve approach, based on hazard ratios and populations statistics of known risk factors only. The validity was compared using calibration plots and c-statistics. RESULTS: C-statistics were similar across methods in both development and validation data sets, that is, 0.82 (95% CI 0.82-0.83) in the Swedish National Diabetes Registry and 0.74 (95% CI 0.74-0.75) in Scottish Care Information-diabetes database. Differences were only observed after random introduction of missing data in the most important predictor variable (i.e., age). CONCLUSION: Validity and robustness of median imputation was not dissimilar to more complex methods for handling missing values, provided that the most important predictor variables, such as age, are not missing.
Authors: Xingqi Cao; Chao Ma; Zhoutao Zheng; Liu He; Meng Hao; Xi Chen; Eileen M Crimmins; Thomas M Gill; Morgan E Levine; Zuyun Liu Journal: EClinicalMedicine Date: 2022-07-10