Sengwee Toh1, Luis A García Rodríguez, Miguel A Hernán. 1. Department of Population Medicine, Harvard Medical School and Harvard Pilgrim Health Care Institute, Boston, MA, USA. darrentoh@post.harvard.edu
Abstract
PURPOSE: Electronic healthcare databases are commonly used in comparative effectiveness and safety research of therapeutics. Many databases now include additional confounder information in a subset of the study population through data linkage or data collection. We described and compared existing methods for analyzing such datasets. METHODS: Using data from The Health Improvement Network and the relation between non-steroidal anti-inflammatory drugs and upper gastrointestinal bleeding as an example, we employed several methods to handle partially missing confounder information. RESULTS: The crude odds ratio (OR) of upper gastrointestinal bleeding was 1.50 (95% confidence interval: 0.98, 2.28) among selective cyclo-oxygenase-2 inhibitor initiators (n = 43 569) compared with traditional non-steroidal anti-inflammatory drug initiators (n = 411 616). The OR dropped to 0.81 (0.52, 1.27) upon adjustment for confounders recorded for all patients. When further considering three additional variables missing in 22% of the study population (smoking, alcohol consumption, body mass index), the OR was between 0.80 and 0.83 for the missing-category approach, the missing-indicator approach, single imputation by the most common category, multiple imputation by chained equations, and propensity score calibration. The OR was 0.65 (0.39, 1.09) and 0.67 (0.38, 1.16) for the unweighted and the inverse probability weighted complete-case analysis, respectively. CONCLUSIONS: Existing methods for handling partially missing confounder data require different assumptions and may produce different results. The unweighted complete-case analysis, the missing-category/indicator approach, and single imputation require often unrealistic assumptions and should be avoided. In this study, differences across methods were not substantial, likely due to relatively low proportion of missingness and weak confounding effect by the three additional variables upon adjustment for other variables.
PURPOSE: Electronic healthcare databases are commonly used in comparative effectiveness and safety research of therapeutics. Many databases now include additional confounder information in a subset of the study population through data linkage or data collection. We described and compared existing methods for analyzing such datasets. METHODS: Using data from The Health Improvement Network and the relation between non-steroidal anti-inflammatory drugs and upper gastrointestinal bleeding as an example, we employed several methods to handle partially missing confounder information. RESULTS: The crude odds ratio (OR) of upper gastrointestinal bleeding was 1.50 (95% confidence interval: 0.98, 2.28) among selective cyclo-oxygenase-2 inhibitor initiators (n = 43 569) compared with traditional non-steroidal anti-inflammatory drug initiators (n = 411 616). The OR dropped to 0.81 (0.52, 1.27) upon adjustment for confounders recorded for all patients. When further considering three additional variables missing in 22% of the study population (smoking, alcohol consumption, body mass index), the OR was between 0.80 and 0.83 for the missing-category approach, the missing-indicator approach, single imputation by the most common category, multiple imputation by chained equations, and propensity score calibration. The OR was 0.65 (0.39, 1.09) and 0.67 (0.38, 1.16) for the unweighted and the inverse probability weighted complete-case analysis, respectively. CONCLUSIONS: Existing methods for handling partially missing confounder data require different assumptions and may produce different results. The unweighted complete-case analysis, the missing-category/indicator approach, and single imputation require often unrealistic assumptions and should be avoided. In this study, differences across methods were not substantial, likely due to relatively low proportion of missingness and weak confounding effect by the three additional variables upon adjustment for other variables.
Authors: Geert J M G van der Heijden; A Rogier T Donders; Theo Stijnen; Karel G M Moons Journal: J Clin Epidemiol Date: 2006-07-11 Impact factor: 6.437
Authors: James D Lewis; Rita Schinnar; Warren B Bilker; Xingmei Wang; Brian L Strom Journal: Pharmacoepidemiol Drug Saf Date: 2007-04 Impact factor: 2.890
Authors: Louise Marston; James R Carpenter; Kate R Walters; Richard W Morris; Irwin Nazareth; Irene Petersen Journal: Pharmacoepidemiol Drug Saf Date: 2010-06 Impact factor: 2.890
Authors: Y-F Chen; P Jobanputra; P Barton; S Bryan; A Fry-Smith; G Harris; R S Taylor Journal: Health Technol Assess Date: 2008-04 Impact factor: 4.014
Authors: Jeffrey P Anderson; Jignesh R Parikh; Daniel K Shenfeld; Vladimir Ivanov; Casey Marks; Bruce W Church; Jason M Laramie; Jack Mardekian; Beth Anne Piper; Richard J Willke; Dale A Rublee Journal: J Diabetes Sci Technol Date: 2015-12-20
Authors: Francisco J de Abajo; Sara Rodríguez-Martín; Antonio Rodríguez-Miguel; Miguel J Gil Journal: J Am Heart Assoc Date: 2017-05-18 Impact factor: 5.501
Authors: Mollie E Wood; Kate L Lapane; Marleen M H J van Gelder; Dheeraj Rai; Hedvig M E Nordeng Journal: Pharmacoepidemiol Drug Saf Date: 2017-10-17 Impact factor: 2.890
Authors: Sara Rodríguez-Martín; Diana González-Bermejo; Antonio Rodríguez-Miguel; Diana Barreira; Alberto García-Lledó; Miguel Gil; Francisco J de Abajo Journal: Clin Pharmacol Ther Date: 2019-11-06 Impact factor: 6.875
Authors: Terry Treadwell; Michael L Sabolinski; Michelle Skornicki; Nathan B Parsons Journal: Adv Wound Care (New Rochelle) Date: 2018-03-01 Impact factor: 4.730
Authors: John D Seeger; Kourtney J Davis; Michelle R Iannacone; Wei Zhou; Nancy Dreyer; Almut G Winterstein; Nancy Santanello; Barry Gertz; Jesse A Berlin Journal: Pharmacoepidemiol Drug Saf Date: 2020-10-04 Impact factor: 2.890