BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.
BACKGROUND AND OBJECTIVE: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. METHODS: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors--both MCAR and MAR--and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. RESULTS: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased--underestimated--coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. CONCLUSION: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy.
Authors: Kelly L Bolton; Georgia Chenevix-Trench; Cindy Goh; Siegal Sadetzki; Susan J Ramus; Beth Y Karlan; Diether Lambrechts; Evelyn Despierre; Daniel Barrowdale; Lesley McGuffog; Sue Healey; Douglas F Easton; Olga Sinilnikova; Javier Benítez; María J García; Susan Neuhausen; Mitchell H Gail; Patricia Hartge; Susan Peock; Debra Frost; D Gareth Evans; Rosalind Eeles; Andrew K Godwin; Mary B Daly; Ava Kwong; Edmond S K Ma; Conxi Lázaro; Ignacio Blanco; Marco Montagna; Emma D'Andrea; Maria Ornella Nicoletto; Sharon E Johnatty; Susanne Krüger Kjær; Allan Jensen; Estrid Høgdall; Ellen L Goode; Brooke L Fridley; Jennifer T Loud; Mark H Greene; Phuong L Mai; Angela Chetrit; Flora Lubin; Galit Hirsh-Yechezkel; Gord Glendon; Irene L Andrulis; Amanda E Toland; Leigha Senter; Martin E Gore; Charlie Gourley; Caroline O Michie; Honglin Song; Jonathan Tyrer; Alice S Whittemore; Valerie McGuire; Weiva Sieh; Ulf Kristoffersson; Håkan Olsson; Åke Borg; Douglas A Levine; Linda Steele; Mary S Beattie; Salina Chan; Robert L Nussbaum; Kirsten B Moysich; Jenny Gross; Ilana Cass; Christine Walsh; Andrew J Li; Ronald Leuchter; Ora Gordon; Montserrat Garcia-Closas; Simon A Gayther; Stephen J Chanock; Antonis C Antoniou; Paul D P Pharoah Journal: JAMA Date: 2012-01-25 Impact factor: 56.272
Authors: Rolf H H Groenwold; Ian R White; A Rogier T Donders; James R Carpenter; Douglas G Altman; Karel G M Moons Journal: CMAJ Date: 2012-02-27 Impact factor: 8.262
Authors: R J Shah; J M Diamond; E Cantu; J Flesch; J C Lee; D J Lederer; V N Lama; J Orens; A Weinacker; D S Wilkes; D Roe; S Bhorade; K M Wille; L B Ware; S M Palmer; M Crespo; E Demissie; J Sonnet; A Shah; S M Kawut; S L Bellamy; A R Localio; J D Christie Journal: Am J Transplant Date: 2015-04-15 Impact factor: 8.086
Authors: Melissa J Azur; Elizabeth A Stuart; Constantine Frangakis; Philip J Leaf Journal: Int J Methods Psychiatr Res Date: 2011-03 Impact factor: 4.035
Authors: Stein I Hallan; Eberhard Ritz; Stian Lydersen; Solfrid Romundstad; Kurt Kvenild; Stephan R Orth Journal: J Am Soc Nephrol Date: 2009-04-08 Impact factor: 10.121
Authors: Edward D Siew; Josh F Peterson; Svetlana K Eden; Karel G Moons; T Alp Ikizler; Michael E Matheny Journal: Clin J Am Soc Nephrol Date: 2012-10-04 Impact factor: 8.237
Authors: Shao-Hsien Liu; Catherine E Dubé; Charles B Eaton; Jeffrey B Driban; Timothy E McAlindon; Kate L Lapane Journal: J Rheumatol Date: 2018-06-15 Impact factor: 4.666