Iris Eekhout1, Henrica C W de Vet2, Jos W R Twisk3, Jaap P L Brand4, Michiel R de Boer5, Martijn W Heymans3. 1. Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands; EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands; Department of Methodology and Applied Biostatistics, Faculty of Earth and Life Sciences, Institute for Health Sciences, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands. Electronic address: i.eekhout@vumc.nl. 2. Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands; EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands. 3. Department of Epidemiology and Biostatistics, VU University Medical Center, P.O. box 7057, 1007 MB Amsterdam, The Netherlands; EMGO Institute for Health and Care Research, VU University Medical Center, Van der Boechorststraat 7, 1081 BT Amsterdam, The Netherlands; Department of Methodology and Applied Biostatistics, Faculty of Earth and Life Sciences, Institute for Health Sciences, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands. 4. Skyline Diagnostics, Marconistraat 16, 3029 AK Rotterdam, The Netherlands. 5. Department of Methodology and Applied Biostatistics, Faculty of Earth and Life Sciences, Institute for Health Sciences, VU University, De Boelelaan 1085, 1081 HV Amsterdam, The Netherlands; Department of Public Health, University Medical Center Groningen, PO box 196, 9700 AD Groningen, The Netherlands.
Abstract
OBJECTIVES: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. STUDY DESIGN AND SETTING: Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. RESULTS: Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. CONCLUSION: We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.
OBJECTIVES: Regardless of the proportion of missing values, complete-case analysis is most frequently applied, although advanced techniques such as multiple imputation (MI) are available. The objective of this study was to explore the performance of simple and more advanced methods for handling missing data in cases when some, many, or all item scores are missing in a multi-item instrument. STUDY DESIGN AND SETTING: Real-life missing data situations were simulated in a multi-item variable used as a covariate in a linear regression model. Various missing data mechanisms were simulated with an increasing percentage of missing data. Subsequently, several techniques to handle missing data were applied to decide on the most optimal technique for each scenario. Fitted regression coefficients were compared using the bias and coverage as performance parameters. RESULTS: Mean imputation caused biased estimates in every missing data scenario when data are missing for more than 10% of the subjects. Furthermore, when a large percentage of subjects had missing items (>25%), MI methods applied to the items outperformed methods applied to the total score. CONCLUSION: We recommend applying MI to the item scores to get the most accurate regression model estimates. Moreover, we advise not to use any form of mean imputation to handle missing data.
Authors: Sara J Singer; Anna D Sinaiko; Maike V Tietschert; Michaela Kerrissey; Russell S Phillips; Veronique Martin; Grace Joseph; Hassina Bahadurzada; Denis Agniel Journal: Health Serv Res Date: 2020-12 Impact factor: 3.402
Authors: Katharina Ackermann; Marietta Kirchner; Anka Bernhard; Anne Martinelli; Chrysanthi Anomitri; Rosalind Baker; Sarah Baumann; Roberta Dochnal; Aranzazu Fernandez-Rivas; Karen Gonzalez-Madruga; Beate Herpertz-Dahlmann; Amaia Hervas; Lucres Jansen; Kristina Kapornai; Linda Kersten; Gregor Kohls; Ronald Limprecht; Helen Lazaratou; Ana McLaughlin; Helena Oldenhof; Jack C Rogers; Réka Siklósi; Areti Smaragdi; Esther Vivanco-Gonzalez; Christina Stadler; Graeme Fairchild; Arne Popma; Stephane A De Brito; Kerstin Konrad; Christine M Freitag Journal: J Abnorm Child Psychol Date: 2019-10
Authors: Katharina Ackermann; Anne Martinelli; Anka Bernhard; Christine M Freitag; Gerhard Büttner; Christina Schwenck Journal: Child Psychiatry Hum Dev Date: 2019-10
Authors: Nicole R van Veenendaal; Jennifer N Auxier; Sophie R D van der Schoor; Linda S Franck; Mireille A Stelwagen; Femke de Groof; Johannes B van Goudoever; Iris E Eekhout; Henrica C W de Vet; Anna Axelin; Anne A M W van Kempen Journal: PLoS One Date: 2021-06-09 Impact factor: 3.240
Authors: Dana Lee Olstad; Karen E Lamb; Lukar E Thornton; Sarah A McNaughton; David A Crawford; Leia M Minaker; Kylie Ball Journal: Int J Epidemiol Date: 2017-10-01 Impact factor: 7.196