Yi Pan1, Yulei He2, Ruiguang Song3, Guoshen Wang3, Qian An3. 1. Division of HIV/AIDS Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA. Electronic address: jnu5@cdc.gov. 2. National Center of Health Statistics, Centers for Disease Control and Prevention, Hyattsville, MD. 3. Division of HIV/AIDS Prevention, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA.
Abstract
PURPOSE: Multiple imputation (MI) is a widely acceptable approach to missing data problems in epidemiological studies. Composite variables are often used to summarize information from multiple, correlated items. This study aims to assess and compare different MI methods for handling missing categorical composite variables. METHODS: We investigate the problem in the context of a real application: estimating the prevalence of HIV transmission category, which is a composite variable generated by applying a hierarchical algorithm to a group of binary risk source variables from a national program data set. We use simulation studies to compare and assess the performance of alternative MI strategies. These methods include the active imputation, just another variable, and the passive imputation approaches. RESULTS: Our study suggests that the passive imputation approach performs better than the direct imputation approach and the inclusive and general imputation model (i.e. passive imputation with interactions) performs the best. There is no need to embed the information from the variable-combining algorithm in the passive imputation modeling. CONCLUSION: We recommend practitioners adopting an inclusive and general passive imputation modeling strategy. Published by Elsevier Inc.
PURPOSE: Multiple imputation (MI) is a widely acceptable approach to missing data problems in epidemiological studies. Composite variables are often used to summarize information from multiple, correlated items. This study aims to assess and compare different MI methods for handling missing categorical composite variables. METHODS: We investigate the problem in the context of a real application: estimating the prevalence of HIV transmission category, which is a composite variable generated by applying a hierarchical algorithm to a group of binary risk source variables from a national program data set. We use simulation studies to compare and assess the performance of alternative MI strategies. These methods include the active imputation, just another variable, and the passive imputation approaches. RESULTS: Our study suggests that the passive imputation approach performs better than the direct imputation approach and the inclusive and general imputation model (i.e. passive imputation with interactions) performs the best. There is no need to embed the information from the variable-combining algorithm in the passive imputation modeling. CONCLUSION: We recommend practitioners adopting an inclusive and general passive imputation modeling strategy. Published by Elsevier Inc.
Authors: Guangyu Zhang; Charles E Rose; Yujia Zhang; Rui Li; Florence C Lee; Greta Massetti; Laura E Adams Journal: Int J Stat Med Res Date: 2022-01-28
Authors: Rameela Raman; Wencong Chen; Michael O Harhay; Jennifer L Thompson; E Wesley Ely; Pratik P Pandharipande; Mayur B Patel Journal: BMC Med Res Methodol Date: 2021-05-06 Impact factor: 4.615