BACKGROUND: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. METHODS: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. RESULTS: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. CONCLUSION: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
BACKGROUND: Correlated data are ubiquitous in epidemiologic research, particularly in nutritional and environmental epidemiology where mixtures of factors are often studied. Our objectives are to demonstrate how highly correlated data arise in epidemiologic research and provide guidance, using a directed acyclic graph approach, on how to proceed analytically when faced with highly correlated data. METHODS: We identified three fundamental structural scenarios in which high correlation between a given variable and the exposure can arise: intermediates, confounders, and colliders. For each of these scenarios, we evaluated the consequences of increasing correlation between the given variable and the exposure on the bias and variance for the total effect of the exposure on the outcome using unadjusted and adjusted models. We derived closed-form solutions for continuous outcomes using linear regression and empirically present our findings for binary outcomes using logistic regression. RESULTS: For models properly specified, total effect estimates remained unbiased even when there was almost perfect correlation between the exposure and a given intermediate, confounder, or collider. In general, as the correlation increased, the variance of the parameter estimate for the exposure in the adjusted models increased, while in the unadjusted models, the variance increased to a lesser extent or decreased. CONCLUSION: Our findings highlight the importance of considering the causal framework under study when specifying regression models. Strategies that do not take into consideration the causal structure may lead to biased effect estimation for the original question of interest, even under high correlation.
Authors: Katherine Ahrens; Sunni L Mumford; Karen C Schliep; Kerri A Kissell; Neil J Perkins; Jean Wactawski-Wende; Enrique F Schisterman Journal: Am J Obstet Gynecol Date: 2013-11-08 Impact factor: 8.661
Authors: Jean Wactawski-Wende; Enrique F Schisterman; Kathleen M Hovey; Penelope P Howards; Richard W Browne; Mary Hediger; Aiyi Liu; Maurizio Trevisan Journal: Paediatr Perinat Epidemiol Date: 2009-03 Impact factor: 3.980
Authors: Carolyn T Bramante; Nicholas E Ingraham; Thomas A Murray; Schelomo Marmor; Shane Hovertsen; Jessica Gronski; Chace McNeil; Ruoying Feng; Gabriel Guzman; Nermine Abdelwahab; Samantha King; Thomas Meehan; Kathryn M Pendleton; Bradley Benson; Deneen Vojta; Christopher J Tignanelli Journal: medRxiv Date: 2020-06-28
Authors: Jennifer A Hutcheon; Susan Moskosky; Cande V Ananth; Olga Basso; Peter A Briss; Cynthia D Ferré; Brittni N Frederiksen; Sam Harper; Sonia Hernández-Díaz; Ashley H Hirai; Russell S Kirby; Mark A Klebanoff; Laura Lindberg; Sunni L Mumford; Heidi D Nelson; Robert W Platt; Lauren M Rossen; Alison M Stuebe; Marie E Thoma; Catherine J Vladutiu; Katherine A Ahrens Journal: Paediatr Perinat Epidemiol Date: 2018-10-12 Impact factor: 3.980
Authors: Roel Vermeulen; George S Downward; Jinming Zhang; Wei Hu; Lützen Portengen; Bryan A Bassig; S Katharine Hammond; Jason Y Y Wong; Jihua Li; Boris Reiss; Jun He; Linwei Tian; Kaiyun Yang; Wei Jie Seow; Jun Xu; Kim Anderson; Bu-Tian Ji; Debra Silverman; Stephen Chanock; Yunchao Huang; Nathaniel Rothman; Qing Lan Journal: Environ Health Perspect Date: 2019-09-05 Impact factor: 9.031
Authors: Hadas Magen-Molho; Marc G Weisskopf; Daniel Nevo; Alexandra Shtein; Shimon Chen; David Broday; Itai Kloog; Hagai Levine; Ofir Pinto; Raanan Raz Journal: Epidemiology Date: 2021-11-01 Impact factor: 4.822
Authors: Lisa B Rokoff; Sheryl L Rifas-Shiman; Brent A Coull; Andres Cardenas; Antonia M Calafat; Xiaoyun Ye; Alexandros Gryparis; Joel Schwartz; Sharon K Sagiv; Diane R Gold; Emily Oken; Abby F Fleisch Journal: Environ Health Date: 2018-02-20 Impact factor: 5.984