Sebastien Haneuse1, Scott Bartell. 1. Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA. shaneuse@hsph.harvard.edu
Abstract
BACKGROUND: Studies of ecologic or aggregate data suffer from a broad range of biases when scientific interest lies with individual-level associations. To overcome these biases, epidemiologists can choose from a range of designs that combine these group-level data with individual-level data. The individual-level data provide information to identify, evaluate, and control bias, whereas the group-level data are often readily accessible and provide gains in efficiency and power. Within this context, the literature on developing models, particularly multilevel models, is well-established, but little work has been published to help researchers choose among competing designs and plan additional data collection. METHODS: We review recently proposed "combined" group- and individual-level designs and methods that collect and analyze data at 2 levels of aggregation. These include aggregate data designs, hierarchical related regression, two-phase designs, and hybrid designs for ecologic inference. RESULTS: The various methods differ in (i) the data elements available at the group and individual levels and (ii) the statistical techniques used to combine the 2 data sources. Implementing these techniques requires care, and it may often be simpler to ignore the group-level data once the individual-level data are collected. A simulation study, based on birth-weight data from North Carolina, is used to illustrate the benefit of incorporating group-level information. CONCLUSIONS: Our focus is on settings where there are individual-level data to supplement readily accessible group-level data. In this context, no single design is ideal. Choosing which design to adopt depends primarily on the model of interest and the nature of the available group-level data.
BACKGROUND: Studies of ecologic or aggregate data suffer from a broad range of biases when scientific interest lies with individual-level associations. To overcome these biases, epidemiologists can choose from a range of designs that combine these group-level data with individual-level data. The individual-level data provide information to identify, evaluate, and control bias, whereas the group-level data are often readily accessible and provide gains in efficiency and power. Within this context, the literature on developing models, particularly multilevel models, is well-established, but little work has been published to help researchers choose among competing designs and plan additional data collection. METHODS: We review recently proposed "combined" group- and individual-level designs and methods that collect and analyze data at 2 levels of aggregation. These include aggregate data designs, hierarchical related regression, two-phase designs, and hybrid designs for ecologic inference. RESULTS: The various methods differ in (i) the data elements available at the group and individual levels and (ii) the statistical techniques used to combine the 2 data sources. Implementing these techniques requires care, and it may often be simpler to ignore the group-level data once the individual-level data are collected. A simulation study, based on birth-weight data from North Carolina, is used to illustrate the benefit of incorporating group-level information. CONCLUSIONS: Our focus is on settings where there are individual-level data to supplement readily accessible group-level data. In this context, no single design is ideal. Choosing which design to adopt depends primarily on the model of interest and the nature of the available group-level data.
Authors: Huaqin Pan; Stephen W Edwards; Cataia Ives; Hannah Covert; Emily W Harville; Maureen Y Lichtveld; Jeffrey K Wickliffe; Carol M Hamilton Journal: Curr Opin Toxicol Date: 2019-07-30
Authors: Maria R Pardo-Crespo; Nirmala Priya Narla; Arthur R Williams; Timothy J Beebe; Jeff Sloan; Barbara P Yawn; Philip H Wheeler; Young J Juhn Journal: J Epidemiol Community Health Date: 2013-01-15 Impact factor: 3.710
Authors: Sebastien Haneuse; Bethany Hedt-Gauthier; Frank Chimbwandira; Simon Makombe; Lyson Tenthani; Andreas Jahn Journal: BMC Med Res Methodol Date: 2015-04-07 Impact factor: 4.615
Authors: Jaap Swanenburg; Fabienne Büchi; Dominik Straumann; Konrad P Weber; Eling D de Bruin Journal: Front Neurol Date: 2020-06-30 Impact factor: 4.003
Authors: Samuel Biraro; Anatoli Kamali; Richard White; Alex Karabarinde; Juliet Nsiimire Ssendagala; Heiner Grosskurth; Helen A Weiss Journal: Trop Med Int Health Date: 2013-10 Impact factor: 2.622
Authors: Robert J Blount; Lisa Pascopella; Donald G Catanzaro; Pennan M Barry; Paul B English; Mark R Segal; Jennifer Flood; Dan Meltzer; Brenda Jones; John Balmes; Payam Nahid Journal: Environ Health Perspect Date: 2017-09-29 Impact factor: 9.031