| Literature DB >> 35057853 |
Torleif Halkjelsvik1,2, Antonio Gasparrini3,4, Rannveig Kaldager Hart5,6.
Abstract
Higher availability of administrative data and better infrastructure for electronic surveys allow for large sample sizes in evaluations of national and other large scale policies. Although larger datasets have many advantages, the use of big disaggregate data (e.g., on individuals, households, stores, municipalities) can be challenging in terms of statistical inference. Measurements made at the same point in time may be jointly influenced by contemporaneous factors and produce more variation across time than suggested by the model. This excess variation, or co-movement over time, produce observations that are not truly independent (i.e., cross-sectional dependence). If this dependency is not accounted for, statistical uncertainty will be underestimated, and studies may indicate reform effects where there is none. In the context of interrupted time series (segmented regression), we illustrate the potential for bias in inference when using large disaggregate data, and we describe two simple solutions that are available in standard statistical software.Entities:
Keywords: Contemporaneous error; Cross-sectional dependence; Disaggregate data; Individual-level data; Interrupted time series; Intervention; Public health; Segmented regression
Year: 2022 PMID: 35057853 PMCID: PMC8772208 DOI: 10.1186/s13690-022-00795-5
Source DB: PubMed Journal: Arch Public Health ISSN: 0778-7367
Fig. 1The effect of ignoring the time variable in analyses of policy change on disaggregate data. The Cluster-Robust Standard Error (Rob.SE) with clustering on Stores gives increasingly biased standard errors as a function of increasing sample size (panels A to C). The Rob. SE with clustering on both Stores and Time gives results in line with the analysis on aggregated data (panel D)