| Literature DB >> 35844206 |
Nguyen K Tran1, Timothy L Lash2, Neal D Goldstein1.
Abstract
As an inherent part of epidemiologic research, practical decisions made during data collection and analysis have the potential to impact the measurement of disease occurrence as well as statistical and causal inference from the results. However, the computational skills needed to collect, manipulate, and evaluate data have not always been a focus of educational programs, and the increasing interest in "data science" suggest that data literacy has become paramount to ensure valid estimation. In this article, we first motivate such practical concerns for the modern epidemiology student, particularly as it relates to challenges in causal inference; second, we discuss how such concerns may be manifested in typical epidemiological analyses and identify the potential for bias; third, we present a case study that exemplifies the entire process; and finally, we draw attention to resources that can help epidemiology students connect the theoretical underpinning of the science to the practical considerations as described herein.Entities:
Keywords: Biostatistics; Causal inference; Data science; Education and training; Epidemiology
Year: 2021 PMID: 35844206 PMCID: PMC9286486 DOI: 10.1016/j.gloepi.2021.100066
Source DB: PubMed Journal: Glob Epidemiol ISSN: 2590-1133
Summary of the potential impact on causal inference given various practical considerations of epidemiological data.
| Research stage | Practical consideration | Hypothetical example | Potential impact on inference | Possible technical solutions |
|---|---|---|---|---|
| Data Management | 1. Missing data | Complete case analysis omits important data | Selection or information bias | Multiple imputations using chained equations [ |
| 2. Duplicate observations | Data reported from a registry without de-duplication | Selection bias | Calculate predicted values of record linkage [ | |
| 3. Inconsistent variable definition | Data linkage with resulting inconsistent operationalization | Information bias | Comparison of linked and unlinked data, sensitivity analysis of linkage procedure [ | |
| Analysis | 4. Study design | Failure to consider an appropriate model for the survey design | Biased error | Evaluate research questions and hypotheses to implement appropriate model |
| 5. Model specification and assumption | Unresolved heteroskedasticity, relationship not linear, or correlated observations | Biased error | Evaluate distributions of data and use of model diagnostics | |
| 6. Variable selection | Inclusion or omission of covariates in the statistical model, mismeasurement of key variables | Uncontrolled confounding or information bias | Causal diagrams [ |