| Literature DB >> 35652173 |
Saloni Dattani1,2, David M Howard1,3, Cathryn M Lewis1,4, Pak C Sham5.
Abstract
As research in genetics has advanced, some findings have been unexpected or shown to be inconsistent between studies or datasets. The reasons these inconsistencies arise are complex. Results from genetic studies can be affected by various factors including statistical power, linkage disequilibrium, quality control, confounding and selection bias, as well as real differences from interactions and effect modifiers, which may be informative about the mechanisms of traits and disease. Statistical artefacts can manifest as differences between results but they can also conceal underlying differences, which implies that their critical examination is important for understanding the underpinnings of traits. In this review, we examine these factors and outline how they can be identified and conceptualised with structural causal models. We explain the consequences they have on genetic estimates, such as genetic associations, polygenic scores, family- and genome-wide heritability, and describe methods to address them to aid in the estimation of true effects of genetic variation. Clarifying these factors can help researchers anticipate when results are likely to diverge and aid researchers' understanding of causal relationships between genes and complex traits.Entities:
Keywords: GWAS; causal inference; confounding; consistency; heritability; replications; selection bias
Mesh:
Year: 2022 PMID: 35652173 PMCID: PMC9544854 DOI: 10.1002/gepi.22459
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.344
Figure 1(a–g) Direct acyclic graphs depicting causal relationships in genetic association analyses, in which nodes (variables) are connected with each other by arcs. Dashed arcs represent non‐causal statistical associations, while filled arcs represent causal statistical associations. Boxes represent variables which have been selected on, for example, by regression adjustment or inclusion/exclusion criteria in a study. Panels (a) and (b) represent effect modification, where the magnitude or direction of a causal effect is modified by a third variable, which acts upon a mediating mechanism. Panels (c) and (d) represent confounding, where a presumed exposure and outcome have a shared cause. Panels (e–g) depict selection bias, in which a presumed exposure and outcome both affect a third variable which is selected upon in the analysis. CNV, copy number variant; PRS, polygenic risk score; SNP, single nucleotide polymorphism.
Approaches to identify the impact of residual confounding or selection bias with negative controls.
| Approach | Aim and description | Methods | Notes |
|---|---|---|---|
| Identification | To identify the presence and potential impact of residual confounding or selection bias. Researchers can use negative controls, which are situations in which the exposure cannot have its hypothesised effect. Researchers can also use positive controls, which are situations in which an exposure should show a known effect. | Negative control exposures: a comparison to the procedure without the presence of the exposure (analogous to placebo controls in randomised controlled trials). This is used to detect whether the procedure or analysis would identify effects regardless of the presence of the exposure, due to residual confounders or selection bias. (Lipsitch et al., |
Requires domain knowledge of the ‘active ingredient’ of the exposure and the potential sources of confounding:
Leads to underestimating the effect of the exposure if the negative control itself has effects on the outcome Leads to overestimating the effect of the exposure if the negative control does not adequately cover the procedure or analysis method used for analysing the exposure |
| Negative control outcomes: a comparison to outcomes which are not expected to be affected by the exposure. This is used to detect whether the analysis method would identify outcomes that should be unaffected by the exposure, but may still be affected by residual confounders or selection bias (Dusetzina et al., |
Requires domain knowledge of outcomes that are unaffected by the exposure:
Leads to overestimating the effect of the exposure if the negative control outcome is actuallyunaffected by residual confounders |
Note: Shown are the aims of these methods, approaches used to achieve these aims and notes on their usage.
Approaches to quantify or minimise the impact of residual confounding.
| Approach | Aim and description | Methods | Strengths (+) and limitations (−) |
|---|---|---|---|
| Quantification | To estimate the impacts of residual confounding in the sample (Liu et al., | Target‐adjustment sensitivity analysis: estimate the values of bias parameters required to overturn the results observed (Cinelli & Hazlett, |
Simple and easy to implement with statistical software, for example, Some identify a range of values of bias parameters, for example, standardised mean difference or the partial Bias parameter values can easily be obtained from summary statistics in the literature, for example, in the form of odds ratios
Requires domain knowledge of the confounders Tend to assume no effect modification and no three‐way interaction between exposure, unobserved confounder and outcome (Rosenbaum, |
| Fixed‐bias parameter analysis: estimate the underlying effect size using fixed values of bias parameters, with confidence intervals to show their impact (Greenland, |
Easy to implement and interpret with free software in R, Stata and Excel, for example, Relaxes the assumption of no three‐way interaction between exposure, unobserved confounder and outcome | ||
| Study design | To ensure that the study population is selected in such a way that the impact of confounders is reduced, using exclusion or inclusion criteria to limit the variation in known confounders (C. Y. Lu, | Exclusion or inclusion criteria: restrict the study population to categories where variation in confounders is limited |
Requires domain knowledge of the relevant confounders Residual confounding may remain within categories that are included in the study Reduces the external validity of results |
| Covariate adjustment | To minimise the statistical impact of known confounders, using domain knowledge of the variables that can lead to confounding. |
Regression: estimate and adjust for the relationship between the covariate, exposure and outcome Matching: estimate and adjust for the closest matches between the exposure and control groups on covariates Stratification: analyse the data within strata of the covariates |
Requires domain knowledge of the relevant confounders; colliders could be adjusted for unintentionally Residual confounding may remain within strata or within covariates that are estimated with noise Matching methods reduce statistical power by retaining only matched sets |
| Propensity scoring | To minimise the statistical impact of known confounders, by estimating the propensity for participants to receive the treatment and matching, stratifying or adjusting for this score. | Propensity score matching, stratification, adjustment as covariates: estimate the likelihood that participants will receive the exposure, and match, stratify or adjust for these propensity scores (Austin, |
Requires domain knowledge of the relevant confounders; colliders could be added unintentionally Matching retains only matched subsets, which reduces statistical power |
Approaches to quantify or minimise the impact of residual selection bias.
| Approach | Aim and description | Methods | Strengths (+) and limitations (−) |
|---|---|---|---|
| Quantification | To estimate the impact of selection into the study sample (Lash et al., | Target‐adjustment sensitivity analysis: estimate the values of bias parameters required to overturn the results observed. |
Simple to implement and interpret
Uninformative about the range of plausible effect sizes Difficult to apply when multiple sources of bias are present |
| Fixed‐bias parameter analysis: estimate the underlying effect size using fixed values of bias parameters, with confidence intervals to show their impact (Lash et al., |
Simple to implement and interpret | ||
| Probabilistic bias analysis: estimate the underlying effect size using a distribution of values of bias parameters (such as uniform, normal or triangular distributions) (Knox et al., |
Can be implemented with software such as
Requires domain knowledge for choice of probabilistic distribution | ||
| Study design | To adjust the study design to ensure the retained sample matches the population of interest on relevant characteristics |
Adherence: increase uptake of the treatment or measurement in the study Non‐response: increase response rates to the measurement in the study Dropouts: reduce dropouts and withdrawals from the study |
Can be unfeasible or impractical Inapplicable to datasets that have already been collected |
| Covariate adjustment | To minimise the impact of selection bias, by breaking the association between exposure and selection variables | Avoidance of adjustment for colliders: identify potential colliders and avoid their adjustment (Cinelli, Forney, et al., |
Requires domain knowledge of causal relations to identify colliders |
| Adjustment for covariates that affect the exposure and selection into the study: identify causes of selection into the study and stratify, adjust, match or exclude data at levels of selection (Hernán et al., |
Requires domain knowledge of causal relations Can only be applied to measured covariates that affect exposure and selection into the study Inapplicable when the exposure is also affected by other variables that affect these covariates | ||
| Propensity score weighting | To minimise the impact of selection bias, by estimating the likelihood of participants' inclusion in the sample and inversely weighting these likelihoods | Inverse probability weighting (IPW): weight participants inversely according to their likelihood to participate in the study (i.e., participants who are the least likely to participate are upwardly weighted) (Hernán et al., |
More flexible than covariate adjustment, because additional covariates need not be measured and effect estimates are unconditional of them
Requires domain knowledge and measurement of variables associated with selection into the study |
| Multiple imputation | To minimise the impact of selection bias from missing data, by modelling the distribution of missing values given the observed data, and predicting and filling them (Huque et al., |
Joint modelling: impute missing values, with the assumption that incomplete variables follow a multivariate normal distribution Fully conditional specification: impute missing values, with the assumption that incomplete variables follow a univariate conditional distribution given the other variables |
Generally more efficient than IPW to address missing data, because it can use information from participants with partially missing data
Only applicable when selection bias is at the level of missing data Difficult and imprecise when participants with missing data tend to have missing values on most variables Difficult to specify correct model when there are many variables to be imputed or if there are interactions in the analysis model |