| Literature DB >> 28981501 |
Lavinia Paternoster1, Kate Tilling1, George Davey Smith1.
Abstract
The past decade has been proclaimed as a hugely successful era of gene discovery through the high yields of many genome-wide association studies (GWAS). However, much of the perceived benefit of such discoveries lies in the promise that the identification of genes that influence disease would directly translate into the identification of potential therapeutic targets, but this has yet to be realized at a level reflecting expectation. One reason for this, we suggest, is that GWAS, to date, have generally not focused on phenotypes that directly relate to the progression of disease and thus speak to disease treatment.Entities:
Mesh:
Year: 2017 PMID: 28981501 PMCID: PMC5628782 DOI: 10.1371/journal.pgen.1006944
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Fig 1DAG of Mendelian randomization method.
Abbreviation: DAG, directed acyclic graph.
Fig 2DAG demonstrating the issue of collider bias in studies with participants selected according to disease status.
In this situation, collider bias can induce an association (dashed line) between any factors (A, C, and U) that affect disease incidence (or other study selection criteria). When 1 or more of these factors are also associated with disease progression (C, U), a path is opened up from A to disease progression through the induced association. If A is a genetic risk factor, it can appear that there is an association between genetic risk factor A and disease progression only because of the induced association with C or U. If C is measured and can be adjusted for, then the induced association is blocked, but unmeasured U cannot be adjusted for in the analysis. Only when the genetic risk factor for progression is not also a risk factor for incidence (i.e., B) will it not be affected by selection bias. The arrows in Figure 2 show causal paths between variables—e.g., that variable A causes disease incidence. A collider is a variable which has 2 paths entering it, e.g., disease incidence. A path is blocked by a collider—i.e., the path from A to disease progression is blocked by disease incidence. If a collider is conditioned on, then that path is unblocked—i.e., if disease incidence is conditioned upon, then the path from A to disease progression becomes unblocked (i.e., collider bias may occur). Abbreviation: DAG, directed acyclic graph.
Fig 3DAG to demonstrate how the introduction of collider bias through the selection of cases (grey paths) can impact an MR analysis between an exposure and disease progression as an outcome.
Associations are induced because SNP causes disease (via exposure), and thus conditioning on disease induces an association between all variables causing disease. In a model not adjusting for exposure (e.g., relating SNP to progression), there is an association between SNP and the confounders, which biases the SNP-progression association. Abbreviations: DAG, direct acyclic graph; MR, Mendelian randomization.
Estimated effects of the risk factor for incidence only (A) and the risk factor for incidence and progression (C) from Fig 2 under different degrees of unmeasured confounding of incidence and progression.
| Degree of confounding by unmeasured confounder(s) (U) | ||||
|---|---|---|---|---|
| Simulated scenario | Low | Moderate | High | Strong |
| OR for disease = 1.5 | OR for disease = 2 | OR for disease = 2.5 | OR for disease = 3 | |
| Beta for progression = 0.5 | Beta for progression = 0.8 | Beta for progression = 1 | Beta for progression = 1.5 | |
| −0.01 (0.01) | −0.02 (0.02) | −0.03 (0.02) | −0.06 (0.03) | |
| 90% | 78% | 66% | 35% | |
| 0.10 (0.01) | 0.08 (0.01) | 0.07 (0.01) | 0.04 (0.02) | |
| 72% | 35% | 18% | 1% | |
Each cell represents results from 500 simulations with a sample size of 50,000.
Uppercase letters refer to factors in Fig 2, lowercase letters refer to effect sizes of paths in Fig 2.
In all scenarios the OR for A and C for disease incidence are 1.3, and the MAF for genetic variants A is 0.2.
C and the U are standard normal variables, disease is a binary variable (with prevalence of approximately 0.2), and prognosis is a normally distributed variable.
Abbreviations: A, risk factor for incidence; C, measured factor for incidence and progression; MAF, minor allele frequency; OR, odds ratio; SE, standard error; U, unmeasured factor for incidence and progression.