| Literature DB >> 32275709 |
Yangqing Deng1, Wei Pan1.
Abstract
Transcriptome-wide association studies (TWAS and PrediXcan) have been increasingly applied to detect associations between genetically predicted gene expressions and GWAS traits, which may suggest, however do not completely determine, causal genes for GWAS traits, due to the likely violation of their imposed strong assumptions for causal inference. Testing colocalization moves it closer to establishing causal relationships: if a GWAS trait and a gene's expression share the same associated SNP, it may suggest a regulatory (and thus putative causal) role of the SNP mediated through the gene on the GWAS trait. Accordingly, it is of interest to develop and apply various colocalization testing approaches. The existing approaches may each have some severe limitations. For instance, some methods test the null hypothesis that there is colocalization, which is not ideal because often the null hypothesis cannot be rejected simply due to limited statistical power (with too small sample sizes). Some other methods arbitrarily restrict the maximum number of causal SNPs in a locus, which may lead to loss of power in the presence of wide-spread allelic heterogeneity. Importantly, most methods cannot be applied to either GWAS/eQTL summary statistics or cases with more than two possibly correlated traits. Here we present a simple and general approach based on conditional analysis of a locus on multiple traits, overcoming the above and other shortcomings of the existing methods. We demonstrate that, compared with other methods, our new method can be applied to a wider range of scenarios and often perform better. We showcase its applications to both simulated and real data, including a large-scale Alzheimer's disease GWAS summary dataset and a gene expression dataset, and a large-scale blood lipid GWAS summary association dataset. An R package "jointsum" implementing the proposed method is publicly available at github.Entities:
Mesh:
Year: 2020 PMID: 32275709 PMCID: PMC7176287 DOI: 10.1371/journal.pcbi.1007778
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.779
50). This choice gives us a fairly reasonable su, which means the largest number of nonzero effects that can be considered under the null. When su is fixed, choosing a larger u and a smaller s means adding more models. This usually leads to higher power but also larger type I errors, partly because the more models we add, the more likely one or more of them have significant results with none-negligible weights. Based on our experience and to keep it simple, we suggest using u = 10 by default, though cautions have to be taken (e.g. with possible sensitivity analyses).
Fig 1Density curves of Tconditional in 3 different null scenarios based on 10000 replications. Scenario 1: All Z's are centered at 0. Scenario 2: Z11, Z21, Z12, Z22 are centered at 1. Scenario 3: Z11, Z21, Z12, Z22, Z13, Z23, Z14, Z24 are centered at 1. In all scenarios, Z's are independent and have variance 1.
Rejection rates (type I errors for JLIM and conditional, power for coloc and HEIDI).
q = 13. 1000 iterations. α = 0.05. The correlation is -0.69 between SNP 1 and SNP 3, 0.68 between SNP 1 and SNP 4, 0.02 between SNP 1 and SNP 5, -0.35 between SNP 4 and SNP 5. Different subjects for two traits. For CMC, B = 104. For MA, B = 103, u = 10 and s = 1.
| Causal locations for trait 1 (size) | Causal locations for trait 2 (size) | JLIM | Conditional | coloc | HEIDI | |||
|---|---|---|---|---|---|---|---|---|
| 1st SNP | CB (w/o adj.) | CMC | MA | |||||
| 1 (0.3) | None | 0.047 | 0.054 | 0.005 (0.079) | 0.031 | 0.020 | 0.056 | 0.194 (0.006) |
| 1 (0.3) | 3 (0.3) | 0 | 0.054 | 0.007 (0.117) | 0.039 (0.025) [0.018] | 0.025 | 0.964 | 0.076 (0.059) |
| 1 (0.3) | 4 (0.3) | 0 | 0.054 | 0.007 (0.121) | 0.039 (0.029)[0.020] | 0.030 | 0.974 | 0.040 (0.038) |
| 1 (0.3) | 4 (-0.3) | 0 | 0.054 | 0.007 (0.121) | 0.051 (0.029) [0.025] | 0.031 | 0.970 | 0.042 (0.040) |
| 1 (0.2), 5 (-0.3) | 4 (0.3) | 0.393 | 0.049 | 0.009 (0.145) | 0.048 (0.028)[0.022] | 0.030 | 0.932 | 0.090 (0.085) |
* The tuning parameter θ was chosen as 0.05 (0.1) [0.2].
** When there was no causal location for trait 2, HEIDI sometimes output NA (after detecting the violation of its assumptions). We thus recorded two ratios A (B). A: #rejected/#non-NA results. (B): #rejected/#iterations.
Rejection rates (power for JLIM and conditional methods, type I errors for coloc and HEIDI).
q = 13. 1000 iterations. Different subjects for two traits. For CMC, B = 103. For MA, B = 103, u = 10 and s = 1.
| Causal locations for trait 1 (size) | Causal locations for trait 2 (size) | JLIM | conditional | Coloc | HEIDI | |||
|---|---|---|---|---|---|---|---|---|
| 1st SNP | CB | CMC | MA | |||||
| 1 (0.2) | 1 (0.2) | 0.977 | 0.798 | 0.391 | 0.640 | 0.636 | 0.041 | 0.105 (0.048) |
| 1 (0.7) | 1 (0.2) | 0.986 | 0.911 | 0.646 | 0.823 | 0.820 | 0.045 | 0.230 (0.105) |
| 1 (0.2) | 1 (0.2), 4 (0.4) | 0.011 | 0.798 | 0.392 | 0.611 | 0.611 | 0.488 | 0.020 (0.020) |
| 1 (0.2), 4(0.2) | 1 (0.2) | 0.514 | 0.798 | 0.395 | 0.613 | 0.623 | 0.262 | 0.171 (0.078) |
| 1 (0.2), 4(0.4) | 1 (0.2) | 0.009 | 0.798 | 0.397 | 0.612 | 0.617 | 0.548 | 0.327 (0.210) |
| 1 (0.2) | 1 (0.2), 2 (0.3), 3 (0.3), 4 (0.4) | 0.004 | 0.798 | 0.393 | 0.567 | 0.565 | 0.971 | 0.595 (0.549) |
| 1 (0.2), 2 (0.3), 3 (0.4), 4 (0.5) | 1 (0.2) | 0.004 | 0.798 | 0.399 | 0.575 | 0.568 | 0.992 | 0.776 (0.524) |
Fig 2ROC curve of JLIM, CB, CMC, eCAVIAR and coloc (Bayesian), combining all the scenarios in Table 1 and Table 2 (12000 samples in total).
Rejection rates.
q = 13. 1000 iterations. α = 0.05. Same subjects for two traits. For CMC, B = 103. Colocalization = No: type I errors for JLIM and conditional, power for coloc. Colocalization = Yes: power for JLIM and conditional, type I errors for coloc. For MA, B = 103, u = 10 and s = 1.
| Colocalization | Causal locations for trait 1 (size) | Causal locations for trait 2 (size) | JLIM | Conditional | coloc | HEIDI | |||
|---|---|---|---|---|---|---|---|---|---|
| 1st SNP | CB | CMC | MA | ||||||
| No | 1 (0.3) | None | 0.042 | 0.057 | 0.002 | 0.030 | 0.032 | 0.029 | 0.164 (0.027) |
| 1 (0.3) | 4 (0.3) | 0 | 0.057 | 0.004 | 0.035 | 0.041 | 0.954 | 0.012 (0.011) | |
| 1 (0.2), 5 (-0.3) | 4 (0.3) | 0.395 | 0.040 | 0.004 | 0.029 | 0.028 | 0.849 | 0.040 (0.034) | |
| Yes | 1 (0.2) | 1 (0.2) | 0.980 | 0.815 | 0.439 | 0.621 | 0.680 | 0.001 | 0.017 (0.014) |
| 1 (0.2) | 1 (0.2), 4 (0.4) | 0.013 | 0.815 | 0.439 | 0.584 | 0.655 | 0.292 | 0.006 (0.006) | |
Rejection rates.
1000 iterations. α = 0.05. Same subjects for two traits. For CMC, B = 103. τ = 0.2. Regions A1-A2: without colocalization (type I errors for JLIM and conditional, power for coloc). Regions B1-B6: with colocalization (power for JLIM and conditional, type I errors for coloc). For MA, B = 103, u = 10. s = 3 for loci with more than 50 SNPs, s = 2 otherwise.
| Region | # SNPs | JLIM | Conditional | coloc | |||
|---|---|---|---|---|---|---|---|
| 1st SNP | CB | CMC | MA | ||||
| A1 | 43 (2/0/0) | 0.039 | 0.011 | 0 | 0.014 | 0.026 | 0.027 |
| A2 | 58 (17/2/0) | 0.054 | 0.010 | 0.010 | 0.030 | 0.046 | 0.047 |
| B1 | 28 (9/4/1) | 0 | 0.018 | 0.189 | 0.276 | 0.288 | 1 |
| B2 | 53 (7/18/2) | 0 | 0.005 | 0.429 | 0.542 | 0.581 | 1 |
| B3 | 62 (12/6/2) | 0.724 | 0.011 | 0.546 | 0.721 | 0.739 | 1 |
| B4 | 43 (18/10/5) | 0 | 0.009 | 0.154 | 0.238 | 0.281 | 1 |
| B5 | 43 (9/7/6) | 0.179 | 0.016 | 0.683 | 0.859 | 0.877 | 1 |
(*) Numbers of SNPs that are causal for trait 1 / trait 2 / both traits.
Numbers of SNPs and loci with colocalization.
LDL and HDL only. For the methods that test each locus, α = 0.05/#loci. For the marginal method that tests each SNP separately, the numbers are the numbers of significant SNPs, and α = 0.05/(#SNPs in "Marginal"). The numbers in parentheses were obtained using the cut-off α = 5E-8. For coloc, the numbers of loci with colocalization are the ones that were not rejected for its null hypothesis under α.
| Chr | Marginal | Regional | |||||||
|---|---|---|---|---|---|---|---|---|---|
| # SNPs | # Significant SNPs | # Loci | # SNPs | # Loci with Colocalization | # Significant Lead SNPs | ||||
| CB | CMC | MA | Coloc | ||||||
| 1 | 181458 | 19 (17) | 26 | 134 | 0 (0) | 9 (4) | 9 (4) | 16 (21) | 6/9 |
| 2 | 209518 | 77 (55) | 33 | 230 | 0 (0) | 16 (3) | 16 (3) | 13 (22) | 5/13 |
| 11 | 124669 | 129 (114) | 26 | 239 | 0 (0) | 23 (19) | 24 (19) | 8 (21) | 20/23 |
| 12 | 116372 | 10 (7) | 2 | 17 | 0 (0) | 1 (0) | 1 (0) | 0 (2) | 0/1 |
| 16 | 67080 | 32 (29) | 5 | 40 | 0 (0) | 3 (0) | 3 (0) | 3 (5) | 1/3 |
| 19 | 33311 | 21 (18) | 21 | 81 | 0 (0) | 7 (4) | 7 (4) | 10 (13) | 2/7 |
| 20 | 58830 | 1 (1) | 6 | 45 | 0 (0) | 1 (0) | 1 (0) | 1 (2) | 1/1 |
* Testing lead SNPs with α = 0.05/(#SNPs in marginal). Lead SNPs were defined as the ones with the smallest p-values for LDL / smallest p-value sums for LDL + HDL. These p-values were obtained from the conditional models.
Numbers of loci with colocalization.
LDL, HDL and TG. For the methods testing each locus, α = 0.05/#loci. For the marginal method that tests each SNP separately, the numbers are the numbers of significant SNPs, and α = 0.05/(#SNPs in "Marginal"). For testing lead SNPs, α = 0.05/(#SNPs in "Regional").
| Chr | Marginal | Regional | ||||||
|---|---|---|---|---|---|---|---|---|
| # SNPs | # Significant SNPs | # Loci | # SNPs | # Loci with Colocalization | # Significant Lead SNPs | |||
| CB | CMC | MA | ||||||
| 1 | 181434 | 6 | 26 | 134 | 6 | 6 | 7 | 4/4 |
| 2 | 209518 | 70 | 33 | 230 | 11 | 11 | 12 | 5/9 |
| 11 | 124669 | 125 | 26 | 239 | 22 | 23 | 23 | 20/21 |
| 12 | 116372 | 0 | 2 | 17 | 1 | 1 | 1 | 0/1 |
| 16 | 67080 | 28 | 5 | 40 | 1 | 1 | 1 | 0/0 |
| 19 | 33311 | 16 | 21 | 81 | 5 | 5 | 6 | 5/3 |
| 20 | 58830 | 0 | 6 | 45 | 1 | 1 | 1 | 0/1 |
* Lead SNPs were defined as the ones with the smallest p-values for LDL / smallest p-value sums for LDL + HDL + TG. These p-values were obtained from the conditional models.
Numbers of significant SNPs and windows.
Window size = 100. Test colocalization of LDL, HDL, TG and TC. For MA, u = 10 and s = 5.
| Chr | Marginal | Regional | ||||
|---|---|---|---|---|---|---|
| # SNPs | # Significant SNPs ( | # Windows | # Significant windows ( | |||
| CB | CMC | MA | ||||
| 2 | 209518 | 48 | 2095 | 1 | 4 | 6 |
| 12 | 116372 | 0 | 1163 | 0 | 0 | 3 |
| 16 | 67080 | 27 | 670 | 0 | 0 | 1 |
| 19 | 33311 | 16 | 333 | 1 | 1 | 3 |
Fig 3One locus associated with schizophrenia on chromosome 6.
JLIM did not detect colocalization but the conditional method's result was more significant. Top: LocusZoom plots of 15 SNPs' p-values in marginal analysis. Bottom: LocusZoom plots of the same SNPs' p-values in conditional analysis. Smaller p-values are truncated at 1e-15.