| Literature DB >> 29940862 |
Marissa LeBlanc1, Verena Zuber2, Wesley K Thompson3, Ole A Andreassen4,5, Arnoldo Frigessi6, Bettina Kulle Andreassen7.
Abstract
BACKGROUND: There is considerable evidence that many complex traits have a partially shared genetic basis, termed pleiotropy. It is therefore useful to consider integrating genome-wide association study (GWAS) data across several traits, usually at the summary statistic level. A major practical challenge arises when these GWAS have overlapping subjects. This is particularly an issue when estimating pleiotropy using methods that condition the significance of one trait on the signficance of a second, such as the covariate-modulated false discovery rate (cmfdr).Entities:
Keywords: Covariate-modulated false discovery rate; Cross-phenotype association; Data integration; Meta-analysis with shared subjects
Mesh:
Year: 2018 PMID: 29940862 PMCID: PMC6019513 DOI: 10.1186/s12864-018-4859-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Simulated GWAS pairs with overlapping samples. Data was simulated for two quantitative trait GWAS with no genetic effects but overlap in samples (each with n=12,500 including 5000 overlapping samples). d=100,000 SNPs were simulated under the null model (phenotype is simulated independent from genotype). Panel a: the p-value distribution for trait 1; Panel b: the p-value distribution for trait 2; Panel c: The p-value distribution for trait 2 given that the p-value in study 1 was less than 0.1; Panel d: quantile-quantile plot for the p-values in study 2, stratified by the p-value in study 1
Fig. 2Simulated GWAS pairs with overlapping samples, after correction for sample overlap using the decor relation transform. Data before correction is presented in Fig. 1. Data was simulated for two quantitative trait GWAS with no genetic effects but overlap in samples ((each with n=12,500 including 5,000 overlapping samples). d=100,000 SNPs were simulated under the null model (phenotype is simulated independent from genotype). The decor relation transformation proposed here was applied to the simulated summary statistics. Panel a: the p-value distribution for trait 1; Panel b: the p-value distribution for trait 2; Panel c: The p-value distribution for trait 2 given that the p-value in study 1 was less than 0.1; Panel d: quantile-quantile plot for the p-values in study 2, stratified by the p-value in study 1
Mean false discovery proportion (FDP), mean number of falsely rejected null hypotheses out of 99,600, i.e. false positives (FP) and mean number of correctly rejected non-null hypotheses i.e. true positives (TP) over 100 simulation runs and a covariate-modulated false discovery rate (cmfdr) cut-off of 0.05
| Model | Independent | Independent, eq. power | Overlapping | Overlapping, corrected |
|---|---|---|---|---|
| Null | ||||
| FDP | – | – | – | – |
| TP | – | – | – | – |
| FP | 0 | 0 | 245.14 (237.9, 252.4) | 0 |
| Positive Pleiotropy A (400 non-null SNPs) | ||||
| FDP | 0.0053 (0.0044, 0.0063) | 0.0030 (0.0024, 0.0037) | 0.39 (0.39, 0.40) | 0.0056 (0.0048, 0.0065) |
| TP | 260.9 (259.6, 262.1) | 283 (282.0, 284.1) | 330.4 (329.5, 331.4) | 243.4 (242.1, 244.7) |
| FP | 1.4 (1.2, 1.7) | 0.9 (0.7, 1.1) | 215.7 (210.1, 221.4) | 1.4 (1.2, 1.6) |
| Positive Pleiotropy + Univariate (400 non-null SNPs) | ||||
| FDP | 0.008 (0.007, 0.009) | 0.005 (0.004, 0.006) | 0.48 (0.48, 0.49) | 0.01 (0.008, 0.01) |
| TP | 233.4 (232.1, 234.7) | 270.4 (269.4, 271.5) | 306.1 (304.7, 307.4) | 209.8 (208.4, 211.2) |
| FP | 2.0 (1.7, 2.2) | 1.3 (1.1, 1.5) | 289.2 (282.4, 296.1) | 2.08 (1.8, 2.4) |
| Positive + Antagonistic Pleiotropy (400 non-null SNPs) | ||||
| FDP | 0.005 (0.005, 0.006) | 0.004 (0.003, 0.005) | 0.46 (0.45, 0.47) | 0.008 (0.007, 0.010) |
| TP | 261.5 (260.4, 262.6) | 290.8 (289.6, 291.9) | 280.9 (280.0, 282.2) | 228.7 (227.3, 230.1) |
| FP | 1.4 (1.2, 1.6) | 1.2 (1.0, 1.4) | 240.1 (233.8, 246.4) | 2.0 (1.7, 2.2) |
| Positive Pleiotropy B (1200 non-null SNPs) | ||||
| FDP | 0.018 (0.008, 0.020) | 0.013 (0.012, 0.014) | 0.32 (0.31, 0.33) | 0.029 (0.027. 0.031) |
| TP | 295.65 (293.01, 298.29) | 425.38 (422.42, 428.34) | 618.30 (615.22, 621.38) | 310.94 (308.22, 313.66) |
| FP | 5.51 (5.05, 5.97) | 5.60 (5.11, 6.09) | 294.64 (288.18, 301.10) | 9.36 (8.70, 10.02) |
| Positive Pleiotropy C (2200 non-null SNPs) | ||||
| FDP | 0.019 (0.017, 0.021) | 0.018 (0.016, 0.020) | 0.36 (0.35, 0.36) | 0.034 (0.032, 0.037) |
| TP | 159.71 (157.63, 161.79) | 243.98 (241.91, 246.04) | 575.33 (570.58, 580.08) | 184.10 (181.68, 186.52) |
| FP | 3.16 (2.80, 3.52) | 4.49 (4.05, 4.92) | 324.94 (317.67, 332.20) | 6.59 (6.08, 7.10) |
Results are presented for six different simulation scenarios: the null model, where both traits are independent from genotype (all SNPs are null); positive pleiotropy A with 400 SNPs that are non-null for both traits; positive pleiotropy plus univariate effects for trait 1, where 200 SNPs were non-null for traits 1 and 2 and 200 SNPs were non-null for trait 1 only; positive plus antagonistic pleiotropy, where 400 SNPs were non-null for both traits 1 and 2, and half of these non-null SNPs have an effect in opposing directions for trait 1 and 2; positive pleiotropy B with 1200 SNPs that are non-null for both traits, 200 with large effects and 1000 with small effects; positive pleiotropy C with 2200 SNPs that are non-null for both traits, 200 with large effects and 2000 with small effects. In all six scenarios d=100,000 SNPs were simulated, the correlation due to overlap is 0.4 and the test statistics for study 2 were used as a covariate for study 1 for the covariate-modulated fdr. For each simulation scenario, we divided the simulated subjects into the following GWAS pairs: , independent GWASs with no overlap (each with n=10,000), , independent equally-powered GWASs (each with n=12,500 like the GWASs with overlapping subjects), , uncorrected overlapping GWAS with (each with n=12,500 including,5000 overlapping, subjects) and , the GWAS with 5,000 overlapping subjects after correction for sample overlap. Data is presented as mean (95% confidence interval)
Mean false discovery proportion (FDP), mean number of falsely rejected null hypotheses out of 99,600, i.e. false positives (FP) and mean number of correctly rejected non-null hypotheses out of 400 s, i.e. true positives (TP) over 100 simulation runs and a covariate-modulated false discovery rate (cmfdr) cut-off of 0.05
| # |
| Independent | Overlapping | Overlapping, corrected | |
|---|---|---|---|---|---|
| 0 | 0 | ||||
| FDP | 5.96E-03 (4.99E-03, 6.92E-03) | 5.92E-03 (4.97E-03, 6.88E-03) | 6.03E-03 (5.07E-03,7.00E-03) | ||
| TP | 268.55 (267.90, 269.90) | 268.52 (267.18, 269.86) | 268.59 (267.18, 269.86) | ||
| FP | 1.62 (1.36, 1.88) | 1.61 (1.35, 1.87) | 1.64 (1.37, 1.91) | ||
| 500 | 0.04 | ||||
| FDP | 5.32E-03 (4.41E-03,6.22E-03) | 5.58E-03 (4.58E-03,6.59E-03) | 4.77E-03 (3.81E-03,5.73E-03) | ||
| TP | 262.75 (261.58,263.92) | 266.3 (264.99, 267.61) | 260.47 (259.05, 261.61) | ||
| FP | 1.41 (1.17, 1.65) | 1.5 (1.23, 1.77,) | 1.25 (1.00, 1.50) | ||
| 1000 | 0.08 | ||||
| FDP | 5.83E-03 (4.87E-03, 6.78E-03) | 8.02E-03 (6.81E-03, 9.23E-03) | 5.69E-03 (4.76E-03,6.63E-03) | ||
| TP | 263.43 (262.08, 264.78) | 271.85 (270.59, 273.11) | 258.92 (257.57, 260.27) | ||
| FP | 1.55 (1.29, 1.81) | 2.21 (1.87, 2.55) | 1.49 (1.24, 1.74) | ||
| 1500 | 0.12 | ||||
| FDP | 5.25E-03 (4.44E-03, 6.06E-03) | 1.21E-02 (1.08E-03, 1.34E-02) | 6.00E-03 (5.08E-03, 6.92E-03) | ||
| TP | 263.67 (262.43, 264.91) | 277.11 (275.82, 278.40) | 257.79 (256.51, 259.07) | ||
| FP | 1.4 (1.18, 1.62) | 3.4 (3.02, 3.78) | 1.56 (1.32, 1.80) | ||
| 2000 | 0.16 | ||||
| FDP | 4.42E-03 (3.61E-03, 5.22E-03) | 1.77E-02 (1.61E-02, 1.92E-02) | 4.06E-03 (3.26E-03, 4.86E-03) | ||
| TP | 255.16 (253.98, 256.34) | 274.18 (273.10, 275.26) | 248.52 (247.27, 249.77) | ||
| FP | 1.14 (0.93, 1.35) | 4.96 (4.51, 5.41) | 1.02 (0.82, 1.22) | ||
| 2500 | 0.20 | ||||
| FDP | 5.03E-03 (4.16E-03,5.90E-03) | 3.64E-02 (3.38E-02, 3.91E-02) | 5.20E-03 (4.28E-03, 6.12E-03) | ||
| TP | 258.84 (257.51, 260.17) | 288.47 (287.29, 289.65) | 249.59 (248.22, 250.96) | ||
| FP | 1.31 (1.08, 1.54) | 10.98 (10.15, 11.81) | 1.31 (1.08, 1.54) | ||
| 3000 | 0.24 | ||||
| FDP | 5.08E-03 (4.18E-03, 5.97E-03) | 7.08E-02 (6.74E-02,7.42E-02) | 6.32E-03 (5.41E-03, 7.22E.03) | ||
| TP | 261.65 (260.32, 262.98) | 300.52 (299.39, 301.65) | 250.14 (248.75, 251.53) | ||
| FP | 1.34 (1.10, 1.58) | 23.03 (21.83, 24.23) | 1.6 (1.37, 1.83) | ||
| 3500 | 0.28 | ||||
| FDP | 4.24E-03 (3.52E-03, 4.96E-03) | 1.25E-01 (1.21E-01, 1.30E-01) | 5.57E-03 (4.74E-03, 6.40E-03) | ||
| TP | 268.5 (267.37, 269.63) | 315.07 (314.00, 316.14) | 256.42 (255.08, 257.76) | ||
| FP | 1.15 (0.95, 1.35) | 45.42 (43.46, 47.38) | 1.44 (1.23, 1.65) | ||
| 4000 | 0.32 | ||||
| FDP | 3.62E-03 (2.84E-03, 4.41E-03) | 1.98E-01 (1.93E-01, 2.03E-01) | 4.74E-03 (3.91E-03, 5.56E-03) | ||
| TP | 262.39 (261.27, 263.51) | 316.5 (315.46, 317.54) | 249.16 (247.94, 250.38) | ||
| FP | 0.96 (0.75, 1.17) | 78.65 (76.05, 81.25) | 1.19 (0.98, 1.40) | ||
| 4500 | 0.36 | ||||
| FDP | 4.81E-03 (3.99E-03, 5.63E-03) | 2.89E-01 (2.83E-01, 2.94E-01) | 5.49E-03 (4.54E-03, 6.44E-03) | ||
| TP | 259.29 (258.16, 260.42) | 319.99 (319.04, 320.94) | 245.16 (243.92, 246.40) | ||
| FP | 1.26 (1.04, 1.48) | 130.41 (127.08, 133.74) | 1.36 (1.12, 1.60) | ||
| 5000 | 0.40 | ||||
| FDP | 5.44E-03 (4.57E-03, 6.31E-03) | 3.98E-01 (3.92E-01, 4.04E-01) | 6.79E-03 (5.78E-03, 7.80E-03) | ||
| TP | 262.26 (261.02, 263.50) | 334.25 (333.28, 335.22) | 245.02 (243.66, 246.38) | ||
| FP | 1.44 (1.21, 1.67) | 222.52 (216.73, 228.31) | 1.68 (1.43, 1.93) |
Here d=100,000 SNPs were simulated, of which 400 were non-null in both study 1 and study 2, i.e., the positive pleiotropy senario. The test statistics for study 2 were used as a covariate for study 1 for the covariate-modulated fdr. For each simulation, we divided the simulated subjects into the following GWAS pairs: , independent GWASs with no overlap (each with n=10,000), , uncorrected, overlapping GWAS with (each with including between 0 and 5000 overlapping subjects) and , the GWAS with overlapping subjects after correction for sample overlap. Data is presented as mean (95% confidence interval)
#, number overlapping. ρ, correlation due to overlap
Fig. 3Mean false discovery proportion (FDP) versus the correlation due to sample overlap over 100 simulation runs and a covariate-modulated false discovery rate (cmfdr) cut-off of 0.05. Here d=100,000 SNPs were simulated, of which 400 were non-null in both study 1 and study 2, i.e., have positive pleiotropic effects. The test statistics for study 2 were used as a covariate for study 1
Robustness of the proposed correction
| True correlation | Plug-in correlation | TP | FP | FDP |
|---|---|---|---|---|
| 0.4 | 0.3 | 261.16 (260.27, 262.85) | 2.42 (2.12, 2.71) | 0.0091 (0.0080, 0.0102) |
| 0.4 | 0.35 | 252.20 (250.92, 253.48) | 1.56 (1.32, 1.80) | 0.0061 (0.0052, 0.0070) |
| 0.4 | 0.375 | 247.78 (246.79, 249.06) | 1.48 (1.22, 1.73) | 0.0059 (0.0049, 0.0069) |
| 0.4 | 0.4 | 243.59 (242.31, 244.879) | 1.40 (1.17, 1.63) | 0.0057 (0.0048, 0.0066) |
| 0.4 | 0.425 | 238.72 (237.42, 240.02) | 1.60 (1.38, 1.82) | 0.0066 (0.0057, 0.0075) |
| 0.4 | 0.45 | 235.11 (233.88, 236.34) | 1.96 (1.72, 2.20) | 0.0082 (0.0072, 0.0092) |
| 0.4 | 0.5 | 234.81 (233.57, 236.04) | 1.96 (1.72, 2.20) | 0.0082 (0.0072, 0.0092) |
For the “positive pleiotropy A” scenario the correlation due to overlap is 0.4. Here we varied the correlation value in the de-correlation step from 0.3 to 0.5. TP, true positives; FP false positives, FDP, false discovery proportion
Psychiatric Genetics Consortium data, with varying amounts of overlapping controls
| #Overlapping | Correlation | #Discoveries, raw | #Discoveries, adjusted |
|---|---|---|---|
| 0 | 0 | 255.3 (239.8,270.8) | 256.5 (239.7,273.3) |
| 2000 | 0.09 | 322.3 (310.1,334.5) | 206.5 (190.1, 222.9) |
| 4000 | 0.18 | 479 (437.4,520.6) | 194.5 (172.8, 216.2) |
| 6000 | 0.27 | 827.6 (762.1, 893.1) | 186.4 (162.9, 209.9 |
| 8000 | 0.36 | 1442.7 (1325.2, 1560.2) | 188.9 (156.8, 221.0) |
| 10000 | 0.45 | 2985.7 (2785.6, 3185.8) | 212.7 (181.3, 244.1) |
The test statistics for bipolar disorder were used as a covariate for schizophrenia in the covariate-modulated fdr (cmfdr). SNPs having a cmfdr < 0.05 were called as discoveries. Data is presented as mean (95% confidence interval)