| Literature DB >> 21103334 |
Hon-Cheong So1, Benjamin H K Yip, Pak Chung Sham.
Abstract
Recently genome-wide association studies (GWAS) have identified numerous susceptibility variants for complex diseases. In this study we proposed several approaches to estimate the total number of variants underlying these diseases. We assume that the variance explained by genetic markers (Vg) follow an exponential distribution, which is justified by previous studies on theories of adaptation. Our aim is to fit the observed distribution of Vg from GWAS to its theoretical distribution. The number of variants is obtained by the heritability divided by the estimated mean of the exponential distribution. In practice, due to limited sample sizes, there is insufficient power to detect variants with small effects. Therefore the power was taken into account in fitting. Besides considering the most significant variants, we also tried to relax the significance threshold, allowing more markers to be fitted. The effects of false positive variants were removed by considering the local false discovery rates. In addition, we developed an alternative approach by directly fitting the z-statistics from GWAS to its theoretical distribution. In all cases, the "winner's curse" effect was corrected analytically. Confidence intervals were also derived. Simulations were performed to compare and verify the performance of different estimators (which incorporates various means of winner's curse correction) and the coverage of the proposed analytic confidence intervals. Our methodology only requires summary statistics and is able to handle both binary and continuous traits. Finally we applied the methods to a few real disease examples (lipid traits, type 2 diabetes and Crohn's disease) and estimated that hundreds to nearly a thousand variants underlie these traits.Entities:
Mesh:
Year: 2010 PMID: 21103334 PMCID: PMC2984437 DOI: 10.1371/journal.pone.0013898
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Probability density of the Vg of detected variants in a GWAS with adjustment for power.
We assume an exponential distribution of Vg for susceptibility variants under unlimited sample size. In practice, sample size and power is limited and small-effect variants will be under-represented. Therefore the probability density should be adjusted for power. “Orig density” denotes the original exponential distribution, and the numbers like 2000/2000 denotes the number of cases and controls. The significance threshold was set at 5×10−7 and prevalence set at 0.001. Lambda equals 2000. The risk allele frequency was assumed at 0.5.
An overview of the proposed estimators of λ.
| Name of estimator | SNP inclusion criteria | Distribution fitting method | Winner's curse correction | Corresponding numbering in figures |
| Bonf | Bonf | pwr*exp_density | None | 1 |
| Bonf.corr | Bonf | pwr*exp_density | Avg of Ghosh 1&2 | 2 |
| Bonf.corr1 | Bonf | pwr*exp_density | Ghosh 1 | 3 |
| Bonf.corr2 | Bonf | pwr*exp_density | Ghosh 2 | 4 |
| Bonf.corr.med | Bonf | pwr*exp_density | Median (Zhong) | 5 |
| Bonf.corr.MSEmedian | Bonf | pwr*exp_density | MSE median (Zhong) | 6 |
| Bonf.fitfZ.conv | Bonf | f1conv | Not necessary | 7 |
| truncfdr | fdrthres | pwr*exp_density | None | 1 |
| truncfdr.corr | fdrthres | pwr*exp_density | Avg of Ghosh 1&2 | 2 |
| truncfdr.corr1 | fdrthres | pwr*exp_density | Ghosh1 | 3 |
| truncfdr.corr2 | fdrthres | pwr*exp_density | Ghosh2 | 4 |
| truncfdr.corr.median | fdrthres | pwr*exp_density | Median (Zhong) | 5 |
| truncfdr.corr.MSEmedian | fdrthres | pwr*exp_density | MSE median (Zhong) | 6 |
| truncfdr.fitfZ.conv | fdrthres | f1conv | Not necessary | 7 |
Bonf, Bonferroni correction; fdrthres, local fdr threshold; pwr*exp_density, fitting by the power times exponential density curve; flconv, fitting by considering the convolution density of non-null observations f1(z); Avg, average. The winner's curse correction methods are named by the first author of the corresponding reference papers. Please see the text for details.
Root mean squared error of different estimators from simulations when number of cases and controls each equals 3500.
| RMSE |
| λ = 2000 | λ = 3000 | λ = 4000 | ||||
|
| RMSE | rank | RMSE | rank | RMSE | rank | RMSE | rank |
| Bonf | 198 | 12 | 783 | 13 | 1553 | 14 | 2403 | 11 |
| Bonf.corr | 201 | 13 | 773 | 12 | 1520 | 13 | 2841 | 12 |
| Bonf.corr1 | 159 | 11 | 645 | 11 | 1408 | 11 | 3065 | 13 |
| Bonf.corr2 | 237 | 14 | 845 | 14 | 1477 | 12 | 2318 | 10 |
| Bonf.corr.med | 123 | 8 | 443 | 8 | 1001 | 8 | 2311 | 9 |
| Bonf.corr.MSEmedian | 93 | 6 | 323 | 4 | 802 | 7 | 1866 | 7 |
| Bonf.fitfZ.conv | 91 | 4 | 352 | 6 | 1021 | 9 | 3795 | 14 |
| truncfdr | 140 | 10 | 625 | 10 | 1335 | 10 | 2151 | 8 |
| truncfdr.corr | 110 | 7 | 407 | 7 | 692 | 5 | 1035 | 5 |
| truncfdr.corr1 | 92 | 5 | 339 | 5 | 615 | 4 | 998 | 4 |
| truncfdr.corr2 | 126 | 9 | 446 | 9 | 694 | 6 | 936 | 2 |
| truncfdr.corr.median | 71 | 1 | 221 | 1 | 418 | 1 | 730 | 1 |
| truncfdr.corr.MSEmedian | 77 | 3 | 282 | 3 | 593 | 3 | 970 | 3 |
| truncfdr.fitfZ.conv | 74 | 2 | 245 | 2 | 520 | 2 | 1236 | 6 |
Root mean squared error of different estimators from simulations when number of cases and controls each equals 5000.
| RMSE |
| λ = 2000 | λ = 3000 | λ = 4000 | ||||
|
| RMSE | rank | RMSE | rank | RMSE | rank | RMSE | rank |
| Bonf | 131 | 12 | 570 | 13 | 1212 | 12 | 1991 | 14 |
| Bonf.corr | 139 | 13 | 567 | 12 | 1233 | 13 | 1874 | 12 |
| Bonf.corr1 | 114 | 11 | 451 | 11 | 1045 | 11 | 1670 | 10 |
| Bonf.corr2 | 161 | 14 | 656 | 14 | 1326 | 14 | 1894 | 13 |
| Bonf.corr.med | 103 | 10 | 348 | 9 | 746 | 9 | 1184 | 9 |
| Bonf.corr.MSEmedian | 84 | 6 | 243 | 5 | 532 | 4 | 959 | 7 |
| Bonf.fitfZ.conv | 82 | 5 | 245 | 6 | 574 | 6 | 1155 | 8 |
| truncfdr | 88 | 7 | 431 | 10 | 978 | 10 | 1679 | 11 |
| truncfdr.corr | 91 | 8 | 294 | 7 | 671 | 7 | 916 | 5 |
| truncfdr.corr1 | 81 | 4 | 242 | 4 | 569 | 5 | 794 | 4 |
| truncfdr.corr2 | 100 | 9 | 335 | 8 | 724 | 8 | 943 | 6 |
| truncfdr.corr.median | 69 | 2 | 182 | 1 | 371 | 1 | 491 | 1 |
| truncfdr.corr.MSEmedian | 65 | 1 | 217 | 3 | 437 | 3 | 721 | 3 |
| truncfdr.fitfZ.conv | 72 | 3 | 182 | 2 | 407 | 2 | 620 | 2 |
Root mean squared error of different estimators from simulations when number of cases and controls each equals 7000.
| RMSE |
| λ = 2000 | λ = 3000 | λ = 4000 | ||||
|
| RMSE | rank | RMSE | rank | RMSE | rank | RMSE | rank |
| Bonf | 87 | 11 | 395 | 12 | 905 | 12 | 1552 | 12 |
| Bonf.corr | 91 | 13 | 403 | 13 | 915 | 13 | 1587 | 13 |
| Bonf.corr1 | 78 | 8 | 319 | 11 | 733 | 11 | 1338 | 11 |
| Bonf.corr2 | 104 | 14 | 474 | 14 | 1045 | 14 | 1715 | 14 |
| Bonf.corr.med | 72 | 5 | 258 | 9 | 548 | 8 | 1011 | 9 |
| Bonf.corr.MSEmed | 64 | 3 | 190 | 6 | 381 | 5 | 716 | 6 |
| Bonf.fitfZ.conv | 65 | 4 | 180 | 4 | 357 | 4 | 678 | 4 |
| truncfdr | 60 | 1 | 291 | 10 | 684 | 10 | 1247 | 10 |
| truncfdr.corr | 84 | 10 | 222 | 7 | 487 | 7 | 817 | 7 |
| truncfdr.corr1 | 78 | 9 | 190 | 5 | 397 | 6 | 680 | 5 |
| truncfdr.corr2 | 89 | 12 | 252 | 8 | 555 | 9 | 897 | 8 |
| truncfdr.corr.median | 72 | 6 | 162 | 3 | 304 | 3 | 486 | 3 |
| truncfdr.corr.MSEmedian | 64 | 2 | 156 | 1 | 283 | 2 | 471 | 2 |
| truncfdr.fitfZ.conv | 73 | 7 | 158 | 2 | 282 | 1 | 470 | 1 |
Figure 2Boxplots of different estimators of λ, with inclusion threshold based on Bonferroni correction.
Figure 3Boxplots of different estimators of λ, with inclusion threshold based on local fdr.
Simulation results for different estimates of 95% confidence intervals (CI).
| True lambda | ||||
| 1000 | 2000 | 3000 | ||
| Coverage probability | Info | 0.949 | 0.959 | 0.953 |
| Info weighted | 0.936 | 0.955 | 0.969 | |
| MLRT | 0.953 | 0.954 | 0.946 | |
| MLRT weighted | 0.93 | 0.942 | 0.948 | |
| Average width of CI | Info | 359 | 1348 | 3703 |
| Info weighted | 285 | 877 | 2025 | |
| MLRT | 360 | 1370 | 4007 | |
| MLRT weighted | 285 | 883 | 2069 | |
| SD of width of CI | Info | 39 | 296 | 2213 |
| Info weighted | 24 | 122 | 454 | |
| MLRT | 39 | 307 | 2452 | |
| MLRT weighted | 25 | 124 | 477 | |
| Mean value | lowCI (info) | 831 | 1402 | 1423 |
| upCI (info) | 1190 | 2750 | 5127 | |
| lowCI (info weighted) | 882 | 1623 | 2175 | |
| upCI (info weighted) | 1166 | 2500 | 4201 | |
| lowCI (MLRT) | 843 | 1492 | 1865 | |
| upCI (MLRT) | 1203 | 2862 | 5872 | |
| lowCI (MLRT weighted) | 889 | 1663 | 2318 | |
| upCI (MLRT weighted) | 1174 | 2545 | 4387 | |
“Info” refers to CI obtained by inversion of Fisher information matrix. “Weighted” refers to weighting by the 1-local fdr.
MLRT, maximum likelihood ratio test; lowCI, lower 95% CI; upCI, upper 95% CI.
Estimates and confidence intervals of lambda for a few complex traits, variants included according to fdr threshold.
| HDL | LDL | TG | DM(all SNPs) | DM(pruned) | Crohn(all SNPs) | Crohn(pruned) | |
|
| |||||||
| truncfdr | 634 | 828 | 989 | 692 | 652 | 1086 | 1249 |
| truncfdr.corr | 827 | 1203 | 1534 | 1380 | 1215 | 1615 | 2062 |
| truncfdr.corr1 | 813 | 1190 | 1522 | 1344 | 1150 | 1569 | 1976 |
| truncfdr.corr2 | 835 | 1198 | 1519 | 1374 | 1256 | 1642 | 2117 |
| truncfdr.corr.median |
|
|
|
|
|
|
|
| truncfdr.corr.MSEmedian | NA | NA | NA | 1116 | 926 | 1329 | 1603 |
| truncfdr.fitfZ.conv |
|
|
|
|
|
|
|
|
| |||||||
| upCI.wt (fdrthres) | 751 | 1056 | 1257 | 1363 | 2420 | 1553 | 2348 |
| loCI.wt(fdrthres) | 635 | 895 | 1128 | 960 | −36 | 1222 | 1083 |
| upCI.MLRT.wt(fdrthres) | 881 | 1152 | 1460 | 1380 | 3344 | 1562 | 2438 |
| loCI.MLRT.wt(fdrthres) | 561 | 835 | 996 | 975 | 397 | 1229 | 1183 |
|
| |||||||
| based on truncfdr.corr.median | 484 | 401 | 519 | 527 | 442 | 793 | 983 |
| based on truncfdr.fitfZ.conv | 437 | 351 | 441 | 493 | 505 | 763 | 943 |
The bolded lines refer to estimators having the best overall performance in simulations.
HDL, high density lipoprotein; LDL, low density lipoprotein; TG, triglyceride; DM, type 2 diabetes mellitus; Crohn, Crohn's disease.
Bonf, inclusion threshold based on Bonferroni correction; fdrthres, inclusion threshold set at local fdr of 0.3.
“wt” refers to weighting by 1-local fdr. Please refer to the previous tables for abbreviations of the estimators and the types of CI calculated.
Estimated number of susceptibility variants assuming a gamma distribution of effect sizes.
| Shape | Lambda | Mean | Number of variants | |
| LDL | 0.9 | 937 | 9.60E-04 | 375 |
| 0.7 | 845 | 8.28E-04 | 435 | |
| 0.5 | 754 | 6.63E-04 | 543 | |
| 0.3 | 664 | 4.52E-04 | 797 | |
| HDL | 0.9 | 657 | 1.37E-03 | 460 |
| 0.7 | 585 | 1.20E-03 | 526 | |
| 0.5 | 514 | 9.73E-04 | 648 | |
| 0.3 | 445 | 6.74E-04 | 935 | |
| TG | 0.9 | 1152 | 7.81E-04 | 474 |
| 0.7 | 1038 | 6.75E-04 | 548 | |
| 0.5 | 812 | 6.16E-04 | 601 | |
| 0.3 | 924 | 3.25E-04 | 1140 | |
| Crohn (all) | 0.9 | 1328 | 6.78E-04 | 812 |
| 0.7 | 1211 | 5.78E-04 | 951 | |
| 0.5 | 1093 | 4.57E-04 | 1203 | |
| 0.3 | 970 | 3.09E-04 | 1779 | |
| Crohn (pruned) | 0.9 | 1649 | 5.46E-04 | 1008 |
| 0.7 | 1512 | 4.63E-04 | 1188 | |
| 0.5 | 1375 | 3.64E-04 | 1513 | |
| 0.3 | 1219 | 2.46E-04 | 2235 | |
| DM (all) | 0.9 | 1122 | 8.02E-04 | 528 |
| 0.7 | 1042 | 6.72E-04 | 631 | |
| 0.5 | 962 | 5.20E-04 | 816 | |
| 0.3 | 884 | 3.40E-04 | 1249 | |
| DM (pruned) | 0.9 | 1165 | 7.73E-04 | 549 |
| 0.7 | 1095 | 6.39E-04 | 663 | |
| 0.5 | 1026 | 4.87E-04 | 870 | |
| 0.3 | 994 | 3.02E-04 | 1405 |
When the shape parameter equals one, the gamma distribution is equivalent to an exponential distribution and the results are listed in table 6. When the shape parameter decreases, the distribution is more skewed towards zero, implying that we assume more variants to have small effect sizes.