| Literature DB >> 26729364 |
James J Yang1, Jia Li2, L Keoki Williams3, Anne Buu4.
Abstract
BACKGROUND: In genome-wide association studies (GWAS) for complex diseases, the association between a SNP and each phenotype is usually weak. Combining multiple related phenotypic traits can increase the power of gene search and thus is a practically important area that requires methodology work. This study provides a comprehensive review of existing methods for conducting GWAS on complex diseases with multiple phenotypes including the multivariate analysis of variance (MANOVA), the principal component analysis (PCA), the generalizing estimating equations (GEE), the trait-based association test involving the extended Simes procedure (TATES), and the classical Fisher combination test. We propose a new method that relaxes the unrealistic independence assumption of the classical Fisher combination test and is computationally efficient. To demonstrate applications of the proposed method, we also present the results of statistical analysis on the Study of Addiction: Genetics and Environment (SAGE) data.Entities:
Mesh:
Year: 2016 PMID: 26729364 PMCID: PMC4704475 DOI: 10.1186/s12859-015-0868-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The relationship between the covariance δ (y-axis) and the correlation ρ (x-axis)
Simulation results when the multivariate phenotypes come from a multivariate normal distribution
|
| MANOVA | PCA | GEE | TATES | FC- | FC-Permutation | FC-Pearson | FC-Kendall |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| 0 | 0.0477 | 0.0514 | 0.0109 | 0.0487 | 0.0468 | 0.0455 | 0.0455 | 0.0451 |
| (0.0021) | (0.0022) | (0.0010) | (0.0022) | (0.0021) | (0.0021) | (0.0021) | (0.0021) | |
| 0.25 | 0.0477 | 0.0499 | 0.0763 | 0.0498 | 0.0631 | 0.0488 | 0.0482 | 0.0477 |
| (0.0021) | (0.0022) | (0.0027) | (0.0022) | (0.0024) | (0.0022) | (0.0021) | (0.0021) | |
| 0.5 | 0.0477 | 0.0496 | 0.1518 | 0.0506 | 0.0942 | 0.0473 | 0.0482 | 0.0484 |
| (0.0021) | (0.0022) | (0.0036) | (0.0022) | (0.0029) | (0.0021) | (0.0021) | (0.0021) | |
| 0.75 | 0.0477 | 0.0496 | 0.2202 | 0.0494 | 0.1263 | 0.0467 | 0.0489 | 0.0485 |
| (0.0021) | (0.0022) | (0.0041) | (0.0022) | (0.0033) | (0.0021) | (0.0022) | (0.0021) | |
|
| ||||||||
| 0 | 0.7595 | 0.5679 | 0.9333 | 0.7359 | 0.9067 | 0.9058 | 0.9047 | 0.9040 |
| (0.0043) | (0.0050) | (0.0025) | (0.0044) | (0.0029) | (0.0029) | (0.0029) | (0.0029) | |
| 0.25 | 0.4086 | 0.7075 | 0.8570 | 0.6406 | 0.8076 | 0.7748 | 0.7749 | 0.7749 |
| (0.0049) | (0.0045) | (0.0035) | (0.0048) | (0.0039) | (0.0042) | (0.0042) | (0.0042) | |
| 0.5 | 0.2655 | 0.5295 | 0.8113 | 0.5668 | 0.7411 | 0.6314 | 0.6420 | 0.6421 |
| (0.0044) | (0.0050) | (0.0039) | (0.0050) | (0.0044) | (0.0048) | (0.0048) | (0.0048) | |
| 0.75 | 0.2011 | 0.4144 | 0.7827 | 0.4949 | 0.6927 | 0.5169 | 0.5272 | 0.5278 |
| (0.0040) | (0.0049) | (0.0041) | (0.0050) | (0.0046) | (0.0050) | (0.0050) | (0.0050) | |
|
| ||||||||
| 0 | 0.8550 | 0.6646 | 0.9272 | 0.8731 | 0.9457 | 0.9454 | 0.9448 | 0.9445 |
| (0.0035) | (0.0047) | (0.0026) | (0.0033) | (0.0023) | (0.0023) | (0.0023) | (0.0023) | |
| 0.25 | 0.6334 | 0.7243 | 0.8500 | 0.8237 | 0.8864 | 0.8604 | 0.8631 | 0.8621 |
| (0.0048) | (0.0045) | (0.0036) | (0.0038) | (0.0032) | (0.0035) | (0.0034) | (0.0034) | |
| 0.5 | 0.6203 | 0.5437 | 0.8043 | 0.7758 | 0.8283 | 0.7252 | 0.7334 | 0.7333 |
| (0.0049) | (0.0050) | (0.0040) | (0.0042) | (0.0038) | (0.0045) | (0.0044) | (0.0044) | |
| 0.75 | 0.8177 | 0.4227 | 0.7756 | 0.7512 | 0.7721 | 0.5821 | 0.5942 | 0.5941 |
| (0.0039) | (0.0049) | (0.0042) | (0.0043) | (0.0042) | (0.0049) | (0.0049) | (0.0049) | |
The three different effect sizes are: no effect β=(0,0,0,0,0)′; moderate effects β=(0.3,0.3,0.3,0.3,0.3)′; and varied effects β=(0.1,0.2,0.3,0.4,0.5)′. The correlation between genes is ϱ ranging from 0 to 0.75. The competing methods are MANOVA (Multivariate analysis of variance), PCA (Principal component analysis), GEE (Generalized estimating equations), TATES (Trait-based association test involving the extended Simes procedure), FC- (the chi-squared distribution with 2m degrees of freedom under the independence assumption), FC-Permutation (the permutation method based on 1,000 permutes), FC-Pearson (the proposed method with the correlation being estimated by the Pearson’s sample correlation coefficient), and FC-Kendall (the proposed method with being estimated by the Kendall’s τ). The numbers in each cell are the mean (standard deviation) of the indicator variable for p-value <0.05 among the 10,000 replications
Simulation results when the multivariate phenotypes come from a mixture of two multivariate normal distributions
|
| MANOVA | PCA | GEE | TATES | FC- | FC-Permutation | FC-Pearson | FC-Kendall |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| 0 | 0.0535 | 0.0543 | 0.0135 | 0.0481 | 0.0487 | 0.0482 | 0.0461 | 0.0477 |
| (0.0023) | (0.0023) | (0.0012) | (0.0021) | (0.0022) | (0.0021) | (0.0021) | (0.0021) | |
| 0.25 | 0.0553 | 0.0514 | 0.0771 | 0.0496 | 0.0627 | 0.0465 | 0.0458 | 0.0469 |
| (0.0023) | (0.0022) | (0.0027) | (0.0022) | (0.0024) | (0.0021) | (0.0021) | (0.0021) | |
| 0.5 | 0.0537 | 0.0501 | 0.1505 | 0.0522 | 0.0895 | 0.0480 | 0.0491 | 0.0501 |
| (0.0023) | (0.0022) | (0.0036) | (0.0022) | (0.0029) | (0.0021) | (0.0022) | (0.0022) | |
| 0.75 | 0.0525 | 0.0538 | 0.2206 | 0.0481 | 0.1296 | 0.0493 | 0.0526 | 0.0513 |
| (0.0022) | (0.0023) | (0.0041) | (0.0021) | (0.0034) | (0.0022) | (0.0022) | (0.0022) | |
|
| ||||||||
| 0 | 0.5943 | 0.3299 | 0.8172 | 0.5683 | 0.7677 | 0.7633 | 0.7595 | 0.7619 |
| (0.0049) | (0.0047) | (0.0039) | (0.0050) | (0.0042) | (0.0043) | (0.0043) | (0.0043) | |
| 0.25 | 0.3038 | 0.5414 | 0.7487 | 0.5003 | 0.6779 | 0.6330 | 0.6333 | 0.6332 |
| (0.0046) | (0.0050) | (0.0043) | (0.0050) | (0.0047) | (0.0048) | (0.0048) | (0.0048) | |
| 0.5 | 0.2073 | 0.3981 | 0.7135 | 0.4402 | 0.6168 | 0.4989 | 0.5083 | 0.5082 |
| (0.0041) | (0.0049) | (0.0045) | (0.0050) | (0.0049) | (0.0050) | (0.0050) | (0.0050) | |
| 0.75 | 0.1601 | 0.3135 | 0.6847 | 0.3870 | 0.5779 | 0.4038 | 0.4111 | 0.4116 |
| (0.0037) | (0.0046) | (0.0046) | (0.0049) | (0.0049) | (0.0049) | (0.0049) | (0.0049) | |
|
| ||||||||
| 0 | 0.6972 | 0.4002 | 0.8087 | 0.7328 | 0.8451 | 0.8425 | 0.8379 | 0.8408 |
| (0.0046) | (0.0049) | (0.0039) | (0.0044) | (0.0036) | (0.0036) | (0.0037) | (0.0037) | |
| 0.25 | 0.4766 | 0.5579 | 0.7427 | 0.6698 | 0.7656 | 0.7269 | 0.7236 | 0.7259 |
| (0.0050) | (0.0050) | (0.0044) | (0.0047) | (0.0042) | (0.0045) | (0.0045) | (0.0045) | |
| 0.5 | 0.4728 | 0.4083 | 0.7073 | 0.6237 | 0.7036 | 0.5766 | 0.5855 | 0.5862 |
| (0.0050) | (0.0049) | (0.0046) | (0.0048) | (0.0046) | (0.0049) | (0.0049) | (0.0049) | |
| 0.75 | 0.6576 | 0.3172 | 0.6799 | 0.5976 | 0.6394 | 0.4532 | 0.4624 | 0.4617 |
| (0.0047) | (0.0047) | (0.0047) | (0.0049) | (0.0048) | (0.0050) | (0.0050) | (0.0050) | |
The three different effect sizes are: no effect β=(0,0,0,0,0)′; moderate effects β=(0.3,0.3,0.3,0.3,0.3)′; and varied effects β=(0.1,0.2,0.3,0.4,0.5)′. The correlation between genes is ϱ ranging from 0 to 0.75. The competing methods are MANOVA (Multivariate analysis of variance), PCA (Principal component analysis), GEE (Generalized estimating equations), TATES (Trait-based association test involving the extended Simes procedure), FC- (the chi-squared distribution with 2m degrees of freedom under the independence assumption), FC-Permutation (the permutation method based on 1,000 permutes), FC-Pearson (the proposed method with the correlation being estimated by the Pearson’s sample correlation coefficient), and FC-Kendall (the proposed method with being estimated by the Kendall’s τ). The numbers in each cell are the mean (standard deviation) of the indicator variable for p-value <0.05 among the 10,000 replications
Fig. 2The distributions of phenotypes for alcohol, nicotine, marijuana and cocaine dependence. The x-axis is the number of symptoms, and the y-axis is the frequency
The Kendall rank pairwise correlations between alcohol, nicotine, marijuana, and cocaine outcomes
| Alcohol | Nicotine | Marijuana | Cocaine | |
|---|---|---|---|---|
| Alcohol | 1 | 0.4554 | 0.4236 | 0.5029 |
| Nicotine | 1 | 0.3373 | 0.3375 | |
| Marijuana | 1 | 0.5067 | ||
| Cocaine | 1 |
Fig. 3The QQ-plots of p-values for the marginal tests of association between the SNPs and each of the four addiction symptomatology variables. The x-axis is the expected −log10(p-value), and the y-axis is the observed −log10(p-value). The diagonal gray straight lines have the slope 1 and intercept 0
Fig. 4The QQ-plot of p-values for the Fisher combination test of association between the SNPs and the multivariate phenotype of addiction. The x-axis is the expected −log10(p-value), and the y-axis is the observed −log10(p-value). The diagonal gray straight lines have the slope 1 and intercept 0