| Literature DB >> 24531419 |
Xiang Zhou1, Matthew Stephens1.
Abstract
Multivariate linear mixed models (mvLMMs) are powerful tools for testing associations between single-nucleotide polymorphisms and multiple correlated phenotypes while controlling for population stratification in genome-wide association studies. We present efficient algorithms in the genome-wide efficient mixed model association (GEMMA) software for fitting mvLMMs and computing likelihood ratio tests. These algorithms offer improved computation speed, power and P-value calibration over existing methods, and can deal with more than two phenotypes.Entities:
Mesh:
Year: 2014 PMID: 24531419 PMCID: PMC4211878 DOI: 10.1038/nmeth.2848
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Comparison of computing time of different methods for parameter estimation in a single mvLMM, and for performing likelihood ratio tests in GWASs. Results are shown for both HMDP and NFBC1966 data sets. All computation was performed on a single core of an Intel Xeon L5420 2.50GHz CPU. n is the number of individuals, s is the number of SNPs, d is the number of traits, c is the number of covariates (c=1 here), t is the number of iterations used in the EM type algorithm and t is the number of iterations used in the NR type algorithm. Notice that the computing time for GEMMA is essentially the same for all d, because in GEMMA the computing time is dominated by the initial O(n eigen-decomposition step; the following optimization iterations are negligible. The sn step in GEMMA could be replaced with an snr step if the relatedness matrix is of rank r.
| Method | Time Complexity | Computation Time | |||||
|---|---|---|---|---|---|---|---|
| HMDP (n=656, s=108,562) | NFBC1966 (n=5255, s=319,111) | ||||||
| Fitting a single mvLMM | |||||||
| GEMMA | O(n3+n2d+n2c+t1nc2d2+t2nc2d6) | < 1 s | < 1 s | < 1 s | 6.7 min | 6.7 min | 6.7 min |
| WOMBAT | O(t1n3(d + c)3+t2n3d7) | 12.5 s | 39.2 s | 71.0 s | 31.0 min | 127.6 min | 477.3 min |
| GCTA | O(t1n3(d + c)3+t2n3d7) | 11.2 s | -- | -- | 38.2 min | -- | -- |
| Genome-wide applications | |||||||
| GEMMA | O(n3+n2d+n2c+s(n2+t1nc2d2+t2nc2d6)) | 6.2 min | 13.7 min | 28.5 min | 4.4 h | 4.8 h | 5.8 h |
| MTMM | O(t1n3(d + c)3+t2n3d7+sn2d2) | 16.4 min | -- | -- | 58.0 h | -- | -- |
Figure 1Illustration of the statistical benefits of our new algorithms implemented in GEMMA. (a) A QQ-plot showing the improved calibration of GEMMA p values compared with those from MTMM for simulated null data. Gray shaded area indicates 0.025 and 0.975 point-wise quantiles of the ordered p values under the null distribution. (b) GEMMA p values are consistently more significant than MTMM p values for the HMDP data. (c) Gain in power for GEMMA compared with MTMM in four different simulation scenarios based on the HMDP data. x-axis in shows the proportion of phenotypic variance in the first phenotype explained (PVE) by the SNP, while the point symbol and line type indicate the SNP effect direction (compared with its effect on the first phenotype) and size (quantified by PVE) on the second phenotype (+: opposite direction, 0.8PVE; ×: opposite direction, 0.2PVE; o: same direction, 0.8PVE; Δ: same direction, 0.2PVE). (d) Simulation results illustrating the potential gain in power from four-phenotype vs two-phenotype analyses.