| Literature DB >> 22706312 |
Xiang Zhou1, Matthew Stephens.
Abstract
Linear mixed models have attracted considerable attention recently as a powerful and effective tool for accounting for population stratification and relatedness in genetic association tests. However, existing methods for exact computation of standard test statistics are computationally impractical for even moderate-sized genome-wide association studies. To address this issue, several approximate methods have been proposed. Here, we present an efficient exact method, which we refer to as genome-wide efficient mixed-model association (GEMMA), that makes approximations unnecessary in many contexts. This method is approximately n times faster than the widely used exact method known as efficient mixed-model association (EMMA), where n is the sample size, making exact genome-wide association analysis computationally practical for large numbers of individuals.Entities:
Mesh:
Year: 2012 PMID: 22706312 PMCID: PMC3386377 DOI: 10.1038/ng.2310
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Performance of different methods for GWAS with the linear mixed model. All computing were performed on a single core of an Intel Xeon L5420 2.50 GHz CPU. The time for the EMMA method is projected from a selection of 10,000 and 100 genetic markers in the HMDP data set and WTCCC data set, respectively. Note that EMMA is implemented in R while others are implemented in C. A C implementation of EMMA could be a few times faster. p is the number of genetic markers, n is the number of individuals, m is the number of strains (equal to n for human studies), c is the number of covariates (fixed effects) in addition to the genotypes. t and t are the number of optimization iterations required, for Brent's method (super-linear rate of convergence) and the Newton--Raphson method (quadratic rate of convergence) respectively. Note that t is expected to be smaller than t
| Methods | Time Complexity[ | Computing Time | ||
|---|---|---|---|---|
| HDL-C[ | Crohn's Disease[ | |||
| Exact Methods | GEMMA | 33 minutes | 3.3 hours | |
| EMMA | ~ 9 days | ~ 27 years | ||
| FaST-LMM[ | 6.8 hours | 6.2 hours | ||
| Approximate Methods | EMMAX | 44 minutes | 6.4 hours | |
| GRAMMAR | 1.6 minutes | 12 minutes | ||
Complexities are given assuming the usual genome-wide relatedness matrix, which has rank n. In the current implementation of various methods except EMMA, the first terms are actually n, but it would be straightforward to make them mn in principle.
m=99, n=681 and p=1,885,197 for HDL-C.
m=n=4686 and p=442,001 for Crohn's disease.
These results are for the algorithm in FaST-LMM that uses the standard full-rank relatedness matrix, which produces p values that are identical to GEMMA and EMMA. See main text for further discussion.
Figure 1Comparison of -log10 p values obtained from GEMMA with those from EMMA (a, b), and EMMAX and GRAMMAR (c, d). In (a) and (b) the p values are shown for the top 10,000 markers and top 100 markers respectively. In (c) and (d) the p values are shown for all markers (1.9 million and 442k respectively).