Literature DB >> 29743933

Fast and Accurate Genome-Wide Association Test of Multiple Quantitative Traits.

Abstract

Multiple correlated traits are often collected in genetic studies. By jointly analyzing multiple traits, we can increase power by aggregating multiple weak effects and reveal additional insights into the genetic architecture of complex human diseases. In this article, we propose a multivariate linear regression-based method to test the joint association of multiple quantitative traits. It is flexible to accommodate any covariates, has very accurate control of type I errors, and offers very competitive performance. We also discuss fast and accurate significance p value computation especially for genome-wide association studies with small-to-medium sample sizes. We demonstrate through extensive numerical studies that the proposed method has competitive performance. Its usefulness is further illustrated with application to genome-wide association analysis of diabetes-related traits in the Atherosclerosis Risk in Communities (ARIC) study. We found some very interesting associations with diabetes traits which have not been reported before. We implemented the proposed methods in a publicly available R package.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2018 PMID： 29743933 PMCID： PMC5878919 DOI： 10.1155/2018/2564531

Source DB: PubMed Journal: Comput Math Methods Med ISSN： 1748-670X Impact factor: 2.238

1. Introduction

Over the past ten years, many epidemiologic studies have used genome-wide association studies (GWAS) to identify genetic components of many complex human diseases. These large cohort studies often collected a broad array of correlated traits that often reflect common physiological processes. By jointly analyzing these correlated traits, we can often gain more power by aggregating multiple weak effects and shed light on the mechanisms underlying complex human diseases [1]. There have been many methods proposed recently to detect SNP association with multiple correlated traits (see, e.g., [2-13]). A direct approach is based on the minimum trait p value [6], which typically requires permutations to compute significance p value. A related approach is the trait-based association test using an extended Simes procedure (TATES; [10]) that combines the univariate trait p values while correcting for the correlations among the multivariate traits. Various dimension reduction methods that summarize the multivariate traits into a univariate outcome are also proposed, which then apply the traditional univariate association test. Examples include the principal component analysis (PCA) [2], principal components of heritability (PCH) [3], and averaging longitudinally observed traits [7, 14]. PCA is an unsupervised dimension reduction and the top PC may not necessarily reflect the association signal. Sample splitting is typically used in PCH for significance calculations and may lead to loss of power. Multivariate trait testing methods generally perform better than univariate analysis-based approach [15]. Among the multivariate testing methods, a popular approach is the canonical correlation analysis (CCA) [4, 16, 17], which is fast to compute but not flexible and is unable to accommodate covariates. Liu et al. [5] proposed the GEE model [18] to jointly analyze one continuous and one binary trait. In Avery et al. [19] and He et al. [11], GEE-based marginal generalized linear modeling of multivariate traits is adopted for efficient multitrait association testing. Schifano et al. [20] proposed a closely related GEE-based scaled marginal association test of multiple secondary continuous traits. Sitlani et al. [13] explored the GEE modeling of longitudinally measured traits for association test. These GEE-based methods typically explicitly avoided modeling the trait correlations. Another set of multivariate approaches is based on the inverted regression of genotypes to test the overall trait effects. For example, the proportional odds regression modeling of genotypes was proposed as a convenient approach to testing multitrait associations [8, 21, 22]. A related adjacent category logistic regression of genotypes was proposed by Wu and Pankow [12]. Inverted regression approach does not easily accommodate imputed SNPs and has generally used the “best-guess” genotypes, which is known to be leading to a loss of power. In contrast, the multivariate trait regression approach can easily test imputed SNPs by using the imputation dosage as covariate. In this article, we explore an alternative multivariate regression framework to explicitly model the trait correlation and adjust for covariates to test multitrait associations. We compute the analytical p values for the proposed tests based on the F-distributions that offer very accurate type I error control with good finite sample performance. We also exploit the parallel nature of genome-wide association test to develop very efficient numerical algorithms that are extremely scalable to genome-wide association tests of millions of SNPs. We demonstrate through extensive numerical studies that the proposed methods have very competitive performance compared to existing methods. We further illustrate the usefulness of the proposed methods through an application to genome-wide association study of multiple diabetes-related glycemic traits.

2. Methods

We first discuss a multivariate linear regression-based framework for modeling the multiple quantitative traits and then derive the Wald type statistics for testing multitrait associations.

2.1. Multivariate Linear Regression Model

Consider m continuous traits Y = (y1,…, y), a covariate vector X = (x1,…, x) of length p (which could contain an ancestry indicator or principal components), and a genotype score G coding the number of minor alleles. Consider the multivariate normal trait model:where β0 is a vector of length m, β is an m × p matrix, β1 is a vector of length m, and the random error ϵ is of length m and is assumed to follow a zero mean multivariate normal distribution with covariance Σ, ϵ ~ N(0, Σ). Multivariate trait association amounts to testing H0 : β1 = 0. Here we have assumed the same covariates for all traits, which is the case for our ARIC study GWAS example (see Application to ARIC GWAS of Glycemic Traits) and many typical GWAS. In the supplementary materials (available here), we discuss the possible scenario with different covariates for each trait. The trait model (1) is a multivariate linear model (MLM; see, e.g., [23, chapter 8] and [24, chapter 9]). Given observations for n unrelated individuals, for individual i, denote Y as the outcome, X as the covariate, and G as the genotype score. Denote Y = (Y1,…, Y), X = (X1,…, X), G = (G1,…, G), and design matrix Z = (1, X, G) of dimension n × (p + 2), where 1 = (1,…, 1) is a column vector of n ones. Denote the m × (p + 2) parameter matrix β = (β0, β, β1). We can check that the maximum likelihood estimators (MLEs) are (see, e.g., [23, p. 294])

2.2. Conducting Multivariate Association Tests

Denote the vector operator vec(), which stacks the columns of a matrix into a vector. Denote A = ZZ. For the MLEs (2) of the MLM model (1), we can check that (see, e.g., [23, p. 296])where ⊗ denotes the Kronecker product and independently follows a Wishart distribution, W(Σ, n − p − 2), with n − p − 2 degrees of freedom (DFs) and scale matrix Σ. Define the n × (p + 1) design matrix Z0 = (1, X) and the corresponding n × n hat matrix H = Z0(Z0Z0)−1Z0. Let P = I − H and G = PG. Here I is an n × n identity matrix. We can check thatWe test the multitrait association with the following Wald statistic:Note that and are independent. Under the null hypothesis, ((n − p − 1 − m)/mn)Q follows the F-distribution with (m, n − p − 1 − m) DFs (see, e.g., [25, p. 541]). In the supplementary materials, we analytically show that the CCA test approach [4] is equivalent to a Score test statistic under the MLM model (1) when there are no covariates other than the genotype. Therefore, the proposed MLM-based Wald test can be treated as a natural and flexible generalization of the CCA: (I) it can accommodate any covariates; (II) it is based on the more powerful Wald test instead of the Score test for an association test of quantitative traits; (III) it has an exact F-distribution for the multivariate normally distributed traits and hence has very accurate control of type I errors for any sample sizes without the need of asymptotic approximation; and (IV) it is very fast to compute (see next section for details) and extremely scalable to genome-wide association tests of millions of SNPs. When genetic effects are similar across traits, we can further improve the multivariate association test power using a test statistic with 1-DF following the lines of O'Brien [26], which performed a Wald test of linear combinations of β1. We can derive similar Wald tests under the MLM (1) (see supplementary materials for technical details). When the genotype effects are the same across different traits, we study the following test statistic:where 1 is an m × 1 column vector of ones. When the scaled genotype effects are the same across different traits, we study the following test statistic:where S is a column vector of estimated standard errors: . Under the null hypothesis, both T and T′ follow the asymptotic standard normal distribution. To improve the finite sample performance, we can compare ((n − p − 1 − m)/n)T and ((n − p − 1 − m)/n)T′ to a t-distribution with (n − p − 1 − m)-DF.

2.3. Efficient Computation of GWAS Wald Test Statistics

For a typical GWAS with millions of SNPs, rather than fitting a MLM for each SNP, we developed very efficient algorithm to estimate the MLMs for all SNPs using matrix decomposition tricks following the line of Voorman et al. [27] as follows. For Z0, denote its singular value decomposition (SVD) as Z0 = UDV, where U is an n × (p + 1) matrix with orthogonal columns, D is a (p + 1)×(p + 1) diagonal matrix, and V is a (p + 1)×(p + 1) orthogonal matrix. The null MLM hat matrix can then be computed as H = UU, and G = G − U(UG). Denote the null MLM residual matrix as E = Y − U(UY), and let V0 = EE. In (4), we have shown that the genotype effect can be efficiently computed as . We can then compute the covariance matrix MLE as . Here both V0 and U just need to be precomputed once and can be stored for use with all SNPs. Operationally we can also apply the popular PLINK tool [28] to test multitrait association. We first obtain the residuals of multivariate traits and genotypes adjusting for all covariates. We then input the residuals into the PLINK CCA test approach [4]. Technically, we need to adjust the PLINK output p value using an F-distribution with different DFs (see supplementary materials for technical details).

3. Results

3.1. Simulation Studies

We consider three forms of Wald statistics: Q is the omnibus test, and T and T′ are the 1-DF test assuming common or common scaled effects. The GEE-based approaches of He et al. [11] are computationally very efficient, have been shown to appropriately control the type I errors, and have the overall best detection power compared to the other methods (e.g., TATES of [10] and other univariate test-based methods) in extensive numerical studies. Here we compared the proposed methods to their GEE score tests, denoted as (Q, T, T′), which are the m-DF omnibus test and 1-DF tests assuming a common effect or common scaled effect. We consider a standard normal covariate X1 and a Bernoulli covariate X2 with probability of 0.5. The SNP genotype score G is simulated from a Binomial distribution, Binom(2, f0), where the minor allele frequency (MAF) f0 = p0 + p1X2. Here X2 is essentially a population indicator and we have simulated SNPs under population stratification. We conducted simulations for testing m = 2,4, 8 related traits of 1,000 unrelated individuals, respectively. Each time, we simulate the m traits from a multivariate normal distribution with a compound symmetry correlation matrix with correlation ρ. The first trait has a variance of 2 and all the other traits have unit variance. We set E(Y) = 1 + 0.5X1 + 0.5X2 + γG for i = 1,3,…, m − 1, and E(Y) = 1 + X1 + X2 + γG for k = 2,4,…, m. We used 10 million experiments to evaluate the type I error and 105 experiments to evaluate the power under various combinations of (γ1,…, γ). We conducted simulations for p0 = (0.1,0.3), p1 = 0.1, and ρ = 0,0.2,0.5,0.8. Here we report the results for m = 2,8, ρ = 0,0.5, and p0 = 0.1. The conclusions remain the same for other settings (data not shown). Tables 1 and 2 summarize the estimated type I errors. Overall, the type I errors are well controlled for the proposed methods, while the GEE score tests are conservative, especially for large number of traits (m = 8). In general, the proposed Wald tests follow the exact F-distribution under the null hypothesis and hence the type I errors are well controlled under all settings. The GEE tests rely on the large-sample asymptotic distribution and therefore generally we need large sample size to have better control of type I errors, especially for a larger number of traits (containing more model parameters).

Table 1

Type I error of multitrait tests (m = 2, p0 = 0.1) divided by the nominal significance level α. The MAFs of SNP are 0.1 and 0.2 in the two populations, respectively. Q is the m-DF omnibus Wald test; T and T′ are the 1-DF Wald tests assuming a common or common scaled effect. (Q, T, T′) are the corresponding GEE-based m-DF omnibus test and 1-DF tests assuming a common effect or common scaled effect.

α	ρ = 0			ρ = 0.5
α	10⁻⁵	10⁻⁴	10⁻³	10⁻⁵	10⁻⁴	10⁻³
Q_s	0.69	0.79	0.89	0.67	0.79	0.89
T_s	0.74	0.85	0.93	0.71	0.83	0.92
T_s′	0.74	0.85	0.89	0.71	0.83	0.92
Q	1.04	1.00	1.00	1.03	1.01	1.00
T	0.98	0.99	1.01	0.97	0.99	1.00
T′	0.96	0.98	1.00	0.96	0.98	0.99

Table 2

Type I error divided by the nominal significance level α for multitrait tests (m = 8, p0 = 0.1).

α	ρ = 0			ρ = 0.5
α	10⁻⁵	10⁻⁴	10⁻³	10⁻⁵	10⁻⁴	10⁻³
Q_s	0.43	0.62	0.75	0.44	0.60	0.75
T_s	0.74	0.84	0.93	0.77	0.85	0.93
T_s′	0.74	0.84	0.93	0.78	0.85	0.93
Q	0.94	0.99	1.00	0.94	1.00	1.00
T	1.03	1.03	1.02	1.05	1.04	1.03
T′	1.03	1.03	1.03	1.03	0.99	0.99

Tables 3 and 4 summarize the power for m = 2 and m = 8, respectively. T is the most powerful when γ are close to each other, and T′ is the most powerful when γ/σ are close to each other. In general, the proposed MLM-based Wald tests perform better than the corresponding GEE-based score tests, especially when testing a large number of traits. This agrees with the general principle that the Wald test is typically more powerful than the GEE-based test.

Table 3

Power of multitrait tests for m = 2 continuous traits (Y1, Y2) under significance level α = 10−4. The MAFs of SNP are 0.1 and 0.2 in the two populations, respectively. Q is the m-DF omnibus Wald test; T and T′ are the 1-DF Wald tests assuming common or common scaled effect. (Q, T, T′) are the corresponding GEE-based m-DF omnibus test and 1-DF tests assuming a common effect or common scaled effect. σ is the standard error of Y and γ is the SNP coefficient, i = 1,2. The highest powered tests are bold-faced.

(γ₁, γ₂)	γ1σ1,γ2σ2	Q	T	T′	Q _s	T _s	T _s′
ρ = 0.5

(0.3,0)	(0.21,0)	0.375	0.001	0.024	0.334	0.001	0.019
(0.3,0.1)	(0.21,0.1)	0.206	0.047	0.146	0.177	0.039	0.126
(0.25,0.18)	(0.18,0.18)	0.180	0.221	0.258	0.154	0.194	0.233
(0.3,0.25)	(0.21,0.25)	0.523	0.617	0.619	0.476	0.573	0.582
(0.2,0.2)	(0.14,0.2)	0.179	0.257	0.215	0.154	0.23	0.193
(0.2,0.25)	(0.14,0.25)	0.410	0.501	0.369	0.367	0.466	0.333
(0.25,0.25)	(0.18,0.25)	0.449	0.560	0.492	0.403	0.521	0.455
(0,0.25)	(0,0.25)	0.638	0.278	0.052	0.59	0.247	0.040
(0,0.3)	(0,0.3)	0.893	0.525	0.121	0.865	0.477	0.093
(0.1,0.25)	(0.07,0.25)	0.465	0.485	0.372	0.418	0.448	0.330
(0.1,0.3)	(0.07,0.3)	0.744	0.726	0.590	0.700	0.688	0.534
(0.2,0.3)	(0.14,0.3)	0.845	0.891	0.842	0.810	0.870	0.810

ρ = 0

(0.3,0)	(0.21,0)	0.206	0.026	0.063	0.178	0.020	0.051
(0.3,0.1)	(0.21,0.1)	0.316	0.249	0.337	0.278	0.215	0.304
(0.25,0.18)	(0.18,0.18)	0.419	0.510	0.530	0.376	0.471	0.494
(0.3,0.25)	(0.21,0.25)	0.830	0.891	0.892	0.796	0.868	0.870
(0.2,0.2)	(0.14,0.2)	0.375	0.486	0.462	0.333	0.449	0.427
(0.2,0.25)	(0.14,0.25)	0.631	0.727	0.677	0.584	0.692	0.636
(0.25,0.25)	(0.18,0.25)	0.734	0.820	0.801	0.690	0.792	0.771
(0,0.25)	(0,0.25)	0.405	0.249	0.134	0.36	0.217	0.107
(0,0.3)	(0,0.3)	0.701	0.485	0.29	0.657	0.437	0.235
(0.1,0.25)	(0.07,0.25)	0.451	0.385	0.165	0.406	0.356	0.140
(0.1,0.3)	(0.07,0.3)	0.769	0.639	0.301	0.728	0.605	0.257
(0.2,0.3)	(0.14,0.3)	0.700	0.743	0.545	0.655	0.713	0.500

Table 4

Power of multitrait tests for m = 8 continuous traits under significance level α = 10−4. The MAFs of SNP are 0.1 and 0.2 in the two populations, respectively. Q is the m-DF omnibus Wald test; T and T′ are the 1-DF Wald tests assuming common or common scaled effect. (Q, T, T′) are the corresponding GEE-based m-DF omnibus test and 1-DF tests assuming a common effect or common scaled effect. The highest powered tests are bold-faced.

(γ₁,…, γ₈)	Q	T	T′	Q _s	T _s	T _s′
ρ = 0.5

γ ₁ = 0.3, γ_i>1 = 0	0.303	0.001	0	0.229	0	0
(.3, .2, .1, .05,0,…, 0)	0.696	0	0.008	0.599	0	0.005
γ ₁ = 0.2, γ_i>1 = 0.15	0.045	0.201	0.220	0.030	0.169	0.195
γ _i = 0.15	0.048	0.237	0.193	0.032	0.204	0.170

ρ = 0

γ ₁ = 0.3, γ_i>1 = 0	0.063	0.001	0.004	0.043	0.001	0.002
(.3, .2, .1, .05,0,…, 0)	0.467	0.156	0.224	0.372	0.102	0.152
γ ₁ = 0.2, γ_i>1 = 0.15	0.934	0.996	0.997	0.887	0.992	0.993
γ _i = 0.15	0.912	0.995	0.994	0.855	0.989	0.988

The chi-square statistic ((n − p − 1)/n)Q is commonly used in practice and referred to an m-DF chi-square distribution to compute multitrait association test's p values, which can lead to significantly inflated type I errors at stringent genome-wide significance levels. Figure 1 shows the ratio of actual significance level of Wald test's p values computed using the chi-square distribution and F-distribution, respectively. We can see that the type I error based on the chi-square distribution is inflated: more so for larger number of traits, smaller significance level, and smaller sample size. For example, when testing m = 8 traits with p = 2 covariates and n = 500 samples, under genome-wide significance level 5 × 10−8, the actual significance level of chi-square distribution p value is 3.42 × 5 × 10−8 = 1.7 × 10−7. Using the chi-square distribution to compute p values will lead to very small inflation only when the sample size is large, such as in the meta-analysis of multiple GWAS studies. For typical GWAS with small-to-medium sample sizes, we recommend using the appropriate F-distribution to compute significance p values to reduce false positive findings.

Figure 1

Ratio of the actual significance levels of m-DF chi-square test versus the F-test with (m, n − p − 1 − m) DFs. The x-axis is the type I error rate. (a) Shows the results for testing m = 4 traits with p = 2 covariates based on n individuals. (b) Shows the results for testing m = 8 traits with p = 2 covariates.

3.2. Application to ARIC GWAS of Glycemic Traits

The Atherosclerosis Risk in Communities (ARIC) study [29] is a population-based, multicenter prospective investigation of cardiovascular disease. Men and women aged 45–64 years at baseline were recruited from four US communities: Forsyth County, North Carolina; Jackson, Mississippi; suburban areas of Minneapolis, Minnesota; and Washington County, Maryland. A total of 15,792 individuals participated in the baseline examination during the period of 1987–1989. The vast majority of ARIC participants are of European (73%) or African (26%) ancestry. We conducted two analyses of diabetes-related glycemic traits in ARIC GWAS data, which has been imputed to around 2.5 million HapMap SNPs using MaCH [30]. We included in the analysis those common SNPs with MAF ≥0.05 and imputation score R2 ≥ 0.3. As a proof of concept, we first analyzed four fasting glucose levels in 5947 nondiabetic ARIC white participants measured at four visits (visits 1–4) conducted approximately three years apart. The average correlation of glucose levels is 0.55. We applied an additive genetic model with imputed dosage as a covariate and adjusted for age, gender, and study center in all tests. By analyzing four fasting glucose measures jointly, T′ identified 104 significant SNPs, T identified 103, T′ identified 102, T identified 101, and Q and Q identified the same set of 95 SNPs at the genome-wide significance level 5 × 10−8. Analyzing each glucose measure separately identified 34, 84, 37, and 64 genome-wide significant SNPs at visits 1, 2, 3, and 4, respectively. All the identified SNPs by different methods are genome-wide significant in the MAGIC Consortium, a meta-analysis of 21 fasting glucose GWAS which together included 46,186 nondiabetic participants [31]. Compared to T′, the two additional SNPs identified by T′, rs780093 and rs780094, had p values of 4.8 × 10−8 and 4.8 × 10−8 using T′. Their respective MAGIC meta-analysis' p values were 2.9 × 10−13 and 2.5 × 10−12. Compared to T, the two additional SNPs identified by T, rs1260326 and rs11688384, had p values of 4.7 × 10−8 and 4.0 × 10−8 using T. Their respective MAGIC meta-analysis' p values were 4.3 × 10−13 and 4.1 × 10−10. Second, we jointly analyzed three distinct diabetes-related glycemic traits measured at visit 4 in 5068 nondiabetic white participants measured at visit 4 in ARIC: fasting glucose, fasting insulin, and glucose level 2 hours after an oral glucose challenge. We applied an additive genetic model with imputed dosage as a covariate and adjusted for age, gender, and study center. To account for the skewed distribution of fasting insulin, we adopted the Box-Cox transformation with an estimated power of 0.35 [32]. The three traits had an average pairwise correlation of 0.31. When analyzing fasting insulin or 2-hour glucose levels individually, we did not identify any significant SNPs at the genome-wide significance level (5 × 10−8). For joint testing of all three traits, T, T′, T, T′ identified none, Q identified 139, and Q identified 140 genome-wide significant SNPs, among which 61 and 61 SNPs were reported as genome-wide significant in the MAGIC meta-analyses of fasting glucose, fasting insulin, or 2-hour glucose levels [31, 33]. Compared to Q, Q identified two additional genome-wide significant SNPs, rs4665987 and rs853780, with p values of 4.9 × 10−8 and 4.9 × 10−8, respectively. MAGIC meta-analysis of fasting glucose reported a p value of 2.1 × 10−38 for rs853780. Its MAGIC meta-analyses of fasting insulin and 2-hour glucose p values are 0.054 and 0.477, respectively. For rs4665987 (near GCKR on chromosome 2:27755825), MAGIC meta-analysis' p values for the fasting glucose, fasting insulin, and 2-hour glucose levels are 4.6 × 10−6, 0.04, and 9.3 × 10−5, respectively. This SNP was genome-wide significantly associated with human serum metabolite levels in a GWAS of 8330 Finnish individuals [34] and several other GWAS [35-38]. Compared to Q, Q reported one additional genome-wide significant SNP, rs17540154, with p value of 4.3 × 10−8. The MAGIC meta-analysis of fasting glucose reported a p value of 8.7 × 10−38 for rs17540154. Its MAGIC meta-analyses of fasting insulin and 2-hour glucose p values are 0.101 and 0.720, respectively. Among the identified significant SNPs by joint testing, there were 79 novel genome-wide significant SNPs that have not been reported as significantly associated with diabetes-related fasting glucose and insulin levels before. Among them, one SNP, rs4665987, is located on chromosome 2:27755825 and 78 other SNPs are clustered on chromosomes 15:62132921 to 15:62396389. Interestingly, six of them (listed in Table 5) were genome-wide significant in the MAGIC meta-analysis of proinsulin level [39]. The list of all identified SNPs with detailed analysis' results is available in the supplementary materials.

Table 5

Six novel SNPs identified in the ARIC joint association test, which were not significant in the corresponding MAGIC consortium meta-analyses of fasting glucose (FG), fasting insulin (FI), and 2-hour fasting glucose (2hFG) but were significant in the MAGIC meta-analysis of fasting proinsulin (FP). We listed the ARIC joint test's p values (the proposed MLM Wald test and the GEE chi-square test) and the corresponding MAGIC consortium meta-analyses' p values for FG, FI, 2hFG, and FP.

SNP	Chr	bp	ARIC joint test's p value		MAGIC meta-analysis' p value
SNP	Chr	bp	Wald	GEE	FG	FI	2hFG	FP
rs4502156	15	62383155	5.4E − 09	7.9E − 09	8.4E − 08	6.7E − 01	8.2E − 05	3.8E − 11
rs7163757	15	62391608	1.4E − 08	1.8E − 08	4.2E − 07	5.7E − 01	1.9E − 05	3.9E − 11
rs8037894	15	62394264	1.2E − 08	1.6E − 08	4.1E − 07	4.8E − 01	3.5E − 05	8.7E − 11
rs6494307	15	62394690	1.7E − 08	2.1E − 08	3.3E − 07	4.9E − 01	2.7E − 05	4.1E − 11
rs7167878	15	62396189	1.7E − 08	2.1E − 08	4.6E − 07	4.5E − 01	2.4E − 05	4.1E − 11
rs7172432	15	62396389	1.7E − 08	2.2E − 08	6.5E − 07	3.3E − 01	1.9E − 05	4.3E − 11

4. Discussion

So far typical effect sizes of most identified genetic variants for many diseases or traits are very small and they have only explained a very small proportion of the overall disease heritability or trait variation. It is commonly accepted that there are many more common variants with relatively small-to-medium effect sizes or rare variants with larger effect sizes yet to be discovered. To identify these additional variants, very large sample sizes will be needed. One approach is to form a consortium to facilitate meta-analysis of many studies, but development of these genetics consortia is generally time-consuming and logistically challenging. Meanwhile the recently studied joint association test of multiple correlated traits offers an alternative approach to boost power in that it can often dramatically improve the association test power by “enlarging the sample size” through the incorporation of many correlated traits that are typically collected in most large genetic studies and may share genetic determinants. Another strategy to further improve the detection power is to use a variant-set association test, which has been proven to be very useful (see, e.g., [16, 17, 40–42]). It is worthwhile to generalize the proposed Wald tests to develop more accurate and powerful association tests of variant sets across multiple traits. Here we have focused on testing a relatively small number of correlated quantitative traits, which have enabled us to develop accurate and powerful association tests without any asymptotic approximations as adopted in the more general though conservative GEE approach, which can be applied to any mix of quantitative and discrete traits. It will be interesting to extend the proposed methods to the phenome-wide association studies (PheWAS) with a large collection of phenotypes [43-45] and develop more powerful joint association test of quantitative and discrete traits. In the previous discussions, we have assumed the same set of covariates across all traits. With differing covariates, we provide technical details regarding model estimation and extensive simulation studies to confirm that the proposed methods accurately control type I errors and perform favorably compared to existing methods (see the supplementary materials for complete results). In summary, we recommend the proposed multivariate linear regression-based test as a complementary approach to enhancing the power of analyzing multiple quantitative traits in unrelated individuals. Our numerical studies have suggested that the omnibus Wald test generally has robust and good performance. The 1-DF Wald tests can perform well due to reduced DFs, but they could be sensitive to the underlying assumptions. It will be worthwhile to develop adaptive and powerful tests. We have implemented the proposed methods in an R package available at http://www.github.com/baolinwu/MTAR. We provide some sample R codes to install and use the package in the supplementary materials. The developed algorithms are very efficient and extremely scalable to genome-wide association test.

40 in total

1. A gene-based test of association using canonical correlation analysis.

Authors: Clara S Tang; Manuel A R Ferreira
Journal: Bioinformatics Date: 2012-01-31 Impact factor: 6.937

2. Powerful SNP-set analysis for case-control genome-wide association studies.

Authors: Michael C Wu; Peter Kraft; Michael P Epstein; Deanne M Taylor; Stephen J Chanock; David J Hunter; Xihong Lin
Journal: Am J Hum Genet Date: 2010-06-11 Impact factor: 11.025

3. A multivariate test of association.

Authors: Manuel A R Ferreira; Shaun M Purcell
Journal: Bioinformatics Date: 2008-11-19 Impact factor: 6.937

4. Generalized estimating equations for genome-wide association studies using longitudinal phenotype data.

Authors: Colleen M Sitlani; Kenneth M Rice; Thomas Lumley; Barbara McKnight; L Adrienne Cupples; Christy L Avery; Raymond Noordam; Bruno H C Stricker; Eric A Whitsel; Bruce M Psaty
Journal: Stat Med Date: 2014-10-09 Impact factor: 2.373

Review 5. Pleiotropy in complex traits: challenges and strategies.

Authors: Nadia Solovieff; Chris Cotsapas; Phil H Lee; Shaun M Purcell; Jordan W Smoller
Journal: Nat Rev Genet Date: 2013-06-11 Impact factor: 53.242

6. The Atherosclerosis Risk in Communities (ARIC) Study: design and objectives. The ARIC investigators.

Authors:
Journal: Am J Epidemiol Date: 1989-04 Impact factor: 4.897

7. A general framework for association tests with multivariate traits in large-scale genomics studies.

Authors: Qianchuan He; Christy L Avery; Dan-Yu Lin
Journal: Genet Epidemiol Date: 2013-11-05 Impact factor: 2.135

8. A phenomics-based strategy identifies loci on APOC1, BRAP, and PLCG1 associated with metabolic syndrome phenotype domains.

Authors: Christy L Avery; Qianchuan He; Kari E North; Jose L Ambite; Eric Boerwinkle; Myriam Fornage; Lucia A Hindorff; Charles Kooperberg; James B Meigs; James S Pankow; Sarah A Pendergrass; Bruce M Psaty; Marylyn D Ritchie; Jerome I Rotter; Kent D Taylor; Lynne R Wilkens; Gerardo Heiss; Dan Yu Lin
Journal: PLoS Genet Date: 2011-10-13 Impact factor: 5.917

9. Association Tests of Multiple Phenotypes: ATeMP.

Authors: Xiaobo Guo; Yixi Li; Xiaohu Ding; Mingguang He; Xueqin Wang; Heping Zhang
Journal: PLoS One Date: 2015-10-19 Impact factor: 3.240

10. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

Authors: Paul F O'Reilly; Clive J Hoggart; Yotsawat Pomyen; Federico C F Calboli; Paul Elliott; Marjo-Riitta Jarvelin; Lachlan J M Coin
Journal: PLoS One Date: 2012-05-02 Impact factor: 3.240

2 in total

1. Family-based quantitative trait meta-analysis implicates rare noncoding variants in DENND1A in polycystic ovary syndrome.

Authors: Matthew Dapas; Ryan Sisk; Richard S Legro; Margrit Urbanek; Andrea Dunaif; M Geoffrey Hayes
Journal: J Clin Endocrinol Metab Date: 2019-04-30 Impact factor: 5.958

2. Genome-wide association analysis of adaptation to oxygen stress in Nile tilapia (Oreochromis niloticus).

Authors: Xiaofei Yu; Hendrik-Jan Megens; Samuel Bekele Mengistu; John W M Bastiaansen; Han A Mulder; John A H Benzie; Martien A M Groenen; Hans Komen
Journal: BMC Genomics Date: 2021-06-09 Impact factor: 3.969

2 in total