Literature DB >> 26860061

A method for analyzing multiple continuous phenotypes in rare variant association studies allowing for flexible correlations in variant effects.

Jianping Sun^1,2, Karim Oualkacha³, Vincenzo Forgetta^1,2,4,5, Hou-Feng Zheng^6,7, J Brent Richards^1,2,4,5,8, Antonio Ciampi¹, Celia Mt Greenwood^1,2,4,9.

Abstract

For region-based sequencing data, power to detect genetic associations can be improved through analysis of multiple related phenotypes. With this motivation, we propose a novel test to detect association simultaneously between a set of rare variants, such as those obtained by sequencing in a small genomic region, and multiple continuous phenotypes. We allow arbitrary correlations among the phenotypes and build on a linear mixed model by assuming the effects of the variants follow a multivariate normal distribution with a zero mean and a specific covariance matrix structure. In order to account for the unknown correlation parameter in the covariance matrix of the variant effects, a data-adaptive variance component test based on score-type statistics is derived. As our approach can calculate the P-value analytically, the proposed test procedure is computationally efficient. Broad simulations and an application to the UK10K project show that our proposed multivariate test is generally more powerful than univariate tests, especially when there are pleiotropic effects or highly correlated phenotypes.

Entities: Chemical

Mesh：

Year: 2016 PMID： 26860061 PMCID： PMC4989219 DOI： 10.1038/ejhg.2016.8

Source DB: PubMed Journal: Eur J Hum Genet ISSN： 1018-4813 Impact factor: 4.246

Introduction

Advances in next-generation sequencing technologies have led to identification of millions of rare variants (with minor allele frequencies (MAFs) <0.05) that can be tested in genetic association studies. There is a growing literature on methods for analysis of association for a set of rare variants: for example, the burden test,[1, 2] sequence kernel association test (SKAT),[3] optimal SKAT (SKAT-O),[4] mixed effects score test,[5] and many others.[6] Most of the existing rare-variant association tests can be thought of as univariate tests. That is, they focus on the association between a set of rare variants and a single phenotype. However, multiple measures of related traits, that is, a multivariate phenotype, are often available and are of interest. Some phenotypes such as blood pressure, which is measured by both systolic and diastolic pressure, are intrinsically multivariate. In other cases, there may be several correlated measures of a trait of interest; however, the one most directly influenced by genetic variation is unknown. For example, excess weight can be measured by body mass index, waist circumference or skin-fold thickness; by bringing in serum lipids as well, the composite phenotype will have a different emphasis. Many patterns of genetic architectures could link variants at a locus to a set of phenotypes:[7] pleiotropy describes one genetic variant influencing multiple traits;[8] endophenotypes (measures assumed to be proximal to genetic effects) can be highly multidimensional; locus heterogeneity can refer to a single trait caused by variants at different chromosomal loci,[9] or alternatively several variants at a locus may influence a disease, whereas symptoms (visible phenotypes) vary from person to person (Figure 1). Thus, multivariate methods may help in rare-variant association studies, as they may add ability to investigate the architectures discussed above, and increase power to detect disease-associated loci.[10]

Figure 1

Possible patterns of genetic architecture that could link variants at a locus to a set of continuous phenotypes. The solid line represents pleiotropy, where a single variant influences multiple phenotypes. The dashed line describes the locus heterogeneity, where different variants in the same gene or region each influence a different phenotype. The dotted line represents a situation where correlation among phenotypes arises indirectly due to the variants' effects on disease; the phenotypes could be symptoms, endophenotypes, or continuous manifestations of disease.

For associations between a single genetic variant and multiple traits, a recent review paper[11] summarizes several proposed methods. However, these single-variant-based multivariate methods usually have limited power to detect rare-variant effects. Although there are a few approaches that have been developed to test associations between multivariate traits and multiple variants in a genetic region, several of these[12, 13] employ the idea of combining the test statistics obtained from single-variant-based multivariate tests or multiple-variant-based univariate tests. Notably, Maity et al[14] recently proposed a kernel machine regression to test multiple phenotypes and a set of markers simultaneously, and the method of Maity et al showed power improvements over single phenotype tests when the correlations are strong. However, this method can be computationally inefficient and may not have optimal power to detect pleiotropic associations, as there is no explicit consideration of correlations between variant effects. Given the rapid increase in large-scale sequencing studies, there is a need for a powerful yet easily implemented statistical method to simultaneously detect associations between multiple phenotypes and rare variants in a region. In this study, we develop a novel region-based multivariate test, MURAT (Multivariate Rare-Variant Association Test), for identifying rare-variant associations when multiple correlated continuous phenotypes are observed. By assuming the variant effects to be random and correlated, we propose a linear mixed model that leads to a data-adaptive variance component test based on a score-type statistic. This choice accounts for unknown correlations among variant effects, while testing for association. We show that there are two special cases of our model depending on the assumed correlation structure, one where it reduces to SKAT and another where it becomes a multivariate fixed effects model. In addition, we show that our analytic P-value derivation only depends on a one-dimensional numerical integration, and hence that this method is computationally efficient and can be considered for analysis of genome-wide sequencing data.

Methods

Notation and model

Suppose for the ith individual, i=1,…, N, we observe K correlated continuous phenotypes, Y=(y,y,…,y), m covariates, X=(x,x,…,x), and genotype data for a set of SNPs that contains p variants G=(g,g,…,g), where g for j=1,…,p are coded as 0, 1, or 2, representing the number of minor alleles that individual i carries for the jth variant. It is noteworthy that these p variants can be both common and rare, although we focus on rare variants in this study. We further assume that (Y,X,G), i=1,…, N, are independent and identically distributed, and our goal is to test the association between G, a set of (rare) variants, and the multiple traits Assuming that the relationship between Y, X, and G can be modeled as where for the kth phenotype, k=1,…,K, α=(α,…,α), and β=(β,…,β) are regression coefficients for X and G, respectively. In addition, we also assume α is a vector of fixed effects and β~N(0,τ2W) is a vector of random effects, where τ2 is an unknown variance component and W=diag{w1,…,w} is a weight matrix such that w represents the weight for the jth variant. The weight matrix is predetermined and usually a decreasing function of observed MAFs for the corresponding variants. Finally, we suppose the model error term, ɛ=(ɛ,ɛ,…,ɛ), is multivariate normal with a mean vector of zeros and a variance covariance matrix Σ. As there are correlations among y's for k=1,…,K, we assume Σ is an arbitrary symmetric and positive definite matrix. Moreover, if we denote , X*=(I⊗X1,⋯,I⊗X), and G*=(I⊗1,⋯,I⊗), where ⊗ is the Kronecker product, then the above equation (1) can be written in matrix form as Here we assume is a Km × 1 vector denoting all fixed covariate effects, and is a Kp × 1 vector representing all random variant effects, assumed to follow a multivariate normal distribution with variance covariance matrix Σ. The assumed correlation structures in Σ and Σ are crucial to this model. To capture trait pleiotropy effects, we assume the correlation between β and β is Corr(β,β)=ρI for k≠k′ and k,k′∈{1,…,K}, where ρ is unknown. That is, we suppose there is a common correlation for the effects of the same variant on different phenotypes, Corr(β,β)=ρ for j=1,…,p. This assumption also captures the situation where the phenotypes are symptoms of a latent trait. In addition, we assume that effects of different variants are uncorrelated, so that Corr(β,β)=0 and Corr(β,β)=0 for j≠j′. Therefore, we have β~N(0,Σ), where Σ=τ2R⊗W is a Kp × Kp symmetric and positive definite matrix, , and 1 is a K-dimensional vector of ones. For example, when K=2, we have . Consequently, the variance covariance matrix of Y is The proposed model (2) is a generalization of the underlying model used for the SKAT test, where variant effects are assumed to be random, independent, and to follow a distribution with mean 0 and variance τ2. In particular, model (2) reduces to SKAT when K=1 (one observed phenotype). Although a straightforward multivariate generalization of SKAT could be achieved by implementing SKAT models for each (y, G) pair, k=1,...,K, and then combining these K models allowing correlations among {y}'s, our proposed MURAT model is richer, because it allows the effects of the same variant on different phenotypes to be correlated as shown in equation (3). On the contrary, if we simply generalized SKAT as described above, the corresponding covariance matrix would be Var(Y)=τ2G*[I⊗W]G*+Σ, which is a special case of equation (3) when ρ=0, under which, in fact model (2) is also equivalent to Maity's test with a linear kernel. Thus, through the correlation parameter ρ, MURAT has the ability to explore pleiotropic effects.

Proposed test statistics

In model (2), the null hypothesis of no association between G* and Y is H0:β=0. This is equivalent to H0:τ2=0. We propose to develop a score test, because it does not require an estimate of τ2. For a fixed ρ, the variance component score statistic for τ2 is where and are estimated under the null by regressing Y on only the covariates X*. The subscript ρ indicates that this score statistic is a function of ρ. As we assume that the K phenotypes are correlated, and can be efficiently estimated using the generalized least-squares method, achieved by using the function gls() in the R package nlme. For fixed ρ, it can be shown that under the null hypothesis, S follows a mixture of χ2-distributions[15, 16, 17] when R is a symmetric matrix (Supplementary S.1). Several methods exist to approximate this mixture χ2-distribution. For example, both the Davies method[18] based on the inverse characteristic function and the Liu method[19] based on moment matching have been shown to work well. However, the common correlation ρ is usually unknown. To construct a test statistic for unknown ρ, as suggested by Lee et al,[4] we apply a data adaptive approach to select an optimal ρ, to maximize test power. We first calculate S and its P-value for a grid of values of ρ over [0,1] and then use the smallest P-value as an overall test statistic for testing H0:τ2=0. Specifically, for a grid 0=ρ1<ρ2<⋯<ρ=1, we obtain and the corresponding P-value, , for v=1,…,b. Then, the overall test statistic is defined to be We have shown that when ρ=0, S is the same as the test statistic used in Maity's method with linear kernel or the one obtained from multivariate SKAT by simply assuming correlated model errors. In addition, we can also show that when ρ=1, the proposed S is equivalent to the square of a score statistic obtained from a multivariate fixed effect model, where we assume the same effect of each variant across the multiple traits (Supplementary S.2). Hence, by searching 0≤ρ≤1, MURAT covers all score tests between Maity's method (or multivariate SKAT) and multivariate multiple regression.

Null distribution of the test statistic T

For an observed T, say Tobs, let q(ρ) be the (1−Tobs)th percentile of the null distribution of S, such that p>Tobs is equivalent to S We show in the Supplementary S.3 that we can approximate S as a weighted sum of two asymptotically independent random variables, say κ and η, which do not depend on ρ, and both follow a mixture of χ2-distributions. Consequently, p can be calculated by using one-dimensional numerical integration, where τ(ρ) is a constant, F(⋅) is the distribution function of κ, and f(⋅) is the density function of η. To improve tail probability estimates, we used the moment matching method with mean, variance, and kurtosis instead of the first three moments,[4] to approximate F(⋅). Furthermore, to improve accuracy in numerical integration, we used a convolution of independent random variables to calculate f(⋅) (see Supplementary S.3).

Results

Simulation studies

We conducted a variety of simulation studies to evaluate the performance of MURAT. In the simulations, we considered K (K=2 or 3) correlated continuous traits and compared type I error and power between MURAT, Maity's test, and SKAT, which tests the multiple traits separately. The genotypes we used for the simulations were taken from sequencing data for a sample of 1000 individuals from the UK10K project (http://www.uk10k.org/). A set of 10 rare variants were randomly selected from a single gene, BET1L, in order to preserve the linkage disequilibrium pattern. The MAFs for these 10 variants range from 0.001 to 0.043 in the sample. For each individual, we generated K correlated continuous phenotypes following the model (1), where two covariates were generated from Normal(0,1) and Bernoulli(0.5), respectively, both with coefficient 0.5; a common intercept of 0.1 was also included and =(ɛ,⋯,ɛ) was generated from a multivariate normal distribution, N(0,Σ), with . In simulations, we used four different levels of the residual correlation, ρ=0, 0.1, 0.4, and 0.7. Different values for the genetic effects, β, where k=1,…,K and j=1,…,10, were considered, in order to evaluate type I error and power. To avoid potential zero denominators in equation (6), we searched over a grid of values for ρ from 0 to 0.99, with step size 0.1 in MURAT. For simplicity, we used the identity weight matrix, that is, W=I, through all the simulation studies. A total of 10 000 simulated data sets were generated for each scenario. Finally, to eliminate the effects caused by different phenotype scales, we first standardized all phenotypes within each simulation before implementing our multivariate test. Unless otherwise specified, to correct for multiple testing across phenotypes, all SKAT results are based on adjusted P-values, defined as K times the minimum of the univariate-based P-values obtained via SKAT.

Type I error rates

To evaluate type I error, we generated phenotypes with β=0 for k=1,…,K where K=2 or 3, and j=1,…,10. We assessed type I error first at moderate significance levels of α=0.05, α=0.01, and α=0.001, and the results are summarized in Table 1. All three methods control type I error well under these moderate significance levels, except that SKAT tends to have slightly inflated type I error when α=0.001 and ρ>0.

Table 1

Type I error comparison among MURAT, SKAT, and Maity's method at moderate significance levels, α=0.05, 0.01, and 0.001

	K=2
	ρ_e=0			ρ_e=0.1
α	MURAT	Maity	SKAT	MURAT	Maity	SKAT
5 × 10⁻²	5.17 × 10⁻²	5.17 × 10⁻²	4.76 × 10⁻²	5.12 × 10⁻²	5.10 × 10⁻²	4.81 × 10⁻²
1 × 10⁻²	0.90 × 10⁻²	0.94 × 10⁻²	0.94 × 10⁻²	0.97 × 10⁻²	0.99 × 10⁻²	0.95 × 10⁻²
1 × 10⁻³	1.01 × 10⁻³	1.30 × 10⁻³	0.90 × 10⁻³	1.10 × 10⁻³	1.10 × 10⁻³	1.20 × 10⁻³
	ρ_e=0.4			ρ_e=0.7
5 × 10⁻²	4.99 × 10⁻²	4.80 × 10⁻²	4.78 × 10⁻²	5.01 × 10⁻²	4.78 × 10⁻²	4.41 × 10⁻²
1 × 10⁻²	1.02 × 10⁻²	0.91 × 10⁻²	0.98 × 10⁻²	0.98 × 10⁻²	0.93 × 10⁻²	0.97 × 10⁻²
1 × 10⁻³	1.02 × 10⁻³	0.70 × 10⁻³	1.40 × 10⁻³	1.10 × 10⁻³	0.80 × 10⁻³	1.30 × 10⁻³

Abbreviations: MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test.

The simulations are based on 10 000 simulated data sets. The results for SKAT are based on adjusted P-values, which are defined as K times the minimum of univariate-based P-values obtained via SKAT.

However, in genome-wide studies, such moderate significance levels are usually of secondary interest. As there are around 20,000 genes in the exome, a Bonferroni-adjusted threshold for testing would be 0.05/20 000=2.5 × 10−6. Therefore, we also examined the type I error of MURAT and SKAT for smaller α levels of 10−4, 10−5, and 2.5 × 10−6 by performing 107 simulations. Owing to the computational burden, these extra simulations only examined ρ=0 and 0.4. In addition, as Maity's test is much slower than the other two methods (Supplementary S.5), we excluded it from this comparison. Table 2 shows that both MURAT and SKAT have inflated type I error when α is very small, and that SKAT performs better than MURAT in this situation. However, the inflation for MURAT is not severe. MURAT's type I error inflation could be due to the approximations required to calculate the significance level analytically, such as matching the first three moments, or the numerical integration in equation (6). Permutation analysis can be used at key regions to obtain more accurate P-values if necessary. To perform permutation, one needs to preserve the dependence structure between the phenotypes and between the genotypes as well. Specifically, for one individual, all his/her phenotypes should be viewed as a single phenotype group or block, and similarly all genotypes in the region being studied can be considered as a single genotype block. Permuting the phenotype blocks with respect to the genotype blocks across individuals leads to data sets that retain the necessary within-block correlations, yet are generated under the null hypothesis of no association between the blocks.

Table 2

Type I error comparison between MURAT and SKAT at stringent significance levels, α=1 × 10−4, 1 × 10−5, and 2.5 × 10−6

	ρ_e=0		ρ_e=0.4
α	MURAT	SKAT	MURAT	SKAT
K=2
1 × 10⁻⁴	1.68 × 10⁻⁴	1.27 × 10⁻⁴	1.48 × 10⁻⁴	1.24 × 10⁻⁴
1 × 10⁻⁵	2.78 × 10⁻⁵	1.16 × 10⁻⁵	1.29 × 10⁻⁵	1.54 × 10⁻⁵
2.5 × 10⁻⁶	7.60 × 10⁻⁶	3.30 × 10⁻⁶	2.80 × 10⁻⁶	3.30 × 10⁻⁶

K=3
1 × 10⁻⁴	1.64 × 10⁻⁴	1.29 × 10⁻⁴	1.35 × 10⁻⁴	1.23 × 10⁻⁴
1 × 10⁻⁵	2.15 × 10⁻⁵	1.25 × 10⁻⁵	1.82 × 10⁻⁵	1.35 × 10⁻⁵
2.5 × 10⁻⁶	9.80 × 10⁻⁶	4.10 × 10⁻⁶	10.20 × 10⁻⁶	3.10 × 10⁻⁶

Abbreviations: MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test.

The simulations are based on 107 simulated data sets. The results for SKAT are based on adjusted P-values, which are defined as K times the minimum of univariate-based P-values obtained via SKAT.

Power

To assess power of the proposed multivariate test, we randomly selected one, two, or five rare variants, to be causal. That is, let {v1,v2,…,v} be u selected causal variants, where u=1, 2, or 5. We set variant effects β1=⋯=β=0 for j∉̸{v1,v2,…,v} in the simulations, where K=2 or 3. The nonzero variant effects, βs were generated from N(0,Σ) with Σ=τ2R⊗I and for k=1,…,K and j′∈{v1,v2,…,v}. In the simulations, different values for τ2 were chosen under different scenarios so that the resulting power was high enough to facilitate comparisons. We evaluated power at significance level α=0.05 for each scenario.

Scenario 1

We first considered the scenario where u causal variants were associated with both K=2 phenotypes. In this setting, we selected τ2=3 and considered six different correlations among the causal variant effects: ρ=0, 0.1, 0.3, 0.5, 0.7, and 0.9, in order to explore pleiotropic effects on power. The top and middle panels in Figure 2 compare MURAT with SKAT for u=2 causal variants associated with both phenotypes and show that MURAT is overall more powerful than SKAT under all possible combinations of ρ and ρ. For small residual correlation ρ, the pleiotropic correlation ρ has little effect on MURAT's power; however, for large ρ, the power of MURAT decreases as ρ increases. This is not surprising, because when ρ and ρ are both large, the two phenotypes are highly correlated. Hence, in situations of very strong correlation, a multivariate test may gain little power relative to analysis of only a single trait. For all values of ρ, the power of SKAT decreases as ρ increases. Similar conclusions can also be drawn when there are only one or five causal variants associated with both phenotypes (Supplementary Figures S1 and S2).

Figure 2

Power comparisons among MURAT, SKAT, and Maity's method under power simulation scenario 1. The top and middle panels show empirical powers of MURAT, which tests associations with two phenotypes simultaneously, versus SKAT, which tests associations with the two phenotypes separately, when two causal variants are associated with both traits. The bottom panel shows empirical powers of MURAT, Maity's test with a linear kernel, and SKAT, when five causal variants are associated with multiple traits. The significance level is 0.05 for all tests and the univariate tests, SKAT, are corrected for multiple comparisons.

We also compared the power of MURAT and SKAT with that of Maity's test with a linear kernel when K=2 and 3. Owing to the computational time associated with Maity's method, we only evaluated its performance with u=5 causal variants, ρ=0.4 and ρ=0, 0.3 or 0.7. The bottom panel of Figure 2 shows that for K=2, when ρ=0, MURAT and Maity's test have similar power, consistent with our expectation, but when K=3 and ρ=0, MURAT seems a little less powerful than Maity's test. As MURAT searches grids of ρ, there could exist slight power differences between MURAT and Maity's method even when the true ρ is zero. It is notable that when ρ is large, MURAT tends to be more powerful than Maity's test for both values of K. For example, when K=2, MURAT is 10% (for ρ=0.3) or 40% (for ρ=0.7) more powerful than Maity's test. Hence, MURAT can exploit pleiotropic effects to gain additional power over a multivariate method that only incorporates correlations among phenotypes. In addition, in this scenario, MURAT gains power as K increases.

Scenario 2

We further conducted a set of simulations in which u causal variants were only associated with the first of two traits. That is, we assumed β2=0 for j=1,…,10 and generated β1, j′∈{v1,v2,…,v}, independent and identically distributed from N(0,τ2) with τ2=5. Figure 3 shows that there is usually some power loss for MURAT compared with SKAT when causal variants are only associated with trait 1. However, the power loss diminishes as ρ increases and there is a power gain when ρ is 0.7. Hence, when traits are strongly correlated, there is a potential benefit of a multivariate test even if only one trait is truly under genetic influence.

Figure 3

Empirical powers of MURAT versus SKAT for trait 1 at significance level of 0.05. Causal variants are associated with only the first trait. The results for SKAT on trait 1 are not adjusted for multiple testing.

Using scenario 2, we also compared MURAT with Maity's test with a linear kernel when there were five causal variants and ρ=0.4. The empirical power for Maity's test (0.64) was higher than for MURAT (0.57) but lower than the power for testing only the first associated trait with SKAT (0.71).

Bone mineral density and exome-sequencing variants

We have analyzed bone mineral density (BMD) phenotypes and tested for genetic associations with sequence-identified variants in data from the TwinsUK participants (www.twinsuk.ac.uk) in the UK10K consortium. The UK10K project (http://www.uk10k.org/) is a whole-genome and exome-sequencing study focused on the genetic architecture of complex diseases. This analysis is intended to demonstrate the performance of our methods and is not definitive; investigations of genetic associations with BMD in larger sample sizes have been undertaken by the GEFOS consortium.[20] In our analysis, we selected measures of BMD at two positions: lumbar spine (LS) and femoral neck (FN), in 1005 individuals. These are strongly correlated traits with a correlation of 0.685. We performed gene-based analysis on variants identified by exome sequencing on all autosomes, therefore testing association with BMD at 19 123 gene-based sets. All analyses included two covariates, weight and age, as these strongly affect BMD. As in our simulation studies, we separately standardized the 1005 measurements of LS and FN before applying MURAT. We compared the strategy of simultaneously testing LS and FN for association using MURAT, with separate tests of LS and FN using SKAT. As the analysis included both common and rare variants, we applied SKAT with the weighted linear kernel and set the MURAT weight matrix as W=diag{w1,...,w}, where (w)1/2=β(MAF,1,25) is the weight for the jth variant in a particular gene as suggested by Wu et al.[3] The Bonferroni-corrected significance threshold for 19 123 genes is around 0.05/19 123=2.61 × 10−6. As this correction is stringent and the sample size is small, in Table 3 we list all genes that have P-values <1 × 10−5, from either MURAT or SKAT (SKAT P-values here are not adjusted for multiple comparisons). For genes SLC17A5 and C20orf187, the MURAT P-values are smaller than those obtained from SKAT and the difference is notable at C20orf187. In addition, we applied Maity's test to C20orf187 and obtained a P-value of 8 × 10−6, supporting the benefit of a multivariate test and the potential benefits of MURAT. Gene SNX17 showed association with only one of the bone phenotypes; the result with MURAT was less significant than the better of the two univariate tests; similar results were seen in our simulations. However, MURAT was only slightly less significant than the better univariate result.

Table 3

Top genes selected from association testing of BMD phenotypes with exome-sequencing variants

			P-values
			SKAT		MURAT
Chr	Gene	No. of SNPs	FN	LS	FN and LS	ρ_v in MURAT
2	SNX17	34	1.02 × 10⁻¹	2.46 × 10⁻⁹	3.05 × 10⁻⁸	0.5
6	SLC17A5	42	2.39 × 10⁻⁶	4.43 × 10⁻²	6.12 × 10⁻⁷	0.6
20	C20orf187	3	1.12 × 10⁻²	1.14 × 10⁻¹	7.73 × 10⁻⁶	0

Abbreviations: BMD, bone mineral density; FN, femoral neck; LS, lumbar spine; MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test; SNP, single-nucleotide polymorphism.

Results with P-values ≤1 × 10−5 are shown. SKAT tests were not corrected for multiple comparisons. The last column reports the value of the correlation parameter ρv that gave the minimum P-value in the MURAT test.

To adjust single phenotype results for multiple testing, MURAT region-based P-values are compared with adjusted SKAT results in the right panel of Figure 4, clearly showing that many MURAT P-values tend to be smaller than the adjusted SKAT P-values. In addition, we also show the corresponding Q–Q plots for MURAT and adjusted SKAT, respectively, in the left panel of Figure 4. The MURAT Q–Q plot indicates a small amount of inflation from the null distribution. In contrast, SKAT suffers from conservativeness for large P-values (ie, earlier departure from the null distribution) and later inflation that exceeds MURAT. Hence, MURAT has the potential to improve test power over single phenotype tests by reducing adjustments for multiple testing in the tail of the distribution.

Figure 4

The left panel shows the Q–Q plot for MURAT P-values and adjusted SKAT P-values on 19 123 genes in UK10K data analysis. The slopes for the MURAT Q–Q plot and adjusted SKAT Q–Q plot are 1.04 and 1.02, respectively. The right panel shows the comparison of −log10(P-values) between MURAT and SKAT tests on each of the 19 123 genes. The SKAT results are corrected for multiple comparisons and the adjusted P-values are defined as twice the minimum of the LS- and FN-based P-values obtained via SKAT.

Several genes are well-known to contain variants that influence BMD.[21] Our results did not identify these genes as significant, either because of the limited sample size or due to performing region-based analysis of windows containing too many non-associated variants. In Supplementary S.4, we report the region-based P-values at 15 well-replicated bone-density-associated genes; MURAT has smaller or equivalent P-values at 11 of these 15 genes. The computation time for running MURAT on all 22 autosomes was manageable. On chromosome 1, for example, 2004 genes were analyzed and computational time was ~6 h using one processor on a node equipped with 64 GB RAM and two Intel Xeon CPUs E5645 at 2.40 GHz. In contrast, analysis of the 241 genes on chromosome 21 took only 52 min with MURAT. The runtimes for each chromosome are summarized in Supplementary Table S2.

Discussion

We have developed a new method, MURAT, to detect association between multiple correlated continuous phenotypes and a set of rare genetic variants. Our test can be applied to studies where features of a complex disease are measured by multiple correlated traits. Similarly, MURAT may be also beneficial for studies in which multiple questionnaires assess psychological or behavioral phenotypes. Through simulations, we show that MURAT has the potential to increase test power. As for most other rare-variant association tests, weights can be used in MURAT, to prioritize variants either based on MAFs or external annotation information. By assuming the variant effects to be random yet correlated, MURAT is based on a linear mixed model that incorporates arbitrary correlations among multivariate phenotypic traits. It is a generalization of SKAT when there exist correlations among observed phenotypes. Through the analytic derivation of the distribution of the P-value, we can avoid the need for permutation tests, except for very small significance thresholds. The computational costs for running our proposed test are much less expensive than for Maity's method (Supplementary S.5). An R package for MURAT can be downloaded from http://greenwoodlab.github.io/software/. We assumed a common correlation for the effect of the same variant on different traits, that is, the variance–covariance matrix for β is Σ=τ2R⊗W, where . This form for Σ is a generalization of the correlation structures used in other popular rare-variant association tests also based on mixed models. For example, when K=1, the matrix Σ reduces to τ2W, which is exactly the variance used for SKAT with p variants. When p=1 and K>1, our test becomes a single-variant-based multivariate test and Σ reduces to a K × K matrix Σ=τ2R. Hence, the matrix Σ in MURAT becomes the same as that used in SKAT-O, if the identity weight matrix is assumed for a set of K variants, as SKAT-O assumes an exchangeable correlation structure for variant effects. If we further assume Σ=I⊗Σ=I⊗(σ2I) in equation (3), that is, the residuals are uncorrelated, then our proposed model (2) reduces to the one used for the SKAT-O test. The specific correlation matrix structure that we assumed is of course a simplification. A more realistic assumption might allow an arbitrary correlation matrix without any constraints. However, this assumption leads to many parameters and it becomes impractical to derive a feasible procedure to estimate significance. Therefore, to compromise between biological reality and statistical feasibility, we assumed there is only one common correlation, ρ, for the effects of the same variant. Simulation shows that MURAT power is affected very little by the misspecified correlation matrix for variant effects (Supplementary S.6). Through extra simulations, we also find that MURAT is robust to different densities of grid for ρ (Supplementary S.7) and direction of variant effect β, especially under scenarios of antagonistic pleiotropy,[22] where one gene is associated with multiple traits; however, the variants are beneficial for some traits and detrimental for others (Supplementary S.8). It could be argued that observing K phenotypes for an individual is conceptually similar to observing only one phenotype in a family with K members. Although equation (1) could be adapted to univariate-trait family-based designs, the specific assumptions of our model apply only to the multiple phenotype situation. We assume there is a correlation (0≤ρ≤1) between the effects of a single variant on different phenotypes; in a family situation, it would make more sense to assume a variant always has the same effect on different family members (ρ=1). As we have previously demonstrated, our test simplifies in this situation to a score test for a multivariate fixed effect model, where we assume the same effect of each variant across the multiple traits. In families, controlling for relatedness not linked to the locus of interest is necessary and several linear mixed models[23, 24, 25] have been proposed that use the kinship matrix or realized relationship matrix to capture these residual genetic correlations. In contrast, in MURAT, the residual correlations in Σ could be due to either genetic or other factors. Covariates capturing cryptic population structure (eg, principal components[26]) can be added as fixed effect covariates, if necessary. A useful extension of our current work might be to generalize the correlated phenotypes from continuous variables to binary ones, or a mixture of these two types, and to extend the proposed multivariate test to family-based studies. Research along this line will be presented in future work.

22 in total

1. Optimal tests for rare variant effects in sequencing association studies.

Authors: Seunggeun Lee; Michael C Wu; Xihong Lin
Journal: Biostatistics Date: 2012-06-14 Impact factor: 5.899

2. Fishing for pleiotropic QTLs in a polygenic sea.

Authors: L E Bauman; L Almasy; J Blangero; R Duggirala; J S Sinsheimer; K Lange
Journal: Ann Hum Genet Date: 2005-09 Impact factor: 1.670

3. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models.

Authors: Dawei Liu; Xihong Lin; Debashis Ghosh
Journal: Biometrics Date: 2007-12 Impact factor: 2.571

Review 4. A sequence of methodological changes due to sequencing.

Authors: Kelly Burkett; Celia Greenwood
Journal: Curr Opin Allergy Clin Immunol Date: 2013-10

Review 5. Genetics of osteoporosis from genome-wide association studies: advances and challenges.

Authors: J Brent Richards; Hou-Feng Zheng; Tim D Spector
Journal: Nat Rev Genet Date: 2012-07-18 Impact factor: 53.242

6. Multivariate phenotype association analysis by marker-set kernel machine regression.

Authors: Arnab Maity; Patrick F Sullivan; Jun-Ying Tzeng
Journal: Genet Epidemiol Date: 2012-08-16 Impact factor: 2.135

7. Sequence kernel association test for quantitative traits in family samples.

Authors: Han Chen; James B Meigs; Josée Dupuis
Journal: Genet Epidemiol Date: 2012-12-26 Impact factor: 2.135

8. Gene-based multiple trait analysis for exome sequencing data.

Authors: Jingyuan Zhao; Anbupalam Thalamuthu
Journal: BMC Proc Date: 2011-11-29

9. Moving toward System Genetics through Multiple Trait Analysis in Genome-Wide Association Studies.

Authors: Daniel Shriner
Journal: Front Genet Date: 2012-01-16 Impact factor: 4.599

10. A groupwise association test for rare mutations using a weighted sum statistic.

Authors: Bo Eskerod Madsen; Sharon R Browning
Journal: PLoS Genet Date: 2009-02-13 Impact factor: 5.917

12 in total

1. Inference on phenotype-specific effects of genes using multivariate kernel machine regression.

Authors: Arnab Maity; Jing Zhao; Patrick F Sullivan; Jung-Ying Tzeng
Journal: Genet Epidemiol Date: 2018-01-03 Impact factor: 2.135

2. Statistical Inference for High-Dimensional Pathway Analysis with Multiple Responses.

Authors: Yang Liu; Wei Sun; Li Hsu; Qianchuan He
Journal: Comput Stat Data Anal Date: 2022-01-13 Impact factor: 1.681

3. Multivariate association analysis with somatic mutation data.

Authors: Qianchuan He; Yang Liu; Ulrike Peters; Li Hsu
Journal: Biometrics Date: 2017-07-19 Impact factor: 2.571

4. Testing cross-phenotype effects of rare variants in longitudinal studies of complex traits.

Authors: Pratyaydipta Rudra; K Alaine Broadaway; Erin B Ware; Min A Jhun; Lawrence F Bielak; Wei Zhao; Jennifer A Smith; Patricia A Peyser; Sharon L R Kardia; Michael P Epstein; Debashis Ghosh
Journal: Genet Epidemiol Date: 2018-03-30 Impact factor: 2.135

5. Multi-SKAT: General framework to test for rare-variant association with multiple phenotypes.

Authors: Diptavo Dutta; Laura Scott; Michael Boehnke; Seunggeun Lee
Journal: Genet Epidemiol Date: 2018-10-08 Impact factor: 2.135

6. Accelerating Gene Discovery by Phenotyping Whole-Genome Sequenced Multi-mutation Strains and Using the Sequence Kernel Association Test (SKAT).

Authors: Tiffany A Timbers; Stephanie J Garland; Swetha Mohan; Stephane Flibotte; Mark Edgley; Quintin Muncaster; Vinci Au; Erica Li-Leger; Federico I Rosell; Jerry Cai; Suzanne Rademakers; Gert Jansen; Donald G Moerman; Michel R Leroux
Journal: PLoS Genet Date: 2016-08-10 Impact factor: 5.917

7. Joint analysis of multiple blood pressure phenotypes in GAW19 data by using a multivariate rare-variant association test.

Authors: Jianping Sun; Sahir R Bhatnagar; Karim Oualkacha; Antonio Ciampi; Celia M T Greenwood
Journal: BMC Proc Date: 2016-10-18

8. A novel method to test associations between a weighted combination of phenotypes and genetic variants.

Authors: Huanhuan Zhu; Shuanglin Zhang; Qiuying Sha
Journal: PLoS One Date: 2018-01-12 Impact factor: 3.240

9. Exome-wide rare variant analyses of two bone mineral density phenotypes: the challenges of analyzing rare genetic variation.

Authors: Jianping Sun; Karim Oualkacha; Vincenzo Forgetta; Hou-Feng Zheng; J Brent Richards; Daniel S Evans; Eric Orwoll; Celia M T Greenwood
Journal: Sci Rep Date: 2018-01-09 Impact factor: 4.379

10. A rare-variant test for high-dimensional data.

Authors: Marika Kaakinen; Reedik Mägi; Krista Fischer; Jani Heikkinen; Marjo-Riitta Järvelin; Andrew P Morris; Inga Prokopenko
Journal: Eur J Hum Genet Date: 2017-05-24 Impact factor: 4.246