Literature DB >> 30048520

Testing an optimally weighted combination of common and/or rare variants with multiple traits.

Zhenchuan Wang¹, Qiuying Sha¹, Shurong Fang², Kui Zhang¹, Shuanglin Zhang¹.

Abstract

Recently, joint analysis of multiple traits has become popular because it can increase statistical power to identify genetic variants associated with complex diseases. In addition, there is increasing evidence indicating that pleiotropy is a widespread phenomenon in complex diseases. Currently, most of existing methods test the association between multiple traits and a single genetic variant. However, these methods by analyzing one variant at a time may not be ideal for rare variant association studies because of the allelic heterogeneity as well as the extreme rarity of rare variants. In this article, we developed a statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is robust to the directions of effects of causal variants and is applicable to different types of traits. Using extensive simulation studies, we compared the performance of TOWmuT with the following five existing methods: gene association with multiple traits (GAMuT), multiple sequence kernel association test (MSKAT), adaptive weighting reverse regression (AWRR), single-TOW, and MANOVA. Our results showed that, in all of the simulation scenarios, TOWmuT has correct type I error rates and is consistently more powerful than the other five tests. We also illustrated the usefulness of TOWmuT by analyzing a whole-genome genotyping data from a lung function study.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30048520 PMCID： PMC6062080 DOI： 10.1371/journal.pone.0201186

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introductions

Many large cohort studies collected many correlated traits that can reflect underlying mechanism of complex diseases. For example, the UK10K cohort study collected 64 correlated phenotypic traits [1]. Usually, complex diseases are characterized by multiple endophenotypes. For example, hypertension can be characterized by systolic and diastolic blood pressure [2]; metabolic syndrome is evaluated by four component traits: high-density lipoprotein (HDL) cholesterol, plasma glucose and Type 2 diabetes, abdominal obesity, and diastolic blood pressure [3]; and schizophrenia can be diagnosed by eight neurocognitive domains [4]. Multiple correlated traits can be influenced by a gene simultaneously. Therefore, by joint analysis of multiple traits, we can not only gain more statistical power to detect pleiotropic variants [5-12], but also can better understand the genetic architecture of the disease of interest [13]. Several statistical methods have been developed to test the association between multiple traits and a single common variant. These methods can be roughly divided into three groups: dimension reduction methods [10, 13–15], regression methods [16-18], and combining test statistics from univariate analysis [9, 19–23]. However, due to the allelic heterogeneity and the extreme rarity of rare variants, the methods by analyzing one variant at a time for common variant association studies may not be ideal for rare variant association studies [24]. Recent genetic association studies show that complex diseases are affected by both common and rare variants [25-31]. Next-generation sequencing technology allows sequencing of the whole genome of large number of individuals, and makes rare variant association studies viable [32, 33]. Currently, statistical methods for rare variant association studies with a single trait have been developed. These methods summarize genotype information from multiple rare variants and can be divided into three groups: burden tests [24, 34–37], quadratic tests [38-41], and combined tests [42-45]. As we pointed out above, it is essential to develop statistical methods to test the association between multiple traits and multiple variants (common and/or rare variants). Very recently, a few statistical methods for this purpose are appeared [11, 46–50]. Casale et al. [47] proposed a set-based association test based on the linear mixed-model. This method enables jointly analyzing multiple correlated traits in rare variant association studies while accounting for population structure and relatedness. Wang et al. [11] proposed a multivariate functional linear model approach to test association between multiple traits and rare variants in a genomic region. In this approach, the genetic effects of variants are treated as smooth functions of genomic positions of these variants. Gene association with multiple traits (GAMuT) proposed by Broadaway et al. [46] provide a nonparametric test of independence between a set of traits and a set of genetic variants. This method compares the similarities of multiple traits with the similarities of genotypes at variants in a genomic region. Multivariate Rare-Variant Association Test (MURAT) proposed by Sun et al. [48] tests association between multiple correlated quantitative traits and a set of rare variants based on a linear mixed model. This method assumes that the effects of the variants follow a multivariate normal distribution with a zero mean and a specific covariance structure. Wu and Pankow [50] extended the commonly used sequence kernel association test (SKAT) [40] for a single trait to multiple traits and proposed multiple sequence kernel association test (MSKAT). Wang et al. [11] proposed an adaptive weighting reverse regression (AWRR) method. This method uses the score test based on the reverse regression, in which the summation of adaptively weighted genotypes is treated as the response variable and multiple traits are treated as independent variables. In this article, we developed a new statistical method by testing an optimally weighted combination of variants with multiple traits (TOWmuT) to test the association between multiple traits and a weighted combination of variants (rare and/or common) in a genomic region. TOWmuT is based on the score test under a linear model, in which the weighted combination of variants is treated as the response variable and multiple traits including covariates are treated as independent variables. The statistic of TOWmuT is the maximum of the score test statistic over weights. The weights at which the score test statistic reaches its maximum are called the optimal weights. TOWmuT is applicable to different types of traits and can include covariates. Using extensive simulation studies, we compared the performance of TOWmuT with single-TOW [39], GAMuT [46], MSKAT [50], AWRR [11] and MANOVA [7]. Our results showed that, in all the simulation scenarios, TOWmuT is either the most powerful test or comparable to the most powerful test among the six tests. We also illustrated the usefulness of TOWmuT by analyzing a real whole-genome genotyping data from a lung function study.

Methods

We consider a sample with n unrelated individuals. Each individual has K potentially correlated quantitative or qualitative traits (1 for cases and 0 for controls for a qualitative trait) and has been genotyped at M variants in a genomic region. Let denote the k trait value of the i individual and denote the genotype score of the i individual at the m variant, where is the number of minor alleles that the i individual carries at the m variant. We first centralize and as and , where and . Let Y =(y,…,y), X =(x,…,x), Y =(Y1,…,Y), and X =(X1,…,X). For the i individual, we consider a linear combination of the variants , where w =(w1,…,w) are weights and their values will be decided later.

Without covariates

We first describe our method without covariates. Consider the linear model The score test statistic to test the null hypothesis H0:β1 = ⋯ = β = 0 is given by where , , and . To simplify the computation of Eq (2), we replace XX/n with the diagonal of XX/n and let A = diag(XX/n). This simplification was also used in the past by Pan [51] and Sha et al. [39]. Then σ2 becomes and T becomes . We define the test statistic of TOWmuT as Let W = A1/2w, then , where λmax(•) indicates the largest eigenvalue of a matrix. Let W0 denote the eigenvector of A−1/2XY(YY)−1YXA−1/2 corresponding to the largest eigenvalue, then w0 = A−1/2W0 is the optimal weights. Actually, we do not need to calculate w0 in order to calculate T. If we let C = XA−1X, then We use a permutation test to evaluate the p-value of T. In details, we randomly shuffle the traits in each permutation. Note that C and (YY)−1 do not change in each permutation. Suppose that we perform B times of permutations. Let denote the value of T based on the b permuted data, where b = 0 represents the original data. Then, the p-value of T is given by

With covariates

Assume that there are p covariates and z,…z denote the p covariates of the i individual. Consider the linear model In the appendix, we showed that under model (6), the score test statistic with covariates to test the null hypothesis H0:β1 = ⋯ = β = 0 is given by where , , , , , and denote the residuals of y and x under We can see the score test statistic with covariates That is, replacing y and x by their residuals and in the score test statistic without covariates T, it becomes the score test statistic with covariates . Therefore, we define TOWmuT statistic with covariates as In summary, to apply TOWmuT with covariates, we adjust both trait value y and genotypic score x for the covariates by applying linear regressions in (8) and apply TOWmuT without covariates to the residuals and .

Comparison of methods

We compare the performance of our proposed method with the following methods: Multivariate Analysis of Variance (MANOVA) [9], MSKAT [50], GAMuT [46], AWRR [11] and single-TOW [39]. In the following, we briefly introduce each of those methods using the notations in the method section. MANOVA: Consider a multivariate multiple linear regression model: Y = Xβ+ε, where Y denotes the n×K matrix of phenotypes; X denotes the n×M matrix of genotypes; β is a M×K matrix of coefficients; ε is the n×K matrix of random errors with each row of ε to be i.i.d. MVN(0,Σ), where Σ is the covariance matrix of ε. To test H0:β = 0, the likelihood ratio test is equivalent to the Wilk’s Lambda test statistic of MANOVA, that is, . Here Λ denote the ratio of the likelihood function under H0 to the likelihood function under H1, l(β,Σ) is the log-likelihood function, and , where is the maximum likelihood estimator (MLE) of β, and |•| denotes the determinant of a matrix. The test statistic has an asymptotic distribution. MSKAT: MSKAT extends the commonly used SKAT [40] for single trait analysis to test for the joint association of rare variant set with multiple continuous traits. GAMuT: GAMuT compares the similarity in multivariate phenotypes to the similarity in rare-variant genotypes in a genomic region by a machine-learning framework called kernel distance covariance. AWRR: by collapsing genotypes using adaptive weights, AWRR uses the score test to test association based on the reverse regression, in which collapsed genotypes are treated as the response variable and multiple traits are treated as independent variables. Single-TOW: Let denote the test statistic of TOW to test the association between the kth trait and the genotypes at the variants in a genomic region. The test statistic of single-TOW is given by T = min1≤ p, where p is the p-value of for k = 1,…,K. The p-value of T is estimated using a permutation procedure.

Simulations

In our simulation studies, we use the empirical Mini-Exome genotype data provided by the genetic analysis workshop 17 (GAW17) to generate genotypes. This dataset contains genotypes of 697 unrelated individuals on 3205 genes. Same as the simulation studies in Sha et al. [39] and Fang et al. [52], we choose four genes in the empirical Mini-Exome genotype data. These four genes are ELAVL4 (gene1), MSH4 (gene2), PDE4B (gene3), and ADAMTS4 (gene4). Each gene contains 10, 20, 30, and 40 variants, respectively. Then, we merge the four genes to form a super gene (Sgene) with 100 variants. We generate genotypes based on the genotypes of 697 individuals in the Sgene since the distribution of the minor allele frequencies (MAFs) in the Sgene are similar to the distribution of MAFs in all of the 3205 genes (Figure A in S1 File). To generate a qualitative trait, we use a liability threshold model based on a quantitative trait [44]. An individual is classified as affected if the individual’s trait is at least one standard deviation larger than the mean of the trait. This leads to a prevalence of 16% for the simulated disease in the general population. In the following, we only describe how to generate a quantitative trait. We assume that all causal variants are rare (MAF < 0.01). We randomly choose n rare variants as causal variants, where n is determined by the percentage of causal variants among rare variants. We use n and n to denote the number of risk rare variants and protective rare variants, respectively, where n + n = n. Let and denote the genotypic scores of the q risk rare variant and the j protective rare variant for the i individual, respectively. We assume that genotypes impact on L traits. Let h and h denote the heritability of all the n rare causal variants for the L traits and the l trait among the L traits, respectively. We generate L random numbers t1,…,t from a uniform distribution between 0 and 1. Then, the heritability of l trait among the L traits is . Given the heritability of the l trait h, we generate n random numbers from a uniform distribution between 0 and 1. The heritability of the m causal variant for the l trait is given by . In our simulation studies, we consider two covariates Z1 and Z2, where Z1 is a continuous covariate generated from a standard normal distribution, and Z2 is a binary covariate taking values 0 and 1 with a probability of 0.5. We generate K traits by considering the factor model [10, 13, 21] where y = (y1,…,y); e = (1,…,1), λ = (λ1,…,λ) is the vector involved genotypes; f = (f1,…,f) ~ MVN(0,Σ), Σ = (1−ρ)I + ρA, A is a matrix with elements of 1, I is the identity matrix, and ρ is the correlation between f and f; R is the number of factors; γ is a K by R matrix; c is a constant number; ε = (ε1,…,ε) is a vector of residuals; and ε1,…,ε are independent, ε ~ N(0,1) for k = 1,…,K. As in Wang et al. [10], we consider the following six models with different number of factors and different number of traits affected by genotypes. In these models, the within-factor correlation is c2 and the between-factor correlation is ρ1 = ρc2. Model 1: There is only one factor and genotypes impact on 6 traits with the same effect size. This is equivalent to set R = 1 and γ = (1,…,1). In details, Model 2: There are five factors and genotypes impact on 6 traits. We set R = 5 and γ = diag(D1,D2,D3,D4,D5), where for i = 1,…,5. In details, Model 3: There are two factors and genotypes impact on 6 traits. That is, R = 2 and γ = diag(D1,D2), where for i = 1,2. In details, Model 4: There are five factors and genotypes impact on one trait. That is, R = 5 and γ = diag(D1,D2,D3,D4,D5), where for i = 1,…,5. In details, Model 5: There are only two factors and genotypes impact on one trait. That is, R = 2 and γ = diag(D1,D2), where for i = 1,2. In details, Model 6: There is K factors and genotypes impact on 6 traits. That is, R = K, γ = I, and c = 1. In details,

Results

To evaluate the type I error rates of the proposed test TOWmuT, we set λ = 0 for k = 1,…,K in all of the 6 models. We consider different models, different sample sizes, different significance levels, and different types of traits. In our simulations we consider 10 traits (K = 10). In each simulation scenario, we estimate the p-values of TOWmuT using 1000 permutations and evaluate the type I error rates of TOWmuT using 10,000 replicated samples. For 10,000 replicated samples, the 95% confidence interval (CI) for the estimated type I error rates of nominal level 0.05 is (0.046, 0.054) and the 95% CI at the nominal level of 0.01 is (0.008, 0.012). Tables 1 and 2 summarize the estimated type I error rates of TOWmuT. From these two tables, we can see that 70 out of 72 (greater than 95%) estimated type I error rates are within the 95% CIs and the two estimated type I error rates not within the 95% CIs (0.05555 and 0.01295) are very close to the bound of the corresponding 95% CI, which indicates that TOWmuT is valid.

Table 1

The estimated type I error rates of TOWmuT for 10 quantitative traits under each model with covariates.

	Sample Size
	Model	500	1000	2000
α = 0.05	1	0.05365	0.0515	0.0515
	2	0.0521	0.0528	0.0504
	3	0.0513	0.0540	0.0503
	4	0.0514	0.0511	0.05
	5	0.05381	0.04825	0.05
	6	0.0482	0.0508	0.05325
α = 0.01	1	0.01165	0.0098	0.0117
	2	0.012	0.01015	0.0102
	3	0.01175	0.01075	0.0113
	4	0.01145	0.01075	0.0118
	5	0.01141	0.01095	0.0117
	6	0.0097	0.0105	0.01185

Table 2

The estimated type I error rates of TOWmuT for the mixture of five quantitative traits and five qualitative traits under each model with covariates.

	Sample Size
	Model	500	1000	2000
α = 0.05	1	0.05365	0.05385	0.05005
	2	0.0511	0.0483	0.05115
	3	0.0508	0.05375	0.052
	4	0.0529	0.04915	0.0536
	5	0.054	0.05355	0.04825
	6	0.05555	0.0493	0.0529
α = 0.01	1	0.0105	0.01295	0.00995
	2	0.0105	0.009	0.0097
	3	0.01145	0.0104	0.0101
	4	0.01065	0.00945	0.01165
	5	0.0118	0.0105	0.00875
	6	0.01195	0.00935	0.01105

For power comparisons, we consider different models, different types of traits, different percentages of protective variants, different values of heritability, different values of between-factor correlation, and different values of within-factor correlation. In each of the simulation scenarios, we estimate the p-values of TOWmuT, AWRR and single-TOW using 1,000 permutations and we estimate the p-values of MANOVA, GAMuT, and MSKAT using their asymptotic distributions. We evaluate the powers of all of the six tests using 1,000 replicated samples at a significance level of 0.05. Fig 1 gives the power comparisons of the six tests (Single-TOW, MSKAT, AWRR, MANOVA, GAMuT, and TOWmuT) for the power as a function of the total heritability based on the six models for 10 quantitative traits. This figure shows that (1) TOWmuT is consistently the most powerful one among the six tests; (2) MANOVA is the second most powerful when genotypes impact on multiple traits (models 1–3 and 6) while AWRR is the second most powerful when genotypes impact on a single trait (models 4–5); (3) MSKAT is consistently less powerful than other multivariate tests probably because SKAT gives larger weights than that of TOW to only those variants with MAF in the range (0.01,0.035) and there are only 8% variants with MAF in the range (0.01,0.035) in Sgene which our simulations are based on; and (4) MSKAT and GAMuT have similar powers in all six models.

Fig 1

Power comparisons of the six tests (Single-TOW, MSKAT, AWRR, MANOVA, GAMuT and TOWmuT) for the power as a function of total heritability for 10 quantitative traits with covariates.

The sample size is 1000. The between-factor correlation is 0.3 and the within-factor correlation is 0.7. The percentage of the causal variants is 0.2. All causal variants are risk variants.

Power comparisons of the six tests (Single-TOW, MSKAT, AWRR, MANOVA, GAMuT and TOWmuT) for the power as a function of total heritability for 10 quantitative traits with covariates.

The sample size is 1000. The between-factor correlation is 0.3 and the within-factor correlation is 0.7. The percentage of the causal variants is 0.2. All causal variants are risk variants. Fig 2 gives the power comparisons of the five tests (Single-TOW, AWRR, MSKAT, GAMuT, and TOWmuT) for the power as a function of the total heritability for the mixture of 5 quantitative traits and 5 qualitative traits. We only compare the powers of five tests because MANOVA has inflated type I error rate in this case. This figure shows that (1) TOWmuT is consistently the most powerful one among the five tests; (2) AWRR is second most powerful when genotypes impact on multiple traits (models 1–3 and 6) while MSKAT and GAMuT are second most powerful when genotypes impact on a single trait (models 4–5); (3) MSKAT and GAMuT have similar powers in all six models; and (4) single-TOW is consistently less powerful than other four multivariate tests because we keep correlations between traits similar to that in Fig 1 such that correlations between original quantitative traits are larger than that in Fig 1.

Fig 2

Power comparisons of the five tests (Single-TOW, AWRR, GAMuT, MSKAT and TOWmuT) for the power as a function of heritability for the mixture of half quantitative traits and half qualitative traits with covariates.

Power comparisons of the five tests (Single-TOW, AWRR, GAMuT, MSKAT and TOWmuT) for the power as a function of heritability for the mixture of half quantitative traits and half qualitative traits with covariates.

The sample size is 1000. Covariance matrix of 10 traits is similar to that of 10 quantitative traits with between-factor correlation being 0.3 and the within-factor correlation being 0.7. The percentage of the causal variants is 0.2. All causal variants are risk variants. We also compare the powers of the six tests for the power as a function of the within-factor correlation for models 1–5 and between-factor correlation for model 6 for 10 quantitative traits (Figure B in S1 File). As shown in this figure, the power of single-TOW is robust to the between-factor correlation or the within-factor correlation since the minimum p-value-based approach is largely unaffected by the trait correlation [50]. However, with the increasing of the between-factor correlation or within-factor correlation, the power of other five tests essentially increases. Other patterns of the power comparisons are similar to those of in Fig 1. Power comparisons of the six tests for the power as a function of the percentage of protective variants for 10 quantitative traits are given by Figure C in S1 File. This figure shows that the power of all six tests are robust to the percentage of protective variants, therefore, all of these methods are robust to the directions of the genetic effects. Other patterns of the power comparisons are similar to those of in Fig 1.

Application to the COPDGene

Chronic obstructive pulmonary disease (COPD) is a common disease in elderly patients that causes significant morbidity and mortality [53]. The Genetic Epidemiology of COPD Study (COPDGene) [54] was designed to identify genetic factors associated with COPD. In this COPDGene study, a total of more than 10,000 subjects have been enrolled including 2/3 non-Hispanic Whites (NHW) and 1/3 African-Americans (AA). In this analysis, we only include 5,430 NHW with no missing phenotypes. Each of the 5,430 NHW has been genotyped at 630,860 SNPs. Based on the literature studies of COPD [9, 55, 56], we chose BMI, Age, Pack-Years (PackYear) and Sex as covariates and selected seven quantitative COPD-related phenotypes. These seven phenotypes are FEV1 (% predicted FEV1), Emphysema (Emph), Emphysema Distribution (EmphDist), Gas Trapping (GasTrap), Airway Wall Area (Pi10), Exacerbation frequency (ExacerFreq), and Six-minute walk distance (6MWD) [9]. The correlation structure of the seven COPD-related phenotypes is given in Figure D in S1 File. To evaluate the performance of our proposed method on a real data set, we applied six methods (TOWmuT, MANOVA, MSKAT, GAMuT, AWRR, and single-TOW) to the COPDGene of NHW population to test the association between each of 50-SNP blocks and the seven quantitative COPD-related phenotypes. To identify significant 50-SNP blocks associated with the phenotypes, we used Bonferroni correction to decide the significance level. The total number of 50-SNP blocks is 12617, therefore, the Bonferroni corrected significance level is 0.05/12617 ≈ 4×10−6. Table 3 summarized the significant blocks identified by at least one method. There were total six significant blocks in Table 3. All of the six blocks have been previously reported to be in association with COPD or lung functions [57-60]. PDSS1 and ABI1 are located between LOC107984176 and LOC105376467, which are Intergenic regions and contain the SNPs associated with pulmonary function [60, 61]. From Table 3, we can see that TOWmuT identified four blocks; AWRR identified two blocks; MANOVA, MSKAT and GAMuT identified one block; single-TOW did not identify any blocks. From these results, we can see that TOWmuT identified the most of significant 50-SNP blocks among the six methods, which is consistent with the results of our simulation studies.

Table 3

Significant blocks identified by at least one method (p-values less than 4×10−6) and the corresponding p-values in the analysis of COPDGene.

CHR	POS1	POS2	Genes	TOWmuT	MANOVA	MSKAT	GAMuT	AWRR	Single-TOW
2	178000985	178419117	NFE2L2	0.20883	2.62E-06	0.02508	0.02505	0.25796	0.15468
4	145278837	145697040	HHIP	1.00E-07	7.71E-06	0.03992	0.03984	0	0.00085
10	26908475	27150093	PDSS1, ABI1	4.00E-06	0.04050	0.01242	0.01247	1.6E-05	0.02845
15	78593362	78825917	IREB2, AGPHD1	1.00E-07	0.00191	0.70349	0.70357	5.6E-06	0.23484
15	78826180	79006442	PSMA4, CHRNA5, CHRNA3, CHRNB4	2.90E-06	0.00037	0.06255	0.06252	0	0.37643
15	79006582	79267817	ADAMTS7	9.01E-05	4.78E-05	2.25E-06	6.42E-07	0.04849	0.01953

Discussion

In this article, we developed TOWmuT to perform joint analysis of multiple traits in gene-based association studies. The motivations to develop this method are based on the following: (1) for complex diseases, multiple correlated traits are usually measured in genetic association studies; (2) there is increasing evidence demonstrating that pleiotropy is a widespread phenomenon in complex diseases [5]; and (3) there is a shortage of gene-based approaches for multiple traits. We used extensive simulation studies to compare the performance of TOWmuT with MANOVA, MSKAT, AWRR, GAMuT and Single-TOW. Our simulation results showed that TOWmuT has correct type I error rates and is consistently more powerful than other five methods we compared. Furthermore, the results from real data analysis showed that the proposed method has great potential in gene-based association study for complex diseases with multiple phenotypes such as COPD. Recently, it has become a major focus of investigation to identify a small number of rare causal variants that contribute to complex diseases [62]. Several methods to pinpoint the causal variants have been developed for testing the association with a single trait. These methods include backward elimination (BE) method [63], hierarchical model method [63], and adaptive combination of p-values method [64]. To extend the TOWmuT method to identify a small number of causal variants which are associated with multiple traits, we can use the BE method. In each step, we remove one variant that has the smallest contribution to the association between multiple traits and the set of variants and then we evaluate the p-value for testing association between multiple traits and the remaining variants by TOWmuT. Causal variants are the set of variants corresponding to the smallest p-value. The computation time required for running TOWmuT depends on the number of traits, the sample size, the number of permutations, and the number of variants in a genomic region. The running time of TOWmuT with 1000 permutations on a data set with 5000 individuals, seven traits, and 10 variants in a genomic region on a laptop with 4 Intel Cores @ 3.30GHz and 4 GB memory is about 0.14s. To perform real data analysis at a genome-wide level, we can first select genomic regions that show evidence of association based on a small number of permutations (e.g. 1,000), and then use a large number of permutations to test the selected regions.

Appendix

We use the same notations in the method section. Let Y = (Y1,…,Y), Z = (1z…,z), Z = (Z1,…,Z), and x = (x1,…,x). Under the linear model the log-likelihood (up to a constant) is given by where α = (α0,…,α), β = (β1,…,β), and ε1,…,ε are independent and ε ~ N(0,σ2). Then, Let and denote the maximum likelihood estimates of α and σ2 under null hypothesis H0:β = 0. Then, and , where P = Z(ZZ)−1Z. Let θ = (α,β). The score and information matrix are and , where U = Y(I−P)x = Y(I−P)Xw. The score test statistic is given by where V = Y(I−P)Y. Note that (I−P)2 = I−P. We have , , , and , where and is the residual of x under the linear regression model (8); and is the residual of y under the linear regression model (8). Therefore,

Supplementary information.

(PDF) Click here for additional data file.

64 in total

1. Extending rare-variant testing strategies: analysis of noncoding sequence and imputed genotypes.

Authors: Matthew Zawistowski; Shyam Gopalakrishnan; Jun Ding; Yun Li; Sara Grimm; Sebastian Zöllner
Journal: Am J Hum Genet Date: 2010-11-12 Impact factor: 11.025

2. Ten genes for inherited breast cancer.

Authors: Tom Walsh; Mary-Claire King
Journal: Cancer Cell Date: 2007-02 Impact factor: 31.743

3. Neurocognitive endophenotypes in a multiplex multigenerational family study of schizophrenia.

Authors: Raquel E Gur; Vishwajit L Nimgaonkar; Laura Almasy; Monica E Calkins; J Daniel Ragland; Michael F Pogue-Geile; Stephen Kanes; John Blangero; Ruben C Gur
Journal: Am J Psychiatry Date: 2007-05 Impact factor: 18.112

4. Pleiotropy analysis of quantitative traits at gene level by multivariate functional linear models.

Authors: Yifan Wang; Aiyi Liu; James L Mills; Michael Boehnke; Alexander F Wilson; Joan E Bailey-Wilson; Momiao Xiong; Colin O Wu; Ruzong Fan
Journal: Genet Epidemiol Date: 2015-03-23 Impact factor: 2.135

5. Detecting association of rare variants by testing an optimally weighted combination of variants for quantitative traits in general families.

Authors: Shurong Fang; Shuanglin Zhang; Qiuying Sha
Journal: Ann Hum Genet Date: 2013-08-22 Impact factor: 1.670

6. Joint Analysis of Multiple Traits in Rare Variant Association Studies.

Authors: Zhenchuan Wang; Xuexia Wang; Qiuying Sha; Shuanglin Zhang
Journal: Ann Hum Genet Date: 2016-03-16 Impact factor: 1.670

7. An Adaptive Association Test for Multiple Phenotypes with GWAS Summary Statistics.

Authors: Junghi Kim; Yun Bai; Wei Pan
Journal: Genet Epidemiol Date: 2015-10-22 Impact factor: 2.135

Review 8. Exome sequencing: the sweet spot before whole genomes.

Authors: Jamie K Teer; James C Mullikin
Journal: Hum Mol Genet Date: 2010-08-12 Impact factor: 6.150

9. Joint Analysis of Multiple Traits Using "Optimal" Maximum Heritability Test.

Authors: Zhenchuan Wang; Qiuying Sha; Shuanglin Zhang
Journal: PLoS One Date: 2016-03-07 Impact factor: 3.240

10. The UK10K project identifies rare variants in health and disease.

Authors: Klaudia Walter; Josine L Min; Jie Huang; Lucy Crooks; Yasin Memari; Shane McCarthy; John R B Perry; ChangJiang Xu; Marta Futema; Daniel Lawson; Valentina Iotchkova; Stephan Schiffels; Audrey E Hendricks; Petr Danecek; Rui Li; James Floyd; Louise V Wain; Inês Barroso; Steve E Humphries; Matthew E Hurles; Eleftheria Zeggini; Jeffrey C Barrett; Vincent Plagnol; J Brent Richards; Celia M T Greenwood; Nicholas J Timpson; Richard Durbin; Nicole Soranzo
Journal: Nature Date: 2015-09-14 Impact factor: 49.962