| Literature DB >> 26860061 |
Jianping Sun1,2, Karim Oualkacha3, Vincenzo Forgetta1,2,4,5, Hou-Feng Zheng6,7, J Brent Richards1,2,4,5,8, Antonio Ciampi1, Celia Mt Greenwood1,2,4,9.
Abstract
For region-based sequencing data, power to detect genetic associations can be improved through analysis of multiple related phenotypes. With this motivation, we propose a novel test to detect association simultaneously between a set of rare variants, such as those obtained by sequencing in a small genomic region, and multiple continuous phenotypes. We allow arbitrary correlations among the phenotypes and build on a linear mixed model by assuming the effects of the variants follow a multivariate normal distribution with a zero mean and a specific covariance matrix structure. In order to account for the unknown correlation parameter in the covariance matrix of the variant effects, a data-adaptive variance component test based on score-type statistics is derived. As our approach can calculate the P-value analytically, the proposed test procedure is computationally efficient. Broad simulations and an application to the UK10K project show that our proposed multivariate test is generally more powerful than univariate tests, especially when there are pleiotropic effects or highly correlated phenotypes.Entities:
Mesh:
Year: 2016 PMID: 26860061 PMCID: PMC4989219 DOI: 10.1038/ejhg.2016.8
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Figure 1Possible patterns of genetic architecture that could link variants at a locus to a set of continuous phenotypes. The solid line represents pleiotropy, where a single variant influences multiple phenotypes. The dashed line describes the locus heterogeneity, where different variants in the same gene or region each influence a different phenotype. The dotted line represents a situation where correlation among phenotypes arises indirectly due to the variants' effects on disease; the phenotypes could be symptoms, endophenotypes, or continuous manifestations of disease.
Type I error comparison among MURAT, SKAT, and Maity's method at moderate significance levels, α=0.05, 0.01, and 0.001
| K= | ||||||
|---|---|---|---|---|---|---|
| ρ | ρ | |||||
| α | ||||||
| 5 × 10−2 | 5.17 × 10−2 | 5.17 × 10−2 | 4.76 × 10−2 | 5.12 × 10−2 | 5.10 × 10−2 | 4.81 × 10−2 |
| 1 × 10−2 | 0.90 × 10−2 | 0.94 × 10−2 | 0.94 × 10−2 | 0.97 × 10−2 | 0.99 × 10−2 | 0.95 × 10−2 |
| 1 × 10−3 | 1.01 × 10−3 | 1.30 × 10−3 | 0.90 × 10−3 | 1.10 × 10−3 | 1.10 × 10−3 | 1.20 × 10−3 |
| ρ | ρ | |||||
| 5 × 10−2 | 4.99 × 10−2 | 4.80 × 10−2 | 4.78 × 10−2 | 5.01 × 10−2 | 4.78 × 10−2 | 4.41 × 10−2 |
| 1 × 10−2 | 1.02 × 10−2 | 0.91 × 10−2 | 0.98 × 10−2 | 0.98 × 10−2 | 0.93 × 10−2 | 0.97 × 10−2 |
| 1 × 10−3 | 1.02 × 10−3 | 0.70 × 10−3 | 1.40 × 10−3 | 1.10 × 10−3 | 0.80 × 10−3 | 1.30 × 10−3 |
Abbreviations: MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test.
The simulations are based on 10 000 simulated data sets. The results for SKAT are based on adjusted P-values, which are defined as K times the minimum of univariate-based P-values obtained via SKAT.
Type I error comparison between MURAT and SKAT at stringent significance levels, α=1 × 10−4, 1 × 10−5, and 2.5 × 10−6
| ρ | ρ | |||
|---|---|---|---|---|
| K= | ||||
| 1 × 10−4 | 1.68 × 10−4 | 1.27 × 10−4 | 1.48 × 10−4 | 1.24 × 10−4 |
| 1 × 10−5 | 2.78 × 10−5 | 1.16 × 10−5 | 1.29 × 10−5 | 1.54 × 10−5 |
| 2.5 × 10−6 | 7.60 × 10−6 | 3.30 × 10−6 | 2.80 × 10−6 | 3.30 × 10−6 |
| K= | ||||
| 1 × 10−4 | 1.64 × 10−4 | 1.29 × 10−4 | 1.35 × 10−4 | 1.23 × 10−4 |
| 1 × 10−5 | 2.15 × 10−5 | 1.25 × 10−5 | 1.82 × 10−5 | 1.35 × 10−5 |
| 2.5 × 10−6 | 9.80 × 10−6 | 4.10 × 10−6 | 10.20 × 10−6 | 3.10 × 10−6 |
Abbreviations: MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test.
The simulations are based on 107 simulated data sets. The results for SKAT are based on adjusted P-values, which are defined as K times the minimum of univariate-based P-values obtained via SKAT.
Figure 2Power comparisons among MURAT, SKAT, and Maity's method under power simulation scenario 1. The top and middle panels show empirical powers of MURAT, which tests associations with two phenotypes simultaneously, versus SKAT, which tests associations with the two phenotypes separately, when two causal variants are associated with both traits. The bottom panel shows empirical powers of MURAT, Maity's test with a linear kernel, and SKAT, when five causal variants are associated with multiple traits. The significance level is 0.05 for all tests and the univariate tests, SKAT, are corrected for multiple comparisons.
Figure 3Empirical powers of MURAT versus SKAT for trait 1 at significance level of 0.05. Causal variants are associated with only the first trait. The results for SKAT on trait 1 are not adjusted for multiple testing.
Top genes selected from association testing of BMD phenotypes with exome-sequencing variants
| ρ | ||||||
|---|---|---|---|---|---|---|
| 2 | 34 | 1.02 × 10−1 | 2.46 × 10−9 | 3.05 × 10−8 | 0.5 | |
| 6 | 42 | 2.39 × 10−6 | 4.43 × 10−2 | 6.12 × 10−7 | 0.6 | |
| 20 | 3 | 1.12 × 10−2 | 1.14 × 10−1 | 7.73 × 10−6 | 0 | |
Abbreviations: BMD, bone mineral density; FN, femoral neck; LS, lumbar spine; MURAT, Multivariate Rare-Variant Association Test; SKAT, sequence kernel association test; SNP, single-nucleotide polymorphism.
Results with P-values ≤1 × 10−5 are shown. SKAT tests were not corrected for multiple comparisons. The last column reports the value of the correlation parameter ρv that gave the minimum P-value in the MURAT test.
Figure 4The left panel shows the Q–Q plot for MURAT P-values and adjusted SKAT P-values on 19 123 genes in UK10K data analysis. The slopes for the MURAT Q–Q plot and adjusted SKAT Q–Q plot are 1.04 and 1.02, respectively. The right panel shows the comparison of −log10(P-values) between MURAT and SKAT tests on each of the 19 123 genes. The SKAT results are corrected for multiple comparisons and the adjusted P-values are defined as twice the minimum of the LS- and FN-based P-values obtained via SKAT.