| Literature DB >> 24675868 |
Ping Zeng1, Yang Zhao2, Liwei Zhang2, Shuiping Huang3, Feng Chen2.
Abstract
This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24675868 PMCID: PMC3968153 DOI: 10.1371/journal.pone.0093355
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Simulation characteristics.
|
| Total SNPs | Selected SNPs | Used rare variants | Causal rare variants |
| 300 | 417 | 125 | 41 | 12 |
| 400 | 434 | 130 | 47 | 14 |
| 500 | 447 | 134 | 51 | 15 |
Estimated Type I error.
|
| Burden | SKAT | SKAT-O | LRT.K | ReLRT.K | LRT.M | ReLRT.M |
|
| |||||||
| 300 | 0.051 | 0.042 | 0.043 | 0.047 | 0.052 | 0.041 | 0.046 |
| 400 | 0.060 | 0.044 | 0.055 | 0.050 | 0.056 | 0.048 | 0.051 |
| 500 | 0.058 | 0.043 | 0.054 | 0.042 | 0.046 | 0.040 | 0.042 |
|
| |||||||
| 300 | 0.010 | 0.008 | 0.008 | 0.011 | 0.012 | 0.010 | 0.010 |
| 400 | 0.012 | 0.008 | 0.012 | 0.010 | 0.012 | 0.010 | 0.010 |
| 500 | 0.010 | 0.008 | 0.010 | 0.010 | 0.010 | 0.009 | 0.010 |
Figure 1Estimated power for all the tests.
The top panel is for α = 0.01 and the bottom panel is for α = 0.05. M% negative means that in these associated SNPs M% have effects −0.3|log10MAF| and the rest (100-M)% are 0.3|log10MAF|.
Losses of the power for α = 0.05&.
|
| burden | SKAT | SKAT-O | LRT.K | ReLRT.K | LRT.M | ReLRT.M |
| 30% Negative | |||||||
| 300 | 0.456 | 0.100 | 0.280 | 0.098 | 0.087 | 0.099 | 0.096 |
| 400 | 0.574 | 0.089 | 0.279 | 0.086 | 0.079 | 0.081 | 0.084 |
| 500 | 0.593 | 0.059 | 0.233 | 0.058 | 0.065 | 0.052 | 0.063 |
| Average | 0.541 | 0.083 | 0.264 | 0.081 | 0.077 | 0.077 | 0.081 |
| 50% Negative | |||||||
| 300 | 0.537 | 0.151 | 0.356 | 0.128 | 0.132 | 0.124 | 0.127 |
| 400 | 0.667 | 0.096 | 0.313 | 0.073 | 0.066 | 0.073 | 0.071 |
| 500 | 0.692 | 0.068 | 0.247 | 0.050 | 0.052 | 0.043 | 0.053 |
| Average | 0.632 | 0.105 | 0.305 | 0.084 | 0.083 | 0.080 | 0.084 |
: The values are differences of power between the situation with none of the causal variants (i.e., 0%) being negative and the situation with 30% or 50% causal variants being negative.
: It means that 30% or 50% causal variants are negatively related to phenotype with effects −0.3|log10MAF| and the rest 70% or 50% are positively related to phenotype with effects 0.3|log10MAF|.
: The average is calculated across sample sizes.
Losses of the power for α = 0.01&.
|
| burden | SKAT | SKAT-O | LRT.K | ReLRT.K | LRT.M | ReLRT.M |
| 30% Negative | |||||||
| 300 | 0.355 | 0.080 | 0.233 | 0.072 | 0.078 | 0.068 | 0.074 |
| 400 | 0.490 | 0.092 | 0.330 | 0.081 | 0.086 | 0.083 | 0.083 |
| 500 | 0.530 | 0.052 | 0.279 | 0.039 | 0.038 | 0.038 | 0.036 |
| Average | 0.458 | 0.075 | 0.281 | 0.064 | 0.067 | 0.063 | 0.064 |
| 50% Negative | |||||||
| 300 | 0.393 | 0.124 | 0.301 | 0.094 | 0.094 | 0.090 | 0.091 |
| 400 | 0.545 | 0.105 | 0.373 | 0.089 | 0.092 | 0.090 | 0.087 |
| 500 | 0.592 | 0.074 | 0.310 | 0.054 | 0.065 | 0.054 | 0.056 |
| Average | 0.510 | 0.101 | 0.328 | 0.079 | 0.084 | 0.078 | 0.078 |
: The values are differences of power between the situation with none of the causal variants (i.e., 0%) being negative and the situation with 30% or 50% causal variants being negative.
: It means that 30% or 50% causal variants are negatively related to phenotype with effects −0.3|log10MAF| and the rest 70% or 50% are positively related to phenotype with effects 0.3|log10MAF|.
: The average is calculated across sample sizes.
Characteristics of the used GAW17 data#.
| Gene | Chr | Total | Rare | Causal | MAF | Causal Effects |
|
| 19 | 21 | 15 | 3 | 7.17×10−3∼0.385 | 0.174668, 0.51468, 0.265181 |
|
| 13 | 35 | 25 | 11 | 7.17×10−3∼0.291 | 0.18047, 0.457361, 0.732566, 0.839669, 0.38582, 0.549816, 0.623466, 0.653351, 0.59670, 0.549214, 0.090586 |
|
| 4 | 16 | 14 | 10 | 7.17×10−3∼0.165 | 0.598271, 0.715613, 0.503025, 1.17194, 0.149975, 0.610938, 0.318125, 0.312058, 1.171940, 0.417977 |
: Chr indicates the chromosome, Total indicates the total number of SNPs contained in the gene, and Rare indicates the number of rare SNPs within the gene.
Results of the used GAW17 data.
| p value | λ | ||||||
| Gene | Burden | SKAT | SKAT-O | LRT | ReLRT | LRT | ReLRT |
|
| 0.262 | 0.483 | 0.420 | 0.388 | 0.387 | <0.001 | <0.001 |
|
| 6.12×10−8 | 9.01×10−7 | 1.03×10−9 | 6.28×10−7 | 5.44×10−7 | 0.750 | 0.748 |
|
| 9.27×10−7 | 1.29×10−3 | 2.78×10−6 | 4.99×10−5 | 4.83×10−5 | 1.778 | 1.767 |