| Literature DB >> 28562689 |
Stefan Konigorski1, Yildiz E Yilmaz2,3,4, Tobias Pischon1,5,6.
Abstract
In genetic association studies of rare variants, low statistical power and potential violations of established estimator properties are among the main challenges of association tests. Multi-marker tests (MMTs) have been proposed to target these challenges, but any comparison with single-marker tests (SMTs) has to consider that their aim is to identify causal genomic regions instead of variants. Valid power comparisons have been performed for the analysis of binary traits indicating that MMTs have higher power, but there is a lack of conclusive studies for quantitative traits. The aim of our study was therefore to fairly compare SMTs and MMTs in their empirical power to identify the same causal loci associated with a quantitative trait. The results of extensive simulation studies indicate that previous results for binary traits cannot be generalized. First, we show that for the analysis of quantitative traits, conventional estimation methods and test statistics of single-marker approaches have valid properties yielding association tests with valid type I error, even when investigating singletons or doubletons. Furthermore, SMTs lead to more powerful association tests for identifying causal genes than MMTs when the effect sizes of causal variants are large, and less powerful tests when causal variants have small effect sizes. For moderate effect sizes, whether SMTs or MMTs have higher power depends on the sample size and percentage of causal SNVs. For a more complete picture, we also compare the power in studies of quantitative and binary traits, and the power to identify causal genes with the power to identify causal rare variants. In a genetic association analysis of systolic blood pressure in the Genetic Analysis Workshop 19 data, SMTs yielded smaller p-values compared to MMTs for most of the investigated blood pressure genes, and were least influenced by the definition of gene regions.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28562689 PMCID: PMC5451057 DOI: 10.1371/journal.pone.0178504
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Detailed overview of the different scenarios considered for the type I error and power study.
| Investigation | Scenario | % of causal variants | Effect size weights | Direction of effect: Positive / Negative | Median (MAD) of explained variance in % |
|---|---|---|---|---|---|
| Type I error | 0 | 0% | c = 0 | - | - |
| Power | 1 | 5% | c = 0.6 | 100% / 0% | 0.9% (0.6) |
| 2 | 5% | c = 0.3 | 100% / 0% | 0.2% (0.2) | |
| 3 | 5% | c = 0.2 | 100% / 0% | 0.1% (0.1) | |
| 4 | 10% | c = 0.6 | 100% / 0% | 1.9% (1.4) | |
| 5 | 10% | c = 0.3 | 100% / 0% | 0.5% (0.3) | |
| 6 | 10% | c = 0.2 | 100% / 0% | 0.2% (0.2) | |
| 7 | 20% | c = 0.6 | 100% / 0% | 3.8% (2.1) | |
| 8 | 20% | c = 0.3 | 100% / 0% | 1.0% (0.6) | |
| 9 | 20% | c = 0.2 | 100% / 0% | 0.4% (0.2) | |
| 10 | 50% | c = 0.6 | 100% / 0% | 9.1% (3.0) | |
| 11 | 50% | c = 0.3 | 100% / 0% | 2.4% (0.9) | |
| 12 | 50% | c = 0.2 | 100% / 0% | 1.1% (0.4) | |
| 13–24 | As in scenarios 1–12 | As in scenarios 1–12 | 80% / 20% | As in scenarios 1–12 | |
| 25–36 | As in scenarios 1–12 | As in scenarios 1–12 | 50% / 50% | As in scenarios 1–12 |
The scenarios vary the percentage of causal variants, their effect size, and the percentage of causal variants with effects in positive/ negative direction. Scenarios 13–24 and 25–36 have the same percentage of causal SNVs and the same effect sizes as scenarios 1–12, but 80% / 20% and 50% / 50% of effects in positive / negative direction. The percentage of causal rare variants is with respect to the total number of rare variants with MAF≤0.03 in the gene. The effect size of a variant with a given MAF on the trait Y is β = c ∙ |log10(MAF)|. The percentage of explained variance for a given gene is calculated as the sum of over all variants i in the gene. Reported are the median and the median absolute deviation (MAD) of this heritability estimate over the 10,000 replicates.
Fig 1Quantile-quantile plots of SMT statistic values for singletons, doubletons, and all SNVs.
Datasets were generated from the null model described in scenario 0 in Table 1 of size n = 1,000 for m = 10,000,000 replicates. Quantile-Quantile plots are shown comparing the empirical quantiles of the t-test statistics of singletons (left panel), doubletons (middle panel), and all SNVs (right panel) to the theoretical quantiles of the tdf = 1000–4 distribution. For computational purposes, each plot is based on a random sample of 1,000,000 t-test statistics, out of the 132,797,000 t-test statistics of all singletons in all replicates, out of the 41,341,000 t-test statistics of all doubletons in all replicates, and out of the 325,393,000 t-test statistics of all SNVs in all replicates. In grey ribbons, approximate 95% point-wise confidence intervals are shown.
Empirical type I error of MMTs and SMT for different nominal α levels.
| MMTs | SMT | ||||
|---|---|---|---|---|---|
| Nominal | SKAT | SKAT-O | Burden | Linear Regression Bonferroni correction | Linear Regression BH correction |
| 5 ∙ 10−2 | 4.94 ∙ 10−2 | 5.22 ∙ 10−2 | 5.01 ∙ 10−2 | 4.39 ∙ 10−2 | 4.93 ∙ 10−2 |
| 1 ∙ 10−2 | 0.97 ∙ 10−2 | 1.10 ∙ 10−2 | 1.00 ∙ 10−2 | 0.90 ∙ 10−2 | 0.99 ∙ 10−2 |
| 1 ∙ 10−3 | 0.94 ∙ 10−3 | 1.12 ∙ 10−3 | 1.00 ∙ 10−3 | 0.88 ∙ 10−3 | 0.97 ∙ 10−3 |
| 1 ∙ 10−4 | 0.92 ∙ 10−4 | 1.17 ∙ 10−4 | 1.03 ∙ 10−4 | 0.91 ∙ 10−4 | 0.98 ∙ 10−4 |
| 1 ∙ 10−5 | 0.92 ∙ 10−5 | 1.10 ∙ 10−5 | 1.12 ∙ 10−5 | 1.16 ∙ 10−5 | 1.25 ∙ 10−5 |
| 2.5 ∙ 10−6 | 2.40 ∙ 10−6 | 2.40 ∙ 10−6 | 3.50 ∙ 10−6 | 2.40 ∙ 10−6 | 2.90 ∙ 10−6 |
Data was generated from the null model with size n = 1,000 for m = 10,000,000 replicates.
Fig 2Power estimates of the SMT and MMTs.
Data was generated under an alternative-hypothesis model described in scenarios 1–36 in Table 1 of size n = 1,000 for m = 10,000 replicates. The nominal α was set to 0.05 (upper panel) and 2.5∙10−6 (lower panel). In the lower panel with α = 2.5∙10−6, the coordinate system is shown on a log10-scale to better visualize the small power differences between the approaches. Multiple testing corrections for the SMT of all SNVs in a gene were done using the BH-correction.
Gene-level p-values for the association tests of candidate genes with SBP in the genetic analysis 19 data analysis.
| Gene | SKAT | SKAT-O | Burden | SMT | ||||
|---|---|---|---|---|---|---|---|---|
| All SNVs | Rare SNVs | All SNVs | Rare SNVs | All SNVs | Rare SNVs | All SNVs | Rare SNVs | |
| INSR | 0.51 | 0.73 | 0.72 | 0.76 | 0.27 | 0.70 | 0.14 | 0.07 |
| RRAS | 0.33 | 0.32 | 0.49 | 0.46 | 0.59 | 0.34 | 0.26 | 0.18 |
| ZNF101 | 0.94 | 0.93 | 0.91 | 0.87 | 0.97 | 0.72 | 0.98 | 0.97 |
| ELAVL3 | 0.46 | 0.44 | 0.29 | 0.42 | 0.12 | 0.27 | 0.76 | 0.74 |
| RGL3 | 0.21 | 0.13 | 0.32 | 0.22 | 0.46 | 0.22 | 0.03 | 0.02 |
| AMH | 0.30 | 0.24 | 0.46 | 0.38 | 0.65 | 0.73 | 0.18 | 0.12 |
| DOT1L | 0.59 | 0.60 | 0.78 | 0.79 | 0.43 | 0.73 | 0.70 | 0.51 |
| PLEKHJ1 | 0.59 | 0.50 | 0.79 | 0.72 | 0.33 | 0.67 | 0.18 | 0.13 |
| SF3A2 | 0.50 | 0.36 | 0.70 | 0.54 | 0.41 | 0.97 | 0.23 | 0.15 |
Unadjusted p-values from gene-level genetic association analysis of all common and rare SNVs (“all SNVs”), and only rare SNVs (“rare SNVs”) in 9 candidate genes with SBP. Adjustments for multiple testing of all SNVs in SMTs in a gene were done using the BH-correction.