| Literature DB >> 22373052 |
Abstract
New high-throughput sequencing technologies have brought forth opportunities for unbiased analysis of thousands of rare genomic variants in genome-wide association studies of complex diseases. Because it is hard to detect single rare variants with appreciable effect sizes at the population level, existing methods mostly aggregate effects of multiple markers by collapsing the rare variants in genes (or genomic regions). We hypothesize that a higher level of aggregation can further improve association signal strength. Using the Genetic Analysis Workshop 17 simulated data, we test a two-step strategy that first applies a collapsing method in a gene-level analysis and then aggregates the gene-level test results by performing an enrichment analysis in gene sets. We find that the gene set approach which combines signals across multiple genes outperforms testing individual genes separately and that the power of the gene set enrichment test is further improved by proper adjustment of statistics to account for gene-wise differences.Entities:
Year: 2011 PMID: 22373052 PMCID: PMC3287890 DOI: 10.1186/1753-6561-5-S9-S52
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
False positive rates at a nominal significance level of 0.05 in step 1
| CMC-1 | CMC-count | WeightSum1 | WeightSum2 | |
|---|---|---|---|---|
| Irrelevant genes | 0.087 | 0.087 | 0.108 | 0.086 |
| Irrelevant genes excluding spurious ones | 0.060 | 0.060 | 0.059 | 0.052 |
| Permutations | 0.050 | 0.050 | 0.035 | 0.050 |
False positive rates were calculated by counting significant values in either irrelevant genes or in 2,000 permutation tests.
Power to detect the risk genes for Q2 in step 1
| Gene | CMC-1 | CMC-count | WeightSum1 | WeightSum2 |
|---|---|---|---|---|
| 0.36 | 0.28 | 0.04 | <0.01 | |
| 0.52 | 0.52 | 0.64 | 0.52 | |
| 0.08 | 0.08 | <0.01 | 0.12 | |
| 0.24 | 0.24 | 0.04 | 0.04 | |
| 0.52 | 0.52 | 0.04 | <0.01 | |
| <0.01 | <0.01 | <0.01 | <0.01 | |
| 0.16 | 0.16 | 0.24 | 0.12 | |
| 0.40 | 0.52 | 0.52 | 0.36 | |
| 0.44 | 0.44 | <0.01 | <0.01 | |
| 0.04 | 0.04 | <0.01 | 0.12 | |
| 0.88 | 0.92 | 0.96 | 0.96 | |
| 0.80 | 0.80 | 0.24 | 0.12 | |
| 0.04 | 0.04 | 0.28 | 0.20 |
Power is shown at significance levels of 0.05.
False positive rates of VSEA at a nominal significance level of 0.05 in step 2
| Gene-based test | |||||
|---|---|---|---|---|---|
| Method | CMC-1 | CMC-count | WeightSum1 | WeightSum2 | |
| Spurious genes present | GSEA | 0.057 | 0.060 | 0.097 | 0.089 |
| VSEA | <0.001 | 0.086 | 0.080 | 0.080 | |
| Spurious genes excluded | GSEA | 0.047 | 0.047 | 0.056 | 0.070 |
| VSEA | 0.061 | 0.060 | 0.070 | 0.070 | |
Figure 1Power of two types of gene set enrichment tests in step 2 Gene set enrichment analysis aggregates the results of gene-based tests for a group of genes. We tested the 13 genes contributing to the Q2 phenotype and used the genes for the Q1 phenotype as a negative reference. Noise was introduced to the Q2 genes by adding 5, 10, 15, and 20 genes. Also, in the last two gene sets part of the true signals was ignored by randomly excluding 5 or 10 risk genes. Power is shown for two types of enrichment tests: GSEA (without adjusting the gene-level test scores) and VSEA (gene scores adjusted). Tests were performed before (dashed lines) and after (solid lines) excluding spurious genes. These gene set tests were based on four gene-level tests (CMC-1, CMC-count, WeightSum1, and WeightSum2).