| Literature DB >> 27980661 |
Alessandra Valcarcel1, Kelsey Grinde2, Kaitlyn Cook3, Alden Green4, Nathan Tintle5.
Abstract
The aggregation of functionally associated variants given a priori biological information can aid in the discovery of rare variants associated with complex diseases. Many methods exist that aggregate rare variants into a set and compute a single p value summarizing association between the set of rare variants and a phenotype of interest. These methods are often called gene-based, rare variant tests of association because the variants in the set are often all contained within the same gene. A reasonable extension of these approaches involves aggregating variants across an even larger set of variants (eg, all variants contained in genes within a pathway). Testing sets of variants such as pathways for association with a disease phenotype reduces multiple testing penalties, may increase power, and allows for straightforward biological interpretation. However, a significant variant-set association test does not indicate precisely which variants contained within that set are causal. Because pathways often contain many variants, it may be helpful to follow-up significant pathway tests by conducting gene-based tests on each gene in that pathway to narrow in on the region of causal variants. In this paper, we propose such a multistep approach for variant-set analysis that can also account for covariates and complex pedigree structure. We demonstrate this approach on simulated phenotypes from Genetic Analysis Workshop 19. We find generally better power for the multistep approach when compared to a more conventional, single-step approach that simply runs gene-based tests of association on each gene across the genome. Further work is necessary to evaluate the multistep approach on different data sets with different characteristics.Entities:
Year: 2016 PMID: 27980661 PMCID: PMC5133510 DOI: 10.1186/s12919-016-0055-4
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Settings for creation of synthetic pathways
| Number of genes | 5 | 10 | Total | ||||||
| Percent Causal | 20 | 40 | 60 | 100 | 20 | 40 | 60 | 100 | |
| Number of pathways | 15 | 11 | 10 | 5 | 15 | 11 | 10 | 5 | 82 |
Genes were randomly sampled without replacement and placed into sets of genes, or pathways, of varying sizes. The percent of genes in the pathway containing at least 1 causal variant also varied across sets
Fig. 1Comparison of power for pathway and gene-based tests. a. The power of VC for each gene is shown, as well as the power of VC for the corresponding pathway that contains that gene. Blue dots (above the line y = x) represent genes for which the pathway power is higher than the gene power. b. This is the same setup as A, except it shows the power of Burden for each gene compared to the power of Burden for its corresponding pathway. In general, pathway tests are more powerful than gene-based tests
Fig. 2Comparison of power for multistep and conventional approaches. a. Comparison of power at each gene for multistep VC -VC versus the single-step VC approach, with significance evaluated at alpha levels of 0.05 and 0.005, respectively. The red points are genes for which the single-step approach has higher power (below the line y = x) and the blue points are instances where our multistep approach has higher power (above the line y = x). There are quite a few instances when the multistep approach offers relatively large improvements in power compared to the single-step gene-based test. b. Comparison of power for Burden - VC versus VC . c. Comparison of power for VC -Burden vs. Burden . d. Comparison of power for Burden -Burden versus Burden . Notice again that there are many instances where the multistep approach offers improvements in power over the single-step gene-based test. Overall we see that our method outperforms the single-step tests by a larger amount, on average, when the first and second step are consistent (VC -VC and Burden Burden )
Comparison of power for genes showing at least a 10 percentage point increase in power for multistep pathway analysis
| Gene name | Pathway description | Single-step power | Multi-step power | |||||
|---|---|---|---|---|---|---|---|---|
| Number of genes | Percent Causal | VCtest |
|
|
| VCtest-VCtest | VCtest- | |
|
| 5 | 60 % | 0.010 |
|
| 0.015 | 0.010 | 0.005 |
|
| 5 | 60 % | 0.000 |
|
| 0.035 | 0.035 | 0.040 |
|
| 10 | 60 % | 0.000 |
|
| 0.050 | 0.020 | 0.005 |
|
| 10 | 60 % |
| 0.000 | 0.000 | 0.000 |
| 0.000 |
|
| 10 | 60 % |
| 0.000 | 0.000 | 0.000 |
| 0.005 |
|
| 10 | 60 % |
| 0.010 | 0.105 |
| 0.010 | 0.000 |
|
| 5 | 100 % |
| 0.010 | 0.055 |
| 0.085 | 0.005 |
Power for genes, as well as characteristics of the pathway they are contained in. Bolded entries represent the largest power achieved by each of the methods (single-step and multistep). In each of these scenarios, the multistep approach offers anywhere from a 10 to 40 % increase in power over the conventional single-step approaches