| Literature DB >> 25309579 |
Carmen Dering1, Inke R König1, Laura B Ramsey2, Mary V Relling2, Wenjian Yang2, Andreas Ziegler3.
Abstract
The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.Entities:
Keywords: SLCO1B1; burden test; collapsing; comparison; rare variants; simulation study
Year: 2014 PMID: 25309579 PMCID: PMC4164031 DOI: 10.3389/fgene.2014.00323
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Properties of 15 collapsing methods: Year of publication, burden (B) or non-burden (NB) test, both common and rare variants used in the analysis [yes (Y)/no (N)], test considers different effect directions (Y/N), test able to handle the presence of non-causal variants (Y/N), phenotype quantitative (Q) and/or binary (B), covariates can be included (Y/N), .
| CAST | 2007 | B | N | N | N | B | N | D |
| CMC | 2008 | B | Y | N | N | B | N | P |
| RVT1 | 2009 | B | N | N | N | B/Q | Y | D |
| RVT2 | 2009 | B | N | N | N | B/Q | Y | D |
| WSS | 2009 | B | N | N | N | B | N | P |
| RC | 2010 | B | N | N | N | B | N | P |
| aSum | 2010 | NB | N | Y | N | B | N | P |
| VT | 2010 | B | N | Y | N | B/Q | Y | P |
| KBAC | 2010 | NB | N | N | Y | B | N | P |
| CMAT | 2010 | B | N | N | N | B | N | P |
| C-α | 2011 | NB | N | Y | Y | B | N | D |
| FPCA | 2011 | B | N | Y | N | B | Y | D |
| PWST | 2011 | NB | N | Y | N | B/Q | N | P |
| SKAT | 2011 | NB | Y | Y | Y | B/Q | Y | D |
| SKAT-O | 2012 | NB | Y | Y | Y | B/Q | Y | D |
aSum, adaptive summation; CAST, cohort allelic sum test; CMAT, cumulative minor-allele test; CMC, combined multivariate cluster; FPCA, functional principal component analysis; KBAC, kernel-based adaptive cluster; PWST, p-value weighted sum test; RC, RARECOVER; RVT, rare variant test 1 and 2; SKAT, sequencing kernel association test; SKAT-O, optimal unified SKAT; VT, variable threshold; WSS, weighted sum statistic.
Figure 1Q-Q plots in 15 collapsing methods, minor allele frequency (MAF) threshold of 0.01, restriction to non-synonymuous variants, phenotype of affection status with no covariates; aSum, adaptive summation; CAST, cohort allelic sum test; CMAT, cumulative minor-allele test; CMC, combined multivariate cluster; FPCA, functional principal component analysis; KBAC, kernel-based adaptive cluster; PWST, . X-axis shows expected −10log transformed p-values from uniform distribution, y-axis shows observed median −10log transformed p-values of 200 replicates surrounded by a ribbon of the first and third quartile of p-values in 200 replicates.
Type I error levels and power for collapsing with minor allele frequency < 0.01 for both non-synonymous and gene-based variants.
| Non-synonymous | aSum | 0.12 | (0.19) | (1.00) |
| C-α | 0.07 | 0.13 | 0.90 | |
| CAST | 0.06 | 0.11 | 0.93 | |
| CMAT | 0.05 | 0.10 | 1.00 | |
| CMC | 0.10 | (0.16) | (0.93) | |
| FPCA | 0.05 | 0.05 | 0.57 | |
| KBAC | 0.03 | 0.06 | 0.77 | |
| PWST | 0.50 | (0.58) | (1.00) | |
| RC | 0.07 | 0.12 | 0.93 | |
| RVT1 | 0.05 | 0.11 | 0.77 | |
| RVT2 | 0.06 | 0.12 | 0.93 | |
| SKAT | 0.08 | (0.12) | (0.93) | |
| SKAT-O | 0.10 | (0.16) | (0.93) | |
| VT | 0.07 | 0.04 | 0.70 | |
| WSS | 0.05 | 0.11 | 0.90 | |
| Gene-based | aSum | 0.12 | (0.15) | (1.00) |
| C-α | 0.06 | 0.10 | 0.91 | |
| CAST | 0.06 | 0.09 | 0.97 | |
| CMAT | 0.12 | (0.16) | (1.00) | |
| CMC | 0.11 | (0.20) | (0.97) | |
| FPCA | 0.05 | 0.07 | 0.82 | |
| KBAC | 0.04 | 0.06 | 0.88 | |
| PWST | 0.65 | (0.85) | (1.00) | |
| RC | 0.08 | (0.13) | (0.94) | |
| RVT1 | 0.06 | 0.10 | 0.91 | |
| RVT2 | 0.05 | 0.11 | 0.85 | |
| SKAT | 0.08 | (0.11) | (0.97) | |
| SKAT-O | 0.10 | (0.14) | (1.00) | |
| VT | 0.06 | 0.04 | 0.85 | |
| WSS | 0.12 | (0.17) | (0.94) |
Type I error levels and average (avg) power were averaged over 200 replicates. Minimal (min) power is the proportion of replicates for which at least one associated region of interest was detected. Power is given in parenthesis if the type I error was inflated.
aSum, adaptive summation; CAST, cohort allelic sum test; CMAT, cumulative minor-allele test; CMC, combined multivariate cluster; FPCA, functional principal component analysis; KBAC, kernel-based adaptive cluster; PWST, p-value weighted sum test; RC, RARECOVER; RVT, rare variant test 1 and 2; SKAT, sequencing kernel association test; SKAT-O, optimal unified SKAT; VT, variable threshold; WSS, weighted sum statistic.
.
| C-α | 1.13 · 10−21 | 1.63 · 10−05 | 9.93 · 10−12 |
| FPCA | 6.75 · 10−06 | 1.26 · 10−06 | 3.61 · 10−01 |
| KBAC | 2.89 · 10−02 | 4.46 · 10−01 | 3.44 · 10−01 |
| VT | 2.37 · 10−04 | 1.11 · 10−01 | 1.27 · 10−01 |
| --------------------------------------------------------------- | |||
| CAST | 5.79 · 10−02 | 7.91 · 10−01 | 6.02 · 10−01 |
| RVT2 | 2.00 · 10−03 | 3.80 · 10−01 | 3.71 · 10−01 |
| RVT1 | 1.30 · 10−03 | 7.79 · 10−01 | 5.36 · 10−01 |
| CMAT | <1.0 · 10−09 | 8.74 · 10−01 | 6.69 · 10−01 |
| RC | 6.30 · 10−07 | 7.84 · 10−06 | 2.91 · 10−06 |
| WSS | <1.0 · 10−09 | 6.13 · 10−01 | 1.97 · 10−01 |
| aSum | 5.30 · 10−08 | 1.14 · 10−01 | 1.02 · 10−02 |
| CMC | 1.76 · 10−07 | 3.77 · 10−06 | 1.03 · 10−02 |
| PWST | 1.36 · 10−03 | 1.18 · 10−04 | 1.95 · 10−07 |
| SKAT | 2.92 · 10−02 | 1.30 · 10−02 | 2.36 · 10−01 |
| SKAT-O | 2.38 · 10−03 | 5.77 · 10−02 | 4.04 · 10−01 |
Rare variants with minor allele frequency <0.05 were collapsed by functionality damaging, non-synonymous or gene-based. The number of permutations for permutation-based tests was 109. Only the first four reported methods (separated by a dashed line) did not show inflated type I error levels in any of the scenarios from the simulation study and were therefore considered to be valid.
aSum, adaptive summation; CAST, cohort allelic sum test; CMAT, cumulative minor-allele test; CMC, combined multivariate cluster; FPCA, functional principal component analysis; KBAC, kernel-based adaptive cluster; PWST, p-value weighted sum test; RC, RARECOVER; RVT, rare variant test 1 and 2; SKAT, sequencing kernel association test; SKAT-O, optimal unified SKAT; VT, variable threshold; WSS, weighted sum statistic.