| Literature DB >> 28742119 |
Elodie Persyn1, Matilde Karakachoff1,2, Solena Le Scouarnec1, Camille Le Clézio3, Dominique Campion3, French Exome Consortium1, Jean-Jacques Schott1,2, Richard Redon1,2, Lise Bellanger4, Christian Dina1,2.
Abstract
Next-generation sequencing technologies made it possible to assay the effect of rare variants on complex diseases. As an extension of the "common disease-common variant" paradigm, rare variant studies are necessary to get a more complete insight into the genetic architecture of human traits. Association studies of these rare variations show new challenges in terms of statistical analysis. Due to their low frequency, rare variants must be tested by groups. This approach is then hindered by the fact that an unknown proportion of the variants could be neutral. The risk level of a rare variation may be determined by its impact but also by its position in the protein sequence. More generally, the molecular mechanisms underlying the disease architecture may involve specific protein domains or inter-genic regulatory regions. While a large variety of methods are optimizing functionality weights for each single marker, few evaluate variant position differences between cases and controls. Here, we propose a test called DoEstRare, which aims to simultaneously detect clusters of disease risk variants and global allele frequency differences in genomic regions. This test estimates, for cases and controls, variant position densities in the genetic region by a kernel method, weighted by a function of allele frequencies. We compared DoEstRare with previously published strategies through simulation studies as well as re-analysis of real datasets. Based on simulation under various scenarios, DoEstRare was the sole to consistently show highest performance, in terms of type I error and power both when variants were clustered or not. DoEstRare was also applied to Brugada syndrome and early-onset Alzheimer's disease data and provided complementary results to other existing tests. DoEstRare, by integrating variant position information, gives new opportunities to explain disease susceptibility. DoEstRare is implemented in a user-friendly R package.Entities:
Mesh:
Year: 2017 PMID: 28742119 PMCID: PMC5524342 DOI: 10.1371/journal.pone.0179364
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 4Power results at nominal level α = 5% based on 1,000 replicates.
P5, P10, P15 and P20 correspond to 5%, 10%, 15% and 20% of DRVs in the gene. DRVs: disease-risk variants.
Rare variant association tests under comparison.
| Positions | Category | Description of the strategy | Methods |
|---|---|---|---|
| No | Burden tests | Computation of a genetic score per individual corresponding to a binary variable. | CAST[ |
| Computation of a genetic score per individual corresponding to a weighted sum of genotypes. | WSS[ | ||
| Variance-component tests | Test the variance of genetic effects. | C-alpha[ | |
| P-value combination tests | Combination of p-values from single-marker tests. | ADA[ | |
| Multi-genotype pattern | Analysis of multi-locus genotypes. | KBAC[ | |
| Yes | Sliding-window tests | A statistic is computed by genetic sliding window. | BOMP[ |
| Kernel matrix tests | A kernel matrix is used in the statistic to take into account physical distance between variants. | CLUSTER[ | |
| Test on inter-marker distances | Physical distances between rare variants are computed. Weighted distance distribution functions are compared between cases and controls. | DBM[ | |
| Rare variant density test | Comparison of rare variant position distributions and average allele frequencies on the gene, between cases and controls. | DoEstRare |
Abbreviations: ADA, adaptive combination of P-values for rare variant association testing; aSum, data-adaptive sum test; BOMP, burden or mutation position; CAST, cohort allelic sum test; CLUSTER, test from Lin (2014); DBM, distance-based measure; KBAC, kernel-based adaptive cluster; KERNEL, test from Schaid et al. (2013); PODKAT, position-dependent kernel association test; SKAT, sequence kernel association test; VT, variable threshold; WSS, weighted sum statistic.
User CPU times for the different methods.
| Test | Permutations/Bootstrap | Average time per gene (sec) | Total time (1000 genes) |
|---|---|---|---|
| CAST | No | 0.013 | 0h 0min 13sec |
| WSS | Yes | 26.221 | 7h 17min 1sec |
| VT | Yes | 111.678 | 31h 1min 18sec |
| aSum | Yes | 10.582 | 2h 56min 22sec |
| CALPHA | Yes | 3.598 | 0h 59min 58sec |
| SKAT | No | 0.091 | 0h 1min 31sec |
| SKAT | Yes (bootstrap) | 1.326 | 0h 22min 6sec |
| SKAT-O | No | 1.051 | 0h 17min 31sec |
| SKATO | Yes (bootstrap) | 160.124 | 44h 28min 44sec |
| KBAC | Yes | 0.187 | 0h 3min 7sec |
| ADA | Yes | 25.318 | 7h 1min 58sec |
| DBM | Yes | 7.933 | 2h 12min 13sec |
| CLUSTER | Yes | 27.095 | 7h 31min 35sec |
| KERNEL | Yes | 6.450 | 1h 47min 30sec |
| PODKAT | No | 0.108 | 0h 1min 48sec |
| PODKAT | Yes (bootstrap) | 1.459 | 0h 24min 19sec |
| BOMP | Yes | 4.757 | 1h 19min 17sec |
| DoEstRare | Yes (standard) | 22.617 | 6h 16min 57sec |
| DoEstRare | Yes (adaptive) | 12.916 | 3h 35min 16sec |