| Literature DB >> 36112662 |
Ozvan Bocher1,2, Thomas E Ludwig1,3, Marie-Sophie Oglobinsky1, Gaëlle Marenne1, Jean-François Deleuze4, Suryakant Suryakant5, Jacob Odeberg6,7, Pierre-Emmanuel Morange8, David-Alexandre Trégouët5, Hervé Perdry9, Emmanuelle Génin1,3.
Abstract
Rare variant association tests (RVAT) have been developed to study the contribution of rare variants widely accessible through high-throughput sequencing technologies. RVAT require to aggregate rare variants in testing units and to filter variants to retain only the most likely causal ones. In the exome, genes are natural testing units and variants are usually filtered based on their functional consequences. However, when dealing with whole-genome sequence (WGS) data, both steps are challenging. No natural biological unit is available for aggregating rare variants. Sliding windows procedures have been proposed to circumvent this difficulty, however they are blind to biological information and result in a large number of tests. We propose a new strategy to perform RVAT on WGS data: "RAVA-FIRST" (RAre Variant Association using Functionally-InfoRmed STeps) comprising three steps. (1) New testing units are defined genome-wide based on functionally-adjusted Combined Annotation Dependent Depletion (CADD) scores of variants observed in the gnomAD populations, which are referred to as "CADD regions". (2) A region-dependent filtering of rare variants is applied in each CADD region. (3) A functionally-informed burden test is performed with sub-scores computed for each genomic category within each CADD region. Both on simulations and real data, RAVA-FIRST was found to outperform other WGS-based RVAT. Applied to a WGS dataset of venous thromboembolism patients, we identified an intergenic region on chromosome 18 enriched for rare variants in early-onset patients. This region that was missed by standard sliding windows procedures is included in a TAD region that contains a strong candidate gene. RAVA-FIRST enables new investigations of rare non-coding variants in complex diseases, facilitated by its implementation in the R package Ravages.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36112662 PMCID: PMC9518893 DOI: 10.1371/journal.pgen.1009923
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 6.020
Fig 1Steps performed in RAVA-FIRST: definition of ACS, CADD regions, region-specific thresholds and functionally-informed burden tests.
Summary statistics of the lengths of CADD regions.
| Quantiles | Mean | |||||
|---|---|---|---|---|---|---|
| 0% | 25% | 50% | 75% | 100% | ||
| Length (kb) | 0.002 | 2.576 | 13.006 | 24.323 | 1,731.228 | 19.852 |
Fig 2TPR, TNR and precision of different filtering strategies on the Clinvar coding or non-coding variants pathogenic variants compared to rare 1000Genome polymorphisms.
Scenarios of association simulated to assess the performance of the RAVA-FIRST burden test.
| R019233 | R019234 | |||
|---|---|---|---|---|
| Coding | Regulatory | Coding | Regulatory | |
| S1 | 50% | |||
| S2A | 50% | 0% | ||
| S2B | 0% | 50% | ||
| S3 | 50% | |||
| S4A | 50% | 0% | 50% | 0% |
| S4B | 0% | 50% | 0% | 50% |
Power at the genome-wide significance level of 2.5∙10−6 under the different simulation scenarios using either the classical WSS or the RAVA-FIRST WSS at the scale of either the entire gene or CADD regions.
| By gene | By CADD regions | |||
|---|---|---|---|---|
| Classical WSS | RAVA-FIRST WSS | Classical WSS | RAVA-FIRST WSS | |
| S1 | 0.409 | 0.370 | 0.782 | 0.701 |
| S2A | 0 | 0.431 | 0.002 | 0.602 |
| S2B | 0.408 | 0.404 | 0.689 | 0.706 |
| S3 | 0.751 | 0.678 | 0.512 | 0.433 |
| S4A | 0.004 | 0.564 | 0.012 | 0.474 |
| S4B | 0.657 | 0.64 | 0.39 | 0.391 |
Number of testing units and variants kept under the three strategies.
| Testing units | Filtering | Number of testing units | Number of variants | |
|---|---|---|---|---|
| WGScan | Sliding windows | MAF ≤ 1% | 377,092 | 96,347 |
| RAVA-FIRST units (CADD regions) | CADD regions | MAF ≤ 1% | 103,439 | 9,423,012 |
| RAVA-FIRST units (CADD regions) | MAF ≤ 1% | 10,389 | 96,294 | |
| RAVA-FIRST units (CADD regions) | MAF ≤ 1% | 95,220 | 3,494,327 |
Fig 3QQ-plot of WSS analyses on VTE data using the four strategies of analysis.
Early-onset patients (<50 years old) were compared to late-onset patients (≥50 years old).
Fig 4WSS scores in the CADD region depending on the age at first VTE event.
The dashed line corresponds to the age 50 discriminating early onset from late onset events.