| Literature DB >> 29463208 |
Steven Flygare1,2, Edgar Javier Hernandez1,3, Lon Phan4, Barry Moore1,3, Man Li1, Anthony Fejes5, Hao Hu6, Karen Eilbeck3,7, Chad Huff6, Lynn Jorde1,3, Martin G Reese5, Mark Yandell8,9.
Abstract
BACKGROUND: Prioritization of sequence variants for diagnosis and discovery of Mendelian diseases is challenging, especially in large collections of whole genome sequences (WGS). Fast, scalable solutions are needed for discovery research, for clinical applications, and for curation of massive public variant repositories such as dbSNP and gnomAD. In response, we have developed VVP, the VAAST Variant Prioritizer. VVP is ultrafast, scales to even the largest variant repositories and genome collections, and its outputs are designed to simplify clinical interpretation of variants of uncertain significance.Entities:
Keywords: Genomics; Human genome; Variant prioritization; Variants of uncertain significance
Mesh:
Year: 2018 PMID: 29463208 PMCID: PMC5819680 DOI: 10.1186/s12859-018-2056-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1CRD curves normalize raw scores across genes. VVP raw score CRD curves for BRCA2 (purple), and CFTR (black), respectively. Note that a given CFTR raw score achieves a lower percentile score than does the same raw score for BRCA2. Red and green dots correspond to the canonical pathogenic CFTR variant ΔF508 scored as a homozygote and heterozygote, respectively
Runtimes. Seconds required by VVP and CADD to process 100, 1000, and 10,000 variants
| Number of variants | VVP | CADD |
|---|---|---|
| 1000 | 0.1 | 130.9 |
| 10,000 | 0.9 | 1388.5 |
| 100,000 | 8.2 | 12,716.3 |
Fig. 2ROC analyses for ClinVar. a Coding Variants. b Non-coding variants. The points on the curves labeled with circles correspond to score thresholds resulting in each tool’s maximum accuracy. That score is shown beside the circle. Points denoted with squares correspond to the score threshold for SIFT and CADD required to reproduce VVP’s call rate for damaging variants on the NA12878 WGS. See Discussion and Table-3 for details. VVP was run using its default dominant model, whereby every variant is scored as a heterozygote. No data are shown for SIFT in panel B, as it does not score non-coding variants
Fig. 3J curves for ClinVar. a Coding variants. b Non-coding variants. The units on the x-axis are percentile ranks for each tool’s score, i.e. score/max for each tool. Youden’s statistic (J) is plotted for each normalized score on the y-axis. As in Fig. 2, the points labeled with circles on the curves correspond to score thresholds resulting in each tool’s maximal accuracy. Squares denote score threshold to obtain VVP’s call rate on the NA12878. See Table-3 and Discussion for additional details. All tools were run using their recommended command lines. VVP J curves were compiled using percentile scores. No data are shown in b. for SIFT, as it does not score non-coding variants
Clinical Utility. Top panel. Gene-specific clinical utilities for the top ten ClinVar genes ranked by number of submitted variants. Bottom panel. Coding, non-coding and combined clinical utility for all ClinVar variants. Pathogenic thresholds for each tool were determined as in Fig. 3
| Gene | VVP | CADD | SIFT |
|---|---|---|---|
| BRCA2 | 0.971 | 0.893 | 0.004 |
| BRCA1 | 0.971 | 0.876 | 0.003 |
| SCN1A | 0.966 | 0.914 | 0.277 |
| MLH1 | 0.943 | 0.950 | 0.057 |
| MSH2 | 0.984 | 0.973 | 0.050 |
| LDLR | 0.989 | 0.890 | 0.033 |
| DMD | 0.959 | 0.932 | 0.030 |
| ATM | 0.957 | 0.953 | 0.021 |
| FBN1 | 0.974 | 0.935 | 0.233 |
| CFTR | 0.945 | 0.930 | 0.073 |
| Utility (All ClinVar Variants) | |||
| Coding | 0.970 | 0.900 | 0.792 |
| Non-coding | 0.917 | 0.715 | 0.000 |
| Both | 0.947 | 0.818 | 0.134 |
Fig. 5VVP percentile scores for ClinVar CFTR and BRCA2 variants. Violin and box plots are described in Fig. 4. Percentile Scores are shown on the y-axis; benign variants on the left, pathogenic on the right. a CFTR. Pathogenic: 897 variants, mean score: 100. Benign: 466 variants, mean score: 17. b BRCA2. Pathogenic: 249 variants, mean score: 93. Benign: 6 variants, mean score: 34. All scores were generated without using genotype information, i.e. the variant was scored as a heterozygote
Call rates on reference genome NA12878, a healthy individual. Although the number of damaging coding and non-coding variants in a healthy individual’s genome is still unknown, presumably damaging variants comprise a low percentage of the total. Relative percentages are shown in the top panel, absolute numbers are shown in the bottom. Rare variants denotes variants with gnomAD population frequencies < 1/1000
|
|
| |||||
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| |
|
|
|
|
|
|
| All Variants (variants) | Rare Variants (variants) | |||||
| CODING | VVP | CADD | SIFT | VVP | CADD | SIFT |
| Pathogenic | 577 | 1577 | 1883 | 48 | 64 | 50 |
| Benign | 13,710 | 12,710 | 8304 | 156 | 140 | 116 |
| Not Scored | 0.0 | 0.0 | 4100 | 0 | 0.0 | 38 |
| NON-CODING | VVP | CADD | SIFT | VVP | CADD | SIFT |
| Pathogenic | 31,079 | 64,571 | 0 | 3769 | 378 | 0 |
| Benign | 1,825,253 | 1,791,761 | 0 | 4949 | 8340 | 0 |
| Not Scored | 0 | 0 | 1,856,322 | 0 | 0 | 8718 |
Fig. 4Global analysis of dbSNP using VVP. Columns are violin plots wherein the width (x-axis) of the shape represents a rotated kernel density plot. Boxplots lie within the violins with white dots denoting the median VVP score; solid black bars representing the interquartile range (IQR), and the thin black lines corresponding to 1.5 * IQR. The far left-hand (grey) column summarizes the results for the entirety of dbSNP. The remaining columns represent the data by ClinVar category. All variants were scored as heterozygotes (VVP Dominant model). All: entirety of dbSNP (155,062,628 variants, mean score: 60). valid: all variants with valid status in dbSNP (1,402,274 variants, mean score: 35). Pathogenic: all ClinVar pathogenic variants in dbSNP (33,693, mean score: 93). Benign: all ClinVar benign variants in dbSNP (21,443, mean score: 19). Likely Pathogenic: ClinVar variants annotated as likely pathogenic (7587, mean score: 92). Likely Benign: ClinVar variants annotated as likely benign (36,719, mean score: 41). Drug Interaction: dbSNP variants implicated in drug response (230, mean score: 45). Additional file 2: Figure S2 provides plots CADD and SIFT for the pathogenic and benign portions of dbSNP