| Literature DB >> 31765830 |
Eddie Ip1, Gavin Chapman1, David Winlaw2, Sally L Dunwoodie3, Eleni Giannoulatou4.
Abstract
Next-generation sequencing (NGS) technologies generate thousands to millions of genetic variants per sample. Identification of potential disease-causal variants is labor intensive as it relies on filtering using various annotation metrics and consideration of multiple pathogenicity prediction scores. We have developed VPOT (variant prioritization ordering tool), a python-based command line tool that allows researchers to create a single fully customizable pathogenicity ranking score from any number of annotation values, each with a user-defined weighting. The use of VPOT can be informative when analyzing entire cohorts, as variants in a cohort can be prioritized. VPOT also provides additional functions to allow variant filtering based on a candidate gene list or by affected status in a family pedigree. VPOT outperforms similar tools in terms of efficacy, flexibility, scalability, and computational performance. VPOT is freely available for public use at GitHub (https://github.com/VCCRI/VPOT/). Documentation for installation along with a user tutorial, a default parameter file, and test data are provided.Entities:
Keywords: Customizable ranking; Genomic annotation; Next-generation sequencing; Pathogenicity predictions; Variant prioritization
Mesh:
Substances:
Year: 2019 PMID: 31765830 PMCID: PMC7056850 DOI: 10.1016/j.gpb.2019.11.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Variant prioritization ordering tool (VPOT) workflow
A. Step 1: prioritization of variants. VPOT is run with annotated VCFs or TSV files and a PPF to create the VPOL. B. Step 2: post-processing of the variant priority ordered list (VPOL). The VPOL can be filtered based on user needs such as against a gene list for candidate gene/variants selection (genef). The VPOL can be filtered for case-control variants reporting or for inheritance models (DN/AD/AR/CH) when applied against a family trio. A pedigree format file is required and the choice of the samples filtering option (samplef). A quick variant report can be produced from the VPOL (stats), and multiple VPOL files can be combined to produce a single VPOL to allow for large cross cohort evaluation across samples (merge). VPOT, variant prioritization ordering tool; VCF, variant call format; TSV, tab-separated values; PPF, prioritization parameter file; VPOL, variant priority ordered list; DN, de novo; AD, autosomal dominant; AR, autosomal recessive; CH, compound heterozygous.
Figure 2.
Family B from [17] is a consanguineous family, with proband sample B.1 having CHD and other extra-cardiac phenotypes and all other siblings being unaffected. Samples within the shaded region of the pedigree have undergone whole genome sequencing. CHD, congenital heart disease.
Top ten variants for family B following autosomal recessive inheritance model filtering (Samplef – AR).
| 67 | c.558G > A | Stop-gain | 4.07E−06 | D | Adc | NA | 39 | NA | 5.26 | |
| 57 | c.1621_1622insAAAAA | FS-I | NA | NA | NA | NA | NA | NA | NA | |
| 30 | c.916G > A | NS-SNV | 5.14E−05 | D | Dc | Dp | 28.3 | Dm | 4.69 | |
| 27 | c.419C > T | NS-SNV | 0.000384 | D | Dc | Dp | 32 | T | 4.02 | |
| 24 | c.457A > G | NS-SNV | 0.000134 | D | Dc | P | 21.9 | T | 4.16 | |
| 23 | c.625C > A | NS-SNV | NA | D | Dc | Dp | 28.4 | T | 3.76 | |
| 23 | c.184C > T | NS-SNV | 9.02E−05 | N | Dc | P | 26.4 | Dm | 4.66 | |
| 21 | c.2186 T > A | NS-SNV | 6.23E−05 | N | Dc | Dp | 18.1 | T | 4.69 | |
| 18 | c.2074C > A | NS-SNV | NA | D | Dc | P | 20.3 | T | 4.42 | |
| 18 | c.685C > G | NS-SNV | NA | D | Dc | P | 25.5 | T | 3.57 |
Note: Detail of top ten variants for Family B [17]. VPOT prioritization was performed using the default PPF supplied within GitHub (https://github.com/VCCRI/VPOT/). LRT values – D (deleterious, when LRT value = 0.000), N (neutral). MutationTaster2 values – Adc (disease-causing automatic, when probability value from Bayes classifier used is >0.5 and variant is marked as probable-pathogenic or pathogenic in ClinVar), Dc (disease-causing, when probability value from Bayes classifier used is >0.5). PolyPhen-2 HVAR values – Dp (probably damaging, when naïve Bayes posterior probability of damaging’s estimate of false positive rate is ≤10%), P (possibly damaging, when naïve Bayes posterior probability of damaging’s estimate of false positive rate is ≤20%). MetaSVM values – Dm (deleterious, when value >0), T (tolerated). See Table S1 for full scoring details with all predictors’ values. FS-I, Frameshift-insertion; NS-SNV, non-synonymous single nucleotide variant; NA, not applicable; gnomAD, genome aggregation database; LRT, likelihood ratio test; CADD, combined annotation dependent depletion; GERP, genomic evolutionary rate profiling.
Feature comparison of VPOT with similar variant prioritization tools.
| Process location | Local | Local | Web |
| Input format | VCF (gz)/TXT (multiple files) | VCF (gz) (multiple files) | VCF/TXT (single file) |
| File size limit | No limit | No limit | 500 MB |
| Annotation | ANNOVAR (freeware), performed by user prior to using tool | Alamut (commercial tool)/SnpEff (freeware), performed by tool | ANNOVAR (freeware), performed by tool |
| Reference genome | No restriction | No restriction | Hg19 |
| Annotation resources that can be applied to VCF | User-defined | User-defined | Defined by tool |
| Pathogenicity prediction tools supported | Based on user-defined annotations (no limit) | phastCons, SIFT, PolyPhen-2 | PolyPhen-2, SIFT, LRT, MutationTaster, MutationAssessor, RadialSVM, FATHMM |
| Disease/inheritance model | DN/AD/AR/CH | DN/AD/AR/CH | AD/AR/XR |
| Quality control check | Total coverage depth, allele balance | NA | Total coverage depth, variant allele coverage depth, allele balance |
| Score weighting range | User-defined | User-defined | 0–1 |
| Number of scoring intervals for each annotation category | User-defined | NA | Defined by tool |
| Output format | TXT – Local | TSV – Local | TXT – Web |
Note: DN, De novo; AD, autosomal dominant; AR, autosomal recessive; CH, compound heterozygous; XR, X-linked-recessive.
Figure 3Comparison of computational performance of VPOT with similar variant prioritization tools.
Prioritization computational time measurements for VPOT, VaRank, Variant Ranker against number of variants. Processing time limitation (48 h) was exceeded by VaRank when attempting ≥2 million variants. File size limitation exceeded for Variant Ranker when attempting >2 million variants. More information on the settings and parameters used is provided in Table S2.