| Literature DB >> 25352915 |
Geert Vandeweyer1, Lut Van Laer2, Bart Loeys2, Tim Van den Bulcke3, R Frank Kooy4.
Abstract
Interpretation of the multitude of variants obtained from next generation sequencing (NGS) is labor intensive and complex. Web-based interfaces such as Galaxy streamline the generation of variant lists but lack flexibility in the downstream annotation and filtering that are necessary to identify causative variants in medical genomics. To this end, we built VariantDB, a web-based interactive annotation and filtering platform that automatically annotates variants with allele frequencies, functional impact, pathogenicity predictions and pathway information. VariantDB allows filtering by all annotations, under dominant, recessive or de novo inheritance models and is freely available at http://www.biomina.be/app/variantdb/.Entities:
Year: 2014 PMID: 25352915 PMCID: PMC4210545 DOI: 10.1186/s13073-014-0074-6
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1Schematic representation of VariantDB implementation. Depending on the expected platform load, server elements can be hosted either on a single machine (default) or on separate physical hosts. If high performance computing (HPC) infrastructure is available, annotation processes can be distributed. HPO, Human Phenotype Ontology.
Summary of annotations available in VariantDB
|
|
|
|
|---|---|---|
| GATK genotypers | Variant coverage, allelic ratio, genotype, Phred polymorphism, Phred genotype, quality by depth, mapping quality, ranksums, strand bias | [ |
| ANNOVAR | Allele frequencies (1KG/ESP/dbSNP), pathogenicity (dbNSFP, CADD, GERP++), segdups, genes (symbol, exon, location, effect; UCSC/RefGene/Ensembl) | [ |
| SnpEff | Variant effect, effect impact, location, protein change, gene (Ensembl) | [ |
| Web tools | MutationTaster, SIFT, PROVEAN, Grantham | [ |
| Gene Ontology | Associated Gene Ontology IDs, terms, and term types. First level parental terms | [ |
| ClinVar | Link to ClinVar, variant type, pathogenic class, class comment, affected gene and transcript, latest update, associated disease, links to external data sources, publications | [ |
| Gene panels | Affected gene, comments, panel name |
Figure 2Selection of annotations. Top left: sample selection box, using either a dropdown menu, or auto-completion. Top right: when raw data files are available, hyperlinks are presented to download VCF/BAM files or load the files into IGV. Bottom left: all available annotations are listed. Users can select annotations using checkboxes for inclusion into the filtering results. Bottom right: previously saved sets of annotations can be enabled at once by selecting the checkbox and pressing ‘Add Annotations’.
Figure 3Selection of filters. Left: filtering criteria are organized in high-level categories. Filters are added by selecting the relevant filter and settings from dropdown menus. Numeric (for example, quality control values) or textual (for example, Gene Symbol) criteria can be added in text fields where appropriate. Right: previously saved filtering schemes can be enabled at once by selecting the checkbox and pressing ‘Apply Filter’.
Figure 4Graphical representation of the selected filtering scheme. Individual filters can be grouped using logic AND/OR rules. Grouping and ordering is handled using a drag-and-drop interface.
Figure 5Results table. For each of the resulting variants, selected annotations are presented. On top, genomic position (which is also a hyperlink to the position in IGV), and other essential variant information is provided. If relevant, annotations are grouped in sub-tables on affected feature. User-specified information related to validation and classification is presented in a separate box on the left-hand side.
Performance examples of VariantDB
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Exome (77 K variants) |
| 859 | 31 | 8 s | 6 s |
| Exome (78 K variants) | Five quality thresholds, SnpEff high/moderate impact | 1,007 | 110 | 14 s | 8 s |
| Exome (78 K variants) | Nonec | 78,423 | 110 | 12 s | 11 s |
aResults are retrieved from the database, and cached for future use. bResults are retrieved from cache. cNo filters are specified. As only the first 100 variants, ordered by genomic position, are initially presented, runtime is not significantly larger.
Functional comparison of VariantDB with publicly available alternatives
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|
| Citation | [ | [ | [ | [ | [ | [ | [ | [ | [ | |
|
| ||||||||||
| Online | - | - | + | + | + | - | - | + | + | + |
| Collaborative projects | - | - | - | - | + | - | - | + | + | + |
| Inter-sample relationsa | + | + | + | - | + | + | + | + | + | + |
|
| ||||||||||
| RefSeq annotations | + | + | + | + | + | + | + | + | + | + |
| Ensembl annotations | - | + | + | - | - | + | - | - | + | + |
|
| - | - | + | - | - | - | + | - | + | + |
|
| ||||||||||
| Public (ESP, 1KG, dbSNP) | + | + | + | - | + | + | + | + | + | + |
| In-house samplesb | + | - | - | - | - | - | - | - | - | + |
|
| ||||||||||
| dbNSFP(c) | + | - | + | - | + | + | + | - | + | + |
| CADD | - | - | - | - | - | - | - | - | - | + |
| PROVEAN | - | - | - | - | - | - | - | - | - | + |
|
| ||||||||||
| Disease information source | GSEA | - | ClinVar | MIM | - | - | HuGe | - | MIM | ClinVar |
|
| ||||||||||
| Annotation updatesd | A | M | A | . | . | M | M | M | A | A |
| Retrospective updates | - | - | - | - | . | - | - | - | - | + |
| Upstream integratione | - | - | - | - | + | + | - | - | - | + |
| Alignment visualization | - | - | + | - | + | - | - | - | - | + |
aRelations might be either specified at sample level or provided as pedigree files upon runtime. bUser-accessible sample genotypes are used to calculate a private set of MAFs. cBoth full and partial dbNSFP annotations are considered here. dA, automatic; M, manual annotation updates; or not specified (period). eDirect integration with genotyping tools or modules.