Literature DB >> 22618869

VarioWatch: providing large-scale and comprehensive annotations on human genomic variants in the next generation sequencing era.

Yu-Chang Cheng¹, Fang-Chih Hsiao, Erh-Chan Yeh, Wan-Jia Lin, Cheng-Yang Louis Tang, Huan-Chin Tseng, Hsing-Tsung Wu, Chuan-Kun Liu, Chih-Cheng Chen, Yuan-Tsong Chen, Adam Yao.

Abstract

VarioWatch (http://genepipe.ncgm.sinica.edu.tw/variowatch/) has been vastly improved since its former publication GenoWatch in the 2008 Web Server Issue. It is now at least 10 000-times faster in annotating a variant. Drastic speed increase, through complete re-design of its working mechanism, makes VarioWatch capable of annotating millions of human genomic variants generated from next generation sequencing in minutes, if not seconds. While using MegaQuery of VarioWatch to quickly annotate variants, users can apply various filters to retrieve a subgroup of variants according to the risk levels, interested regions, etc. that satisfy users' requirements. In addition to performance leap, many new features have also been added, such as annotation on novel variants, functional analyses on splice sites and in/dels, detailed variant information in tabulated form, plus a risk level decision tree regarding the analyzed variant. Up to 1000 target variants can be visualized with our carefully designed Genome View, Gene View, Transcript View and Variation View. Two commonly used reference versions, NCBI build 36.3 and NCBI build 37.2, are supported. VarioWatch is unique in its ability to annotate comprehensively and efficiently millions of variants online, immediately delivering the results in real time, plus visualizes up to 1000 annotated variants.

Entities: Gene Species

Mesh：

Year: 2012 PMID： 22618869 PMCID： PMC3394242 DOI： 10.1093/nar/gks397

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Over the past few years, the throughput of the next generation sequencing (NGS) technologies have been exponentially increased to a massive scale, greatly changing the face of genomic research and making post-sequencing data analysis tremendously difficult. This technology improvement calls for powerful and handy bioinformatics tools that can process with high performance the NGS data, such as genomic variants, as well as satisfy analysis features to facilitate research. Many genomic variants annotation online tools published (1–4) or not published like SeattleSeq Annotation (http://snp.gs.washington.edu/SeattleSeqAnnotation134/) and offline tools (5–8) are available, but VarioWatch is unique in its ability to annotate comprehensively and efficiently millions of variants online, immediately delivering the results in real time, plus visualizes up to 1000 annotated variants. Based on GenoWatch (9), serving since 2006 and published in the 2008 Web Server issue, VarioWatch was developed with the aim to offer the research community extremely efficient online annotation service of human genomic variants in the NGS era. VarioWatch has two major improvements. One is speed and the other is comprehensiveness. Regarding speed, the superseded GenoWatch relied on web robots to retrieve data from many public domain websites, such as NCBI (10–12), UniProt (13), KEGG (14) and GO (15), to annotate bulks of variants. It always provided the up-to-date annotations, and this strategy was sufficient before NGS prevailed. Due to slow responses from the source websites, GenoWatch failed to cope with massive online annotation. To solve the problem, we changed our approach by replacing the idea of always providing the most up-to-date information from the Internet with the idea of providing information from frequently updated local databases. By constructing local databases, we increased the annotating speed to at least 10 000-times faster and kept data integrity better by completely avoiding source information retrieval through internet connection and the instability of external web sites. Now that the system is re-structured, re-programmed and fine-tuned, millions of variants can be analyzed and downloaded in minutes, if not seconds, in CSV format with MegaQuery, and up to 1000 variants can be easily visualized and browsed. In addition, we provided filters in MegaQuery to help users narrow down candidate variants and expedite their research. On top of speed increase, VarioWatch also offers more comprehensive analysis. In contrast to GenoWatch annotating only known SNPs, VarioWatch analyzes both known SNPs and novel variants. By incorporating features similar to FANS (16), VarioWatch investigates a novel variant with its genomic context, analyzes the functional effect if it is located in a protein coding region or in a GT-AG splice site, presents information of genes nearby, checks affection to ESE and ESS hexamers pattern [from Rescue-ESE (17) and Fas-ESS (18)] if the variant is in an exon, and predicts risk of the variant based on the above-mentioned information. If the variant is reported in dbSNP or 1000 Genomes Project (19), related details will be listed as well. Creating an annotation database for VarioWatch not only improves the system performance, but also enables VarioWatch to serve more than one reference version at the same time. VarioWatch currently provides annotations of two popular human genome reference versions (NCBI build 36.3, NCBI build 37.2), including gene annotation, pre-computed variation risks, known variants from dbSNP, 1000 Genomes Project (released on October 2011), OMIM (20) and other minor variant databases (see Supplementary Data).

INPUT

Users can easily query and visualize up to 1000 regions by chromosome positions, markers, gene symbols, a batch file input, etc. (Figure 1A). For instance, they can use a physical position, a single marker (e.g. SNP), plus downstream and upstream spans, to define a chromosome region like in GenoWatch. VarioWatch also supports sequence upload, finding all variants on the uploaded sequence by BLAT (21) and then annotating them automatically. By incorporating human variation data sets, such as OMIM, VarioWatch allows a disease name query. It first translates the input disease name into a group of relevant genes then shows all annotations of these genes as well as variants within.

Figure 1.

Input pages for normal query and MegaQuery. (A) An example to retrieve and visualize genomic annotations on gene APOE plus 5000 bases upstream and downstream. (B) MegaQuery Download is capable of taking a massive amount of variants as input, labeling them with genomic annotations, filtering out unwanted records and returning with purified annotation results. Furthermore, VarioWatch has a special unit called MegaQuery (Figure 1B) dedicated to annotating millions of variants generated by NGS. MegaQuery currently supports batch queries for both single nucleotide substitution and in/del variants. Users can upload a file containing a list of variants. Examples are provided for different input types, respectively. Result files, e.g. snp.txt or indel.txt from Illumina CASAVA variant detection outcome or VCF format, can also be directly uploaded through MegaQuery to process. Often, instead of examining all the variants identified by NGS, researchers only want to examine those which satisfy their research needs. Before, upon receiving variant annotation data, they either looked for further help from an IT specialist or turned to a computer-based spreadsheet, doing tedious work to achieve this goal. To address this issue, MegaQuery provides four handy filters to help researchers listing variants with functional impacts, with predicted risk above a certain threshold, in specific gene region or variants not reported in either dbSNP or 1000 Genome Project.

OUTPUT

The results page is comprised of Genome View, Gene View, Transcript View and Variation View. Genome View and Gene View are generally inherited from GenoWatch. Genome View (Figure 2A) displays an overview of input markers plus their nearby genes. If a marker is a variant with risky functional impact, it is coloured according to the risk level. Clicking on a marker leads to Gene View (Figure 2B), showing structured genes and their corresponding annotations, including gene functions, tissue-specificity, diseases and so on. Instead of showing only SNP annotations like in GenoWatch, VarioWatch also lists disease-associated mutations and reveals the relation between query variants and these known mutations in this view. Transcript View (Figure 2C) presents transcript structure, the functional impacts of the same variant on different transcript isoforms and the distribution of known variants within. Variation View (Figure 2D) discloses the annotation details of a variant. It comprises three areas. The top area tabulates detailed variant information including its location, allele change, gene ID and gene symbol if the variant sits in a gene, cDNA change if the variant causes transcript change, protein and codon change if the variant falls in a translated region, estimated risk level, SNP information if the variant is a SNP, related disease and literature reference. The middle area graphs a risk-level decision tree and a highlighted path to show how the risk level of the variant is decided. Users can click on the path steps to obtain detailed reasons and references to data sources. What’s more, at the upper right corner of the area are links for users to download the variant-containing sequence and design primers for that variant. Finally, at the bottom area, information of population diversity extracted from 1000 Genomes project and HapMap (22) is clearly presented. All views can be exported to a text file for further analysis.

Figure 2.

Example output pages for visualized annotation result. (A) Genome View provides a bird's-eye-view of the query result on the genome scale. It shows the distribution of the query items on the whole genome, and colours each item according to the risk level analyzed based on the annotation results. (B) Gene View displays each query item in the context of genes and mutations known to cause diseases. In addition to providing a diagram representation of gene structures, including introns and exons, it also annotates each gene within the view-port with known functions, tissue specificity, ontology, pathway involved and disease caused. Disease-relevant mutations are also revealed. This view was designed with the aim to expedite gene-relevant literature searching. (C) Transcript View displays a query item in the transcript context. Since one variant may have different effects on different transcript isoforms, this view provides a precise genomic context in which the query item is analyzed. Transcript View also depicts known SNPs within the specified transcript along with disease-relevant mutations. (D) Variation View shows the annotation details of a query item, the decision tree of risk evaluation, and the relevant allele frequencies in different human races.

Figure 3.

MegaQuery Download responds a query with one zip file containing three different reports: SNV/Indel Variation Annotation, 1000 Genome Allele Frequency and Gene Annotation. SNV variation annotation provides a text-based annotation and risk analysis result of each query item in CSV format, while the other two auxiliary reports provide relevant allele frequencies and the information of containing genes.

IMPLEMENTATION

VarioWatch is written in Java programming language with Struts framework and JDBC technology. To further improve user experience, JavaScript is used for rendering the interactive input and output page. This makes it easier for users to define a genomic region in query page and to browse the classified result page. For VarioWatch database construction, we built a script that mirrors all needed source data files from public domain FTP sites. Once each data source is verified to be consistent with their reference version, a pipe-line system will be involved to process these data into databases. In addition, a simple computer cluster system is built for hosting SIFT non-synonymous variants prediction tool (23). Combining these pre-computed and stored results, each variant generated from all possible substitution bases in coding regions and GT-AG splice sites is given a functional risk level and type.

CONCLUSION

VarioWatch provides an easy way for researchers to directly and quickly annotate a large number of human genomic variants online without having to run an offline annotating application or needing help from an IT specialist. The annotation is comprehensive. The input interface is intuitive and the returning outcome is displayed in a carefully designed results page. Its reliability, availability and serviceability are much better than GenoWatch because of database localization. VarioWatch should be able to help researchers facilitate their work substantially in variant annotation and prioritization in the NGS era.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Table 1 and Supplementary References [24-26].

FUNDING

Academia Sinica Life Sciences [40-05-GMM]; National Science Council, Taiwan, R.O.C. [NSC100-2319-B-001-001]; National Center for Genomic Medicine. Funding for open access charge: Academia Sinica Life Sciences [40-05-GMM]. Conflict of interest statement. None declared.

26 in total

1. dbSNP: the NCBI database of genetic variation.

Authors: S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal: Nucleic Acids Res Date: 2001-01-01 Impact factor: 16.971

2. The KEGG resource for deciphering the genome.

Authors: Minoru Kanehisa; Susumu Goto; Shuichi Kawashima; Yasushi Okuno; Masahiro Hattori
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. SIFT: Predicting amino acid changes that affect protein function.

Authors: Pauline C Ng; Steven Henikoff
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

4. SNPper: retrieval and analysis of human SNPs.

Authors: A Riva; I S Kohane
Journal: Bioinformatics Date: 2002-12 Impact factor: 6.937

5. RESCUE-ESE identifies candidate exonic splicing enhancers in vertebrate exons.

Authors: William G Fairbrother; Gene W Yeo; Rufang Yeh; Paul Goldstein; Matthew Mawson; Phillip A Sharp; Christopher B Burge
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

6. pfSNP: An integrated potentially functional SNP resource that facilitates hypotheses generation through knowledge syntheses.

Authors: Jingbo Wang; Mostafa Ronaghi; Samuel S Chong; Caroline G L Lee
Journal: Hum Mutat Date: 2011-01 Impact factor: 4.878

7. CandiSNPer: a web tool for the identification of candidate SNPs for causal variants.

Authors: Armin O Schmitt; Jens Assmus; Ralf H Bortfeldt; Gudrun A Brockmann
Journal: Bioinformatics Date: 2010-02-19 Impact factor: 6.937

8. A new face and new challenges for Online Mendelian Inheritance in Man (OMIM®).

Authors: Joanna Amberger; Carol Bocchini; Ada Hamosh
Journal: Hum Mutat Date: 2011-04-05 Impact factor: 4.878

9. A map of human genome variation from population-scale sequencing.

Authors: Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal: Nature Date: 2010-10-28 Impact factor: 49.962

10. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data.

Authors: Kai Wang; Mingyao Li; Hakon Hakonarson
Journal: Nucleic Acids Res Date: 2010-07-03 Impact factor: 16.971

18 in total

1. CRY1, CRY2 and PRKCDBP genetic variants in metabolic syndrome.

Authors: Leena Kovanen; Kati Donner; Mari Kaunisto; Timo Partonen
Journal: Hypertens Res Date: 2014-11-13 Impact factor: 3.872

2. Association between IL7RA polymorphisms and the successful therapy against HCV in HIV/HCV-coinfected patients.

Authors: M Guzmán-Fulgencio; J Berenguer; D Pineda-Tenor; M A Jiménez-Sousa; M García-Álvarez; T Aldámiz-Echevarria; A Carrero; C Diez; F Tejerina; S Vázquez; V Briz; S Resino
Journal: Eur J Clin Microbiol Infect Dis Date: 2014-09-19 Impact factor: 3.267

3. Annotate-it: a Swiss-knife approach to annotation, analysis and interpretation of single nucleotide variation in human disease.

Authors: Alejandro Sifrim; Jeroen Kj Van Houdt; Leon-Charles Tranchevent; Beata Nowakowska; Ryo Sakai; Georgios A Pavlopoulos; Koen Devriendt; Joris R Vermeesch; Yves Moreau; Jan Aerts
Journal: Genome Med Date: 2012-09-26 Impact factor: 11.117

4. In silico transcriptional regulation and functional analysis of dengue shock syndrome associated SNPs in PLCE1 and MICB genes.

Authors: Malik Mumtaz Taqi; Durdana Waseem; Humaira Ismatullah; Syed Aleem Haider; Muhammad Faisal
Journal: Funct Integr Genomics Date: 2016-04-01 Impact factor: 3.410

5. Identification of potential key genes and high-frequency mutant genes in prostate cancer by using RNA-Seq data.

Authors: Ze Zhang; He Wu; Hong Zhou; Yunhe Gu; Yufeng Bai; Shiliang Yu; Ruihua An; Jiping Qi
Journal: Oncol Lett Date: 2018-01-24 Impact factor: 2.967

6. The causal role of elevated uric acid and waist circumference on the risk of metabolic syndrome components.

Authors: Mahantesh I Biradar; Kuang-Mao Chiang; Hsin-Chou Yang; Yen-Tsung Huang; Wen-Harn Pan
Journal: Int J Obes (Lond) Date: 2019-11-21 Impact factor: 5.095

7. Identification of mutant genes with high-frequency, high-risk, and high-expression in lung adenocarcinoma.

Authors: Guiyuan Li; Shengming Yi; Fan Yang; Yongxin Zhou; Qiang Ji; Jianzhi Cai; Yunqing Mei
Journal: Thorac Cancer Date: 2014-04-22 Impact factor: 3.500

8. CRY2 genetic variants associate with dysthymia.

Authors: Leena Kovanen; Mari Kaunisto; Kati Donner; Sirkku T Saarikoski; Timo Partonen
Journal: PLoS One Date: 2013-08-08 Impact factor: 3.240

9. Unraveling genomic variation from next generation sequencing data.

Authors: Georgios A Pavlopoulos; Anastasis Oulas; Ernesto Iacucci; Alejandro Sifrim; Yves Moreau; Reinhard Schneider; Jan Aerts; Ioannis Iliopoulos
Journal: BioData Min Date: 2013-07-25 Impact factor: 2.522

10. PPARγ2 Pro12Ala polymorphism was associated with favorable cardiometabolic risk profile in HIV/HCV coinfected patients: a cross-sectional study.

Authors: Pilar García-Broncano; Juan Berenguer; Amanda Fernández-Rodríguez; Daniel Pineda-Tenor; María Ángeles Jiménez-Sousa; Mónica García-Alvarez; Pilar Miralles; Teresa Aldámiz-Echevarria; Juan Carlos López; Dariela Micheloud; Salvador Resino
Journal: J Transl Med Date: 2014-08-27 Impact factor: 5.531