Literature DB >> 16845012

WEGO: a web tool for plotting GO annotations.

Jia Ye1, Lin Fang, Hongkun Zheng, Yong Zhang, Jie Chen, Zengjin Zhang, Jing Wang, Shengting Li, Ruiqiang Li, Lars Bolund, Jun Wang.   

Abstract

Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium are widely accepted in most of the large scale gene annotation projects. Consequently, many tools have been created for use with the GO ontologies. WEGO (Web Gene Ontology Annotation Plot) is a simple but useful tool for visualizing, comparing and plotting GO annotation results. Different from other commercial software for creating chart, WEGO is designed to deal with the directed acyclic graph structure of GO to facilitate histogram creation of GO annotation results. WEGO has been used widely in many important biological research projects, such as the rice genome project and the silkworm genome project. It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at http://wego.genomics.org.cn. There are two available mirror sites at http://wego2.genomics.org.cn and http://wego.genomics.com.cn. Any suggestions are welcome at wego@genomics.org.cn.

Entities:  

Mesh:

Year:  2006        PMID: 16845012      PMCID: PMC1538768          DOI: 10.1093/nar/gkl031

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Unified, structured vocabularies and classifications freely provided by the Gene Ontology (GO) Consortium () are widely accepted in most of the large scale gene annotation projects. Three ontologies (molecular function, biological process and cellular component) were developed to represent common and basic biological information in annotation. Not only the original organizations SGD (Saccharomyces Genome Database), FlyBase and MGD (Mouse Genome Database), but also some additional model organism database groups are involved in the project, including TAIR (The Arabidopsis Information Resource), WormBase, RGD (Rat Genome Database), TIGR and so on (1–3). It is not easy, however, for a biologist with little computer background to analyze and understand genes with the GO information. The difficulties may have two aspects: (i) how to annotate the anonymous sequences with the GO vocabularies, and (ii) how to find the differences or anything new in the dataset. Many tools and software programs have been developed to tackle the first problem through an automatically or manually curated search for the associations between GO terms and genes (4–8). The Web Gene Ontology Annotation Plot (WEGO) is therefore designed as a web application mainly to deal with the second problem. The main purpose of the WEGO is to visualize the annotation of sets of genes, comparing the provided gene datasets and plotting the distribution of GO annotation results into a histogram. General histograms could be drawn by many commercial software programs. However, the GO terms are structured in the form of directed acyclic graph (DAG) to represent a network of complex relationships of ‘child’ and ‘parent’ (1). In order to avoid the tedious task of plotting the distribution of GO annotations, WEGO presents the DAG structures of ontologies as hierarchical trees to help users easily choose the levels and GO terms for exhibition. WEGO is not the only software to address this problem nor is it the most powerful one (9–13), but it is an excellent tool in several aspects. First, it is very user-friendly. For example, biologists could use the output result of InterProScan () as the input data of WEGO without any conversion. Second, WEGO is a web server that avoids the tedious steps of application installation and testing. It is operating system independent as well. Third, WEGO provides a visualization of the annotation results. It is not only useful for customizing output but is also effective for the understanding of GO annotations. In addition, WEGO does not have the restriction of organism. Finally, WEGO supports the comparison between several gene datasets which is a key characteristic in the post-genomic era. WEGO has been applied in many important biological research studies, such as the comparative genomics study between the rice genome and the Arabidopsis genome (14,15) and the silkworm genome analysis (16). It has become one of the daily tools for downstream gene annotation analysis, especially when performing comparative genomics tasks. As an example, Figure1.D, which is from the analysis of silkworm draft sequences, illustrates how WEGO can help analyze and compare the annotation results. In this histogram, significant differences in several categories are clearly presented by comparison between expressed sequence tag (EST)-confirmed genes in silk gland and other libraries.
Figure 1

WEGO interfaces. (A–C) Shows a screenshot montage of the WEGO interface of the three steps of the WEGO procedure: annotation results uploading, hierarchical GO tree editing, output setting. As an example, (D) is a sample figure from the analysis of silkworm draft sequences to show how WEGO can help analyze and compare the annotation results. In this histogram, EST-confirmed genes in silk gland are compared with 11 other libraries. Significant differences are obvious in several categories.

DESCRIPTION OF THE WEB INTERFACE

The web interface of WEGO is based on common gateway interface (CGI) and scalable vector graphics (SVG) technologies. It is implemented by Perl language. There are three freely accessible tools through the web interface: WEGO, External to GO Query and GO Archive Query. The GO data, dated from April 1, 2001, is downloaded from the GO FTP archive and is updated monthly ().

WEGO

Input of WEGO. Currently, WEGO supports four kinds of input format: WEGO native format, InterProScan raw (our default input format), text and XML output formats. The ‘-goterms’ option should be switched on for corresponding GO annotations when performing the InterProScan. WEGO native format is a simple text file with one gene record per line. Each column is tab delimited. The first column is the gene name and the rest are the associated GO IDs. The InterProScan output formats are acceptable for the convenience of the user, so that the annotation results of InterProScan could be uploaded onto the WEGO without any conversion. We are planning to support more output formats from other GO annotation tools in the near future. Uses of WEGO. There are two ways to work with WEGO. The first is to upload the annotation files (up to three files at one time). The input files must be in one of the four formats described above. The version of GO archive used for the downstream analysis of the GO annotation results in WEGO should of course be the same as the one used in annotation. Therefore, it is optional in WEGO when uploading the input files. The second way is to simply enter the job ID if the user carried out a WEGO analysis within the previous three days. A process window shows the job ID after the file is uploaded. Then the user is redirected to a webpage with a hierarchical GO tree which includes all the GO terms contained in the uploaded files. The displayed level of GO tree and the selected GO terms both could be changed by the user. The GO terms that were not contained in the chosen GO archive are listed in the ‘view error’ page. This error occurs frequently due to the different versions of GO archive used in annotation and WEGO. Another tool, named GO Archive Query, was developed to help users (especially the ones without information of the GO version used in annotation) deal with this problem. The user could switch between the three ontology trees to choose any GO terms of interest to display in the output histogram. The gene number, percentages and P-value of Pearson Chi-square test of each GO term are listed in the same line. The Pearson Chi-Square test is applied to indicate significant relationships between two input datasets. Compared with the Fisher's exact test, the Pearson Chi-Square test is appropriate and efficient for 2 × 2 matrixes if all the expected counts are greater than 5. Red arrows are used to indicate remarkable relationships with the significant level of 5%. The ‘Gene List’ function presents all the gene names under special GO term in XML format, so that users can get the gene content of each branch on the GO tree as well as gene number. Most of the users choose the GO term by the tree level setting, which may result in many GO terms with no exact meaning included. The anonymous terms filter was designed to avoid the useless items. Only two keywords ‘unknown’ and ‘obsolete’ have currently been adopted. There is also a custom terms filter, which allows the user to define the filter's keywords. All the GO terms including these keywords will be dropped from the output histogram by the filter. Alternatively, users could use the specially designed function ‘arrowed’ to select all the independent nodes to present all significant differences between his or her input datasets. Output of WEGO. SVG is the default output format of WEGO, since it is widely supported by many industrial and open source software programs, such as CorelDRAW®, Illustrilator®, inkscape and ImageMagick. With the help of the SVG plug-in, SVG could be viewed in the browser. Another advantage of SVG is its easy conversion to other graph formats and its suitability for publishing. WEGO also supports other common graph formats, including the bitmap formats PNG, JPEG and GIF, suitable for on-screen display, and the other vector formats PostScript and EPS. The output file will be compressed for downloading and the user could also supply an email address to receive results.

Two associated tools

External to GO Query

The structured vocabularies and classifications of GO are now accepted widely. However, GO is not the only attempt to build structured vocabularies for genome annotation. A series of other catalogs are also in current use, such as EC (Enzyme Commission), Swiss_Prot and Pfam domains. The External to GO Query attempts to make translations between these categories and GO terms. It is an interface based on the database of the GO Consortium's external2go (). Users can query both GO ID and entries of external systems by External to GO Query. Corresponding entries or GO ID will be given as output (Figure 2). Compared with the QuickGO (17,18), which was developed by the GOA (Gene Ontology Annotation project), the External to GO Query is a simpler but handier tool. The External to GO Query is designed to help biologists better understand the annotation results even though these mappings are not currently complete or exact.
Figure 2

External to GO Query. Screen capture from the External to GO Query, which attempts to make translations between other categories and GO. Users could query both GO ID and entries of external systems by External to GO Query. The complex relationships among the external catalogs are not in the consideration of External to GO Query, so if the entry of external database is queried, only the associated GO terms will be returned.

GO Archive Query

As the GO terms, definitions and ontologies are frequently updated, it is important to choose the correct version of GO archive. The version of GO used in the analysis should be the same as the one used in annotation. As stated above, the choice is difficult for the users without any information of the version of GO archive used in the annotation. Consequently, another tool, GO Archive Query, was developed to help users to solve this problem. Users could query GO ID, especially the GO ID from the ‘view error’, at which point the user is presented with all the versions of GO archives containing the GO ID and can choose the correct or close version of GO archive (Figure 3).
Figure 3

GO Archive Query. GO Archive Query provides the interface that allows users to query GO ID in the format of GO:0001955, 0001955 or just 1955. All the versions of GO repositories containing the GO ID will be presented. It is helpful for users choosing the correct version or at least a similar version of GO repository to use.

AVAILABILITY AND PROSPECTS

WEGO, along with the two other tools, namely External to GO Query and GO Archive Query, are freely available for all users at . There are two available mirror sites at and . It is operating system independent, and has been tested on Mozilla/Netscape/Firefox, Opera, Galeon and Internet Explorer. An SVG plug-in is necessary for online preview of the figure. Aiming for the greatest ease of use for biologists, especially for those without computer background, we are trying to develop the WEGO to serve as a GO-application-friendly tool as well as a user-friendly tool. Additional output formats of other GO annotation tools will be adaptable as the WEGO input. And more output choices and better integration with other GO tools will be future features of WEGO.
  18 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  GoFigure: automated Gene Ontology annotation.

Authors:  Salim Khan; Gang Situ; Keith Decker; Carl J Schmidt
Journal:  Bioinformatics       Date:  2003-12-12       Impact factor: 6.937

3.  The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro.

Authors:  Evelyn Camon; Michele Magrane; Daniel Barrell; David Binns; Wolfgang Fleischmann; Paul Kersey; Nicola Mulder; Tom Oinn; John Maslen; Anthony Cox; Rolf Apweiler
Journal:  Genome Res       Date:  2003-03-12       Impact factor: 9.043

4.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

5.  Automated Gene Ontology annotation for anonymous sequence data.

Authors:  Steffen Hennig; Detlef Groth; Hans Lehrach
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

6.  OntoBlast function: From sequence similarities directly to potential functional annotations by ontology terms.

Authors:  Günther Zehetner
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

7.  A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

Authors:  Jun Yu; Songnian Hu; Jun Wang; Gane Ka-Shu Wong; Songgang Li; Bin Liu; Yajun Deng; Li Dai; Yan Zhou; Xiuqing Zhang; Mengliang Cao; Jing Liu; Jiandong Sun; Jiabin Tang; Yanjiong Chen; Xiaobing Huang; Wei Lin; Chen Ye; Wei Tong; Lijuan Cong; Jianing Geng; Yujun Han; Lin Li; Wei Li; Guangqiang Hu; Xiangang Huang; Wenjie Li; Jian Li; Zhanwei Liu; Long Li; Jianping Liu; Qiuhui Qi; Jinsong Liu; Li Li; Tao Li; Xuegang Wang; Hong Lu; Tingting Wu; Miao Zhu; Peixiang Ni; Hua Han; Wei Dong; Xiaoyu Ren; Xiaoli Feng; Peng Cui; Xianran Li; Hao Wang; Xin Xu; Wenxue Zhai; Zhao Xu; Jinsong Zhang; Sijie He; Jianguo Zhang; Jichen Xu; Kunlin Zhang; Xianwu Zheng; Jianhai Dong; Wanyong Zeng; Lin Tao; Jia Ye; Jun Tan; Xide Ren; Xuewei Chen; Jun He; Daofeng Liu; Wei Tian; Chaoguang Tian; Hongai Xia; Qiyu Bao; Gang Li; Hui Gao; Ting Cao; Juan Wang; Wenming Zhao; Ping Li; Wei Chen; Xudong Wang; Yong Zhang; Jianfei Hu; Jing Wang; Song Liu; Jian Yang; Guangyu Zhang; Yuqing Xiong; Zhijie Li; Long Mao; Chengshu Zhou; Zhen Zhu; Runsheng Chen; Bailin Hao; Weimou Zheng; Shouyi Chen; Wei Guo; Guojie Li; Siqi Liu; Ming Tao; Jian Wang; Lihuang Zhu; Longping Yuan; Huanming Yang
Journal:  Science       Date:  2002-04-05       Impact factor: 47.728

8.  The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology.

Authors:  Evelyn Camon; Michele Magrane; Daniel Barrell; Vivian Lee; Emily Dimmer; John Maslen; David Binns; Nicola Harte; Rodrigo Lopez; Rolf Apweiler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

9.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies.

Authors:  Bing Zhang; Denise Schmoyer; Stefan Kirov; Jay Snoddy
Journal:  BMC Bioinformatics       Date:  2004-02-18       Impact factor: 3.169

10.  GObar: a gene ontology based analysis and visualization tool for gene sets.

Authors:  Jason S M Lee; Gurpreet Katari; Ravi Sachidanandam
Journal:  BMC Bioinformatics       Date:  2005-07-25       Impact factor: 3.169

View more
  1245 in total

1.  Massively parallel sequencing and analysis of expressed sequence tags in a successful invasive plant.

Authors:  Peter J Prentis; Megan Woolfit; Skye R Thomas-Hall; Daniel Ortiz-Barrientos; Ana Pavasovic; Andrew J Lowe; Peer M Schenk
Journal:  Ann Bot       Date:  2010-10-07       Impact factor: 4.357

2.  Analysis of expression sequence tags from a full-length-enriched cDNA library of developing sesame seeds (Sesamum indicum).

Authors:  Tao Ke; Caihua Dong; Han Mao; Yingzhong Zhao; Hong Chen; Hongyan Liu; Xuyan Dong; Chaobo Tong; Shengyi Liu
Journal:  BMC Plant Biol       Date:  2011-12-24       Impact factor: 4.215

3.  From model to crop: functional analysis of a STAY-GREEN gene in the model legume Medicago truncatula and effective use of the gene for alfalfa improvement.

Authors:  Chuanen Zhou; Lu Han; Catalina Pislariu; Jin Nakashima; Chunxiang Fu; Qingzhen Jiang; Li Quan; Elison B Blancaflor; Yuhong Tang; Joseph H Bouton; Michael Udvardi; Guangmin Xia; Zeng-Yu Wang
Journal:  Plant Physiol       Date:  2011-09-28       Impact factor: 8.340

4.  Expressed sequence tags in cultivated peanut (Arachis hypogaea): discovery of genes in seed development and response to Ralstonia solanacearum challenge.

Authors:  Jiaquan Huang; Liying Yan; Yong Lei; Huifang Jiang; Xiaoping Ren; Boshou Liao
Journal:  J Plant Res       Date:  2012-05-31       Impact factor: 2.629

5.  Global transcriptome profiling analysis reveals insight into saliva-responsive genes in alfalfa.

Authors:  Wenxian Liu; Zhengshe Zhang; Shuangyan Chen; Lichao Ma; Hucheng Wang; Rui Dong; Yanrong Wang; Zhipeng Liu
Journal:  Plant Cell Rep       Date:  2015-12-08       Impact factor: 4.570

6.  Gene transcript profiles in the desert plant Nitraria tangutorum during fruit development and ripening.

Authors:  Jia Wang; Zhenhua Dang; Huirong Zhang; Linlin Zheng; Tebuqin Borjigin; Yingchun Wang
Journal:  Mol Genet Genomics       Date:  2015-09-20       Impact factor: 3.291

7.  Transcriptomic analysis of rice (Oryza sativa) endosperm using the RNA-Seq technique.

Authors:  Yi Gao; Hong Xu; Yanyue Shen; Jianbo Wang
Journal:  Plant Mol Biol       Date:  2013-01-16       Impact factor: 4.076

8.  Detoxification strategies and regulation of oxygen production and flowering of Platanus acerifolia under lead (Pb) stress by transcriptome analysis.

Authors:  Limin Wang; Haijiao Yang; Rongning Liu; Guoqiang Fan
Journal:  Environ Sci Pollut Res Int       Date:  2015-04-28       Impact factor: 4.223

9.  Proteome and Transcriptome Analysis of Ovary, Intersex Gonads, and Testis Reveals Potential Key Sex Reversal/Differentiation Genes and Mechanism in Scallop Chlamys nobilis.

Authors:  Yu Shi; Wenguang Liu; Maoxian He
Journal:  Mar Biotechnol (NY)       Date:  2018-03-15       Impact factor: 3.619

10.  Dynamic landscapes of four histone modifications during deetiolation in Arabidopsis.

Authors:  Jean-Benoit F Charron; Hang He; Axel A Elling; Xing Wang Deng
Journal:  Plant Cell       Date:  2009-12-11       Impact factor: 11.277

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.