Literature DB >> 29788377

WEGO 2.0: a web tool for analyzing and plotting GO annotations, 2018 update.

Jia Ye¹, Yong Zhang¹, Huihai Cui¹, Jiawei Liu¹, Yuqing Wu^1,2, Yun Cheng³, Huixing Xu¹, Xingxin Huang¹, Shengting Li¹, An Zhou¹, Xiuqing Zhang¹, Lars Bolund^4,5, Qiang Chen^6,7,8, Jian Wang¹, Huanming Yang¹, Lin Fang^1,9, Chunmei Shi^6,7,8.

Abstract

WEGO (Web Gene Ontology Annotation Plot), created in 2006, is a simple but useful tool for visualizing, comparing and plotting GO (Gene Ontology) annotation results. Owing largely to the rapid development of high-throughput sequencing and the increasing acceptance of GO, WEGO has benefitted from outstanding performance regarding the number of users and citations in recent years, which motivated us to update to version 2.0. WEGO uses the GO annotation results as input. Based on GO's standardized DAG (Directed Acyclic Graph) structured vocabulary system, the number of genes corresponding to each GO ID is calculated and shown in a graphical format. WEGO 2.0 updates have targeted four aspects, aiming to provide a more efficient and up-to-date approach for comparative genomic analyses. First, the number of input files, previously limited to three, is now unlimited, allowing WEGO to analyze multiple datasets. Also added in this version are the reference datasets of nine model species that can be adopted as baselines in genomic comparative analyses. Furthermore, in the analyzing processes each Chi-square test is carried out for multiple datasets instead of every two samples. At last, WEGO 2.0 provides an additional output graph along with the traditional WEGO histogram, displaying the sorted P-values of GO terms and indicating their significant differences. At the same time, WEGO 2.0 features an entirely new user interface. WEGO is available for free at http://wego.genomics.org.cn.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29788377 PMCID： PMC6030983 DOI： 10.1093/nar/gky400

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Gene Ontology (GO) was started by the GO Consortium in 1998 to focus on studies of the genome of three model organisms: Drosophila Melanogaster (fruit fly), Mus musculus (mouse) and Saccharomyces cerevisiae (brewer’s or baker’s yeast) (1–9). As a result of its unified and well-structured vocabulary, GO was quickly adopted across an array of genome projects (10), transcriptome projects (11–14), proteome projects (15) and more. GO consists of three sub-ontologies: biological process, cellular component and molecular function. These sub-ontologies and the terms therein were designed as a Directed Acyclic Graph (DAG). In order to calculate and present gene enrichment statistics and gene expression levels, the calculation of gene numbers of each GO ID requires a significant understanding of DAG structures. Some tools were created to carry out these analyses, such as agriGO 2.0 (16), BiNGO (17), g:Profiler (18), Gorilla (19), etc. (20,21). WEGO (Web Gene Ontology Annotation Plot) (22) is a tool that focuses on analyzing GO annotations in a comparative manner. It was created in 2006 and was quickly accepted and put to use by a large number of researchers. In the past 12 years the website has been visited more than 12 636 545 times by users in more than 186 countries and regions (as of the end of 2017). WEGO was cited over 1536 times by publications covering research topics focusing on various types of species from Bryum argenteum (Bryophytes) (23) to Polynoidae (scale worms) (24) and from Gossypium (cotton) (25) to Bombus terrestris (bumblebee) (26). We have also benefitted from a great deal of positive feedback and some very constructive suggestions from users worldwide. With the rapid development of high-throughput sequencing, the use of genome (10), transcriptome (11–14) and proteome (15) big data has become a major factor in downstream annotation and data analyses. In following this trend and answering user feedback, WEGO was updated to version 2.0 in 2018. It is now more applicable for big data, while its original characteristics of user friendliness and graphical presentation have been enhanced. Some major changes include the ability to upload several input files for analysis, the addition of a reference dataset of nine species for WEGO analyses and an additional output plot showing inconsistent terms.

WEGO INTERFACE

WEGO 2.0 uses the Tomcat 7.0 application server and the MariaDB 5.5 backend database, which is a branch version of the popular open source MySQL database system. The entire WEGO 2.0 service was developed using Java and JavaScript, specifically the NodeJS, Bootstrap 3, JQuery, eCharts 3 and DropzoneJS libraries and frameworks. External2GO Query and GO Archive Query remain unchanged from the original version of WEGO and can be found in the ‘Tools’ tab. These features aid in translating GO terms between different biological databases and selecting the corresponding GO archives. If the GO vocabulary version adopted in WEGO analysis is different from the annotation process, some outdated GO numbers will appear. You can find such GO numbers under the ‘View error’ option. Using demo data as an example, GO:0004785 is an unmatched GO number listed in the view error. We looked up this number in the GO Archive Query and found that it only existed before the March 2008 version of GO vocabulary. This helps to find the correct GO file version. A WEGO analysis workflow consists of the submission of input files, selection of GO terms and editing of output results. In addition to the improvements in user interface and user friendliness made in WEGO 2.0, some substantial updates were also included and are explained in further detail in the next sessions.

INPUT

Input files for WEGO are uploaded by way of a drag-and-drop action (Figure 1A)—an unlimited number of files can be uploaded for any one analysis. WEGO supports WEGO native, InterproScan result (XML, TXT, RAW) (27) and GAF (GO Annotation file format) (1–9) formats. There are three optional parameters before submitting:

Figure 1.

The WEGO interface. This combination chart is just a demonstration of the use of WEGO. Panel (A) is the homepage, uploading the input file. Panel (B) shows the settings of the data. Panels (C) and (D) are two different output plots. (A) Homepage with an example of submission; (B) Data Setting—GO tree tab, showing the statistical summary and GO term selections; (C) Graph A: traditional WEGO histogram: comparisons in gene numbers and percentages of selected GO terms. More datasets uploaded, there will be more column serials in the histogram. (D) Graph B: log (10) of P-values obtained from all datasets of selected GO terms in descending order, indicating the data differences, especially significant differences.

The file format: The GAF format is the GO consortium’s standard format for GO annotation data, so we set GAF as the default WEGO input format. WEGO provides a demo analysis in the submission area for new users to familiarize themselves with the operation of WEGO. Input samples could be found in documentation. Gene Ontology Files: Since GO vocabulary is frequently updated, WEGO offers users the ability to select the correct version that exactly matches what has been adopted in their GO annotations. The default GO file is the latest version Reference data: WEGO provides the reference data of nine model species including: baker’s yeast, Caenorhabditis elegans, Escherichia coli, house mouse, human, fruit fly, brown rat, rice and zebrafish (http://www.geneontology.org/page/download-go-annotations). The backend data for these nine species is obtained from the GO Consortium website (1–9). By default, no reference data is selected. The WEGO interface. This combination chart is just a demonstration of the use of WEGO. Panel (A) is the homepage, uploading the input file. Panel (B) shows the settings of the data. Panels (C) and (D) are two different output plots. (A) Homepage with an example of submission; (B) Data Setting—GO tree tab, showing the statistical summary and GO term selections; (C) Graph A: traditional WEGO histogram: comparisons in gene numbers and percentages of selected GO terms. More datasets uploaded, there will be more column serials in the histogram. (D) Graph B: log (10) of P-values obtained from all datasets of selected GO terms in descending order, indicating the data differences, especially significant differences.

ANALYSIS OUTPUT

A serial number is generated for every job submitted, which is called a job ID. It could be entered on the top right corner of the homepage to re-access the editing page. The job ID is valid for 3 months therefore users can use the serial ID to retrieve the results, instead of re-analyzing big datasets. A foldable summary of basic statistics of the input file is shown in a tabular form, as in Figure 1B, where the numbers of genes of the three sub-ontologies for each sample, are listed correspondingly. GO terms are listed as a hierarchical GO tree as shown in Figure 1B. Each GO term is presented in a row, shown in the following order: gene number, percentage, P-value, GO term, term description and hyperlink of the gene list. Three sub-ontologies of GO annotations are listed in separate tabs, which is easier for the users to switch between them. There are two sets of buttons for displaying and selecting GO terms. These buttons work globally which means all three sub-ontologies are effected. The ‘View Error’ page lists GO terms that are not contained in the GO tree due to the mismatching of GO archive versions used in annotation and WEGO. Chi-square tests are carried out for all datasets of particular GO terms. P-values are obtained to indicate the sample differences. A sample difference is considered significant when the P-value < 0.05, thus a star is added for this GO term. By clicking on the ‘Star’ button all the starred GO terms are automatically selected. This function makes it easier for users to identify significantly different GO terms in all the input datasets. Two output graphs automatically synchronize with GO tree editing. In the ‘Graphs’ tab the properties of the graphs could be edited, including the sizes, colors and legends of graphs, as shown in Figure 1. The users are welcome to export the graphs in SVG, PNG or JPEG formats using the ‘Export’ button.

VISUALIZATION OF OUTPUT

An example of the two types of graphs as WEGO output is shown in Figure 1C and D. Graph A is the traditional WEGO histogram remaining from the previous version. The x-axis displays the GO terms selected from the GO trees. The right y-axis shows the gene numbers of selected GO terms, while the left y-axis shows the percentages. The y-axis could be either linear or log scaled. The log scaled y-axis is recommended when the gene numbers differ too much. Graph B is the newly added graph in 2018 update. The y-axis shows the user selected GO terms and the x-axis shows the log of the P-values from Chi-square tests of all samples. The Chi-square test of independence is applied to determine whether there is a significant difference between the expected frequencies of genes with GO terms and their observed frequencies. When the P-value < 0.05, it is concluded that there is a sample difference in proportions of GO-enriched genes. The graphs are easily exported using the ‘Export’ button at the top right corners of both graphs. WEGO supports SVG format as output since it is a vector figure format that does not lose its clarity in data transmission. PNG and JPEG formats are also supported. The graphs only show data that is selected in the ‘GO Tree’ tab; if the selection is changed both graphs automatically update. The GO term selecting settings are used in both figures.

UPDATES IN 2018

In order to improve the user-friendliness of WEGO, as well as to keep up with the big data era, WEGO has updated to its 2.0 version. The following three updates greatly improve the functions and usability of WEGO: WEGO now supports unlimited number of input files, where in contrast the previous version had the restriction of three files. As high-throughput sequencing becomes cheaper and easier, it is common now that 8–10 files have to be analyzed at the same time (28–33). Therefore, this optimization is considered to be very applicable. Moreover, the Chi-square tests can now be applied to multiple datasets (instead of applying to every two datasets), which means only one P-value is calculated for each GO term. In WEGO 2.0 the genomic annotations of nine species are provided as reference data, which are used as the baseline in genomic comparative analyses. The data are obtained from GO annotations in the Gene Ontology Consortium website, providing comprehensive and non-redundant annotation files for each organism (1–9). Another important update of WEGO 2.0 is the additional bar graph (Figure 1D) that shows the GO terms (of user’s interests) with the most significant differences in descending order. The horizontal axis is designed to show the log10 of P-values. These P-values are calculated from Chi-square tests of the gene numbers of a particular GO term in all datasets. Therefore, graph B aids in identifying and visualizing the GO terms with most significant differences in all datasets. Besides these three points, some other slight improvements in WEGO 2.0 include a tabular brief summary of the statistics of input datasets, and a totally new interface. To improve the efficiency and user experience, the analysis workflow is reduced from four to two steps. WEGO now supports some modern user interface technologies, such as web-based drag-and-drop style of file uploading and interactive chart editing and faster switching of GO trees.

PROSPECTIVE DISCUSSION

The large number of WEGO visitors and citations in the past 12 years was beyond the authors’ expectation. It is fully acknowledged that the wide acceptance of WEGO did not stand on its own. It was greatly related to the extensive use of GO vocabulary (1–9). It was concluded that the most important features for a tool, especially a web-based tool, a clear and user-friendly interface, the ease to use and the constant maintenance and improvement rather than complex backend statistical methods and development techniques. In order to keep up with the development trend of the field of genomics, WEGO now allows for any number of input files and provides enhanced visual presentations, including the visualization of significantly different GO terms. More extended functions such as supporting other well-structured annotation results (e.g. KEGG) (34,35) are likely to be developed in the future. In the future, the maintenance of WEGO will be constantly considered as the most important task, therefore it is greatly appreciated to receive feedback from users. The authors sincerely welcome any feedback through the contact page (http://wego.genomics.org.cn/contact).

DATA AVAILABILITY

WEGO 2.0 is available at http://wego.genomics.org.cn. This website is free and open to all users and there is no login requirement. Click here for additional data file.

35 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. The Gene Ontology (GO) database and informatics resource.

Authors: M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors: Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal: Nucleic Acids Res Date: 2003-01-01 Impact factor: 16.971

4. Gene Ontology annotations and resources.

Authors: J A Blake; M Dolan; H Drabkin; D P Hill; Ni Li; D Sitnikov; S Bridges; S Burgess; T Buza; F McCarthy; D Peddinti; L Pillai; S Carbon; H Dietze; A Ireland; S E Lewis; C J Mungall; P Gaudet; R L Chrisholm; P Fey; W A Kibbe; S Basu; D A Siegele; B K McIntosh; D P Renfro; A E Zweifel; J C Hu; N H Brown; S Tweedie; Y Alam-Faruque; R Apweiler; A Auchinchloss; K Axelsen; B Bely; M -C Blatter; C Bonilla; L Bouguerleret; E Boutet; L Breuza; A Bridge; W M Chan; G Chavali; E Coudert; E Dimmer; A Estreicher; L Famiglietti; M Feuermann; A Gos; N Gruaz-Gumowski; R Hieta; C Hinz; C Hulo; R Huntley; J James; F Jungo; G Keller; K Laiho; D Legge; P Lemercier; D Lieberherr; M Magrane; M J Martin; P Masson; P Mutowo-Muellenet; C O'Donovan; I Pedruzzi; K Pichler; D Poggioli; P Porras Millán; S Poux; C Rivoire; B Roechert; T Sawford; M Schneider; A Stutz; S Sundaram; M Tognolli; I Xenarios; R Foulgar; J Lomax; P Roncaglia; V K Khodiyar; R C Lovering; P J Talmud; M Chibucos; M Gwinn Giglio; H -Y Chang; S Hunter; C McAnulla; A Mitchell; A Sangrador; R Stephan; M A Harris; S G Oliver; K Rutherford; V Wood; J Bahler; A Lock; P J Kersey; D M McDowall; D M Staines; M Dwinell; M Shimoyama; S Laulederkind; T Hayman; S -J Wang; V Petri; T Lowry; P D'Eustachio; L Matthews; R Balakrishnan; G Binkley; J M Cherry; M C Costanzo; S S Dwight; S R Engel; D G Fisk; B C Hitz; E L Hong; K Karra; S R Miyasato; R S Nash; J Park; M S Skrzypek; S Weng; E D Wong; T Z Berardini; E Huala; H Mi; P D Thomas; J Chan; R Kishore; P Sternberg; K Van Auken; D Howe; M Westerfield
Journal: Nucleic Acids Res Date: 2012-11-17 Impact factor: 16.971

5. REVIGO summarizes and visualizes long lists of gene ontology terms.

Authors: Fran Supek; Matko Bošnjak; Nives Škunca; Tomislav Šmuc
Journal: PLoS One Date: 2011-07-18 Impact factor: 3.240

6. RNA Sequencing Reveals that Endoplasmic Reticulum Stress and Disruption of Membrane Integrity Underlie Dimethyl Trisulfide Toxicity against Fusarium oxysporum f. sp. cubense Tropical Race 4.

Authors: Cunwu Zuo; Weina Zhang; Zhongjian Chen; Baihong Chen; Yonghong Huang
Journal: Front Microbiol Date: 2017-07-24 Impact factor: 5.640

7. Sex and tissue specific gene expression patterns identified following de novo transcriptomic analysis of the Norway lobster, Nephrops norvegicus.

Authors: Guiomar Rotllant; Tuan Viet Nguyen; Valerio Sbragaglia; Lifat Rahi; Kevin J Dudley; David Hurwood; Tomer Ventura; Joan B Company; Vincent Chand; Jacopo Aguzzi; Peter B Mather
Journal: BMC Genomics Date: 2017-08-16 Impact factor: 3.969

8. The Gene Ontology in 2010: extensions and refinements.

Authors:
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

9. The Gene Ontology project in 2008.

Authors:
Journal: Nucleic Acids Res Date: 2007-11-04 Impact factor: 16.971

10. De novo transcriptome sequencing of Isaria cateniannulata and comparative analysis of gene expression in response to heat and cold stresses.

Authors: Dingfeng Wang; Liangde Li; Guangyuan Wu; Liette Vasseur; Guang Yang; Pengrong Huang
Journal: PLoS One Date: 2017-10-12 Impact factor: 3.240

131 in total

1. Draft genomic sequence of Armillaria gallica 012m: insights into its symbiotic relationship with Gastrodia elata.

Authors: Mengtao Zhan; Menghua Tian; Weiguang Wang; Ganpeng Li; Xiaokai Lu; Guolei Cai; Haiying Yang; Gang Du; Lishuxin Huang
Journal: Braz J Microbiol Date: 2020-06-22 Impact factor: 2.476

2. Streptomyces brasiliscabiei, a new species causing potato scab in south Brazil.

Authors: Daniele Bussioli Alves Corrêa; Danilo Trabuco do Amaral; Márcio José da Silva; Suzete Aparecida Lanza Destéfano
Journal: Antonie Van Leeuwenhoek Date: 2021-04-21 Impact factor: 2.271

3. Lipidomic analyses reveal enhanced lipolysis in planthoppers feeding on resistant host plants.

Authors: Xiaohong Zheng; Yeyun Xin; Yaxin Peng; Junhan Shan; Ning Zhang; Di Wu; Jianping Guo; Jin Huang; Wei Guan; Shaojie Shi; Cong Zhou; Rongzhi Chen; Bo Du; Lili Zhu; Fang Yang; Xiqin Fu; Longping Yuan; Guangcun He
Journal: Sci China Life Sci Date: 2020-11-05 Impact factor: 6.038

4. Transcriptome alterations of radish shoots exposed to cadmium can be interpreted in the context of leaf senescence.

Authors: Zahra Soleimannejad; Hamid Reza Sadeghipour; Ahmad Abdolzadeh; Masoud Golalipour; Mohammad Reza Bakhtiarizadeh
Journal: Protoplasma Date: 2022-04-09 Impact factor: 3.356

5. Chromosome-scale genome assembly of sweet cherry (Prunus avium L.) cv. Tieton obtained using long-read and Hi-C sequencing.

Authors: Jiawei Wang; Weizhen Liu; Dongzi Zhu; Po Hong; Shizhong Zhang; Shijun Xiao; Yue Tan; Xin Chen; Li Xu; Xiaojuan Zong; Lisi Zhang; Hairong Wei; Xiaohui Yuan; Qingzhong Liu
Journal: Hortic Res Date: 2020-08-01 Impact factor: 6.793

6. ENCORE: A Visualization Tool for Insight into Circadian Omics.

Authors: Hannah De Los Santos; Kristin P Bennett; Jennifer M Hurley
Journal: ACM BCB Date: 2019-09

7. Long-read RNA sequencing reveals widespread sex-specific alternative splicing in threespine stickleback fish.

Authors: Alice S Naftaly; Shana Pau; Michael A White
Journal: Genome Res Date: 2021-06-15 Impact factor: 9.043

8. Genome-wide identification, characterization and transcriptional profiling of NHX-type (Na⁺/H⁺) antiporters under salinity stress in soybean.

Authors: Shrushti Joshi; Kawaljeet Kaur; Tushar Khare; Ashish Kumar Srivastava; Penna Suprasanna; Vinay Kumar
Journal: 3 Biotech Date: 2021-01-02 Impact factor: 2.406

9. The recombination landscape and multiple QTL mapping in a Solanum tuberosum cv. 'Atlantic'-derived F₁ population.

Authors: Guilherme da Silva Pereira; Marcelo Mollinari; Mitchell J Schumann; Mark E Clough; Zhao-Bang Zeng; G Craig Yencho
Journal: Heredity (Edinb) Date: 2021-03-22 Impact factor: 3.821

10. Conversion between 100-million-year-old duplicated genes contributes to rice subspecies divergence.

Authors: Chendan Wei; Zhenyi Wang; Jianyu Wang; Jia Teng; Shaoqi Shen; Qimeng Xiao; Shoutong Bao; Yishan Feng; Yan Zhang; Yuxian Li; Sangrong Sun; Yuanshuai Yue; Chunyang Wu; Yanli Wang; Tianning Zhou; Wenbo Xu; Jigao Yu; Li Wang; Jinpeng Wang
Journal: BMC Genomics Date: 2021-06-19 Impact factor: 3.969