Literature DB >> 22110029

GONUTS: the Gene Ontology Normal Usage Tracking System.

Daniel P Renfro1, Brenley K McIntosh, Anand Venkatraman, Deborah A Siegele, James C Hu.   

Abstract

The Gene Ontology Normal Usage Tracking System (GONUTS) is a community-based browser and usage guide for Gene Ontology (GO) terms and a community system for general GO annotation of proteins. GONUTS uses wiki technology to allow registered users to share and edit notes on the use of each term in GO, and to contribute annotations for specific genes of interest. By providing a site for generation of third-party documentation at the granularity of individual terms, GONUTS complements the official documentation of the Gene Ontology Consortium. To provide examples for community users, GONUTS displays the complete GO annotations from seven model organisms: Saccharomyces cerevisiae, Dictyostelium discoideum, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Mus musculus and Arabidopsis thaliana. To support community annotation, GONUTS allows automated creation of gene pages for gene products in UniProt. GONUTS will improve the consistency of annotation efforts across genome projects, and should be useful in training new annotators and consumers in the production of GO annotations and the use of GO terms. GONUTS can be accessed at http://gowiki.tamu.edu. The source code for generating the content of GONUTS is available upon request.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 22110029      PMCID: PMC3245169          DOI: 10.1093/nar/gkr907

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


GENE ONTOLOGY AND THE NEED FOR A GRANULAR USAGE GUIDE

The Gene Ontology has become a standard for the consistent functional annotation of genes and gene products across all organisms (1–3). Well-established model organism databases (MODs) devote considerable resources to providing high-quality GO annotation based on manual curation from the experimental literature. Annotation of genes from other organisms is often inferred on the basis of homology to one of the established model organisms. Two situations inspired us to build a general system to support community GO annotation. First, several well-established bacterial systems with extensive experimental literature (e.g. Escherichia coli and Bacillus subtilis) were not well covered by GO annotations based on direct experimental evidence. Second, the wealth of new complete genome sequences includes many organisms where there is significant experimental literature, but the research communities are too small to support a model organism database. In some of these cases, the expertise in the experimental corpus is held by a small number of researchers focusing on specific biological problems. These experts could provide the most scientifically accurate and rich GO annotation and could provide useful input about the structure of the ontology itself, provided they familiarize themselves with the Gene Ontology. Typically these experts are only interested in a small number of genes and have little incentive to familiarize themselves with the intricacies of GO. Alternatively, researchers that are intimately familiar with GO have the potential to create many useful annotations, but typically do not have the necessary expertise to optimally annotate the wide variety of subject areas in the scientific literature. Either professional curators need to learn many new areas of biology, or experts need to become adequately acquainted with GO to provide quality annotations via community curation. Training of large numbers of community members in GO annotation would be aided by more detailed documentation of best practices and pitfalls of GO. The Gene Ontology is structured as a directed acyclic graph (DAG) where documentation can be applied to an entire branch, or more optimally, at the level of individual terms. Building such detailed documentation is a Herculean task for any small group, such as the GO consortium. However, we reasoned that large-scale documentation could be incrementally built through community collaboration, especially if community members record the nuances they encounter while learning the annotation process. The nature of wiki-based systems allows revisions and corrections of false starts and errors in the learning process. A wiki provides an exceptional environment for this type of self-documenting system, as well as valuable resources such as revision history and a dedicated page for discussion. To capture detailed usage notes dealing with GO terms and to aid the annotation of specific genes by research experts, we constructed the Gene Ontology Normal Usage Tracking System (GONUTS), a wiki-based GO browser and community annotation system (http://gowiki.tamu.edu). GONUTS currently supports two main kinds of wiki-based content: (i) pages to capture community usage notes for specific GO terms and (ii) editable pages displaying editable GO annotations for specific genes.

GONUTS AS A GENE ONTOLOGY TERM BROWSER

Figure 1 shows an example of a GONUTS page for a GO term. GO terms are represented by MediaWiki Category pages, which like biological ontologies, can be represented as DAGs. Each page contains information about the GO term from the publicly downloadable ontology files at http://geneontology.org. A set of PHP scripts updates these GO term pages once per week as described in the Supplementary Methods section. MediaWiki automatically generates links to the child terms as subcategories, and links to associated genes as category members. We modified the normal MediaWiki category-page display with a set of extensions that provide an AJAX-based expansion/contraction to allow viewing the descendants of each child term. Users also have the option to filter associated genes for terms that have a large number of members. Each page also includes a link to the term in AmiGO, the official web-based term browser of the GO consortium. GONUTS term pages also include a graphical representation of that ontology node from the EMBL-EBI (4). Clicking on the graphic takes the user to the EBI's QuickGO browser (4,5). The key component on each GO term page is the area for user-editable usage notes. MediaWiki pages allow for associated ‘Talk’ pages, which provide a place for discussion, commentary and questions for other users. In GONUTS, the ‘Talk’ pages for GO terms are seeded with links to the Sourceforge tracker for the GO consortium, which allows users to find prior-related discussions.
Figure 1.

A typical GO term page in GONUTS. (A) Information about the GO term derived from the ontology files from the GO consortium, including id, definition, relationships, and parent terms. (B) A ontology graph from the EBI. (C) User-edited section for notes. In this example, a user has embedded an uploaded diagram explaining the term in better detail. (D) Child Terms (E) Genes annotated to this term. The page also contains all the typical elements of a MediaWiki page including a sidebar for site-navigation and tabs along the top for various actions.

A typical GO term page in GONUTS. (A) Information about the GO term derived from the ontology files from the GO consortium, including id, definition, relationships, and parent terms. (B) A ontology graph from the EBI. (C) User-edited section for notes. In this example, a user has embedded an uploaded diagram explaining the term in better detail. (D) Child Terms (E) Genes annotated to this term. The page also contains all the typical elements of a MediaWiki page including a sidebar for site-navigation and tabs along the top for various actions. Users can search for GO terms or genes with GO annotations using GO id numbers or keywords in the normal MediaWiki search system. Although MediaWiki search does not currently allow wildcards, the search will find complete words in any order. Thus, terms related to cAMP-dependent protein kinases can be found using ‘cAMP protein kinase’ or ‘kinase protein cAMP’ in any order. To address the shortcomings of the native search, the ability to search GONUTS using Google has been added.

USER-EDITABLE NOTES FOR EVERY GO TERM

What sets GONUTS apart from other ontology browsers is the editable notes section on every term page. Browsing and searching for ontology terms or annotated genes is unrestricted, but registration is required for editing the notes. This is mainly to inhibit vandalism of the wiki; ‘wiki spam’ is a serious problem on many open wikis (6–9). We use a ‘vampire model’ for user registration on GONUTS: any registered user can create additional registered users without applying for permission from the central resource. These, in turn, can register their colleagues. This method of user creation removes the dependence on the involvement of administrators and allows the community to grow dynamically. Clicking the edit link to the right of the ‘Notes’ heading takes a logged-in user to a MediaWiki text entry page. Although editing the notes is not ‘What You See Is What You Get’ (WYSIWYG), the standard MediaWiki markup is easy to learn and will be familiar to users of Wikipedia and many other wikis. Help pages about editing are available on GONUTS, Wikipedia and across the web. An important aspect of all wiki-based systems is that there is no requirement for users to learn any of the markup in order to contribute or edit content. The collaborative nature of wiki editing means that information can be added in plain text by one user and then formatted by another who is more familiar with the markup system. Moreover, even incomplete comments are potentially useful starting points for elaboration by other users. In order to maintain ease of use on the GO term pages, we have made a few modifications that make editing different from editing at Wikipedia. For example, we have simplified handling of references, which are important to the credibility of any community-edited text. A link is provided to insert a citation marker that accepts a PubMed ID number and retrieves the relevant reference information from PubMed's E-utilities (10). These are then collected in a references section that includes links to wiki pages for each publication.

GENE PAGES

When we first conceived GONUTS we wanted examples of high-quality annotations using specific GO terms, which community curators could use as role models for their own annotation. For this purpose, we included all of the GO annotations from a set of well-curated model organisms from the GO consortium's reference genomes project (11,12). Gene pages generated for the reference genomes include links to a details page in the submitting model organism database, and lists of all the GO annotations associated with that gene. The references field is converted into a citation, and a references section lists the references with links to PubMed and the citing database, as appropriate. As with the GO term pages, a Notes section is provided for user commentary. These notes about how the annotation of this gene was done by the source MOD informs the annotation of similar genes in other organisms. Although we initially envisioned GONUTS as an aid to annotation at other sites, such as EcoliWiki, we realized that there is demand for community annotation of gene function for many genomes that do not have well-established model organism databases (13). To meet this need we implemented an automated gene-page creator. Clicking on ‘Create new gene page’ in GONUTS takes the user to a form where the desired gene can be specified by any accession UniProt understands (UniProt id, EMBL id, PDB id, etc.) Submitting the form launches a script that creates a gene page for the protein of interest and preloads it with various types of information. At present we do not support user-created pages for genes encoding RNA products; this is planned for a later version. Figure 2 shows one of these user-created gene pages. The top section of the page is constructed using gene names, synonyms and accessions from UniProt (14,15). GO annotations from UniProt are automatically placed in a table in the lower section of the gene page. This table is editable using our TableEdit MediaWiki extension.
Figure 2.

A user-created GONUTS page for E. coli ihfB generated from its UniProt record.

A user-created GONUTS page for E. coli ihfB generated from its UniProt record. Community curation can involve providing information that supports or refutes existing annotations or entering new GO annotations. In the former case, contributing can be as simple as adding a NOT qualifier, a PubMed ID to a table row, or text to the comments field. When adding new annotations, we imagine that the user will have another copy of GONUTS, AmiGO, or their favorite ontology browser open in another web browser window to find appropriate GO terms. The user need only know the GO id (the unique seven digit number) when adding an annotation as the TableEdit extension will fill in the appropriate term name and ontology code. Pulldown menus allow the user to choose qualifiers and evidence codes, and a custom extension recognizes when with/from identifiers are needed. References added as PMID:number are automatically converted to links and endnotes, with links to a literature page where users can document how the data in the linked paper supports each relevant GO annotation. Each gene page is automatically added to the appropriate Category for each annotated GO term. Each gene page is also categorized by the taxonomy of the source organism using NCBI taxonomy. This allows searching for genes based on a GO term_id and a taxonomy node. The gene pages are categorized for all parent nodes in the taxonomy tree, so that searches can be done at levels above species. This should be useful for comparing, for example, zebrafish and mouse annotations. A set of checkboxes controls which gene products are displayed; users can select any subset based on the source organism.

GONUTS IS_EDITED WEB SERVICE

GONUTS contains thousands of pages with content seeded from other resources. Pages where community members have added or edited content are currently a small subset of these. GONUTS provides a REST web service to allow other sites to display whether or not a GO term or gene page includes contributions from the community. The service, which can be accessed at http://gowiki.tamu.edu/rest/is_edited.php, takes input parameters via HTTP GET that specify either a GO term identifier or a query string specifying a set of wiki pages. The service returns the number of times the page has been revised by a registered user since it was created and the number of times it was edited after an optional date parameter. Further documentation of the web service is at http://gowiki.tamu.edu/wiki/index.php/Help:Web_Services. This service allows other web resources to incorporate intelligent links to GONUTS that display whether or not preexisting community annotation is present at the linked page. For example, AmiGO uses the web service to indicate whether user comments are present for GO terms (16). By using appropriate SQL wildcards in the page query, a list of all user-edited pages relating to a particular organism can be returned.

ANNOTATION JAMBOREES WITH GONUTS

One of the strengths of wikis is the ability to easily adapt them to uses beyond their original design. In the Summer of 2008, GONUTS began hosting a series of electronic annotation jamborees for the RefGenome project (12) (see http://gowiki.tamu.edu/wiki/index.php/Electronic_Jamboree). The purpose of these Jamborees was to compare the annotation practices of different groups and to develop stronger guidelines for annotation best practices. Genes targeted for annotation from all species are added to the jamboree's page (a Mediawiki category page.) A custom extension aggregates all of the annotations into a table, and a web service from the Mouse Genome Informatics Database at Jackson Labs generates an image to compare annotations on a graphical representation of the ontologies (Figure 3) (17). Using these tools the annotators can see what others have done and compare annotations across both the phylogenetic tree and the Gene Ontology graph.
Figure 3.

Screenshots of a user-created annotation jamboree category page for human SMAD signaling proteins started as part of the cardiovascular GO annotation initiative. Genes of interest are placed in the jamboree category. An extension generates a sortable and filterable table (A) and diagrams showing the annotations made to genes in this category (B). The table is truncated in this figure and only one of the three diagrams is shown. The full page can be viewed at (http://gowiki.tamu.edu/wiki/index.php/Category:SMAD_signaling).

Screenshots of a user-created annotation jamboree category page for human SMAD signaling proteins started as part of the cardiovascular GO annotation initiative. Genes of interest are placed in the jamboree category. An extension generates a sortable and filterable table (A) and diagrams showing the annotations made to genes in this category (B). The table is truncated in this figure and only one of the three diagrams is shown. The full page can be viewed at (http://gowiki.tamu.edu/wiki/index.php/Category:SMAD_signaling).

USAGE OF GONUTS

Use of the GONUTS website as measured by Google Analytics has grown dramatically since its launch in 2007. GONUTS had approximately 4000 unique visitors in September of 2011 compared to just 1200 in September 2010. Although this is well below the usage of EcoliWiki, which had over 28 000 unique visitors during the same period, GONUTS is much more heavily edited than EcoliWiki. Since the two wikis launched, non-staff users account for only about 1500 page revisions in EcoliWiki. In contrast, non-staff GONUTS users have contributed almost 9700 revisions. In both cases, bot edits are excluded. Figure 4 shows a histogram showing editing activity by GONUTS users.
Figure 4.

Distribution of user contributions to GONUTS. Contributed revisions are grouped in bins of 10. Users with no edits are not shown.

Distribution of user contributions to GONUTS. Contributed revisions are grouped in bins of 10. Users with no edits are not shown.

DISCUSSION

Manually curated literature-based annotations using controlled vocabularies such as the Gene Ontology are the gold standard for providing information about gene function. However, the need for human curation of papers creates a serious bottleneck in annotation. GONUTS was created to support spreading the annotation effort across a larger community of biologists by providing both detailed documentation for GO terms and infrastructure for unofficial annotation and reannotation of any protein in UniProt (15). Although we ultimately envision GONUTS as a means for supporting broad community annotation, we have been pleased that GONUTS has turned out to be useful even for professional curators in the GO consortium who have access to a wide variety of alternative tools. One of the basic features of wikis is the flexibility with which users can not only modify content on existing pages, but also create and organize pages in ways that were not anticipated by the site owners. We were thus especially pleased when we found that users had created categories to organize focused annotation efforts. The collaborative nature of wikis is another fundamental part of GONUTS, and over the past few years this feature has led several other groups to create wiki-based systems for biology (18–21). The free text nature of the basic wiki software limits the usefulness of user-created content. The use of custom tables, interfaces and markup tools in GONUTS gently guides users to create content that can be more easily mined than basic wikitext. Our approach could potentially be merged with existing semantic mediawiki tools used by others to integrate with the larger semantic web. Even without these, however, we can already easily create web services and data dumps for usage notes and annotations. GONUTS also differs from most biology wikis in our extensive use of categories to reflect the DAG structure of the knowledge encoded in the Gene Ontology. The approach used in GONUTS could be easily adapted for other biological ontologies that share the obo file format and relationship types. Indeed, we have provided GONUTS loading code to the Sequence Ontology (SO) project to help them generate their wiki-based term browser (22).

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Methods.

FUNDING

EcoliWiki is funded as a component of PortEco from subcontracts from grant U24 GM077905-01 (2006–2009) and 1U24GM088849-01 (2009–present) from the National Institutes of Health/National Institutes of General Medical Sciences. Funding for open access charge: National Institutes of Health (1U24GM088849). Conflict of interest statement. None declared.
  18 in total

1.  Creating the gene ontology resource: design and implementation.

Authors: 
Journal:  Genome Res       Date:  2001-08       Impact factor: 9.043

2.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  A wiki for the life sciences where authorship matters.

Authors:  Robert Hoffmann
Journal:  Nat Genet       Date:  2008-09       Impact factor: 38.330

4.  Ongoing and future developments at the Universal Protein Resource.

Authors: 
Journal:  Nucleic Acids Res       Date:  2010-11-04       Impact factor: 16.971

5.  Entrez Gene: gene-centered information at NCBI.

Authors:  Donna Maglott; Jim Ostell; Kim D Pruitt; Tatiana Tatusova
Journal:  Nucleic Acids Res       Date:  2010-11-28       Impact factor: 16.971

6.  The Sequence Ontology: a tool for the unification of genome annotations.

Authors:  Karen Eilbeck; Suzanna E Lewis; Christopher J Mungall; Mark Yandell; Lincoln Stein; Richard Durbin; Michael Ashburner
Journal:  Genome Biol       Date:  2005-04-29       Impact factor: 13.583

7.  The Gene Ontology's Reference Genome Project: a unified framework for functional annotation across species.

Authors: 
Journal:  PLoS Comput Biol       Date:  2009-07-03       Impact factor: 4.475

8.  AmiGO: online access to ontology and annotation data.

Authors:  Seth Carbon; Amelia Ireland; Christopher J Mungall; ShengQiang Shu; Brad Marshall; Suzanna Lewis
Journal:  Bioinformatics       Date:  2008-11-25       Impact factor: 6.937

9.  The Universal Protein Resource (UniProt) 2009.

Authors: 
Journal:  Nucleic Acids Res       Date:  2008-10-04       Impact factor: 16.971

10.  The Gene Ontology project in 2008.

Authors: 
Journal:  Nucleic Acids Res       Date:  2007-11-04       Impact factor: 16.971

View more
  14 in total

Review 1.  Microbial virus genome annotation-mustering the troops to fight the sequence onslaught.

Authors:  J Rodney Brister; Phillippe Le Mercier; James C Hu
Journal:  Virology       Date:  2012-10-18       Impact factor: 3.616

2.  Making your database available through Wikipedia: the pros and cons.

Authors:  Robert D Finn; Paul P Gardner; Alex Bateman
Journal:  Nucleic Acids Res       Date:  2011-12-05       Impact factor: 16.971

3.  MorusDB: a resource for mulberry genomics and genome biology.

Authors:  Tian Li; Xiwu Qi; Qiwei Zeng; Zhonghuai Xiang; Ningjia He
Journal:  Database (Oxford)       Date:  2014-06-11       Impact factor: 3.451

4.  DIANA-miRPath v3.0: deciphering microRNA function with experimental support.

Authors:  Ioannis S Vlachos; Konstantinos Zagganas; Maria D Paraskevopoulou; Georgios Georgakilas; Dimitra Karagkouni; Thanasis Vergoulis; Theodore Dalamagas; Artemis G Hatzigeorgiou
Journal:  Nucleic Acids Res       Date:  2015-05-14       Impact factor: 16.971

5.  Activities at the Universal Protein Resource (UniProt).

Authors: 
Journal:  Nucleic Acids Res       Date:  2013-11-18       Impact factor: 16.971

6.  An ontology for microbial phenotypes.

Authors:  Marcus C Chibucos; Adrienne E Zweifel; Jonathan C Herrera; William Meza; Shabnam Eslamfam; Peter Uetz; Deborah A Siegele; James C Hu; Michelle G Giglio
Journal:  BMC Microbiol       Date:  2014-11-30       Impact factor: 3.605

7.  Community intelligence in knowledge curation: an application to managing scientific nomenclature.

Authors:  Lin Dai; Chao Xu; Ming Tian; Jian Sang; Dong Zou; Ang Li; Guocheng Liu; Fei Chen; Jiayan Wu; Jingfa Xiao; Xumin Wang; Jun Yu; Zhang Zhang
Journal:  PLoS One       Date:  2013-02-25       Impact factor: 3.240

8.  PortEco: a resource for exploring bacterial biology through high-throughput data and analysis tools.

Authors:  James C Hu; Gavin Sherlock; Deborah A Siegele; Suzanne A Aleksander; Catherine A Ball; Janos Demeter; Sushanth Gouni; Timothy A Holland; Peter D Karp; John E Lewis; Nathan M Liles; Brenley K McIntosh; Huaiyu Mi; Anushya Muruganujan; Farrell Wymore; Paul D Thomas; Tomer Altman
Journal:  Nucleic Acids Res       Date:  2013-11-26       Impact factor: 16.971

9.  From zebrafish heart jogging genes to mouse and human orthologs: using Gene Ontology to investigate mammalian heart development.

Authors:  Varsha K Khodiyar; Doug Howe; Philippa J Talmud; Ross Breckenridge; Ruth C Lovering
Journal:  F1000Res       Date:  2013-11-13

10.  RiceWiki: a wiki-based database for community curation of rice genes.

Authors:  Zhang Zhang; Jian Sang; Lina Ma; Gang Wu; Hao Wu; Dawei Huang; Dong Zou; Siqi Liu; Ang Li; Lili Hao; Ming Tian; Chao Xu; Xumin Wang; Jiayan Wu; Jingfa Xiao; Lin Dai; Ling-Ling Chen; Songnian Hu; Jun Yu
Journal:  Nucleic Acids Res       Date:  2013-10-16       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.