| Literature DB >> 26141960 |
Harold J Drabkin1, Karen R Christie2, Mary E Dolan2, David P Hill2, Li Ni2, Dmitry Sitnikov2, Judith A Blake2.
Abstract
The Gene Ontology (GO) is an important component of modern biological knowledge representation with great utility for computational analysis of genomic and genetic data. The Gene Ontology Consortium (GOC) consists of a large team of contributors including curation teams from most model organism database groups as well as curation teams focused on representation of data relevant to specific human diseases. Key to the generation of consistent and comprehensive annotations is the development and use of shared standards and measures of curation quality. The GOC engages all contributors to work to a defined standard of curation that is presented here in the context of annotation of genes in the laboratory mouse. Comprehensive understanding of the origin, epistemology, and coverage of GO annotations is essential for most effective use of GO resources. Here the application of comparative approaches to capturing functional data in the mouse system is described.Entities:
Mesh:
Year: 2015 PMID: 26141960 PMCID: PMC4602061 DOI: 10.1007/s00335-015-9580-0
Source DB: PubMed Journal: Mamm Genome ISSN: 0938-8990 Impact factor: 2.957
Sequence-based evidence codes
| Inferred from sequence or structural similarity (ISS) |
| Inferred from sequence orthology (ISO) |
| Inferred from sequence alignment (ISA) |
| Inferred from sequence model (ISM) |
| Inferred from genomic context (IGC) |
| Inferred from biological aspect of ancestor (IBA) |
| Inferred from biological aspect of descendant (IBD) |
| Inferred from key residues (IKR) |
| Inferred from rapid divergence (IRD) |
A complete list of all evidence codes used by GO can be found at http://geneontology.org/page/guide-go-evidence-codes
Fig. 1Importing mouse annotations from rat or human genes based on orthology to mouse genes. Each specific load is assigned a specific MGD reference. Since the evidence code is assertion by orthology as determined by MGD, the provider of the annotations is MGD. Annotations are obtained from the designated authorities for GO annotation for human (GOA) or rat (RGD) genes
Summary of GO annotations in MGD from literature curation, orthology or electronic pipelines
| Annotation method | Total | Genes annotated only by orthology, phylogenetic, or electronic method | ||
|---|---|---|---|---|
| # Genes | # Annotations | # Genes | # Annotations | |
| Manual curation of experimental literature | 11,123 | 98,944 | NA | NA |
| Orthology transfer methods | 11,728a | 98,987 | 3728 | 18,012 |
| Transferred from human (GOA) via orthology | 10,515 | 65,988 | 3379 | 14,104 |
| Transferred from rat (RGD) via orthology | 4631 | 29,861 | 816 | 3271 |
| Curated by MGI curators | 1322 | 3138 | 268 | 637 |
| Phylogenetic methods | ||||
| PAINT | 4356 | 19,703 | 2285 | 10,841 |
| Electronic pipelines (IEA) | 14,653b | 98,980 | 5308 | 35,276 |
| Enzyme Commission (EC) | 1690 | 18,549 | 692 | 8848 |
| Swiss-Prot keywords | 14,270 | 55,754 | 5107 | 18,369 |
| InterPro | 9970 | 24,677 | 3346 | 8060 |
| All annotation methods | 24,179 | 357,251 | 7219 | 64,129 |
Numbers are as of May 5, 2015
aGenes can be annotated by multiple orthology methods, so this represents total number of genes annotated by any orthology method
bGenes can be annotated by multiple electronic pipelines, so this represents total number of genes annotated by any of them
Fig. 2Exporting mouse annotations to non-mouse genes based on orthology. The orthologous non-mouse gene becomes the gene that is annotated by an experimental method described in the publication. The bottom two panels depict the non-mouse annotation at either the GOC site (Amigo browser) or GOA (QuickGO)
Twelve model organisms (MODs) used for GO annotations backed by experimental evidence
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Fig. 3The PAINT tool overlays experimental GO annotations onto externally constructed Panther phylogenetic trees and allows curators to remove any inappropriate or misplaced sequences before propagating annotations. When needed, new annotations can be made which will be included in PAINT once they have been added to the GO Consortium annotation database. The curator can then determine which annotations represent ancestral functions which should be propagated to an ancestral sequence node. PAINT automatically propagates GO terms from the ancestor node to all descendant sequences that are not already annotated to that term experimentally, except where the curator blocks propagation due to divergence in function. The annotations are exported from PAINT and incorporated into the GO Consortium annotation database
Fig. 4Complex query for mouse genes located on chromosome 3 that are annotated to protein tyrosine kinase activity and are associated with diabetes
Fig. 5Complex GXD query for mouse genes annotated to protein tyrosine kinase activity and are expressed in Tyler Stages 17–19 metanephric mesenchyme
Tools available at MGD for GO analysis
| Tool | Use | Comments | URL |
|---|---|---|---|
| GO Term Mapper | A tool for analyzing a mouse gene set based on mouse annotations using a method based on the GO Term Finder (Boyle et al. | Can exclude IEA annotations if desired |
|
| GO Slim Chart Tool: | A tool for categorizing a gene set according to a set of high-level GO terms, a ‘GO slim’ | Can exclude IEA annotations if desired |
|
| Vlad | A GO Term Finder type tool with a graphical output | Can select annotation set (MGI GO, or user supplied). Can supply reference set and filter on several evidence codes. Output can be graphical, or tabular |
|
| MouseMine | An InterMine tool (Kalderimis et al. | Can use the premade template queries in the FUNCTION section to access GO data in a variety of ways. Results can be further filtered to increase specificity of the query |
|