| Literature DB >> 23842463 |
Rama Balakrishnan1, Midori A Harris, Rachael Huntley, Kimberly Van Auken, J Michael Cherry.
Abstract
The Gene Ontology Consortium (GOC) is a community-based bioinformatics project that classifies gene product function through the use of structured controlled vocabularies. A fundamental application of the Gene Ontology (GO) is in the creation of gene product annotations, evidence-based associations between GO definitions and experimental or sequence-based analysis. Currently, the GOC disseminates 126 million annotations covering >374,000 species including all the kingdoms of life. This number includes two classes of GO annotations: those created manually by experienced biocurators reviewing the literature or by examination of biological data (1.1 million annotations covering 2226 species) and those generated computationally via automated methods. As manual annotations are often used to propagate functional predictions between related proteins within and between genomes, it is critical to provide accurate consistent manual annotations. Toward this goal, we present here the conventions defined by the GOC for the creation of manual annotation. This guide represents the best practices for manual annotation as established by the GOC project over the past 12 years. We hope this guide will encourage research communities to annotate gene products of their interest to enhance the corpus of GO annotations available to all. DATABASE URL: http://www.geneontology.org.Entities:
Mesh:
Year: 2013 PMID: 23842463 PMCID: PMC3706743 DOI: 10.1093/database/bat054
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
This table lists a small subset of the GO terms used as examples in the main text of the article
| GOID | Aspect | GO term name | Definition |
|---|---|---|---|
| GO:0003674 | Molecular function | Molecular_function | Elemental activities, such as catalysis or binding, describing the actions of a gene product at the molecular level. A given gene product may exhibit one or more molecular functions. |
| GO:0008150 | Biological process | Biological_process | Any process specifically pertinent to the functioning of integrated living units: cells, tissues, organs and organisms. A process is a collection of molecular events with a defined beginning and end. |
| GO:0005575 | Cellular component | Cellular_component | The part of a cell or its extracellular environment in which a gene product is located. A gene product may be located in one or more parts of a cell and its location may be as specific as a particular macromolecular complex, that is, a stable, persistent association of macromolecules that function together. |
| GO:0004672 | Molecular function | Protein kinase activity | Catalysis of the phosphorylation of an amino acid residue in a protein, usually according to the reaction: a protein + ATP = a phosphoprotein + ADP. |
| GO:0022858 | Molecular function | Alanine transmembrane transporter activity | Catalysis of the transfer of alanine from one side of a membrane to the other. Alanine is 2-aminopropanoic acid. |
| GO:0003872 | Molecular function | 6-phosphofructokinase activity | Catalysis of the reaction: ATP + D-fructose-6-phosphate = ADP + D-fructose 1,6-bisphosphate. |
| GO:0000981 | Molecular function | Sequence-specific DNA binding RNA polymerase II transcription factor activity | Interacting selectively and noncovalently with a specific DNA sequence in order to modulate transcription by RNA polymerase II. The transcription factor may or may not also interact selectively with a protein or macromolecular complex. |
| GO:0000988 | Molecular function | Protein binding transcription factor activity | Interacting selectively and noncovalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules), in order to modulate transcription. A protein binding transcription factor may or may not also interact with the template nucleic acid (either DNA or RNA) as well. |
| GO:0043565 | Molecular function | Sequence-specific DNA binding | Interacting selectively and noncovalently with DNA of a specific nucleotide composition, e.g. GC-rich DNA binding, or with a specific sequence motif or type of DNA e.g. promotor binding or rDNA binding. |
| GO:0004674 | Molecular function | Protein serine/threonine kinase activity | Catalysis of the reactions: ATP + protein serine = ADP + protein serine phosphate, and ATP + protein threonine = ADP + protein threonine phosphate. |
| GO:0004712 | Molecular function | Protein serine/threonine/tyrosine kinase activity | Catalysis of the reactions: ATP + a protein serine = ADP + protein serine phosphate; ATP + a protein threonine = ADP + protein threonine phosphate; and ATP + a protein tyrosine = ADP + protein tyrosine phosphate. |
| GO:0005515 | Molecular function | Protein binding | Interacting selectively and noncovalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). |
| GO:0042393 | Molecular function | Histone binding | Interacting selectively and noncovalently with a histone, any of a group of water-soluble proteins found in association with the DNA of plant and animal chromosomes. They are involved in the condensation and coiling of chromosomes during cell division and have also been implicated in nonspecific suppression of gene activity. |
| GO: 0016887 | Molecular function | ATPase activity | Catalysis of the reaction: ATP + H2O = ADP + phosphate + 2 H+. May or may not be coupled to another reaction. |
| GO:0008121 | Molecular function | Ubiquinol-cytochrome-c reductase activity | Catalysis of the transfer of a solute or solutes from one side of a membrane to the other according to the reaction: CoQH2 + 2 ferricytochrome c = CoQ + 2 ferrocytochrome c + 2 H+. |
| GO:0004463 | Molecular function | Leukotriene-A4 hydrolase activity | Catalysis of the reaction: H( |
| GO:0008152 | Biological process | Metabolic process | The chemical reactions and pathways, including anabolism and catabolism, by which living organisms transform chemical substances. Metabolic processes typically transform small molecules, but also include macromolecular processes such as DNA repair and replication, and protein synthesis and degradation. |
| GO:0023052 | Biological process | Signaling | The entirety of a process in which information is transmitted within a biological system. This process begins with an active signal and ends when a cellular response has been triggered. |
| GO:0016265 | Biological process | Death | A permanent cessation of all vital functions: the end of life; can be applied to a whole organism or to a part of an organism. |
| GO:0008219 | Biological process | Cell death | Any biological process that results in permanent cessation of all vital functions of a cell. A cell should be considered dead when any one of the following molecular or morphological criteria is met: (i) the cell has lost the integrity of its plasma membrane; (ii) the cell, including its nucleus, has undergone complete fragmentation into discrete bodies (frequently referred to as ‘apoptotic bodies’); and/or (iii) its corpse (or its fragments) have been engulfed by an adjacent cell |
| GO:0006915 | Biological process | Apoptotic process | A programmed cell death process which begins when a cell receives an internal (e.g. DNA damage) or external signal (e.g. an extracellular death ligand), and proceeds through a series of biochemical events (signaling pathways) which typically lead to rounding-up of the cell, retraction of pseudopodes, reduction of cellular volume (pyknosis), chromatin condensation, nuclear fragmentation (karyorrhexis), plasma membrane blebbing and fragmentation of the cell into apoptotic bodies. The process ends when the cell has died. The process is divided into a signaling pathway phase and into an execution phase, which is triggered by the former. |
| GO:0030263 | Biological process | Apoptotic chromosome condensation | The compaction of chromatin during apoptosis. |
| GO:0032329 | Biological process | Serine transport | The directed movement of |
| GO:0015826 | Biological process | Threonine transport | The directed movement of threonine, (2R*,3S*)-2-amino-3-hydroxybutanoic acid, into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. |
| GO:0032328 | Biological process | Alanine transport | The directed movement of alanine, 2-aminopropanoic acid, into, out of or within a cell, or between cells, by means of some agent such as a transporter or pore. |
| GO:00034605 | Biological process | Cellular response to heat | Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a heat stimulus, a temperature stimulus above the optimal temperature for that organism. |
| GO:0071470 | Biological process | Cellular response to osmotic stress | Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus indicating an increase or decrease in the concentration of solutes outside the organism or cell. |
| GO:0034599 | Biological process | Cellular response to oxidative stress | Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of oxidative stress, a state often resulting from exposure to high levels of reactive oxygen species, e.g. superoxide anions, hydrogen peroxide (H2O2), and hydroxyl radicals. |
| GO:0033554 | Biological process | Cellular response to stress | Any process that results in a change in state or activity of a cell (in terms of movement, secretion, enzyme production, gene expression, etc.) as a result of a stimulus indicating the organism is under stress. The stress is usually, but not necessarily, exogenous (e.g. temperature, humidity, ionizing radiation). |
| GO:0006351 | Biological process | Transcription, DNA dependent | The cellular synthesis of RNA on a template of DNA. |
| GO:0006357 | Biological process | Regulation of transcription from RNA polymerase II promoter | Any process that modulates the frequency, rate or extent of transcription from an RNA polymerase II promoter. |
| GO:0007067 | Biological process | Mitosis | A cell cycle process comprising the steps by which the nucleus of a eukaryotic cell divides; the process involves condensation of chromosomal DNA into a highly compacted form. Canonically, mitosis produces two daughter nuclei whose chromosome complement is identical to that of the mother cell. |
| GO:0000084 | Biological process | S phase of mitotic cell cycle | S phase occurring as part of the mitotic cell cycle. S phase is the part of the cell cycle during which DNA synthesis takes place. A mitotic cell cycle is one which canonically comprises four successive phases called G1, S, G2 and M and includes replication of the genome and the subsequent segregation of chromosomes into daughter cells. |
| GO:0031028 | Biological process | Septation initiation signaling cascade | The series of molecular signals, mediated by the small GTPase Ras, that results in the initiation of contraction of the contractile ring, at the beginning of cytokinesis and cell division by septum formation. The pathway coordinates chromosome segregation with mitotic exit and cytokinesis. |
| GO:0051321 | Biological process | Meiotic cell cycle | Progression through the phases of the meiotic cell cycle, in which canonically a cell replicates to produce four offspring with half the chromosomal content of the progenitor cell. |
| GO:0005634 | Cellular component | Nucleus | A membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. In most cells, the nucleus contains all of the cell’s chromosomes except the organellar chromosomes, and is the site of RNA synthesis and processing. In some species, or in specialized cell types, RNA metabolism or DNA replication may be absent. |
| GO:0005681 | Cellular component | Splicesosomal complex | Any of a series of ribonucleoprotein complexes that contain RNA and small nuclear ribonucleoproteins (snRNPs), and are formed sequentially during the splicing of a messenger RNA primary transcript to excise an intron. |
| GO:0044428, | Cellular component | Nuclear part | Any constituent part of the nucleus, a membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. |
| GO:0005826 | Cellular component | Actomyosin contractile ring | A cytoskeletal structure composed of actin filaments and myosin that forms beneath the plasma membrane of many cells, including animal cells and yeast cells, in a plane perpendicular to the axis of the spindle, i.e. the cell division plane. Ring contraction is associated with centripetal growth of the membrane that divides the cytoplasm of the two daughter cells. In animal cells, the contractile ring is located inside the plasma membrane at the location of the cleavage furrow. In budding fungal cells, e.g. mitotic |
| GO:0005750 | Cellular component | Mitochondrial respiratory chain complex III | A protein complex located in the mitochondrial inner membrane that forms part of the mitochondrial respiratory chain. Contains about 10 polypeptide subunits including four redox centers: cytochrome b/b6, cytochrome c1 and an 2Fe-2S cluster. Catalyzes the oxidation of ubiquinol by oxidized cytochrome c1. |
The GO Term is a short phrase that is typically used to represent the individual components of the ontologies, while the definitions provide the precise meaning of the GO terms. As emphasized in the text, it is important to create annotations to the definition and not to the GO Term. Curators should explore the ontologies using AmiGO (17, http://amigo.geneontology.org) or QuickGO (18, http://www.ebi.ac.uk/QuickGO/) to identify appropriate terms for annotation. More information on the structure of the Gene Ontology and how it is developed is available online from http://www.geneontology.org.
Figure 1.GO Term ‘leukotriene-A4 hydrolase activity’ [GO:0004463], one of the terms mentioned in the main text of the article, as seen in AmiGO (16, http://amigo.geneontology.org). (a) Graphical view of the ontology structure showing the most granular term ‘leukotriene-A4 hydrolase activity’ [GO:0004463] at the bottom (highlighted in red), and all its parent terms leading up to the root node (‘molecular_function’ [GO:0003674]) at the top. Each box representing a GO term includes the GO identifier, and the blue line connecting the terms represent the ontological relationship ‘is_a’ (implying that a child term is a subtype of the parent term). (b) Alternate text display for viewing the ontology structure. ‘leukotriene-A4 hydrolase activity’ [GO:0004463] is highlighted in red. Each child term is indented from its parent to indicate the depth of the tree. Apart from the GOID and GO term, each row includes other pieces of information that are important to understand the ontology and the annotations to each term. Starting from the left end of the row, the + sign indicates that there are child terms for that node and clicking on the + sign opens the browser to display the child terms. Next the small icon ‘i’ indicates the term is related to its parent by an is–a relationship (explained above). At the right end of the row in brackets is the total number of gene products annotated to that term and all its child terms. (c) Term information relevant to making an annotation is highlighted in red, which includes the GOID, Aspect of the ontology (Molecular Function), Synonyms and Definition of the term.
Evidence code categories, evidence code names and definitions
| Category | Evidence code | Types of evidence | Supporting data |
|---|---|---|---|
| Experimental | Inferred from Experiment (EXP) | Any experimental assay | |
| Inferred from Direct Assay (IDA) | (i) Enzyme assays | ||
| (ii) | |||
| (iii) Immunofluorescence (for cellular component) | |||
| (iv) Cell fractionation (for cellular component) | |||
| (v) Physical interaction/binding assay (sometimes appropriate for cellular component or molecular function) | |||
| Inferred from Physical Interaction (IPI) | (i) Two-hybrid interactions | With column should be filled with Identifier of the interacting protein | |
| (ii) Co-purification | |||
| (iii) Co-immunoprecipitation | |||
| (iv) Ion/protein binding experiments | |||
| Inferred from Mutant Phenotype (IMP) | (i) Mutations, natural or introduced, that result in partial or complete impairment or alteration of the function of that gene | ||
| (ii) Polymorphism or allelic variation (including where no allele is designated wild-type or mutant) | |||
| (iii) Any procedure that disturbs the expression or function of the gene, including RNAi, anti-sense RNAs, antibody depletion, or the use of any molecule or experimental condition that may disturb or affect the normal functioning of the gene, including: inhibitors, blockers, modifiers, any type of antagonists, temperature jumps, changes in pH or ionic strength | |||
| (iv) Overexpression or ectopic expression of wild-type or mutant gene | |||
| Inferred from Genetic Interaction (IGI) | (i) ‘Traditional’ genetic interactions such as suppressors, synthetic lethals, etc. | With column should be filled with identifier of the interacting gene | |
| (ii) Functional complementation | |||
| (iii) Rescue experiments | |||
| (iv) Inference about one gene drawn from the phenotype of a mutation in a different gene | |||
| Inferred from Expression Pattern (IEP) | (i) Transcript levels or timing (e.g. Northerns, microarray data) | ||
| (ii) Protein levels (e.g. Western blots) | |||
| Computational Analysis | Inferred by Sequence Similarity (ISS) | Sequence, structural similarity-based analysis | With column should be filled with identifier of the similar gene/protein |
| Inferred by Sequence Alignment (ISA) | Pairwise or multiple alignment | With column should be filled with identifier of the similar gene/protein | |
| Inferred by Sequence Orthology (ISO) | Assertion of orthology between the gene product and a gene product in another organism | With column should be filled with identifier of the similar gene/protein | |
| Inferred from Sequence Model (ISM) | Prediction methods for noncoding RNA genes such as tRNASCAN-SE, Snoscan, and Rfam | ||
| Inferred from genomic context (IGC) | Information about the genomic context of a gene product forms part of the evidence for a particular annotation. | ||
| (i) operon structure | |||
| (ii) syntenic regions | |||
| (iii) pathway analysis | |||
| (iv) genome scale analysis of processes | |||
| Inferred from Key Residues (IKR) | Sequence analysis, where lack of key sequence residues is used to make a negative (NOT) annotation | Should include NOT qualifier and With column is required if analysis is carried out by a curator and GO_reference:0000047 | |
| Author | Traceable Author Statement (TAS) | Original evidence is referenced in the article and therefore can be traced to another source | |
| Non-traceable Author Statement (NAS) | Statements in articles that cannot be traced to another article or experiment | ||
| Curatorial | No Data (ND) | Used for annotations when information about the molecular function, biological process, or cellular component of the gene or gene product being annotated is not available | Should be used only with root nodes |
| Inferred by Curator (IC) | Annotation reasonably inferred by a curator from other GO annotations, for which direct evidence is available | GO ID should be filled in the from column |
The evidence codes are used to represent the method or type of results used to define the annotation. Annotators should use this table as a quick reference guide and consult the detailed documentation available online (http://www.geneontology.org/GO.evidence.shtml) for specific details (including the do’s and don’ts) on how the method or results are matched to each evidence code.
A sample annotation in the GAF 2.0 format
| Column | Content | Required? | Example |
|---|---|---|---|
| 1 | DB | Required | MGI |
| 2 | DB Object ID | Required | MGI:1350922 |
| 3 | DB Object Symbol | Required | Cadps |
| 4 | Qualifier | Optional | NOT |
| 5 | GO ID | Required | GO:0006887 |
| 6 | DB:Reference (|DB:Reference) | Required | MGI:MGI:3583730|PMID:15820695 |
| 7 | Evidence Code | Required | IMP |
| 8 | With (or) From | Optional | MGI:MGI:3583931 |
| 9 | Aspect | Required | P |
| 10 | DB Object Name | Optional | Ca2+-dependent secretion activator |
| 11 | DB Object Synonym (|Synonym) | Optional | CAPS1 |
| 12 | DB Object Type | Required | Protein |
| 13 | Taxon(|taxon) | Required | Taxon:10090 |
| 14 | Date | Required | 20060202 |
| 15 | Assigned By | Required | MGI |
| 16 | Annotation Extension | Optional | Occurs_in(CL:0000001)|occurs_in(CL:0000336) |
| 17 | Gene Product Form ID | Optional | UniProtKB:Q80TJ1 |
This table provides an example of an annotation from the Mouse Genome Informatics group (from February 2013). The Cadps protein (MGI identifier MGI:1350922) was annotated by the MGI project to ‘exocytosis’ [GO:0006887], a term in the Biological Process ontology indicated by ‘P’ in column 9. This annotation used the ‘NOT’ qualifier indicating the authors of PMID:15820695 (5) showed that this protein is ‘NOT’ involved in ‘exocytosis’. The non-PMID reference number, MGI:MGI:3583730, is MGI’s internal identifier for the same reference. The curators arrived at this annotation based on the phenotype of the Cadps mutant, which is indicated with the IMP evidence code. The identifier of the allele (MGI:MGI:3583931) used in the experiment is captured in column 8 (WITH/FORM). In addition, the annotation extension field (column 16) indicates the cell types where this protein (CL:0000001, primary cell culture or CL:0000336, adrenal medulla chromaffin cell) was NOT found to be involved in this process (exocytosis). Finally, the last column represents the UniProtKB identifier for the isoform of the mouse Cadps protein that was studied.
Figure 2.GO Evidence code decision tree describing the process of choosing an evidence code. This flow chart is meant to orient the biocurator on the different categories of evidence codes and does not include the complete definitions of the evidence codes (Table 2). This chart will aid the biocurator to evaluate the reported method or results and map them to an appropriate evidence code; the biocurator should consult the detailed evidence code documentation available online from http://www.geneontology.org/GO.evidence.shtml.