| Literature DB >> 20973947 |
Jennifer I Deegan née Clark1, Emily C Dimmer, Christopher J Mungall.
Abstract
BACKGROUND: The Gene Ontology project supports categorization of gene products according to their location of action, the molecular functions that they carry out, and the processes that they are involved in. Although the ontologies are intentionally developed to be taxon neutral, and to cover all species, there are inherent taxon specificities in some branches. For example, the process 'lactation' is specific to mammals and the location 'mitochondrion' is specific to eukaryotes. The lack of an explicit formalization of these constraints can lead to errors and inconsistencies in automated and manual annotation.Entities:
Mesh:
Year: 2010 PMID: 20973947 PMCID: PMC3098089 DOI: 10.1186/1471-2105-11-530
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Lactation. The GO class 'lactation' is restricted for use with gene products from species of the taxonomic grouping Mammalia. The class inherits this restriction from the superclass 'mammary gland development'. In this figure, the GO classes are shown in blue, and the taxonomic classes are shown in yellow. The relationship types are labeled in the diagram.
Figure 2C4 photosynthesis. The GO class 'C4 photosynthesis' is restricted for use with gene products from species of the taxonomic grouping Viridiplantae. This is a narrower taxonomic group than that to which the GO superclass 'photosynthesis' is restricted. The GO class 'photosynthesis' is restricted for use with gene products from any sub-type of the Viridiplantae, Euglenozoa, Archaea or Bacteria. The relationship between 'photosynthesis' and these four taxonomic groups is shown by the relationship only_in_taxon from 'photosynthesis' to the union term 'Viridiplantae or Euglenozoa or Archaea or Bacteria', and by the relationships between this union term and the four individual taxonomic groups. These latter relationships are shown as union_of relationships (marked 'un'). In this figure, the GO classes are shown in blue, and the taxonomic classes are shown in yellow. The relationship types are labeled in the diagram.
Numbers of annotation inconsistencies found, classified by evidence code.
| Evidence code type | Evidence code | Annotation errors | Total annotations | Percentage error rate |
|---|---|---|---|---|
| Experimental | EXP | 977 | 5360 | 18.23 |
| IDA | 12 | 105764 | 0.01 | |
| IMP | 84 | 88283 | 0.10 | |
| IEP | 0 | 10129 | 0.00 | |
| IPI | 0 | 29877 | 0.00 | |
| IGI | 0 | 12914 | 0.00 | |
| Computational Analysis | ISS | 85 | 228605 | 0.04 |
| ISO | 1 | 2975 | 0.03 | |
| ISA | 0 | 5921 | 0.00 | |
| ISM | 0 | 143 | 0.00 | |
| IGC | 0 | 483 | 0.00 | |
| RCA | 3 | 75175 | 0.00 | |
| Author statement | TAS | 4070 | 46888 | 8.68 |
| NAS | 3 | 23578 | 0.01 | |
| Curator statement | IC | 0 | 5682 | 0.00 |
| ND | 0 | 171817 | 0.00 | |
| Automatically Assigned | IEA | 639 | 844441 | 0.08 |
A large number of inconsistencies have been found, and the problems corrected. The number of inconsistencies in each evidence code group are shown here, both as an absolute number, and as a percentage of the total annotations with that code. Note that the high rate of EXP annotation flags are due to Reactome virus annotations (when this is corrected for, the EXP error rate drops to nearly zero). For interpretation of evidence codes see http://www.geneontology.org/GO.evidence.shtml
Numbers of annotation inconsistencies found by certain rules.
| Constraint | Errors detected | Evidence class |
|---|---|---|
| GO:0030879 | 19 | IEA |
| 'mammary gland development' | ||
| NCBITaxon:40674 | ||
| 'Mammalia' | ||
| GO:0012511 | 21 | IEA |
| 'monolayer-surrounded lipid storage body' | ||
| NCBITaxon:33090 | ||
| 'Viridiplantae' | ||
| GO:0001701 | 51 | IEA |
| 'in utero embryonic development' | ||
| NCBITaxon:32525 | ||
| 'Theria' | ||
| GO:0001541 | 10 | IEA |
| 'ovarian follicle development' | ||
| NCBITaxon:40674 | ||
| 'Mammalia' | ||
| GO:0051300 | 13 | Mixture of ISO, ISS, IEA and IMP. |
| 'spindle pole body organization' | ||
| NCBITaxon:4751 'Fungi' | ||
| GO:0015979 | 9 | IEA |
| 'photosynthesis' | ||
| NCBITaxon_Union:0000021 | ||
| 'Viridiplantae or Bacteria or Euglenozoa or Archaea' | ||
| GO:0015995 | 9 | IEA |
| 'chlorophyll biosynthetic process' | ||
| NCBITaxon_Union:0000007 | ||
| 'Viridiplantae or Bacteria or Euglenozoa' | ||
A large number of inconsistencies have been found and various repairs made. This table gives a summary of the numbers of annotation errors found using a selection of the rules that we have implemented. For interpretation of evidence codes see http://www.geneontology.org/GO.evidence.shtml
Numbers of annotation inconsistencies found, classified by ontology.
| Ontology | Annotation errors | Total number of annotations | Percentage errors |
|---|---|---|---|
| Biological process | 237 | 568306 | 0.04 |
| Molecular function | 35 | 627858 | 0.01 |
| Cellular component | 5602 | 461910 | 1.12 |
A large number of inconsistencies have been found, and the problems corrected. The number of inconsistencies in each ontology are shown here, both as an absolute number, and as a percentage of the total annotations to that ontology.