| Literature DB >> 21342572 |
Yu Rang Park1, Jihun Kim, Hye Won Lee, Young Jo Yoon, Ju Han Kim.
Abstract
BACKGROUND: The Gene Ontology (GO) provides a controlled vocabulary for describing genes and gene products. In spite of the undoubted importance of GO, several drawbacks associated with GO and GO-based annotations have been introduced. We identified three types of semantic inconsistencies in GO-based annotations; semantically redundant, biological-domain inconsistent and taxonomy inconsistent annotations.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21342572 PMCID: PMC3044297 DOI: 10.1186/1471-2105-12-S1-S40
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Redundant annotations in biological databases
| Databases | No. of gene products annotated with GO terms | No. of GO annotations applied to gene products | No. of GO terms used in gene product annotations | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Redundant anntation | Redundant annotation (same evidence code) | Total gene products | Redundant anntation | Redundant annotation (same evidence code) | Total GO annotations | Redundant anntation | Redundant annotation (same evidence code) | Total GO terms | |||
| Ensembla | 09/01/09 | 01/01/10 | 307,467 | 299,911 | 673,180 | 783,687 | 707,335 | 4,395,125 | 2,978 | 2,551 | 12,309 |
| Geneb | 12/15/09 | 01/01/10 | 88,797 | 73,001 | 235,852 | 223,772 | 143,537 | 1,234,220 | 3,369 | 2,632 | 15,363 |
| AspGDc | 12/21/09 | 01/01/10 | 523 | 225 | 3,425 | 640 | 259 | 15,340 | 239 | 107 | 3,259 |
| CGD | 11/24/09 | 01/01/10 | 474 | 229 | 4,040 | 772 | 309 | 20,009 | 254 | 104 | 3,332 |
| dictyBase | 12/27/09 | 01/01/10 | 2,590 | 1,619 | 7,489 | 4,651 | 2,377 | 31,064 | 368 | 241 | 2,403 |
| EcoCyc | 12/14/09 | 01/01/10 | 173 | 155 | 1,869 | 273 | 219 | 4,992 | 132 | 111 | 1,388 |
| FB | 10/30/09 | 01/01/10 | 3,355 | 1,823 | 12,509 | 7,301 | 2,740 | 68,316 | 1,077 | 656 | 4,924 |
| GeneDB_Pfalciparum | 10/27/05 | 01/01/10 | 21 | 16 | 2,206 | 21 | 16 | 4,632 | 17 | 15 | 663 |
| GeneDB_Spombe | 09/28/09 | 01/01/10 | 2,797 | 1,330 | 5,213 | 4,009 | 1,662 | 34,114 | 495 | 297 | 3,394 |
| GeneDB_Tbrucei | 07/18/07 | 01/01/10 | 234 | 191 | 2,977 | 251 | 202 | 10,414 | 61 | 52 | 935 |
| GR_protein | 08/26/09 | 01/01/10 | 426 | 369 | 41,321 | 552 | 445 | 49,721 | 90 | 77 | 646 |
| JCVI_CMR | 07/22/09 | 01/01/10 | 446 | 412 | 21,271 | 455 | 417 | 54,398 | 90 | 83 | 2,350 |
| MGI | 12/17/09 | 01/01/10 | 14,927 | 13,466 | 18,167 | 50,970 | 33,966 | 151,652 | 1,564 | 1,214 | 7,327 |
| NCBI | 03/03/08 | 01/01/10 | 324 | 187 | 11,274 | 457 | 319 | 27,647 | 66 | 63 | 492 |
| PDB | 12/17/09 | 01/01/10 | 10,234 | 10,234 | 21,849 | 18,263 | 18,263 | 83,588 | 283 | 283 | 1,884 |
| PseudoCAP | 06/28/06 | 01/01/10 | 584 | 244 | 1,519 | 720 | 275 | 7,284 | 54 | 30 | 859 |
| RefSeq | 12/14/09 | 01/01/10 | 1,945 | 1,945 | 12,166 | 2,748 | 2,748 | 36,201 | 125 | 125 | 1,440 |
| RGD | 10/02/09 | 01/01/10 | 9,932 | 8,008 | 17,352 | 29,961 | 15,120 | 180,606 | 1,893 | 1,341 | 9,094 |
| SGD | 12/25/09 | 01/01/10 | 5,273 | 4,482 | 6,353 | 23,815 | 11,575 | 76,188 | 1,118 | 766 | 4,222 |
| SGN | 10/23/09 | 01/01/10 | 12 | 9 | 155 | 12 | 9 | 1,253 | 10 | 8 | 653 |
| TAIR | 12/23/09 | 01/01/10 | 8,102 | 6,871 | 51,713 | 10,615 | 8,656 | 149,466 | 646 | 473 | 4,103 |
| TIGR_CMR | 11/14/07 | 01/01/10 | 757 | 731 | 40,653 | 782 | 756 | 101,965 | 95 | 92 | 2,441 |
| UniProt | 12/17/09 | 01/01/10 | 206 | 41 | 1,290 | 381 | 67 | 9,381 | 11 | 9 | 173 |
| UniProtKB/Swiss-Prot | 12/17/09 | 01/01/10 | 384,061 | 380,296 | 419,241 | 1,303,909 | 1,279,924 | 3,416,194 | 1,709 | 1,514 | 11,507 |
| UniProtKB/TrEMBL | 12/17/09 | 01/01/10 | 3,615,614 | 3,615,469 | 5,981,451 | 9,116,513 | 9,115,708 | 28,760,356 | 1,420 | 1,402 | 9,262 |
| WB | 11/26/09 | 01/01/10 | 5,252 | 5,041 | 15,667 | 9,904 | 8,926 | 91,611 | 497 | 381 | 2,738 |
| ZFIN | 12/23/09 | 01/01/10 | 7,047 | 6,856 | 15,074 | 17,683 | 16,454 | 101,152 | 603 | 509 | 3,019 |
a http://www.ensembl.org/index.html
b http://www.ncbi.nlm.nih.gov/gene
c http://www.geneontology.org/GO.downloads.annotations.shtml
Biological-domain-inconsistent annotations in biological databases
| Biological Domain | Databases | No. of gene products annotated with GO terms | No. of GO annotations applied to gene products | No. of GO terms used in gene product annotations | |||||
|---|---|---|---|---|---|---|---|---|---|
| Biological-domain inconsistent | Total gene product | Biological-domain inconsistent | Total GO annotation | Biological-domain inconsistent | Total GO terms | ||||
| Non-prokaryotic gene product | Ensembla | 09/01/09 | 01/01/10 | 711 | 1,891,586 | 760 | 4,395,125 | 13 | 12,309 |
| Geneb | 12/15/09 | 01/01/10 | 1,517 | 2,391,443 | 1,647 | 1,133,060 | 34 | 14,762 | |
| AspGDc | 12/21/09 | 01/01/10 | 2 | 3,425 | 2 | 15,340 | 1 | 3,259 | |
| dictyBase | 12/27/09 | 01/01/10 | 1 | 7,489 | 1 | 31,064 | 1 | 2,403 | |
| FB | 10/30/09 | 01/01/10 | 1 | 12,509 | 1 | 68,316 | 1 | 4,924 | |
| GeneDB_Tbrucei | 07/18/07 | 01/01/10 | 2 | 2,977 | 2 | 10,414 | 1 | 935 | |
| MGI | 12/17/09 | 01/01/10 | 2 | 18,167 | 2 | 151,652 | 2 | 7,327 | |
| PDB | 12/17/09 | 01/01/10 | 26 | 9,170 | 26 | 31,686 | 3 | 1,024 | |
| RGD | 10/02/09 | 01/01/10 | 2 | 18,363 | 4 | 180,606 | 3 | 9,094 | |
| TAIR | 12/23/09 | 01/01/10 | 3 | 51,713 | 3 | 149,466 | 1 | 4,103 | |
| UniProtKB/Swiss-Prot | 12/17/09 | 01/01/10 | 2,680 | 122,261 | 3,803 | 1,035,209 | 17 | 10,589 | |
| UniProtKB/TrEMBL | 12/17/09 | 01/01/10 | 20,573 | 2,333,592 | 23,876 | 12,234,060 | 24 | 8,586 | |
| WB | 11/26/09 | 01/01/10 | 12 | 15,667 | 13 | 91,611 | 5 | 2,738 | |
| Non-eukaryotic gene product | Genebb | 12/15/09 | 01/01/10 | 53,088 | 3,595,041 | 76,597 | 101,160 | 319 | 2,497 |
| EcoCycc | 12/14/09 | 01/01/10 | 2 | 1,869 | 2 | 4,992 | 1 | 1,388 | |
| JCVI_CMR | 07/22/09 | 01/01/10 | 16 | 21,271 | 16 | 54,398 | 3 | 2,350 | |
| PDB | 12/17/09 | 01/01/10 | 83 | 16,580 | 85 | 66,027 | 12 | 1,689 | |
| TIGR_CMR | 11/14/07 | 01/01/10 | 70 | 40,653 | 70 | 101,965 | 4 | 2,441 | |
| UniProt | 12/17/09 | 01/01/10 | 48 | 248 | 67 | 7,870 | 3 | 44 | |
| UniProtKB/Swiss-Prot | 12/17/09 | 01/01/10 | 4,454 | 324,523 | 5,297 | 2,581,774 | 30 | 3,306 | |
| UniProtKB/TrEMBL | 12/17/09 | 01/01/10 | 77,047 | 4,459,834 | 83,965 | 20,009,318 | 49 | 4,048 | |
a http://www.ensembl.org/index.html
b http://www.ncbi.nlm.nih.gov/gene
c http://www.geneontology.org/GO.current.annotations.shtml
Gene Ontology distribution incorrectly annotated across evidence codes and the related factors
| Evidence code | No. of inaccurate annotation (correlation coefficient value) | Total No. of GO annotation | |||
|---|---|---|---|---|---|
| Redundant annotation | Biological domain inconsistent annotation | Taxonomy inconsistent annotation | Total inaccurate annotation | ||
| NR | 0 (*) | 0 (*) | 0 (*) | 0 (*) | 6 |
| ISM | 30 (-0.07) | 2 (0.26) | 0 (*) | 32 (-0.06) | 279 |
| ISA | 287 (-0.05) | 1,385 (0.40) | 0 (*) | 1,672 (-0.04) | 11,756 |
| IGC | 322 (-0.05) | 11 (0.43) | 0 (*) | 333 (-0.04) | 888 |
| IC | 1,193 (-0.03) | 1,265 (0.40) | 0 (*) | 2,458 (-0.02) | 12,490 |
| IEP | 2,344 (-0.03) | 2,467 (0.46) | 0 (*) | 4,811 (-0.02) | 27,889 |
| EXP | 3,273 (0.08) | 1,221 (0.16) | 0 (*) | 4,494 (0.08) | 20,781 |
| IGI | 3,628 (-0.02) | 4,171 (0.41) | 0 (*) | 7,799 (-0.02) | 34,985 |
| RCA | 6,469 (-0.07) | 7,710 (0.18) | 0 (*) | 14,179 (-0.06) | 85,014 |
| NAS | 7,921 (0.01) | 4,285 (0.39) | 0 (*) | 12,206 (0.02) | 58,687 |
| IPI | 10,555 (0.11) | 1,163 (0.31) | 0 (*) | 11,718 (0.11) | 72,597 |
| ISO | 14,119 (-0.05) | 15,956 (0.39) | 16 (-0.06) | 30,091 (-0.04) | 115,268 |
| TAS | 15,944 (0.01) | 8,331 (0.39) | 5 (-0.01) | 24,280 (0.01) | 113,414 |
| ND | 18,987 (-0.07) | 1 (0.51) | 0 (*) | 18,988 (-0.05) | 366,152 |
| ISS | 24,314 (0.01) | 12,828 (0.53) | 49 (0.01) | 37,191 (0.03) | 377,770 |
| IMP | 25,994 (-0.03) | 29,932 (0.38) | 160 (-0.06) | 56,086 (-0.03) | 242,825 |
| IDA | 44,327 (0.01) | 29,863 (0.38) | 17 (-0.02) | 74,207 (0.01) | 311,481 |
| IEA | 11,433,355 (0.99) | 219,560 (0.75) | 56,180 (0.99) | 11,709,095 (0.99) | 42,984,075 |
| NA (Not Avaliable) | 803 (-0.02) | 3,654 (0.61) | 12 (-0.04) | 4,469 (-0.01) | 18,102 |
| No of gene product | (0.71) | (0.97) | (0.69) | (0.72) | |
| No. of species | (0.99) | (0.78) | (0.99) | (0.99) | |
| No. of GO term | (0.35) | (0.57) | (0.34) | (0.36) | |
| Average No. of annotations | (0.99) | (0.76) | (0.99) | (0.99) | |