| Literature DB >> 22174742 |
Yasmin Alam-Faruque1, Rachael P Huntley, Varsha K Khodiyar, Evelyn B Camon, Emily C Dimmer, Tony Sawford, Maria J Martin, Claire O'Donovan, Philippa J Talmud, Peter Scambler, Rolf Apweiler, Ruth C Lovering.
Abstract
UNLABELLED: The Gene Ontology (GO) resource provides dynamic controlled vocabularies to provide an information-rich resource to aid in the consistent description of the functional attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). System-focused projects, such as the Renal and Cardiovascular GO Annotation Initiatives, aim to provide detailed GO data for proteins implicated in specific organ development and function. Such projects support the rapid evaluation of new experimental data and aid in the generation of novel biological insights to help alleviate human disease. This paper describes the improvement of GO data for renal and cardiovascular research communities and demonstrates that the cardiovascular-focused GO annotations, created over the past three years, have led to an evident improvement of microarray interpretation. The reanalysis of cardiovascular microarray datasets confirms the need to continue to improve the annotation of the human proteome. AVAILABILITY: GO ANNOTATION DATA IS FREELY AVAILABLE FROM: ftp://ftp.geneontology.org/pub/go/gene-associations/Entities:
Mesh:
Year: 2011 PMID: 22174742 PMCID: PMC3235096 DOI: 10.1371/journal.pone.0027541
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Distribution of GO term specificity by annotation source.
Accumulative frequency of the distribution of GO terms applied in human annotations. Manual annotations created by the Cardiovascular and Renal Initiative, compared to those created by annotation groups without a system focused approach. Mann Whitney U confirms that the median granularity of GO terms used in human protein annotation by the Cardiovascular and Renal Initiatives is 8.0 (inter quartile range 6–10), compared to a median granularity of 7.0 (inter quartile range 5–9), for the GO terms used by other groups manually annotating to the human proteome (P<0.0001).
Figure 2AmiGO ‘Tree View’ image of part of the kidney developmental process ontology.
The ‘tree view’ in AmiGO (http://amigo.geneontology.org) showing the GO term parents of GO:0003337 ‘mesenchymal to epithelial transition involved in metanephros morphogenesis’. The most specific twelve GO terms (shaded) were amongst the 470 new terms created following the kidney development ontology workshop. The numbers in brackets indicate the number of human proteins annotated to the GO term, or one of its child terms (07th October 2011). [I] is_a parent-child relationship, ‘P’ part_of parent-child relationship.
Comparison of PAH-SSc microarray data analysis using MAPPFinder in 2008 and GO-Elite, Ontologizer and ProfCom GO in 2011.
| GenMAPP analysisGrigoryev | GenMAPP GO-Elite analysis June 2011 | Ontologizer analysisMarch 2011 | ProfCom GO analysisMarch 2011 | |||||||||
| GO term | Z-score | S | P | Z-score | St = 262 | Pt = 17158 | p-value (Adj) | St = 264 | Pt = 18249 | p-value | St = 265 | P t = 18257 |
| angiogenesis |
| 6 | 41 |
| 11 | 137 | 1 | 15 | 248 | 8.00E-02 | 9 | 122 |
| chemotaxis |
| 10 | 111 |
| 20 | 484 | 1 | 20 | 525 | 7.50E-02 | 9 | 121 |
| inflammatory response |
| 18 | 179 |
| 17 | 265 | 1 | 23 | 361 |
| 16 | 224 |
| cellular component movement |
| 11 | 108 |
| 28 | 506 |
| 35 | 701 | 9.90E-02 | 8 | 98 |
| G-protein coupled receptor signaling |
| 15 | 825 |
| 18 | 524 | 1 | 19 | 573 | N/A | N/A | N/A |
| cell-cell signaling |
| 11 | 283 | −0.36 | 8 | 597 | 1 | 15 | 877 | N/A | N/A | N/A |
| sensory perception |
| 7 | 472 | N/A | N/A | N/A | 1 | 3 | 835 | N/A | N/A | N/A |
| antimicrobial humoral response |
| 6 | 84 | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A | N/A |
| Negative regulation of cell proliferation |
| 9 | 136 | 4.07 | 16 | 403 | 1 | 16 | 399 | N/A | N/A | N/A |
Significant Z-scores and p-values are highlighted in bold text. GO processes with Z scores >1.96 identified by MAPPFinder and GO-Elite are considered as significantly enriched [22]; adjusted p-values<0.1 identified by Ontologizer [24] are considered as significantly enriched; P-values.<0.01 identified by ProfCom GO [25] are considered as significantly enriched. S = study count, P = population count, t = number of protein IDs.
Figure 3QuickGO term display.
QuickGO (www.ebi.ac.uk/QuickGO) ancestor chart showing information for GO:0006935 ‘chemotaxis’ and its ‘is_a’ parent relationships within the hierarchical directed acyclic graph. The GO terms ‘chemotaxis’, ‘locomotion’ and ‘response to stimulus’ are highlighted to illustrate their parent-child relationships. The child term details are displayed for the GO term ‘chemotaxis’.
Comparison of Ontologizer PAH-SSc microarray data analysis using GO annotation dataset with and without the human protein annotations submitted by the Cardiovascular Initiative.
| GO dataset including Cardiovascular Initiative annotations | GO dataset without Cardiovascular Initiative annotations | ||||||||
| GO ID | GO term | p-value | p-value (Adj) | Study count(t = 264) | Population count(t = 18249) | p-value | p-value (Adj) | Study count(t = 264) | Population count(t = 18244) |
| GO:0002376 | immune system process | 1.38E-20 |
| 77 | 1487 | 2.73E-18 |
| 71 | 1406 |
| GO:0065007 | biological regulation | 1.15E-10 |
| 183 | 8119 | 3.02E-10 |
| 179 | 7943 |
| GO:0050896 | Response to stimulus | 2.93E-10 |
| 155 | 6423 | 1.57E-09 |
| 151 | 6318 |
| GO:0040011 | Locomotion | 8.55E-09 |
| 42 | 970 | 4.88E-07 |
| 36 | 883 |
| GO:0016265 | Death | 1.56E-08 |
| 53 | 1431 | 6.33E-09 |
| 52 | 1354 |
| GO:0023052 | Signaling | 1.76E-08 |
| 107 | 4017 | 6.16E-09 |
| 106 | 3898 |
| GO:0006928 | cellular component movement | 6.01E-08 |
| 35 | 701 | 9.27E-07 |
| 30 | 612 |
| GO:0032502 | developmental process | 1.15E-07 |
| 98 | 3678 | 1.67E-07 |
| 95 | 3553 |
| GO:0001775 | cell activation | 1.73E-07 |
| 32 | 632 | 7.56E-06 |
| 27 | 575 |
| GO:0006950 | Response to stress | 4.68E-07 |
| 92 | 2552 | 1.57E-05 |
| 84 | 2448 |
| GO:0008283 | cell proliferation | 3.15E-06 |
| 42 | 1205 | 2.37E-05 |
| 37 | 1091 |
| GO:0009987 | cellular process | 1.27E-05 |
| 229 | 12453 | 7.36E-06 |
| 228 | 12356 |
| GO:0032501 | multicellular organismal process | 3.01E-05 |
| 117 | 5194 | 3.81E-05 |
| 114 | 5054 |
| GO:0048518 | positive regulation of biological process | 3.91E-05 |
| 87 | 2786 | 1.60E-04 |
| 78 | 2541 |
| GO:0022610 | biological adhesion | 4.59E-05 |
| 30 | 827 | 5.74E-04 | 0.186 | 26 | 782 |
| GO:0009605 | Response to external stimulus | 6.49E-05 |
| 44 | 1033 | 1.66E-04 |
| 40 | 952 |
| GO:0001816 | Cytokine production | 6.75E-05 |
| 18 | 289 | 1.64E-03 | 0.458 | 13 | 227 |
| GO:0048519 | Negative regulation of biological process | 7.13E-05 |
| 77 | 2404 | 4.42E-04 | 0.144 | 68 | 2190 |
| GO:0051674 | localization of cell | 7.73E-05 |
| 29 | 606 | 1.27E-03 | 0.37 | 23 | 515 |
| GO:0051179 | Localization | 1.14E-04 |
| 87 | 3669 | 2.48E-04 |
| 82 | 3482 |
| GO:0065008 | regulation of biological quality | 1.80E-04 |
| 72 | 2199 | 2.39E-04 |
| 67 | 2024 |
| GO:0042221 | Response to chemical stimulus | 2.13E-04 |
| 74 | 2177 | 1.25E-03 | 0.364 | 67 | 2048 |
| GO:0046209 | nitric oxide metabolic process | 2.41E-04 |
| 6 | 51 | 2.88E-02 | 1 | 3 | 40 |
| GO:0003013 | circulatory system process | 9.57E-04 |
| 12 | 263 | 2.16E-04 |
| 12 | 226 |
Significant adjusted p-values are highlighted in bold text. GO processes with adjusted p-values<0.1 identified by Ontologizer [24] are considered as significantly enriched. t = number of protein IDs.
Comparison of ProfCom GO PAH-SSc microarray data analysis using GO annotation dataset with and without the human protein annotations submitted by the Cardiovascular Initiative.
| GO dataset including Cardiovascular Initiative annotations | GO dataset without Cardiovascular Initiative annotations | ||||||
| GO ID | GO term | p-value | Study count(t = 265) | Population count(t = 18257) | p-value | Study count(t = 265) | Population count(t = 18252) |
| GO:0032496 | response to lipopolysaccharide |
| 12 | 107 |
| 10 | 95 |
| GO:0006954 | inflammatory response |
| 16 | 224 | 1.59E-02 | 13 | 214 |
| GO:0045768 | positive regulation of anti-apoptosis |
| 7 | 34 |
| 6 | 27 |
| GO:0045429 | postive regulation of nitric oxide biosynthetic process |
| 6 | 26 | #N/A | #N/A | #N/A |
| GO:0006955 | immune response |
| 23 | 510 |
| 23 | 502 |
| GO:0048661 | positive regulation of smooth muscle cell proliferation |
| 6 | 27 |
| 5 | 19 |
| GO:0014070 | response to organic cyclic substance |
| 10 | 104 |
| 10 | 104 |
| GO:0050900 | leukocyte migration |
| 10 | 106 | 1.94E-02 | 9 | 103 |
| GO:0051412 | response to corticosterone stimulus |
| 5 | 19 |
| 5 | 19 |
| GO:0019221 | cytokine-mediated signaling pathway |
| 13 | 203 | 1.32E-01 | 11 | 194 |
Significant p-values are highlighted in bold text. GO processes with p-values<0.01 identified by ProfCom GO [25] are considered as significantly enriched. t = number of protein IDs.
Comparison of Ontologizer macrophage data analysis using GO annotation datasets from December 2010 and April 2011.
| April 2011 | December 2010 | ||||||
| GO ID | GO term | p-value (Adj) | Study count(t = 257) | Population count(t = 14241) | p-value (Adj) | Study count(t = 258) | Population count(t = 14386) |
| GO:0001775 | cell activation |
| 25 | 390 |
| 19 | 363 |
| GO:0002376 | immune system process |
| 39 | 885 |
| 31 | 833 |
| GO:0008283 | cell proliferation |
| 38 | 862 | 0.236 | 27 | 800 |
| GO:0001816 | cytokine production |
| 16 | 220 | 1 | 8 | 205 |
| GO:0042221 | response to chemical stimulus |
| 55 | 1538 | 1 | 37 | 1356 |
| GO:0006928 | cellular component movement |
| 24 | 531 | 1 | 12 | 483 |
| GO:0051674 | localization of cell |
| 23 | 507 | 1 | 11 | 457 |
| GO:0032502 | developmental process |
| 81 | 3006 | 0.982 | 65 | 2832 |
| GO:0071887 | leukocyte apoptosis |
| 7 | 41 | 1 | 2 | 24 |
Significant p-values are highlighted in bold text. GO processes with adjusted p-values<0.1 identified by Ontologizer [24] are considered as significantly enriched. t = number of protein IDs.