| Literature DB >> 20498845 |
Bart H J van den Berg1, Fiona M McCarthy, Susan J Lamont, Shane C Burgess.
Abstract
One motivation of systems biology research is to understand gene functions and interactions from functional genomics data such as that derived from microarrays. Up-to-date structural and functional annotations of genes are an essential foundation of systems biology modeling. We propose that the first essential step in any systems biology modeling of functional genomics data, especially for species with recently sequenced genomes, is gene structural and functional re-annotation. To demonstrate the impact of such re-annotation, we structurally and functionally re-annotated a microarray developed, and previously used, as a tool for disease research. We quantified the impact of this re-annotation on the array based on the total numbers of structural- and functional-annotations, the Gene Annotation Quality (GAQ) score, and canonical pathway coverage. We next quantified the impact of re-annotation on systems biology modeling using a previously published experiment that used this microarray. We show that re-annotation improves the quantity and quality of structural- and functional-annotations, allows a more comprehensive Gene Ontology based modeling, and improves pathway coverage for both the whole array and a differentially expressed mRNA subset. Our results also demonstrate that re-annotation can result in a different knowledge outcome derived from previous published research findings. We propose that, because of this, re-annotation should be considered to be an essential first step for deriving value from functional genomics data.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20498845 PMCID: PMC2871057 DOI: 10.1371/journal.pone.0010642
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Structural re-annotation results.
| Whole microarray data | Differentially expressed mRNA data | |||||
| Structural annotation | Original | Re-annotated | Fold Δ | Original | Re-annotated | Fold Δ |
| Total EST | 15,609 | 15,227 | 0.98 | 57 | 54 | 0.95 |
| EST to Chicken gene | 1,457 | 14,206 | 9.75 | 16 | 49 | 3.06 |
| EST to Human gene | 3,951 | 0 | n/a | 13 | 0 | n/a |
| EST to Mouse gene | 1,487 | 0 | n/a | 5 | 0 | n/a |
| EST to Rat gene | 450 | 0 | n/a | 3 | 0 | n/a |
| EST to genes other species | 1,409 | 0 | n/a | 7 | 0 | n/a |
| EST with no gene annotation | 6,855 | 1,032 | 0.15 | 13 | 5 | 0.38 |
| Total unique chicken gene annotations | 1,136 | 11,868 | 10.45 | 16 | 43 | 2.69 |
The results of structural annotation are compared between the original annotation data and the re-annotation dataset for both the whole microarray as for the differentially expressed mRNA dataset. Re-annotation increased EST to chicken gene mapping and total unique chicken genes, in both the whole microarray as for the differentially expressed mRNA dataset, while reducing need for structural annotations based on orthology.
Functional re-annotation results.
| Whole microarray data | Differentially expressed mRNA data | |||||
| Functional annotation | Original | Re-annotated | Fold Δ | Original | Re-annotated | Fold Δ |
| Total unique chicken protein annotations | 785 | 11,868 | 15.10 | 12 | 43 | 3.58 |
| Total proteins GO annotated | 615 | 3,845 | 6.25 | 9 | 38 | 4.22 |
| Total GO terms | 3,929 | 27,815 (9,595) | 7.08 (2.44) | 39 | 365 (190) | 9.36 (4.87) |
| Unique GO terms | 1,050 | 2,652 (1,662) | 2.53 (1.58) | 26 | 160 (92) | 6.15 (3.54) |
| Total GAQ score | 43,245 | 305,996 (107,006) | 7.08 (2.47) | 375 | 4,158 (2663) | 11.08 (6.04) |
| Mean GAQ score | 70 | 80 (174) | 1.13 (2.49) | 42 | 109 (296) | 2.63 (7.05) |
| GO depth score | 21,142 | 143,206 (51,391) | 6.77 (2.43) | 177 | 1,921 (934) | 10.85 (5.28) |
| ∑GO annotation confidence score including IEA | 8,037 | 57,696 (20,040) | 7.18 (2.49) | 81 | 781 (527) | 9.64 (6.51) |
| ∑GO annotation confidence score excluding IEA | 1,325 | 14,258 (3,460) | 10.76 (2.61) | 5 | 143 (283) | 28.60 (56.6) |
The results of functional annotation are compared between the original annotation data and the re-annotation dataset for both the whole microarray as for the differentially expressed mRNA dataset. In addition, re-annotation increased the number of GO terms, the total GAQ score, the detail and the confidence in the GO annotations assigned. The numbers in parentheses represent the re-annotation results of only the original 615 chicken proteins of the whole array data. This score represents the standard baseline of the impact of re-annotation improvement. For the differentially expressed mRNA data, the numbers in parentheses represent the re-annotation results of the original 9 differentially expressed mRNAs.
Figure 1Whole microarray GOSlim modelling.
The difference in number of GO annotations in the GOSlim groups for the GO ontologies ‘cellular component’, ‘molecular function’ and ‘biological process’ between the original and re-annotated whole microarray gene dataset. The whole microarray GOSlim modeling shows that re-annotation increases the number of GO annotations in each GOSlim group for each ontology.
Figure 2Differentially Expressed mRNA GOSlim modelling.
The difference in number of GO annotations in the GOSlim groups for the GO ontologies ‘cellular component’, ‘molecular function’ and ‘biological process’ between the original and re-annotated differentially expressed mRNA dataset. The differentially expressed mRNA GOSlim modeling shows that re-annotation increases the number of GO annotations for most GOSlim group. The negative value for the GOSlim group ‘transporter activity’ in the ‘molecular function’ ontology are caused by updated GO annotations to the more detailed ‘protein transporter activity’ GOSlim group.
Top 10 significant pathways original whole microarray dataset.
| Rank | Pathway Original | Ratio coverage | # Genes from dataset |
| 1 |
| 1.17E-01 | 11 |
| 2 | Axonal Guidance Signaling | 6.42E-02 | 26 |
| 3 |
| 8.37E-02 | 17 |
| 4 | Amyotrophic Lateral Sclerosis Signaling | 9.09E-02 | 10 |
| 5 | Actin Cytoskeleton Signaling | 7.14E-02 | 17 |
| 6 | CDK5 Signaling | 9.68E-02 | 9 |
| 7 | Caveolar-mediated Endocytosis | 9.76E-02 | 8 |
| 8 | Neurotrophin/TRK Signaling | 1.15E-01 | 9 |
| 9 | VEGF Signaling | 9.28E-02 | 9 |
| 10 | Clathrin-mediated Endocytosis | 7.19E-02 | 12 |
|
|
| ||
Top 10 of significant pathways identified by Ingenuity Pathway Analysis for the original whole microarray dataset. Pathways in bold were found in the top 10 series of both the original and re-annotated datasets. Ratio Coverage = the number of genes from the data set that map to the pathway divided by the total number of genes that map to the canonical pathway is displayed (Ingenuity® Systems).
Top 10 significant pathways re-annotated whole microarray dataset.
| Rank | Pathway Original | Ratio coverage | # Genes from dataset |
| 1 | CD28 Signaling in T Helper Cells | 4.52E-01 | 56 |
| 2 |
| 4.68E-01 | 95 |
| 3 | NF-κB Signaling | 4.29E-01 | 63 |
| 4 | Insulin Receptor Signaling | 4.49E-01 | 62 |
| 5 |
| 4.79E-01 | 45 |
| 6 | IL-9 Signaling | 4.59E-01 | 17 |
| 7 | Role of NFAT in Regulation of the Immune Response | 3.72E-01 | 70 |
| 8 | Angiopoietin Signaling | 4.17E-01 | 30 |
| 9 | Ceramide Signaling | 4.76E-01 | 40 |
| 10 | Virus Entry via Endocytic Pathways | 4.48E-01 | 43 |
|
|
| ||
Top 10 of significant pathways identified by Ingenuity Pathway Analysis for the re-annotated whole microarray dataset. Pathways in bold were found in the top 10 series of both the original and re-annotated datasets. Ratio Coverage = the number of genes from the data set that map to the pathway divided by the total number of genes that map to the canonical pathway is displayed (Ingenuity® Systems).
Top 10 significant pathways original differentially expressed mRNA dataset.
| Rank | Pathway Original | Ratio coverage | Genes from dataset |
| 1 |
| 2.22E-02 | CCL5, FN1, IL1B |
| 2 | Acute Phase Response Signaling | 1.12E-02 | FN1, IL1B |
| 3 |
| 1.08E-02 | CD3E, CCL5, IL1B |
| 4 |
| 1.79E-02 | IL1B |
| 5 | Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells | 3.7E-02 | CD3E |
| 6 | Docosahexaenoic Acid (DHA) Signaling | 2.22E-02 | IL1B |
| 7 | TREM1 Signaling | 1.45E-02 | IL1B |
| 8 | IL-10 Signaling | 1.41E-02 | IL1B |
| 9 | Calcium-induced T Lymphocyte Apoptosis | 1.61E-02 | CD3E |
| 10 | LXR/RXR Activation | 1.18E-02 | IL1B |
|
|
| ||
Top 10 of significant pathways identified by Ingenuity Pathway Analysis for the original differentially expressed mRNA dataset. Pathways in bold were found in the top 10 series of both the original and re-annotated datasets. Genes in bold and italics were only identified in the re-annotated dataset. Ratio Coverage = the number of genes from the data set that map to the pathway divided by the total number of genes that map to the canonical pathway is displayed (Ingenuity® Systems).
Top 10 significant pathways re-annotated differentially expressed mRNA dataset.
| Rank | Pathway Original | Ratio coverage | Genes from dataset |
| 1 | Cytotoxic T Lymphocyte-mediated Apoptosis of Target Cells | 7.41E-02 | CD3E, |
| 2 |
| 3.7E-02 | CCL5, FN1, IL1B, |
| 3 | CCR5 Signaling in Macrophages | 3.45E-02 | CD3E, CCL5, |
| 4 | Death Receptor Signaling | 3.08E-02 | BIRC2 |
| 5 | Induction of Apoptosis by HIV1 | 3.03E-02 | BIRC2, |
| 6 |
| 1.44E-02 | CD3E, CCL5, IL1B, |
| 7 | p38 MAPK Signaling | 2.11E-02 | IL1B, |
| 8 | CTLA4 Signaling in Cytotoxic T Lymphocytes | 2.25E-02 | CD3E, |
| 9 | Apoptosis Signaling | 2.13E-02 | BIRC2, |
| 10 |
| 1.79E-02 | IL1B |
|
|
| ||
Top 10 of significant pathways identified by Ingenuity Pathway Analysis for the re-annotated differentially expressed mRNA dataset. Pathways in bold were found in the top 10 series of both the original and re-annotated datasets. Genes in bold and italics were only identified in the re-annotated dataset. Ratio Coverage = the number of genes from the data set that map to the pathway divided by the total number of genes that map to the canonical pathway is displayed (Ingenuity® Systems).