| Literature DB >> 23071802 |
Prashanti Manda1, Seval Ozkan, Hui Wang, Fiona McCarthy, Susan M Bridges.
Abstract
The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by Level (COLL) that takes into account the structure and semantics of the GO, generates generalized transactions from annotation data and mines interesting multi-level cross-ontology association rules. We applied our method on publicly available chicken and mouse GO annotation datasets and mined 5368 and 3959 multi-level cross ontology rules from the two datasets respectively. We show that our approach discovers more and higher quality association rules from the GO as evaluated by biologists in comparison to previously published methods. Biologically interesting rules discovered by our method reveal unknown and surprising knowledge about co-occurring GO terms.Entities:
Mesh:
Year: 2012 PMID: 23071802 PMCID: PMC3470562 DOI: 10.1371/journal.pone.0047411
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Issues in generalization in the Gene Ontology.
Figure 2Number of terms at each level of the GO (data version 1.1.2633).
Figure 3Distribution of terms in the GO (data version 1.1.2633) from different levels across CC, MF and BP.
Figure 4A comparison of the distribution of GO annotations in the synthetic datasets generated using the three approaches and the distribution in the target dataset in the three sub-ontologies: (a): Cellular Component, (b) Biological Process, (c) Molecular Function.
Average false discovery rate of random cross-ontology rules from 50 synthetic datasets at each level of generalization.
| Level of Generalization in the GO | False Discovery Rate of Random Rules | ||
| MF → CC, CC →MF | BP → MF, MF → BP | CC → BP, BP →CC | |
| 16 | 0 | 0 | 0 |
| 15 | 0 | 0 | 0 |
| 14 | 0 | 0 | 0 |
| 13 | 0 | 0 | 0 |
| 12 | 0 | 0 | 0 |
| 11 | 0 | 0 | 0 |
| 10 | 0 | 0 | 0 |
| 9 | 0.00020 | 0.00032 | 0.00016 |
| 8 | 0.00150 | 0.00000 | 0.00422 |
| 7 | 0.00372 | 0.00032 | 0.01000 |
| 6 | 0.00438 | 0.00130 | 0.00924 |
| 5 | 0.02076 | 0.02088 | 0.01974 |
| 4 | 0.01724 | 0.03904 | 0.01644 |
| 3 | 0.01378 | 0.02792 | 0.04646 |
Summary of the number of rules mined before and after pruning by COLL and the Burgun approach.
| Dataset | COLL | BURGUN | ||
| Number of Rules Mined | Number of Cross-Ontology Rules after Pruning | Number of Rules Mined | Number of Cross-Ontology Rules after Pruning | |
| Chicken | 178362 | 5368 | 12422 | 2693 |
| Mouse (All annotations) | 83602 | 3959 | 4936 | 1517 |
Number of rules mined by COLL at each level of generalization mined from the chicken and mouse datasets.
| Level of Generalization in the GO | Chicken All Annotations | Mouse | |
| All Annotations | IEA AnnotationsRemoved | ||
| 14 | 2 | 0 | 0 |
| 13 | 11 | 10 | 6 |
| 12 | 24 | 12 | 17 |
| 11 | 91 | 24 | 33 |
| 10 | 208 | 99 | 110 |
| 9 | 595 | 327 | 317 |
| 8 | 938 | 870 | 953 |
| 7 | 1467 | 1152 | 1562 |
| 6 | 2025 | 1465 | 2131 |
Number of rules mined by COLL in each cross-ontology category.
| Cross-Ontology Rule Category | Chicken All Annotations | Mouse | |
| All Annotations | IEA AnnotationsRemoved | ||
| CC → BP | 658 | 246 | 872 |
| BP → CC | 1669 | 1532 | 2129 |
| MF → BP | 1510 | 1240 | 1272 |
| BP → MF | 950 | 326 | 472 |
| MF → CC | 421 | 538 | 321 |
| CC → MF | 153 | 77 | 63 |
Number of rules mined by COLL in each confidence range.
| Cross-ontology Rule Category | Chicken All Annotations | Mouse | |
| All Annotations | IEA AnnotationsRemoved | ||
| 100% | 1759 | 593 | 603 |
| 90%–99% | 85 | 539 | 206 |
| 80%–89% | 740 | 590 | 852 |
| 70%–79% | 1196 | 792 | 942 |
| 60%–69% | 1581 | 1445 | 2526 |
Examples of cross-ontology rules mined from the chicken dataset.
| Antecedent | GO Term Name | Consequent | GO Term Name | Cross-Ontology Category |
| GO:0005901 | caveola | GO:0031325 | positive regulation of cellular metabolic process | CC → BP |
| GO:0005929 | cilium | GO:0042058 | regulation of epidermal growth factor receptor signaling pathway | CC → BP |
| GO:0015491 | cation:cation antiporter activity | GO:0045895 | regulation of protein kinase activity | MF → BP |
| GO:0015491 | cation:cation antiporter activity | GO:0015707 | nitrite transport | MF → BP |
| GO:0043091 | L-arginine import | GO:0051139 | metal ion:hydrogen antiporter activity | BP → MF |
| GO:0002286 | T cell activation involved in immune response | GO:0043231 | intracellular membrane-bounded organelle | BP → CC |
| GO:0015491 | cation:cation antiporter activity | GO:0045859 | regulation of protein kinase activity | MF → BP |
| GO:0016459 | myosin complex | GO:0003774 | motor activity | CC → MF |
Number of rules in each evaluation category from a random set of 25 rules mined by COLL and the Burgun approach.
| Evaluation Category | Number of Rules in Evaluation Category | ||||||
| ChickenAll Annotations | Mouse | ||||||
| All Annotations | IEA AnnotationsRemoved | ||||||
| COLL | Burgun | COLL | Burgun | COLL | Burgun | ||
| Surprisingness | Unknown/Surprising | 5 | 0 | 4 | 1 | 0 | 1 |
| Somewhat Known | 4 | 5 | 2 | 2 | 2 | 3 | |
| Widely Known | 15 | 18 | 19 | 22 | 18 | 17 | |
| Meaningfulness | Meaningful | 16 | 22 | 19 | 22 | 19 | 19 |
| Maybe Meaningful | 3 | 2 | 6 | 2 | 0 | 3 | |
| Not Meaningful | 5 | 0 | 0 | 0 | 0 | 0 | |
Number of rules in each evaluation category from a set of 50 rules in a confidence range of 60–64% mined by COLL and the Burgun approach.
| Evaluation Category | Mouse All Annotations | ||
| COLL | Burgun | ||
| Surprisingness | Unknown/Surprising | 4 | 0 |
| Somewhat Known | 8 | 3 | |
| Widely Known | 35 | 41 | |
| Meaningfulness | Meaningful | 39 | 35 |
| Maybe Meaningful | 11 | 11 | |
| Not Meaningful | 0 | 0 | |