| Literature DB >> 25904997 |
Rebekka Alm1, Dagmar Waltemath2, Markus Wolfien2, Olaf Wolkenhauer3, Ron Henkel4.
Abstract
BACKGROUND: Model repositories such as BioModels Database provide computational models of biological systems for the scientific community. These models contain rich semantic annotations that link model entities to concepts in well-established bio-ontologies such as Gene Ontology. Consequently, thematically similar models are likely to share similar annotations. Based on this assumption, we argue that semantic annotations are a suitable tool to characterize sets of models. These characteristics improve model classification, allow to identify additional features for model retrieval tasks, and enable the comparison of sets of models.Entities:
Keywords: Bio-ontologies; Feature extraction; Model similarity; SBML
Year: 2015 PMID: 25904997 PMCID: PMC4405863 DOI: 10.1186/s13326-015-0014-4
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Concept vs. annotation distribution in SBO. Overview of the concept distribution in the seven branches of the Systems Biology Ontology (SBO). The size of the colored circles visualizes the number of concepts summarized by each branch. The bottom mirrored image visualizes the distribution of annotations from all models in the BioModels Database test set (BMDB). Figure adapted from [3].
Extracted features for different sets (CC, RS1, RS2 and BMDB), methods and feature size
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 33285 | 24870 | 24870 | 24870 | 22563 | 22563 | 26816 | 24870 | |
| 33302 | 33302 | 33302 | 33302 | 33608 | 26082 | 33695 | 26082 | |
| ChEBI | 33304 | 33304 | 33304 | 33304 | 33694 | 33241 | 47019 | 33241 |
| 35701 | 33582 | 33582 | 33582 | 37096 | 33695 | 61120 | 33695 | |
| 36357 | 36357 | 36357 | 36357 | 37787 | 61120 | 63367 | 61120 | |
|
| 5.4 | 4.2 | 4.4 | 4.2 | 7.2 | 5.6 | 8.2 | 5.4 |
| 8152 | 3674 | 3674 | 3674 | 22411 | 3674 | 3674 | 3674 | |
| 9987 | 8152 | 5575 | 8152 | 30163 | 5575 | 9987 | 5575 | |
| GO | 44699 | 9987 | 8152 | 9987 | 51726 | 6810 | 22607 | 9987 |
| 65007 | 44699 | 9987 | 44699 | 65009 | 9987 | 43170 | 43170 | |
| 71840 | 51234 | 44699 | 65007 | 71822 | 43170 | 71822 | 71822 | |
|
| 2 | 1.8 | 1.8 | 1.8 | 4.4 | 2.2 | 3.6 | 2.6 |
| 003 | 064 | 231 | 003 | 009 | 009 | 009 | 003 | |
| 236 | 231 | 245 | 064 | 231 | 064 | 167 | 009 | |
| SBO | 374 | 240 | 247 | 231 | 252 | 176 | 240 | 064 |
| 375 | 241 | 291 | 236 | 336 | 252 | 167 | ||
| 545 | 545 | 545 | 545 | 240 | ||||
|
| 2.4 | 2.4 | 2.4 | 2 | 4 | 4.3 | 3.5 | 3 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
| |
| 16646 | 18059 | 18059 | 18059 | 22563 | 22563 | 24875 | 24835 | |
| 24651 | 24835 | 24835 | 24835 | 33608 | 24835 | 25107 | 24870 | |
| 25367 | 24870 | 24870 | 24870 | 33694 | 25741 | 26816 | 26082 | |
| 25699 | 25367 | 25367 | 25367 | 37096 | 26082 | 33252 | 33241 | |
| 25741 | 25806 | 26082 | 26082 | 37787 | 33241 | 33620 | 33259 | |
| 26082 | 26082 | 33259 | 33241 | 33252 | 33636 | 33636 | ||
| 33241 | 26835 | 33304 | 33259 | 33259 | 33695 | 33695 | ||
| ChEBI | 33839 | 33241 | 33581 | 33285 | 33608 | 35155 | 35155 | |
| 35701 | 33259 | 33674 | 33304 | 33695 | 35569 | 35569 | ||
| 36358 | 33285 | 33839 | 33674 | 35701 | 47019 | 35701 | ||
| 36606 | 33674 | 35701 | 33839 | 61120 | 61120 | 47019 | ||
| 51143 | 33694 | 37577 | 35701 | 63367 | 63161 | 61120 | ||
| 63161 | 35701 | 50906 | 50906 | 64709 | 63367 | 63161 | ||
| 63299 | 51143 | 51143 | 51143 | 63367 | ||||
| 64709 | 64709 | 64709 | 64709 | 64709 | ||||
|
| 5.9 | 5.3 | 4.8 | 4.8 | 7.2 | 5.4 | 7.0 | 6.3 |
| 3674 | 3674 | 3674 | 3674 | 216 | 3674 | 3674 | 3674 | |
| 5575 | 5575 | 5575 | 5575 | 4693 | 5575 | 5834 | 5575 | |
| 6807 | 6807 | 6807 | 8152 | 5575 | 6810 | 6826 | 9987 | |
| 9056 | 9056 | 9056 | 9987 | 22411 | 9987 | 8943 | 43170 | |
| 9058 | 9058 | 9058 | 32501 | 30163 | 16088 | 9987 | 71822 | |
| 40007 | 44237 | 32501 | 32502 | 32268 | 43170 | 22607 | ||
| 44237 | 44238 | 44237 | 40007 | 45750 | 45750 | 43170 | ||
| GO | 44238 | 44699 | 44238 | 44699 | 51726 | 71822 | ||
| 44699 | 44710 | 44699 | 48511 | 65009 | ||||
| 50896 | 48511 | 44710 | 50896 | 71822 | ||||
| 51234 | 50896 | 50896 | 51234 | |||||
| 65007 | 51234 | 51234 | 51704 | |||||
| 71704 | 65007 | 65007 | 65007 | |||||
| 71840 | 71704 | 71704 | 71840 | |||||
| 71840 | 71840 | |||||||
|
| 2.3 | 2.3 | 1.9 | 1.8 | 4.1 | 2.1 | 3.0 | 2.6 |
| 009 | 064 | 016 | 003 | 009 | 009 | 009 | 003 | |
| 177 | 177 | 017 | 064 | 231 | 064 | 167 | 009 | |
| 179 | 179 | 046 | 241 | 252 | 176 | 240 | 064 | |
| 180 | 180 | 153 | 245 | 336 | 252 | 167 | ||
| 181 | 182 | 156 | 247 | 240 | ||||
| 182 | 185 | 231 | 253 | |||||
| 205 | 205 | 241 | 285 | |||||
| SBO | 245 | 241 | 245 | 290 | ||||
| 253 | 247 | 247 | 291 | |||||
| 290 | 250 | 253 | 374 | |||||
| 291 | 253 | 290 | 375 | |||||
| 308 | 285 | 291 | 405 | |||||
| 342 | 290 | 308 | 409 | |||||
| 360 | 377 | 360 | 412 | |||||
| 374 | 545 | 380 | 545 | |||||
|
| 4.6 | 4.5 | 3.7 | 3.3 | 4 | 4.3 | 3.5 | 3 |
The upper table shows a maximum of five features, the bottom table 15 features, respectively. IDs are shortened (e.g. SBO:0000064 is represented by 064) and ordered ascending. The average depth (avg) of features per ontology is emphasized for the test sets.
Figure 2Concept depth of annotations. Distribution of annotation depth. Overview of the distribution of annotated model entities in relation to the depth of the annotation. The x-axis shows the depth of the annotated concepts in the corresponding ontology, the y-axis shows the number of annotated entities on a logarithmic scale (exact values are stated at the bottom of the figure). The figure legend states the ontology name, the model set and the average depth.
Figure 3Feature overlaps. Visualization of feature overlaps of the four test sets. Each diagram shows the overlap of the results of one ontology (SBO, GO or ChEBI), method (M2 or M4) and number of features (F5 or F15).
Similarity between thematic and arbitrary model sets, calculated based on the similarity of their characteristic features
|
|
|
| |||
|---|---|---|---|---|---|
|
|
|
|
| ||
| BMDB & CC | ChEBI | 0.82 | 0.57 | 0.75 | 0.20 |
| GO | 0.80 | 0.40 | 0.71 | 0.30 | |
| SBO | 0.75 | 0.44 | 0.50 | 0.43 | |
| BMDB & RS1 | ChEBI | 1.00 | 0.94 | 0.91 | 0.71 |
| GO | 0.87 | 0.84 | 0.67 | 0.59 | |
| SBO | 0.75 | 0.65 | 0.63 | 0.65 | |
| CC & RS1 | ChEBI | 0.82 | 0.63 | 0.77 | 0.29 |
| GO | 0.67 | 0.25 | 0.90 | 0.36 | |
| SBO | 0.50 | 0.63 | 0.70 | 0.63 | |
Number of curated model contained in each thematic data set
|
|
|
|
| |
|---|---|---|---|---|
| BMDB | 490 | 13012 | 10882 | 5729 |
| CC | 34 | 227 | 954 | 37 |
| CA | 13 | 6 | 62 | 9 |
| APOP | 13 | 31 | 43 | 3 |
| NFKB | 12 | 28 | 35 | 0 |
Additionally, the number of distinct annotations contained in a set are shown for SBO, GO and ChEBI.
Extracted features for thematic test sets, methods and feature size
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| 8152 | 3674 | 3674 | 3674 | 22411 | 5515 | 5217 | 5515 | |
| 9987 | 5575 | 9987 | 8152 | 30163 | 30693 | 5829 | 6886 | |
| GO | 44699 | 8152 | 44699 | 9987 | 51726 | 44257 | 6816 | 22607 |
| 65007 | 9987 | 51234 | 44699 | 65009 | 65003 | 15085 | 44257 | |
| 71840 | 71840 | 65007 | 71840 | 71822 | 71822 | 51480 | 71822 | |
|
| 2.0 | 1.6 | 1.8 | 1.8 | 4.4 | 4.2 | 8.2 | 4.6 |
|
|
|
| ||||||
|
|
|
|
|
|
|
|
| |
| 3674 | 3824 | 3824 | 3674 | 216 | 2090 | 5217 | 5515 | |
| 5575 | 5488 | 4872 | 5575 | 4693 | 5575 | 5783 | 5634 | |
| 6807 | 5575 | 5215 | 6807 | 5575 | 16265 | 5829 | 6886 | |
| 9056 | 9056 | 5488 | 9056 | 22411 | 30693 | 6816 | 16563 | |
| 9058 | 9987 | 5575 | 9058 | 30163 | 31264 | 15085 | 22607 | |
| 40007 | 30234 | 7204 | 44237 | 32268 | 43027 | 17111 | 44257 | |
| 44237 | 32501 | 22411 | 44238 | 45750 | 44257 | 38023 | 71822 | |
| GO | 44238 | 44238 | 32469 | 44699 | 51726 | 65003 | 51480 | |
| 44699 | 44699 | 44237 | 44710 | 65009 | 71822 | |||
| 50896 | 50896 | 50789 | 50896 | 71822 | ||||
| 51234 | 51234 | 51234 | 51234 | |||||
| 65007 | 65007 | 51481 | 65007 | |||||
| 71704 | 71704 | 51716 | 71704 | |||||
| 71840 | 71840 | 60089 | 71840 | |||||
| 65009 | ||||||||
|
| 2.2 | 2.1 | 4.0 | 2.4 | 4.1 | 4.2 | 7.0 | 4.3 |
The upper table shows a maximum of five features, the bottom table a maximum of 15 features, respectively. IDs are shortened (e.g. GO:00003674 is represented by 3674) and ordered ascending (Additional file 3).
Similarity between two model sets, calculated based on the similarity of their characteristic GO features
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| BMDB | 0.8395 | 0.4720 | 0.3989 | 0.3522 | 0.0747 | 0.3629 | |
| RS1 | 0.8720 | 0.3203 | 0.2472 | 0.1917 | 0.1072 | 0.2746 | |
| RS2 | 0.8720 | 0.8720 | 0.5752 | 0.4078 | 0.1332 | 0.4632 | |
| CC | 0.8000 | 0.6720 | 0.8000 | 0.4669 | 0.1116 | 0.5222 | |
| APOP | 0.6720 | 0.6720 | 0.8000 | 0.6000 | 0.0912 | 0.7550 | |
| CA | 0.8720 | 0.8720 | 0.7440 | 0.6720 | 0.5440 | 0.1758 | |
| NFKB | 0.8720 | 0.8720 | 1.0000 | 0.8000 | 0.8000 | 0.7440 | |
|
|
|
|
|
|
|
|
|
| BMDB | 0.5997 | 0.4800 | 0.2995 | 0.2016 | 0.0467 | 0.2592 | |
| RS1 | 0.6706 | 0.4230 | 0.3596 | 0.1573 | 0.0536 | 0.2476 | |
| RS2 | 0.9543 | 0.6706 | 0.3236 | 0.3202 | 0.0833 | 0.4105 | |
| CC | 0.7185 | 0.8907 | 0.6727 | 0.3711 | 0.0811 | 0.3080 | |
| APOP | 0.6449 | 0.6533 | 0.6449 | 0.7000 | 0.1543 | 0.5082 | |
| CA | 0.3095 | 0.3364 | 0.3095 | 0.3315 | 0.4679 | 0.2022 | |
| NFKB | 0.6681 | 0.9333 | 0.6681 | 0.9496 | 0.6953 | 0.3291 |
Values for M4 are shown above the main diagonal, M2 below, respectively.