| Literature DB >> 21914205 |
Toralf Kirsten1, Anika Gross, Michael Hartung, Erhard Rahm.
Abstract
BACKGROUND: Ontologies are increasingly used to structure and semantically describe entities of domains, such as genes and proteins in life sciences. Their increasing size and the high frequency of updates resulting in a large set of ontology versions necessitates efficient management and analysis of this data.Entities:
Year: 2011 PMID: 21914205 PMCID: PMC3198872 DOI: 10.1186/2041-1480-2-6
Source DB: PubMed Journal: J Biomed Semantics
Comparison of existing platforms and systems that provide and apply life science ontologies
| OLS | OBO | BioPortal | GOMMA | SAMBO | |
|---|---|---|---|---|---|
| Service to query, browse and navigate biomedical ontologies | Collaborative platform having shared principles to govern and coordinate ontology development | System to access and share ontologies that are actively used in biomedical communities | Infrastructure to manage, analyze and match ontologies taking their evolution into account | System for aligning and merging biomedical ontologies | |
| OBO | OBO, OWL | OBO, OWL, RDF, RRF, ... | OBO, OWL, RDF, ... (extensible via flexible importers) | OWL | |
| - | x | x | x | x | |
| - (only latest versions are accessible) | x (downloadable versions via CVS repository) | x (access and download of ontology versions) | x (efficient versioning of ontologies in a repository) | - (no explicit ontology versioning possible) | |
| - | x (information about changes via newsletters) | - | - (version comparison to detect changes) | - | |
| - | - | - | x | - | |
| - | - | - | x (metadata, external knowledge, instances) | x (metadata, external knowledge, documents, learner) | |
| ontologies | ontologies | ontologies and mappings | ontologies, ontology evolution | - | |
| auto completion, term hierarchies via graphs | discussion lists and wiki to support collaborative development | automatic text annotation | enhanced Diff, annotation migration | merging, user interaction | |
| x | x | x | - | - | |
| x (web application)) | - (infrastructure to share the ontologies) | x (web portal) | x (web application OnEX) | x (desktop GUI) | |
| x (web service: Ontology QueryService | - | x (web service to access and query available on tologies) | x (query ontology and mapping versions, statistics, diff, match via API) | unknown | |
The table provides a comparative overview of platforms that provide and apply life science ontologies. The systems are compared by different characteristics such as versioning support or search/navigation facilities.
Figure 1Versions of ontologies and entity sources and mappings among them. The figure shows the versioning of ontologies and entity sources and their interrelation using ontology, annotation, and evolution mappings.
Figure 2Overview of GOMMA's component-based infrastructure.
Figure 3The match process in GOMMA. GOMMA utilizes the sketched process to create ontology mappings. This process iteratively generates mappings between selected input ontologies and includes feedback from human experts.
Selected matchers of GOMMA
| Category | Name | Description |
|---|---|---|
| Linguistic Matcher | This matcher computes the linguistic similarity between two ontology concepts. The matcher is configured by two sets of attributes specifying which attribute values are used to align the concepts of O and O'. The linguistic similarity functions include nGram, Loom, and others. | |
| Child | The child matcher computes the similarity between two ontology concepts based on the similarity of their children. | |
| Path Matcher | The path matcher computes the similarity between two ontology concepts taking the paths from the concepts to their root element into account. Each path is represented by concatenating concept names. Finally, the matcher computes the linguistic similarity between the paths. | |
| Similarity Flooding | This structural matcher computes the similarity between two concepts based on the Similarity Flooding algorithm. | |
| Annotation-based | The annotation-based matcher computes the similarity between two ontology concepts by taking the associated entities into account. The matcher utilizes an annotation mapping to determine the degree of shared entities of two concepts to compare. The similarity functions include Dice, Jaccard, and Cosine. |
The table lists selected metadata-based and annotation-based matchers of GOMMA. The matchers take two ontologies O and O' as input and produce a set of correspondences interrelating the input ontologies. All matchers are configured by the similarity function that is used to compute the concept similarity of correspondences.
Figure 4Application scenario: Term enrichment analysis. The figure shows analysis results for a term enrichment analysis of a gene set using a hypergeometric test from the FUNC package [11]. The experiment was executed for two Gene Ontology Molecular Function (GO-MF) versions: 2009-09 (a) and 2011-03 (b). The gene and annotation set were not modified. Colored categories denote significantly enriched categories w.r.t. the used gene set and ontology version. The table (c) shows more detailed information for each significant category, e.g., the number of indirect (propagated) gene annotations (|A|).
Figure 5Evolution statistics in OnEX. The figure shows selected use cases of the web-based system Ontology Evolution Explorer (http://www.izbi.de/onex). The overview shows statistics for all ontologies currently integrated in OnEX. Tracking changes, the list of changed concepts and quantitative difference statistics are shown for the Mammalian Phenotype Ontology.
Evolution statistics for selected biomedical ontologies
| Name | O0 | On | |Co| | |Cn| | growthC | |R0| | |Rn| | growthR | growthC+R |
|---|---|---|---|---|---|---|---|---|---|
| Protein Protein Interaction Ontology | 2005-08 | 2009-06 | 194 | 960 | 4.95 | 211 | 1,006 | 4.77 | 4.85 |
| Biological Processes (GO) | 2002-12 | 2010-03 | 6,741 | 19,099 | 2.83 | 0 | 39,391 | 8.68 | |
| NCI Thesaurus | 2003-10 | 2009-12 | 28,740 | 77,448 | 2.69 | 33,847 | 86,803 | 2.56 | 2.62 |
| Cellular Components (GO) | 2002-12 | 2010-03 | 1,124 | 2,810 | 2.50 | 0 | 5,185 | 7.11 | |
| Chemical Entities of biomedical Interest | 2004-10 | 2009-08 | 10,236 | 24,225 | 2.37 | 11,592 | 43,085 | 3.72 | 3.08 |
| Mammalian Phenotype Ontology | 2005-08 | 2010-03 | 4,175 | 7,571 | 1.81 | 4,620 | 8,560 | 1.85 | 1.83 |
| Sequence Ontology | 2005-08 | 2010-03 | 981 | 1,764 | 1.80 | 1,181 | 2,014 | 1.71 | 1.75 |
| Molecular Functions (GO) | 2002-12 | 2010-03 | 5,298 | 9,487 | 1.79 | 0 | 10,972 | 3.86 | |
| Pathway Ontology | 2005-11 | 2010-03 | 427 | 751 | 1.76 | 478 | 923 | 1.93 | 1.85 |
| Zebrafish Anatomy | 2005-11 | 2009-12 | 1,389 | 2,431 | 1.75 | 4,272 | 8,819 | 2.06 | 1.99 |
| Cell Type Ontology | 2004-06 | 2010-01 | 687 | 1,049 | 1.53 | 1,251 | 1,799 | 1.44 | 1.47 |
| Plant Structure Ontology | 2005-07 | 2009-08 | 681 | 868 | 1.27 | 980 | 1,274 | 1.30 | 1.29 |
| Protein Modification Ontology | 2006-06 | 2009-02 | 1,074 | 1,338 | 1.25 | 1,568 | 1,982 | 1.26 | 1.26 |
| Adult Mouse Anatomy | 2005-08 | 2010-03 | 2,416 | 2,947 | 1.22 | 2,939 | 3,722 | 1.27 | 1.25 |
| Fly Anatomy | 2004-12 | 2010-02 | 6,090 | 6,707 | 1.10 | 9,826 | 12,319 | 1.25 | 1.20 |
| Flybase Controlled Vocabulary | 2004-12 | 2010-02 | 658 | 713 | 1.08 | 653 | 698 | 1.07 | 1.08 |
The table shows evolution statistics of analyzed ontologies taking different evolution measures such as the growth rate or the number of concepts/relationships into account. (|R| - number of relationships, |C| - number of concepts)
Figure 6Change history of GO:0003700 in OnEX. Detailed change history for concept GO:0003700 ("sequence-specific DNA binding transcription factor activity") using the concept-based analysis module of OnEX.
Figure 7Long-term region analysis for top-level concepts of NCI Thesaurus. Tracking of average costs for sample regions in NCI Thesaurus between 2004 and 2009.
Figure 8Comparative region analysis for top-level concepts of ChEBI. The figure shows the results of a region analysis for ChEBI top-level concepts. Red (green) categories evolved heavily (marginally) in the observation period and are thus unstable (stable). We analysed monthly released versions in 2009 (top) and 2010 (bottom).
Figure 9Stable and unstable ontology regions in GO Molecular Functions using the Region Analyzer. The figure shows the region stability of GO Molecular functions concepts between 2009-09 and 2011-03 (monthly versions). Red (green) categories evolved heavily (marginally) in the observation period and are thus unstable (stable). (a) Region stability of slim terms on the first level of GO Molecular function. (b) Region stability of the detected significant result concepts and their parents (from our application scenario in Figure 4).
Figure 10Complex change operations in Mammalian Phenotype Ontology (left) and ChEBI (right). The diff for both ontologies was computed between the versions 2009-12 and 2010-12.
Figure 11Complex change operations in GO Molecular Functions. The Figure shows three complex change operations that occurred in the region of the significant categories from our application scenario. The diff was computed between GO MF versions 2009-09 and 2011-03.