| Literature DB >> 23046555 |
Anika Oellrich1, Georgios V Gkoutos, Robert Hoehndorf, Dietrich Rebholz-Schuhmann.
Abstract
Researchers use animal studies to better understand human diseases. In recent years, large-scale phenotype studies such as Phenoscape and EuroPhenome have been initiated to identify genetic causes of a species' phenome. Species-specific phenotype ontologies are required to capture and report about all findings and to automatically infer results relevant to human diseases. The integration of the different phenotype ontologies into a coherent framework is necessary to achieve interoperability for cross-species research.Here, we investigate the quality and completeness of two different methods to align the Human Phenotype Ontology and the Mammalian Phenotype Ontology. The first method combines lexical matching with inference over the ontologies' taxonomic structures, while the second method uses a mapping algorithm based on the formal definitions of the ontologies. Neither method could map all concepts. Despite the formal definitions method provides mappings for more concepts than does the lexical matching method, it does not outperform the lexical matching in a biological use case. Our results suggest that combining both approaches will yield a better mappings in terms of completeness, specificity and application purposes.Entities:
Year: 2012 PMID: 23046555 PMCID: PMC3448526 DOI: 10.1186/2041-1480-3-S2-S1
Source DB: PubMed Journal: J Biomed Semantics
Content of both generated mappings
| HP | MP | |||||
|---|---|---|---|---|---|---|
| # concepts | 10104 | 100% | - | 8507 | 100% | - |
| # with formal definition | 4860 | 48.10% | - | 5389 | 63.35% | - |
| # mapped with lexical | 2740 | 27.12% | 7.17 | 1046 | 12.30% | 6.97 |
| # mapped with ontological | 8184 | 80.10% | 5.48 | 4446 | 52.26% | 6.64 |
Illustrates the numbers of concepts contained in each ontology but also incorporates the results of the mapping methods. % total: percentages calculated based on the total number of concepts in the ontology; avg # mapped: is the average number of concepts mapped to one particular concept in the ontology.
Figure 1Overlap groups obtained when comparing both mappings directly. Shows the different types of obtained overlap while directly comparing the mappings generated by both methods, regardless of the ontology the mapping is provided for. The amount of mapped concepts for the formal definitions method is represented with a yellow circle and the lexical matching is illustrated with a turquoise circle. We identified the following five categories: a) exact (both lexical matching and formal definitions method generated exactly the same list of mapped concepts), b) formal ⊂ lexical (mapping generated by the formal definitions method is a subset of the list generated by lexical matching), c) lexical ⊂ formal (mapping generated by lexical matching is a subset of the list generated by the formal definitions method), d) overlap (both lists contain additionally mapped concepts and share only a certain overlap), and e) nothing (despite both methods generating a list of mapped concepts for a specific concept, both lists have nothing in common).
Coverage overlap groups when comparing both mappings
| HP to MP | MP to HP | |
|---|---|---|
| # exact | 155 | 70 |
| # lexical ⊂ formal | 755 | 287 |
| # formal ⊂ lexical | 496 | 114 |
| # overlap | 952 | 215 |
| # nothing | 74 | 0 |
| # concepts | 2432 | 686 |
Illustrates the amount of mappings falling into each of the overlap categories when both methods are compared. The mappings for HP to MP and MP to HP are compared independently due to non-symmetrical mappings.
Figure 2Receiver Operating Characteristics. Shows the Receiver Operating Characteristic (ROC) curves for both scenarios: the left panel illustrating the case where alleles are "translated" to HP and the right illustrating the case where diseases are "translated" to MP. In the first scenario the application of the lexical mappings (AUC: 0.74) seems to have better performance than the formal definitions mappings (AUC: 0.72), whereas in the second scenario the formal definitions mappings (AUC: 0.66) seem to yield better results in the biological use case than the lexical mappings (AUC: 0.61).