| Literature DB >> 19478020 |
Karin Verspoor1, Daniel Dvorkin, K Bretonnel Cohen, Lawrence Hunter.
Abstract
MOTIVATION: It is important for the quality of biological ontologies that similar concepts be expressed consistently, or univocally. Univocality is relevant for the usability of the ontology for humans, as well as for computational tools that rely on regularity in the structure of terms. However, in practice terms are not always expressed consistently, and we must develop methods for identifying terms that are not univocal so that they can be corrected.Entities:
Mesh:
Year: 2009 PMID: 19478020 PMCID: PMC2687949 DOI: 10.1093/bioinformatics/btp195
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Number, mean and maximum (max) size of clusters for each xyz transformation combination
| Count | Mean | Max | Count | Mean | Max | ||
|---|---|---|---|---|---|---|---|
| 000 | 23 478 | 1.088 | 29 | 100 | 12 704 | 2.010 | 2999 |
| 001 | 23 395 | 1.092 | 29 | 101 | 12 594 | 2.028 | 3003 |
| 010 | 23 400 | 1.091 | 31 | 110 | 12 564 | 2.033 | 3012 |
| 011 | 23 294 | 1.096 | 31 | 111 | 12 354 | 2.067 | 3054 |
x is abstraction, y is stopword removal and z is token reordering.
Fig. 1.Log distribution of cluster sizes, xyz=000.
Fig. 3.Log distribution of cluster sizes, xyz=111.
Breakdown of the 100 clusters by abstraction type
| Abstraction | Count | Percentage |
|---|---|---|
| CTERM only | 2489 | 20 |
| GTERM only | 3840 | 30 |
| Both CTERM and GTERM | 1415 | 11 |
| No abstraction | 4960 | 39 |
Results of heuristic search for univocality violations
| No. of clusters | Proportion (%) | |
|---|---|---|
| Total candidates | 237 | |
| Identical | 47 | |
| False positive | 123 | 65 |
| True positive | 67 | 35 |
Breakdown of false positives
| No. of clusters | False positives proportion (%) | |
|---|---|---|
| Semantic import of stopword | 61 | 50 |
| Non-parallel structure | 33 | 27 |
| Semantic import of stemming | 21 | 17 |
| Syntactic variation | 6 | 5 |
| Semantic import of word order | 1 | 1 |
| Misclassified content word | 1 | 1 |