| Literature DB >> 20626921 |
Clement Jonquet1, Mark A Musen, Nigam H Shah.
Abstract
BACKGROUND: Researchers in biomedical informatics use ontologies and terminologies to annotate their data in order to facilitate data integration and translational discoveries. As the use of ontologies for annotation of biomedical datasets has risen, a common challenge is to identify ontologies that are best suited to annotating specific datasets. The number and variety of biomedical ontologies is large, and it is cumbersome for a researcher to figure out which ontology to use.Entities:
Year: 2010 PMID: 20626921 PMCID: PMC2903720 DOI: 10.1186/2041-1480-1-S1-S1
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Recommender service workflow.
Figure 2NCBO Annotator web service workflow.Direct annotations are created from raw text based on syntactic concept recognition (concepts names & synonyms). Next, different components expand the first set of annotations using the knowledge represented in one or more ontologies.
Annotation weights per context
| Annotation context | Weights |
|---|---|
| Direct annotation done with a concept preferred name | 10 |
| Direct annotation done with a concept synonym | 8 |
| Expanded annotation done with a mapping | 7 |
| Expanded annotation done with a parent level n(e.g., 9 for n=1; 7 for n=2; 4 for n=5; 3 for n=8; 1 for n>12) | 1+10.e-0.2*n |
Recommender’s heuristics and corresponding research questions.
| Annotator’s method | Output value | Question |
|---|---|---|
| CR | score | Which ontologies offer maximum coverage for a set of data? |
| CR+M | score | Which ontologies are reference ontologies for a set of data? |
| CR | normalized-score | Which small ontologies are specialized for a set of data? |
Figure 3Recommender web service user interface.A user can select the recommendation scenario, as well as the repository of ontologies to use, and enter the text data to recommend. A tag cloud is generated in which the score of an ontology is represented by the size of its name in the cloud.
Source and size of the six datasets.
| Dataset | Source | Size |
|---|---|---|
| Provided by evaluator | 420 | |
| Methods section of 3 papers about ECG-related paper | 2750 | |
| Provided by evaluator | 9615 | |
| Concatenated ‘name’, description’ and ‘species’ sections of 30 randomly selected ArrayExpress entries | 6520 | |
| Provided by evaluator | 72 | |
| National Comprehensive Cancer Network (NCCN) Breast Cancer Guideline | 12540 |
Evaluation of Recommender results.
| Method | Output | UC1-key-word | UC1-corpus | UC2-key-word | UC2-corpus | UC3-key-word | UC3-corpus |
|---|---|---|---|---|---|---|---|
| Score | 5 | 4.5 | 4 | 3 | 5 | 5 | |
| Normalized score | 4.5 | 4.5 | 4 | 2 | 2 | 1 | |
| Score | 4 | 4 | 3 | 4 | 4.5 | 4.5 | |
| Normalized score | 3.5 | 4 | 2.5 | 2 | 1 | 1 | |
Figure 4Comparison of ontology selection approaches.
Addressing of each questions (automation, speed and accuracy).
| Question – Recommender’s method | Automation | Fast enough | Accuracy |
|---|---|---|---|
| Which ontologies offer the maximum coverage for my data? – (CR – score) | Yes | Yes | Yes |
| Which ontologies are reference ontologies for my data? – (CR+M – score) | Yes | No | Yes |
| Which small ontologies are specialized for my data? – (CR – normalized score) | Yes | Yes | Not enough |
| [for keyword-based recommendation] | ||
| [for corpus-based recommendation] | ||
| [for UMLS Metathesaurus repository] | ||
| [for NCBO BioPortal repository] | ||
| [for both repositories] | ||
| [for CR+M method] | ||
| [for CR method] |