| Literature DB >> 29579141 |
Daniela Oliveira1, Anila Sahar Butt2, Armin Haller3, Dietrich Rebholz-Schuhmann1, Ratnesh Sahay1.
Abstract
MOTIVATION: Searching for precise terms and terminological definitions in the biomedical data space is problematic, as researchers find overlapping, closely related and even equivalent concepts in a single or multiple ontologies. Search engines that retrieve ontological resources often suggest an extensive list of search results for a given input term, which leads to the tedious task of selecting the best-fit ontological resource (class or property) for the input term and reduces user confidence in the retrieval engines. A systematic evaluation of these search engines is necessary to understand their strengths and weaknesses in different search requirements. RESULT: We have implemented seven comparable Information Retrieval ranking algorithms to search through ontologies and compared them against four search engines for ontologies. Free-text queries have been performed, the outcomes have been judged by experts and the ranking algorithms and search engines have been evaluated against the expert-based ground truth (GT). In addition, we propose a probabilistic GT that is developed automatically to provide deeper insights and confidence to the expert-based GT as well as evaluating a broader range of search queries.Entities:
Keywords: healthcare and life sciences; information retrieval; linked data; ontology; ranking algorithms; semantic Web
Mesh:
Year: 2019 PMID: 29579141 PMCID: PMC6781604 DOI: 10.1093/bib/bby015
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Notation used
| Variable | Description |
|---|---|
|
| Ontology collection |
| N | Number of ontologies in |
|
| An ontology: |
|
| Collection of all resources ( |
|
| A resource: |
| Q | Query string |
|
| Query word i of Q |
|
| Set of matched resources |
|
| Set of matched resources |
Summary of IR algorithms
| Algorithm | Scoring | Global | WPM | Remarks |
|---|---|---|---|---|
| tf-idf | Term frequency | No | No | Frequent resources in the collection have a low score. In ontologies, a common term does not necessarily mean less relevant. Frequent terms can be a product of reuse by other ontologies |
| BM25 | Term frequency | Yes | No | Suffers from the same issue has tf-idf, but the cumulative score ranks domain ontologies higher |
| VSM | Vector similarity | No | No | Uses tf-idf to weight vectors and also considers the tf-idf of the query, aggravating the tf-idf drawback |
| PageRank | Links between ontologies | Yes | No | Ranks based on popularity, which may lead to popular but less relevant resources, being ranked higher |
| CMM | Coverage of the set of queries | Yes | Yes | Ontologies with a large number of partial matches will be scored higher than ontologies with few exact matches |
| SMM | Closeness between ontological resources | Yes | No | Although this algorithm can be useful when considering similarity among the matched resources of two or more query terms of a multi-keyword query, it performs poorly on single-word queries |
Note: Scoring summarizes the main scoring method of the algorithm. Global indicates if the score attributed by the algorithm is per resource or per ontology. WPM (weights partial matches) shows if the ontology distinguishes between partial and exact matches.
Figure 1Evaluation workflow: from input search queries to evaluation results.
Ontologies used in this benchmark with name, acronym, number of triples and reference
| Name | Acronym | # Triples |
|---|---|---|
| Chemical Entities of Biological Interest Ontology [ | ChEBI | 8187078 |
| Cell Ontology [ | CL | 69796 |
| Human Disease Ontology [ | DOID | 203125 |
| The Drug Ontology [ | DRON | 138898 |
| EMBRACE Data And Methods [ | EDAM | 33300 |
| Experimental Factor Ontology [ | EFO | 469954 |
| Foundational Model of Anatomy [ | FMA | 612982 |
| Gene Ontology [ | GO | 1575776 |
| Human Phenotype Ontology [ | HP | 350017 |
| Mouse Adult Gross Anatomy Ontology [ | MA | 25523 |
| Mammalian Phenotype Ontology [ | MP | 335821 |
| Mouse Pathology Ontology [ | MPATH | 11992 |
| Neuro Behavior Ontology [ | NBO | 10376 |
| National Cancer Institute Thesaurus [ | NCIT | 5784846 |
| Ontology of Adverse Events [ | OAE | 54334 |
| Ontology of Genes and Genomes [ | OGG | 1211539 |
| Phenotypic Quality Ontology [ | PATO | 31644 |
| Plant Ontology [ | PO | 59932 |
| Uber Anatomy Ontology [ | UBERON | 690529 |
| Vertebrate Trait Ontology [ | VT | 44183 |
|
| WPhenotype | 31991 |
| Xenopus Anatomy and Development Ontology [ | XAO | 40611 |
| Zebrafish Anatomy and Development Ontology [ | ZFA | 82964 |
Cancer-related queries and their number of search results on Google, BioPortal and OLS, in April 2017
| Query terms | Abbreviation | Type | BioPortal | OLS | |
|---|---|---|---|---|---|
| Ovary | Ovary | Organ | 25.400.000 | 29 | 1054 |
| MYH7 | MYH7 | Gene | 86.500 | 8 | 22 |
| Paclitaxel | Paclitaxel | Drug | 4.640.000 | 18 | 149 |
| Carcinoma | Carcinoma | Disease | 32.800.000 | 25 | 4025 |
| Carboplatin | Carboplatin | Drug | 2.710.000 | 19 | 212 |
| Ovarian teratoma | OT | Tumour | 434.000 | 18 | 1164 |
| Ovarian cystadenoma | OCys | Tumour | 148.000 | 18 | 1100 |
| Ovarian choriocarcinoma | OChor | Tumour | 317.000 | 20 | 1129 |
| Ovarian embryonal carcinoma | OEC | Tumour | 164.000 | 19 | 5069 |
| Ovarian mucinous adenocarcinoma | OMA | Tumour | 117.000 | 15 | 2235 |
Level of self-accessed knowledge of the experts in the biomedical and knowledge engineering fields
| Expert | Biomedical | Works with BD | Produces BD | Applies BD | Knowledge engineering | Worked with Ont. | Developed a BmO |
|---|---|---|---|---|---|---|---|
| 1 | 5 | Yes | Yes | Yes | 2 | Yes | No |
| 2 | 3 | Yes | Yes | Yes | 5 | Yes | Yes |
| 3 | 5 | Yes | No | Yes | 5 | Yes | Yes |
| 4 | 4 | Yes | No | Yes | 5 | Yes | Yes |
| 5 | 5 | Yes | Yes | No | 4 | Yes | No |
| 6 | 4 | Yes | No | Yes | 5 | Yes | Yes |
| 7 | 5 | No | No | Yes | 3 | Yes | Yes |
| 8 | 4 | Yes | Yes | No | 3 | Yes | No |
| 9 | 5 | Yes | Yes | Yes | 5 | Yes | Yes |
| 10 | 5 | No | No | Yes | 5 | Yes | Yes |
|
|
| 8:2 | 5:5 | 8:2 |
| 10:0 | 7:3 |
Note: BD = biomedical data; ont. = ontology; BmO is biomedical ontology. Bold numbers distinguish which values refer to the average.
Expanded query set obtained from [62]
| Type | Queries |
|---|---|
| General | Concentration unit, daily living, electron microscopy, health belief, health services, body weight, cell mass, cell proliferation, disease staging, dose response, clinical trial, compound treatment, differential scanning calorimetry, growth protocol, high-performance liquid, high throughput, sequence alignment |
| Cell or tissue | Bone marrow, brown adipose, connective tissue, connective tissue development, granulosa cell, haemoglobin E |
| Anatomy | Collecting duct, digestive system, embryonic structure, frontal lobe, harderian gland, heart ventricle |
| Genetic | Copy number, gene expression phenotype, gene regulation, genetic modification |
| Condition | Convulsive status epilepticus, fatty liver, generalized anxiety, heart failure, heart rate, venous thrombosis |
| Disease | Breast cancer, eye disease, haemoglobin E thalassaemia, hepatitis b, hepatitis c, ovarian cancer |
| Disorders | Cystathione synthase deficiency, Dowling-Degos syndrome, epileptic encephalopathy, fever infection syndrome, Goldstein-Hutt, nephrotic syndrome |
Figure 2Box plot of the results of the GT questionnaire. The y axis displays the possible number of ranks for each item (i.e number of answers plus the additional not-relevant rank). The x axis shows the class id for each of the possible answers for the queries. The dotted line represents the median, and the dashed line represents the mean by which the results were ordered.
Ranking of ‘Carcinoma’ in the GT
| Rank | Mean | URI |
|---|---|---|
| 1 | 1.9 |
|
| 2 | 2.1 |
|
| 3 | 3.7 |
|
| 4 | 3.2 |
|
| 5 | 3.2 |
|
Figure 3Goodness-of-fit chi-square expected and observed results, represented by a line and bars, respectively. Each chart contains a bar for the number of rankings available for each query and one extra one representing the ranking of ‘Not-Relevant’ (NR). A bold and underlined query term indicates that the test rejected the null hypothesis, with α = 0.05.
Figure 4Relevancy and ranking agreement between the GT and the PGT.
AP@3. The colours code the AP@3 values and range from dark green (highest AP@3, i.e. 1.0) to red (lowest AP@3, i.e. 0.0)
|
|
Note: The last column and last row represent the mean of each column/row, colour coded from blue (high mean) to light yellow (low mean).
NDCG. The colours code the NDCG values and range from dark green (highest NDCG, i.e. 1.0) to red (lowest NDCG, i.e. 0.0)
|
|
Note: The last column and last row represent the mean of each column/row, colour coded from blue (high mean) to light yellow (low mean).
Figure 5NDCG, P@k, AP@k and MAP results for the 10 query collection, considering partial matches, against the GT.
Figure 6NDCG, P@k, AP@k and MAP results for the 10 query collection, considering exact matches only, against the GT.
Figure 7NDCG, P@k, AP@k and MAP results for the 10 query collection, considering exact matches only, against the PGT.
Correlation and average distance between GT and PGT results for each metric tested (P-value < 0.01)
| Metric | Pearson’s | Average distance |
|---|---|---|
| NDCG | 0.75 | 0.03 |
| P@K | 0.64 | 0.07 |
| AP@K | 0.69 | 0.05 |
| MAP | 0.93 | 0.05 |
Figure 8NDCG, P@k, AP@k and MAP results for the extended query and ontology collection, considering exact matches only, against the PGT.
Comparing the ranking for ‘ovary’ between the GT, BioPortal (BP), OLS, Solr and PGT
| Class URI | GT | BP | OLS | Solr | PGT |
|---|---|---|---|---|---|
|
| 1 | 1 | 1 | 3 | 1 |
|
| 2 | 4 | 3 | 2 | 4 |
|
| 3 | 2 | 5 | - | 5 |
|
| 4 | 3 | 4 | 1 | 3 |
|
| 5 | - | 2 | 4 | 2 |