| Literature DB >> 30200874 |
Mazen Alobaidi1,2, Khalid Mahmood Malik3, Susan Sabra1.
Abstract
BACKGROUND: Fulfilling the vision of Semantic Web requires an accurate data model for organizing knowledge and sharing common understanding of the domain. Fitting this description, ontologies are the cornerstones of Semantic Web and can be used to solve many problems of clinical information and biomedical engineering, such as word sense disambiguation, semantic similarity, question answering, ontology alignment, etc. Manual construction of ontology is labor intensive and requires domain experts and ontology engineers. To downsize the labor-intensive nature of ontology generation and minimize the need for domain experts, we present a novel automated ontology generation framework, Linked Open Data approach for Automatic Biomedical Ontology Generation (LOD-ABOG), which is empowered by Linked Open Data (LOD). LOD-ABOG performs concept extraction using knowledge base mainly UMLS and LOD, along with Natural Language Processing (NLP) operations; and applies relation extraction using LOD, Breadth first Search (BSF) graph method, and Freepal repository patterns.Entities:
Keywords: Linked open data; Ontology generation; Semantic enrichment; Semantic web
Mesh:
Year: 2018 PMID: 30200874 PMCID: PMC6131949 DOI: 10.1186/s12859-018-2339-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
A comparison of LOD-ABOG with existing knowledge base approaches
| Modules | Approaches | |||
|---|---|---|---|---|
| Harris et al. (2015) | Cahyani et al. (2017) | Qawasmeh et al. (2018) | Proposed Approach | |
| Text processing | ||||
| Methods | NLP | NLP | Manual | NLP |
| Concept Extraction | ||||
| Methods |
| Manual | ||
| Evaluation | Accuracy 60% (domain independence), 90% domain specific | Accuracy 72% (represent concepts and relations) | Not available | recall 81.13%, precision 45.29%, F-measure 58.12% |
| Relation Extraction | ||||
| Methods |
|
|
| |
| Evaluation | Accuracy 31–67% | Accuracy 72% (represent concepts and relations) | Accuracy in range (15–50%) | Recall 63.82%, Precision 66.77%, F-measure 65.26% |
| Type of extracted data | List of concepts, relations between them, and synonyms | List of concepts, and relations between them | List of classes, relations between them, and instances of these class | OWL Ontology |
Fig. 1Illustration of framework LOD-ABOG Architecture
The main modules of LOD-ABOG
| Module Name | Functionality |
|---|---|
| NLP | Performs the linguistic analysis tasks such as tokenization, segmentation, Part-of-Speech (POS) [ |
| Entity Discovery | Identifies biomedical concepts from free-form text by UMLS and LOD authentication |
| Semantic Entity Enrichment | Identifies biomedical concepts from free-form text using UMLS and LOD |
| RDF Triple Extraction | Extracts well-defined information and URIs, as well as taxonomic relations to enrich discovered concepts using LOD. |
| Syntactic Patterns | Extracts non-taxonomic relations by identifying triples within a sentence that match predefined patterns of words against the input |
| Ontology Factory | Generates the ontology with respect to RDF, RDFS, OWL and SKOS schemas. |
Fig. 2An example of semantic entity enrichment output
URIs that represent concept “Ileus”
| URI1= | |
| URI2= | |
| URI3= | |
| URI4= | |
| URI5= | |
| URI6= | |
| URI7= |
Fig. 3Syntactic Patterns Module Workflow
Patterns and their corresponding observed relations and mapping predicates
| Pattern | Observed Relations in Freepal | Predicates in lifesci |
|---|---|---|
| [X] causes by [Y] | ns:medicine.disease.causes |
|
| [X] disability [Y] | ns:medicine.symptom.symptom_of |
|
| [X] treatment of [Y] | treatrel.used_to_treat |
|
| [X] drug treatment [Y] | treatrel.used_to_treat |
|
| [X] cancer [Y] | ns:medicine.risk_factor.diseases |
|
| example of [X] include [Y] | s:medicine.drug_class.drugs |
|
Fig. 4A simplified partial example of ontology generated by LOD-ABOG
LOD-ABOG Ontology Relations
| Semantic Enrichment/Triple Candidate | Ontology Relation |
|---|---|
| Concept | owl:class |
| Synonym | owl:equivalentClass, skos:altLabel |
| PrefLabel | skos:prefLabel |
| Is-a | rdfs:subClassOf |
| Concept scheme resource | skos:inScheme |
| High ranked URI | rdf:ID |
| Most high ranked URIs | owl:sameAs |
| Semantic type | rdf:type |
| Definition | skos:definition |
Comparison of different methods for concepts discovery
| Method | Concepts Discovery | ||
|---|---|---|---|
| Recall % | Precision % | F-Measure % | |
| UMLS | 63.12 | 22.53 | 33.20 |
| LOD | 77.01 | 23.36 | 35.84 |
| UMLS + LOD | 81.13 | 45.29 | 58.12 |
Evaluation of hierarchy extraction results
| Hierarchical Relation Extraction | |||
|---|---|---|---|
| Recall % | Precision % | F-Measure % | |
| Disease Concepts | 77.44 | 80.11 | 78.75 |
| Chemical Concepts | 50.20 | 53.43 | 51.76 |
| Disease + Chemical Concepts | 63.82 | 66.77 | 65.26 |
Evaluation of non-hierarchy extraction results
| Non-Hierarchical Relation Extraction | ||
|---|---|---|
| Recall % | Precision % | F-Measure % |
| 77.20 | 40.1 | 52.78 |
Fig. 5Results Evaluation of the primary ontology generation tasks in LOD-ABOG
Fig. 6Comparison of Recall between LOD-ABOG and OntoGain Framework
Fig. 7Comparison of Precision between LOD-ABOG and OntoGain Framework
Comparison of results with baseline ontology (Alzheimer ontology)
| Extraction | Recall % | Precision % | F-measure % |
|---|---|---|---|
| Concepts | 87.28 | 62.50 | 72.48 |
| Relations | 77.47 | 75.12 | 76.27 |
| Properties | 87.21 | 79.68 | 83.28 |
Comparison of results with SemMedDB
| Extraction | Recall % | Precision % | F-Measure % |
|---|---|---|---|
| concepts | 89.34 | 75.23 | 81.68 |
| Hierarchy relations | 82.64 | 72.86 | 77.44 |
| Non-Hierarchy relations | 45.25 | 81.25 | 58.12 |