| Literature DB >> 21624160 |
Robert Stevens1, James Malone, Sandra Williams, Richard Power, Allan Third.
Abstract
BACKGROUND: Text definitions for entities within bio-ontologies are a cornerstone of the effort to gain a consensus in understanding and usage of those ontologies. Writing these definitions is, however, a considerable effort and there is often a lag between specification of the main part of an ontology (logical descriptions and definitions of entities) and the development of the text-based definitions. The goal of natural language generation (NLG) from ontologies is to take the logical description of entities and generate fluent natural language. The application described here uses NLG to automatically provide text-based definitions from an ontology that has logical descriptions of its entities, so avoiding the bottleneck of authoring these definitions by hand.Entities:
Year: 2011 PMID: 21624160 PMCID: PMC3102894 DOI: 10.1186/2041-1480-2-S2-S5
Source DB: PubMed Journal: J Biomed Semantics
Figure 1OWL and natural language definitions for the HeLA cell-line. This shows an example of the OWL and hand-written textual definition for the HeLa cell line class as seen in EFO. We can see from this that the definitions are similar, in that they both say what a HeLa cell is, but the hand-written one brings in more background information, such as the name of the individual whence the cells came. The rightmost pane shows the definition generated by the version of our program used for the second evaluation study.
Figure 2Ungrouped, grouped, and grouped and aggregated verbalisations of OWL descriptions. The left-hand box (A) shows a list of ungrouped sentences, each representing an axiom from EFO. The sentences appear in the order in which they occur in the input file. The middle box (B) shows the emboldened sentences from (a) sentences grouped according to the ‘subject’ or topic of the sentence. In this case, a genetic disorder. This gathers all the sentences pertinent to genetic disorder into one place. The right-hand box (C) shows the aggregated version of the grouped output. The repetition of the subclass axioms is replaced by a list construct.
Comparisons of OWL verbalisers
| System | Tbox | Abox | Coverage | Grouping | Aggregation | Lexicon | Domain |
|---|---|---|---|---|---|---|---|
| ACE [ | Yes | Yes | OWL-2 | Yes | No | Automatic | Generic |
| ROA [ | Yes | Yes | Unclear | Unclear | Yes | Automatic | Generic |
| SWOOP [ | Yes | Unclear | OWL-DL | No | No | Automatic | Generic |
| MIAKT [ | No | Yes | RDF | Yes | Yes | Handcrafted | Specific |
| NaturalOWL [ | Yes | Yes | OWL-DL | Yes | Yes | User-defined | Specific |
| GINO [ | Yes | Unclear | Unclear | No | No | Automatic | Generic |
| LIBER [ | No | Yes | RDF | Yes | Yes | User-defined | Specific |
| SWAT Tools | Yes | Yes | OWL-2 | Yes | Yes | Automatic | Generic |
Comparison of OWL verbalisers. ‘TBox’ contains the ontology’s classes and properties; ‘ABox’ contains the individuals and the assertions upon them. Coverage is approximate: for instance, ACE and SWAT cover nearly all of OWL-2, but with a few omissions. ‘Lexicon’ indicates the source of lexical entries for atomic entities; ‘Domain’ is generic if the system can produce text for any ontology (of the stated OWL coverage) with English-based names, and specific if handcrafted lexical entries or grammar rules are needed.
Results from Survey One
| Judgements | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Totals | 5.9% (11) | 9.1% (17) | 27.3% (51) | 32.1% (60) | 25.7% (48) |
Summary of survey results on natural language definitions from iteration 1. Question: ‘How understandable are the definitions?’ Judgements range from 1 (not understandable) to 5 (understandable). The survey was completed by 21 people (questions did not require an answer).
Examples of output used in Survey One
| Class label | OWL axioms (Manchester syntax) | Generated Natural Language Definition |
|---|---|---|
| 22rv1 | SubClassOf: ’cell line’ | A 22rv1 is a cell line. A 22rv1 is all of the following: something that is bearer of a prostate carcinoma, something that derives from a homo sapiens, and something that derives from a prostate. |
| HeLa | SubClassOf: ’cell line’ | A he la is a cell line. A he la is all of the following: something that is bearer of a cervical carcinoma, something that derives from a homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix. |
| Ara-C-resistant murine leukemia | SubClassOf: ’cell line’ | A ara c resistant murine leukemia is a cell line. A b117h, and a b140h are kinds of ara c resistant murine leukemias. |
| GM18507 | SubClassOf: ’cell line’ | A gm18507 is a cell line. A gm18507 is all of the following: something that has as quality a male, something that derives from a homo sapiens, and something that derives from a lymphoblast. |
| BDCM | SubClassOf: ’cell line’ | A bdcm is a cell line. |
Example of natural language definitions generated from corresponding OWL axioms from the first iteration (Survey 1). *Note: these subclass relations are placed on the subclasses but we illustrate them here for context.
Results from Survey Two
| Judgements | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| Totals | 3.6% (5) | 5.0% (7) | 10.8% (15) | 37.4% (52) | 43.2% (60) |
Summary of results on natural language definitions for the second iteration (Survey 2 Part 1). Question ‘How well does the text capture the OWL meaning?’ Judgements range from 1 (Not at all) to 5 (Totall captured meaning). The survey was completed by 16 people (questions did not require an answer).
Examples of output Used in Survey Two
| Class label | OWL axioms (Manchester syntax) | Generated Natural Language Definition |
|---|---|---|
| HeLa | bearer_of some ’cervical carcinoma’ | A HeLa is all of the following: something that is bearer of a cervical carcinoma, something that derives from a Homo sapiens, something that derives from an epithelial cell, and something that derives from a cervix. A HeLa is a cell line. |
| 4470 | derives_from some ’Mus musculus’ | A 4470 is both something that derives from a Mus musculus, and something that derives from a bone marrow. A 4470 is a cell line. |
| Ara-C-resistant murine leukemia cell line | SubClassOf: ’cell line’ | An Ara-C-resistant murine leukemia is a cell line. B117Hs, and B140Hs are Ara-C-resistant murine leukemias. An Ara-C-resistant murine leukemia derives from a Mus musculus. |
| genetic disorder | SubClassOf: disease | A genetic disorder is a disease. No genetic disorder is any of the following: a normal or an uninfected. |
Example of natural language definitions generated from corresponding OWL axioms for the second iteration (Survey 2). *Note: these subclass relations are placed on the subclasses but we illustrate them here for context.
Figure 3Alternative renderings for Survey Two. Alternative renderings for a selection of definitions (Survey 2, Part 2). Participants were asked, in two separate questions, to pick which they thought was the most natural to read and which best captured the meaning of the OWL.
Figure 4The architecture of the OWL verbaliser. Architecture of the natural language definition generator.