| Literature DB >> 23046606 |
Daniel Schober1, Ilinca Tudose, Vojtech Svatek, Martin Boeker.
Abstract
BACKGROUND: Although policy providers have outlined minimal metadata guidelines and naming conventions, ontologies of today still display inter- and intra-ontology heterogeneities in class labelling schemes and metadata completeness. This fact is at least partially due to missing or inappropriate tools. Software support can ease this situation and contribute to overall ontology consistency and quality by helping to enforce such conventions.Entities:
Year: 2012 PMID: 23046606 PMCID: PMC3448530 DOI: 10.1186/2041-1480-3-S2-S4
Source DB: PubMed Journal: J Biomed Semantics
Requirements for a naming convention and metadata verification tool
| Requirement | Aspects met and Implementation | OntoCheck Panel |
|---|---|---|
| Easy installation, usage and intuitive navigation. | Protégé plugin, structured into 3 self-explaining tabs. Tooltips providing on-the-spot guidance. | All |
| Generation and display of numeric counts for selectable ontology metrices. | Making use of the Protégé and Java API, diverse metrices are available, amending the already present 'Ontology Metrics'. | All |
| Selection of an 'entry class node' from where on - leaf-wards - a check should be done. | Allows to test for a certain postfix e.g. '_Disposition' only within a selected 'Disposition' entry node sub-tree. Allows checking for metadata availability in selectable subtrees. | All |
| Display of classes failing a specified test and export as list. | Found classes can be sorted according to different criteria and exported for later curation. | All |
| Display of quantitative results on detected issues in terms of absolute and percentage counts in a given subtree. | A statistical data pane verbalizes the numerical results in a copyable natural language sentence. | All |
| Storage and reload capabilities for created checks allowing for later re-use and propagation. | An xml file is generated storing all checks in a reproducible way. | All |
| Detection for 'presence' and 'required cardinality' of labels and metadata. | Checks are available on OWL elements capturing lexical information, i.e. rdf:ID, rdfs:label, own annotation properties and standard annotation properties e.g. from Dublin Core or SKOS. | Check |
| Check for syntactical and typographical patterns and label length i.e. to discover too short or too long names within string values of selectable entities. | Allows checking naming conventions via simple string matches and full regular expressions. Checks the length of labels. A significant fraction of the OBO Foundry naming conventions can be checked, i.e. case, separator but also morphemic conventions. | Check |
| Detection and counts of redundant class labels. | Label repetition can be checked for via the ComparePanel. | Compare |
| Comparison of values between pairs of entities to detect similarities and avoid redundancies. | Operators like equals, contains or starts with can be used to compare selectable entities. | Compare |
| Quantification of ontology measures useful for ontology evaluation, progress monitoring and complexity analysis. | Displays the percentage or absolute number of entities having 'exactly', 'at least' or 'at most' a certain number of annotation properties, direct sub-/superclasses, or 'usages', i.e. indicating 'hub nodes'. | Count |
The high level requirements are listed in the first column followed by their specific implementations, indicating the extend of requirement fulfilment in our tool. The last column indicates in which tab the function is implemented.
Figure 1The Check panel. The Check panel displays the specification (left) of a test for an 'all lower case, space separator' naming convention on rdfs:label for the active ontology. The 'statistical data' view (middle) lists a history of launched checks and quantifies their results in terms of absolute amount of classes failing a test. Percentages are given with respect to the overall number of entry node descendants. One of the result classes, 'BuildingPart', is activated to show the found violation in the label, which is MixedCase as seen in the metadata pane below. Clicking on a class in the result pane (right) will activate it in the class hierarchy pane (left) opening a metadata edit pane to allow for corrections (below). The lower right corner shows how a file name and location can be selected to export the result list.
Figure 2The OntoCheck Compare panel. The Compare panel displaying a check that verifies whether the rather dynamic rdfs:label still matches a previously given static semantic ID (OWLClassName), given word separators are ignored. A considerable amount of classes is found, i.e. the detected class 'Nitroimidazole' (marked), which have deviant labels, i.e. here 'Nitromidazole' (without "i" after "Nitro", see annotation metadata below).
Figure 3The OntoCheck Count panel. A count for 'hub-node' classes is carried out over the whole Biotop ontology (entry node is Thing). A list of 23 classes used more than 10 times is displayed in the Result classes' pane.
Exemplary OntoCheck tests with quantification of detected violations
| Ontology | Entry Node | Entity | Panel | Check | Classes |
|---|---|---|---|---|---|
| BioTop | root | <rdfs:label> | Check | Upper case start | 12 (4) |
| BioTop | root | <owl:Class rdf:about> | Check | CamelCase | 34 (8) |
| DCO | root | <ru-meta:definition> | Check | Min card.=1 | 37 (8) |
| DCO | 'Disease' | <SNOMED_ID> | Check | Min card.=1 | 2 (2) |
| DCO | root | <ru-meta:synonym> | Count | Min card.>2 | 238 (40) |
| DCO | root | <ru-meta:shortLabel> | Check | Max Char Count < 20 | 3 (.5) |
| DCO | root | n/a | Count | CountClsHavingAtLeast15Subclasses | 15 (1) |
| DCO | root | n/a | Count | CountClsUsedAtLeast15times | 48 (3.3) |
| NTDO | root | <rdfs:label> | Check | Doesn'tContain'Class'or'class' | 3 (1) |
| Good | root | <rdfs:label> | Check | Min card.=1 | 6 (15) |
| Vertical | root | <rdfs:label> | Check | Length regex.{4,50}+ | 1 (1.5) |
| Vertical | root | <rdf:ID> | Check | Doesn'tContain'Or' | 7 (10) |
| Vertical | root | n/a | Count | ClsUsedOnlyOnce | 13 (20) |
| @neurist | root | n/a | Count | CountClsHavingExactlyOneSubclass | 150 (5.3) |
'Entry Node' refers to the selected class in the hierarchy for which all descendants are tested. The entity selected to be checked is described via its OWL syntax element. The last column indicates the amount of found classes violating (Check panel) or fulfilling (Count panel) a specified pattern. For the naming checks 'abs' refers to the absolute count of entities of the specified type failing the test. '%' refers to the ratio of abs to the amount of all entry node descendants.