| Literature DB >> 33319713 |
Ling Zheng1, Hua Min2, Yan Chen3, Vipina Keloth4, James Geller4, Yehoshua Perl4, George Hripcsak5.
Abstract
BACKGROUND: Summarization networks are compact summaries of ontologies. The "Big Picture" view offered by summarization networks enables to identify sets of concepts that are more likely to have errors than control concepts. For ontologies that have outgoing lateral relationships, we have developed the "partial-area taxonomy" summarization network. Prior research has identified one kind of outlier concepts, concepts of small partials-areas within partial-area taxonomies. Previously we have shown that the small partial-area technique works successfully for four ontologies (or their hierarchies).Entities:
Keywords: Auditing BioPortal ontologies; Biomedical ontologies; Meta-ontology; Ontology auditing scalability; Ontology error concentration; Ontology quality assurance; Summarization network
Mesh:
Year: 2020 PMID: 33319713 PMCID: PMC7737254 DOI: 10.1186/s12911-020-01311-x
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Glossary
| Term | Definition | Example |
|---|---|---|
| The subsumption relationship underlying the hierarchy of an ontology is called | A hierarchical | |
| Lateral relationship | The non-hierarchical semantic relationship is called lateral relationship, in contrast to the hierarchical | The NCIt concept |
| Area | An area is a group of all the concepts having exactly the same set of lateral relationship types | Figure |
| Partial-area | A partial-area is a subunit in an area defined by a root concept describing the semantic of the partial-area, including also its all descendant concepts within the area sharing the same semantic | Figure |
| Small partial-area | A partial-area is small if its size is not larger than a bound b, where b is a small number, typically lower or equal to 10 | The partial-area |
Fig. 1a An excerpt of 12 concepts from NCIt’s Gene hierarchy. Concepts are denoted by round-corner boxes and are connected by is-a relationships represented by upward arrows. Colored rectangles enclose concepts with the same set of relationship types (in bold). Root concepts are shown as bold boxes. b The area taxonomy for a. Areas are presented as colored boxes based on the number of relationship types, i.e., areas with the same number of relationship types have the same color. An area is labeled by the set of its relationship types and the number of concepts that it summarizes in parentheses. Areas are connected by child-of links shown as bold upward arrows. c The partial-area taxonomy for a. Partial-areas are shown as white boxes inside areas. A partial-area is labeled by its root concept and the number of concepts that it summarizes in parentheses. Partial-areas are connected by child-of links represented as bold arrows, as in the area taxonomy
Fig. 2The structured-based meta-ontology for BioPortal ontologies in August 2019
Fig. 3The flow chart summarizing the process of the small partial-area based QA study
The distribution of complete SNOMED CT specimen concepts, sample concepts and erroneous concepts by partial-area size
| Partial-area size | # of partial-areas | # of concepts | # of sample concepts | # of erroneous concepts | Error percentage (%) |
|---|---|---|---|---|---|
| 1 | 345 | 345 | 22 | 3 | 13.6 |
| 2 | 72 | 120 | 8 | 1 | 12.5 |
| 3 | 25 | 61 | 4 | 2 | 50.0 |
| 4 | 12 | 40 | 3 | 1 | 33.3 |
| 5 | 11 | 39 | 2 | 1 | 50.0 |
| 6 | 10 | 51 | 3 | 0 | 0 |
| 7 | 7 | 36 | 2 | 1 | 50.0 |
| 8 | 4 | 28 | 2 | 0 | 0 |
| 9 | 6 | 52 | 3 | 2 | 66.7 |
| 10 | 2 | 10 | 1 | 0 | 0 |
| > 10 | 36 | 681 | 50 | 3 | 6.0 |
| Total | 530 | 1463 | 100 | 14 | 14 |
Four examples of errors for SNOMED CT specimen concepts identified in the review
| Concept | Partial-area size | Error | Suggested correction |
|---|---|---|---|
| Urethra biopsy sample | 1 | The target | Replace with |
| Bursa tissue sample | 2 | Incorrect parent concept | Change to |
| Tissue specimen from eye | 9 | The target | Replace with |
| Extradural lesion sample | 22 | The target | Replace with |
The 2 × 2 contingency table for erroneous small partial-area concepts and erroneous large partial-area concepts in the SNOMED CT Specimen hierarchy (with a two-tailed p value = 0.0226 < 0.05 by Fisher’s exact test)
| # Erroneous concepts | # Concepts w/o errors | Error percentage (%) | |
|---|---|---|---|
| Small partial-areas (1–9) | 11 | 38 | 22.4 |
| Large partial-areas (≥ 9) | 3 | 48 | 5.9 |
The distribution of complete NCIt Gene concepts, sample concepts and erroneous concepts by partial-area size
| Partial-area size | # of partial-areas | # of concepts | # of sample concepts | # of erroneous concepts | Error percentage (%) |
|---|---|---|---|---|---|
| 1 | 5450 | 5450 | 10 | 9 | 90 |
| 2 | 90 | 180 | 5 | 4 | 80 |
| 3 | 4 | 12 | 5 | 1 | 20 |
| 4 | 5 | 20 | 5 | 3 | 60 |
| 5 | 2 | 10 | 5 | 3 | 60 |
| 6 | 1 | 6 | 5 | 3 | 60 |
| 7 | 2 | 14 | 5 | 2 | 40 |
| 8 | 2 | 16 | 5 | 1 | 20 |
| 10 | 1 | 9 | 5 | 3 | 60 |
| > 10 | 37 | 4288 | 50 | 33 | 66 |
| Total | 5594 | 10,005 | 100 | 62 | 62 |
Five examples of errors for NCIt Gene concepts identified in the review
| Concept | Partial-area size | Error | Suggested correction |
|---|---|---|---|
| RBM5 wt Allele | 1 | Missing the relationship | Add the relationship |
| NUP98 Gene | 1 | Missing the relationship | Add the relationship |
| ZNF365 Gene | 2 | Missing the relationship | Add the relationship |
| BCAR4 wt Allele | 5 | Missing the relationship | Add the two relationships |
| BRS3 Gene | 654 | Missing the relationship | Add the relationship |
The 2 × 2 contingency table for erroneous small partial-area concepts and erroneous large partial-area concepts in the NCIt’s Gene hierarchy (with a two-tailed p value = 0.043 < 0.05 by Fisher’s exact test)
| # Erroneous concepts | # Concepts w/o errors | Error percentage (%) | |
|---|---|---|---|
| Small partial-areas (1–2) | 13 | 2 | 86.7 |
| Large partial-areas (≥ 3) | 49 | 36 | 57.6 |