| Literature DB >> 33319709 |
Ling Zheng1, Yan Chen2, Hua Min3, P Lloyd Hildebrand4, Hao Liu5, Michael Halper6, James Geller5, Sherri de Coronado7, Yehoshua Perl5.
Abstract
BACKGROUND: Ontologies house various kinds of domain knowledge in formal structures, primarily in the form of concepts and the associative relationships between them. Ontologies have become integral components of many health information processing environments. Hence, quality assurance of the conceptual content of any ontology is critical. Relationships are foundational to the definition of concepts. Missing relationship errors (i.e., unintended omissions of important definitional relationships) can have a deleterious effect on the quality of an ontology. An abstraction network is a structure that overlays an ontology and provides an alternate, summarization view of its contents. One kind of abstraction network is called an area taxonomy, and a variation of it is called a subtaxonomy. A methodology based on these taxonomies for more readily finding missing relationship errors is explored.Entities:
Keywords: Abstraction network; Error concentration; Missing relationship error; National Cancer Institute thesaurus (NCIt); Omission error; Ontology modeling; Ontology quality assurance; SNOMED CT; Taxonomy
Mesh:
Year: 2020 PMID: 33319709 PMCID: PMC7737264 DOI: 10.1186/s12911-020-01319-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Concept Cellular Process from NCIt shown in Protégé, including the subclass (IS-A) relationship to Biological Process, and the relationship (role) Biological Process Has Associated Location to Cell
Relationships in NCIt’s Biological Process hierarchy and their abbreviations
| Relationship | Abbreviated name |
|---|---|
Fig. 2a Excerpt of 13 concepts from the NCIt’s Biological Process hierarchy. Upward arrows represent IS-A relationships. Concepts with the same set of relationships are enclosed in a common, colored area. E.g., Cancer Cell Growth Regulation and Morphogenesis have one relationship Part of Process. Areas with the same number of relationships have the same color. E.g., the area {Location} and the area {Part of Process} are green. Area roots, e.g., Cellular Process, have bold outlines. b Area taxonomy for a, composed of five areas. Areas are represented by colored boxes labeled with their sets of relationships and numbers of concepts. They are organized in color-coded levels, according to number of relationships. The three concepts having the Location relationship are now represented by an area box named {Location}. Child-of links between areas are bold arrows; e.g., {Location, Part of Process} on Level 2 and {Location, Initiator BP, Part of Process} on Level 3 are child-of area {Location}
Fig. 3Complete area taxonomy for the NCIt’s Biological Process hierarchy. Most child-of’s have been omitted to avoid overload. Note how the importance of the relationship Location is reflected in the area taxonomy. Area {Location} has 207 concepts, and Location appears in 20 of 37 area names
Fig. 4An excerpt of the subtaxonomy for the Eye/vision finding subhierarchy in SNOMED CT, presenting 48 areas out of 97 areas in the complete subtaxonomy
Fig. 5Path of seven IS-As to the root in the NCIt Biological Process hierarchy
Missing relationship error distribution by level in the top area of NCIt’s BP hierarchy
| Level | # concepts | # concepts missing relationships | % of concepts missing relationships |
|---|---|---|---|
| 0 | 1 | 0 | 0 |
| 1 | 7 | 0 | 0 |
| 2 | 69 | 15 | 21.7 |
| 3 | 138 | 53 | 38.4 |
| 4 | 125 | 58 | 46.4 |
| 5 | 88 | 61 | 69.3 |
| 6 | 44 | 32 | 72.7 |
| 7 | 14 | 8 | 57.1 |
| 8 | 23 | 5 | 21.7 |
| 9 | 4 | 0 | 0 |
| Total | 513 | 232 | 45.2 |
Number of concepts in the NCIt’s BP top area reported missing relationship for each relationship type
| Relationship | # concepts missing relationship | # concepts confirmed by (SdC) |
|---|---|---|
| 103 | 84 | |
| 1 | 0 | |
| 2 | 0 | |
| 1 | 1 | |
| 3 | 1 | |
| 20 | 10 | |
| 113 | 4 | |
| Total | 232 | 99 |
Examples of concepts confirmed to have missing relationships in the NCIt’s BP top area for different relationships by (SdC)
| Relationship | Example confirmed concept missing relationship | Target of missing relationship |
|---|---|---|
Rejected examples of concepts missing relationships in the NCIt’s BP top area for different relationships by (SdC)
| Relationship | Reported example of concept missing relationship | Proposed target of missing relationship | Reason |
|---|---|---|---|
| Not always true | |||
| Not always true | |||
| Secretion processes do not produce chemicals | |||
The 2 × 2 contingency table for the concept errors in NCIt’s Biological Process top area versus concepts from other areas of the area taxonomy
| # erroneous concepts | # concepts w/o errors | |
|---|---|---|
| Non-top areas | 13 | 87 |
| Top area | 232 | 281 |
The 2 × 2 contingency table for concept errors between the lower-indexed-half levels and higher-indexed-half levels
| Level range | # erroneous concepts | # concepts w/o errors | Error percentage |
|---|---|---|---|
| 0–4 (lower-indexed-half) | 126 | 214 | 37.1 |
| 5–9 (higher-indexed-half) | 106 | 67 | 61.3 |
The QA study results on the SNOMED CT’s Eye/vision finding subhierarchy
| Level | # concepts | # audited concepts | % of concepts audited | # concepts missing relationships | % of concepts missing relationships |
|---|---|---|---|---|---|
| 0 | 1 | 0 | 0 | 0 | |
| 1 | 19 | 0 | 0 | 0 | |
| 2 | 58 | 0 | 0 | 0 | |
| 3 | 132 | 8 | 6.06 | 6 | 75 |
| 4 | 250 | 18 | 7.20 | 6 | 33.33 |
| 5 | 323 | 29 | 8.98 | 8 | 27.59 |
| 6 | 272 | 19 | 6.99 | 9 | 47.37 |
| 7 | 165 | 18 | 10.91 | 11 | 61.11 |
| 8 | 54 | 4 | 7.41 | 2 | 50 |
| 9 | 25 | 0 | 0 | 0 | |
| 10 | 2 | 0 | 0 | 0 | |
| Total | 1301 | 96 | 7.38 | 42 | 43.75 |
Five example concepts in the Eye/vision finding top area missing two relationships
| Concept | Level in the top area | Missing relationship type 1 | Target 1 | Missing relationship type 2 | Target 2 |
|---|---|---|---|---|---|
| Normal intraocular pressure | 3 | Interprets | Intraocular pressure | Has interpretation | Normal |
| Decreased red reflex | 3 | Interprets | Red reflex | Has interpretation | Decreased |
| Irregular tear film | 4 | Interprets | Ocular tear film observable | Has interpretation | Abnormal |
| Enophthalmos due to orbital tissue atrophy | 5 | Due to | Atrophy of soft tissue of orbit | Associated morphology | Posterior displacement |
| Impairment level: better eye: severe impairment: lesser eye: total impairment | 7 | Interprets | Visual function | Has interpretation | Impaired |
Affected descendants of the 68 non-leaf concepts missing relationships in the NCIt’s BP top area
| # concepts | Total # descendants outside top area | # affected descendants | |
|---|---|---|---|
| All descendants are in non-top areas | 5 | 15 | 5 |
| Some descendants are in top area | 23 | 102 | 50 |
| All descendants are in the top area | 40 | N/A | N/A |
| Total | 68 | 117 | 55 |
Top areas of 11 hierarchies in NCIt (15.02d release)
| Hierarchy | # concepts | # concepts in top area | % |
|---|---|---|---|
| 10,633 | 10,087 | 94.9 | |
| 6747 | 1730 | 25.6 | |
| 1145 | 513 | 44.8 | |
| 3419 | 41 | 1.2 | |
| 12,409 | 8851 | 71.3 | |
| 25,360 | 14,347 | 56.6 | |
| 17,681 | 16,139 | 91.3 | |
| 1701 | 327 | 19.2 | |
| 8914 | 395 | 4.4 | |
| 5256 | 90 | 1.7 | |
| 1244 | 192 | 15.4 |
Top areas of eight hierarchies in SNOMED CT (2020-01-31 release)
| Hierarchy | # concepts | # Concepts in top area | % |
|---|---|---|---|
| 39,323 | 27,224 | 69.2 | |
| 114,397 | 6427 | 5.6 | |
| 3189 | 3006 | 94.3 | |
| 9144 | 8744 | 95.6 | |
| 22,244 | 418 | 1.9 | |
| 58,154 | 2628 | 4.7 | |
| 4739 | 61 | 1.3 | |
| 1702 | 34 | 2.0 |
Fig. 6Revised area taxonomy for the NCIt BP hierarchy incorporating the confirmed corrections. Pink highlights the areas that are different from the original in Fig. 3
The 2 × 2 contingency table for erroneous concepts in the top area and non-top areas confirmed by (SdC)
| # erroneous concepts | # concepts w/o errors | Total concepts in the study | |
|---|---|---|---|
| Non-top areas | 10 | 90 | 100 |
| Top area | 99 | 414 | 513 |