| Literature DB >> 27812127 |
Pablo López-García1, Stefan Schulz1.
Abstract
Unprincipled modeling decisions in large-domain ontologies, such as SNOMED CT, are problematic and might act as a barrier for their quality assurance and successful use in electronic health records. Most previous work has focused on clustering problematic concepts, which is helpful for quality control but faces difficulties in pinpointing the origin of those modeling problems. In this study, we examined the underlying structural patterns in SNOMED CT's data model as such patterns directly reflect the modeling strategies of editors. Our results showed that 92% of all structural patterns found accumulated in the Procedure and Clinical finding sub-hierarchies, and pattern reuse was low; over 30% of patterns were only used once. A qualitative analysis of a sample of 50 such singleton patterns revealed modeling problems, including redundancy, omission, and inconsistency. The problems detected in the sample suggest that the analysis of structural patterns is a valuable technique for revealing problematic areas of SNOMED CT and modeling the styles of terminology editors. Furthermore, the patterns that describe the modeling of a large number of concepts could provide insights for template creation and refinement in SNOMED CT.Entities:
Mesh:
Year: 2016 PMID: 27812127 PMCID: PMC5094788 DOI: 10.1371/journal.pone.0165619
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Synopsis of SNOMED CT's relational release format with its underlying semantics in description logics (OWL-EL) and text for the concept “Neoplasm of kidney” in the stated version.
Throughout the paper, we use the OWL Manchester syntax [28], italics for concept (class) names and boldface for relationship names (SNOMED CT linkage concepts / OWL subClassOf and object properties).
| SNOMED CT triple | Corresponding OWL axiom | Textual description |
|---|---|---|
| Every kidney neoplasm is a neoplastic disease | ||
| Every kidney neoplasm is located at some anatomical site of the type kidney structure (i.e., kidney or any of its parts) [ | ||
| Every kidney neoplasm has an associated morphology of the type neoplasm (i.e., neoplasm or any of its descendants) |
Stated relationships and structural pattern for the SNOMED CT concept “Fine needle biopsy of kidney (procedure)” in the stated version.
See Table 3 for sub-hierarchy abbreviations.
| Stated Relationships | Structural Pattern |
|---|---|
Distribution of structural patterns, attributes, and concepts among SNOMED CT’s 18 top-level sub-hierarchies.
| SNOMED CT Top-level Sub-hierarchy | Structural Patterns | Attributes | Concepts | |
|---|---|---|---|---|
| PR | Procedure | 587 | 29 | 54 063 |
| CF | Clinical finding | 195 | 16 | 102 217 |
| SN | Specimen | 28 | 5 | 1 455 |
| SI | Situation with explicit context | 15 | 7 | 3 762 |
| EV | Event | 8 | 5 | 3 665 |
| PB | Pharmaceutical / biologic product | 6 | 2 | 16 965 |
| PO | Physical object | 3 | 2 | 14 288 |
| BS | Body structure | 2 | 2 | 30 774 |
| EG | Environment or geographical location | 1 | 0 | 1 814 |
| OE | Observable entity | 1 | 0 | 8 372 |
| OR | Organism | 1 | 0 | 32 912 |
| PF | Physical force | 1 | 0 | 170 |
| QV | Qualifier value | 1 | 0 | 9 363 |
| RA | Record artifact | 1 | 0 | 225 |
| SO | Social context | 1 | 0 | 4 710 |
| SP | Special concept | 1 | 0 | 648 |
| ST | Staging and scales | 1 | 0 | 1 325 |
| SU | Substance | 1 | 0 | 24 719 |
aEleven concepts belonged to both Pharmaceutical/biologic product and Physical object sub-hierarchies and share two structural patterns: [Is-a-Pharmaceutical / biologic product, Is-a-Physical object] and [Is-a-Pharmaceutical / biologic product, Is-a-Physical object, Has active ingredient-Substance]. Therefore, the total figures do not add up to the rows above (311 447–11 = 311 436 concepts, and 854–2 = 852 structural patterns).
Fig 1Steps followed to tag SNOMED CT concepts by top-level concept, depth, and structural pattern.
Note that some SNOMED CT concepts belonged to several sub-hierarchies, as shown in this example (Thrombin embedded bandage, ID 412025006) and needed to be tagged twice. In SNOMED CT’s stated relationships file, sourceId denotes the concept to be defined, destinationId the target concept, and typeId the relationship, i.e., [SourceId, typeId, destinationId] triples using our notation. See Table 3 for abbreviations.
Fig 2Distribution of structural patterns for the most populated top-level sub-hierarchies.
The x-axis contains patterns found, the y-axis the number of concepts corresponding to that pattern (in logarithmic scale). See Table 3 for abbreviations.
Fig 3Absolute accumulation of patterns (p) per sub-hierarchy and depth.
Orange and red indicate an accumulation of more than 10 patterns. The scale is logarithmic.
Fig 4Concepts (in blue) and patterns (in red) by depth in SNOMED CT top-level sub-hierarchies.
See Table 3 for abbreviations.
Fig 5Relative usage of patterns per sub-hierarchy and depth (c/p).
Reds indicate that, on average, each pattern is used by fewer than 100 concepts. The scale is logarithmic. See Table 3 for abbreviations.
Fig 6Accumulation of singleton patterns per sub-hierarchy and depth.
Red indicates that over 50 patterns are accumulated. See Table 3 for abbreviations.
Distribution of singleton patterns and assessed stratified sample of 50 concepts.
| SNOMED CT Top-level Sub-hierarchy | Depth | Singleton Patterns | Sample |
|---|---|---|---|
| Clinical finding (CF) | 2 | 3 | 1 |
| 3 | 4 | 2 | |
| 4 | 6 | 2 | |
| 5 | 10 | 4 | |
| 6 | 14 | 5 | |
| 7 | 11 | 4 | |
| 8 | 2 | 1 | |
| 9 | 2 | 1 | |
| 10 | 3 | 1 | |
| Event (EV) | 2 | 2 | 2 |
| Procedure (PR) | 1 | 2 | 1 |
| 2 | 2 | 1 | |
| 3 | 25 | 2 | |
| 4 | 40 | 4 | |
| 5 | 54 | 5 | |
| 6 | 34 | 3 | |
| 7 | 26 | 2 | |
| 8 | 5 | 1 | |
| 9 | 2 | 1 | |
| 10 | 2 | 1 | |
| Situation with explicit context (SI) | 1 | 1 | 1 |
| 2 | 2 | 1 | |
| 3 | 3 | 1 | |
| Specimen (SN) | 1 | 2 | 1 |
| 2 | 3 | 1 | |
| 3 | 1 | 1 | |
Stated relationships and role groups for the SNOMED CT concept Ossiculectomy with tympanoplasty revision (procedure).
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 0 | ||
| 1 | ||
| 1 | ||
| 2 | ||
| 2 | ||
| 2 |
Alternative, a more parsimonious model for representing Ossiculectomy with tympanoplasty revision (procedure).
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 0 |
Catecholamines, fractionation measurement, urine (procedure): Procedure concept definition, missing Specimen, but having a measurement method specified that is not reflected in the name.
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 0 | ||
| 0 | ||
| 0 |
Example of different patterns for obviously similar concepts in the Events sub-hierarchy.
| Concept defined | Attribute | Role group | Target concept with semantic tag |
|---|---|---|---|
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 | |||
| 0 |
Disease, combined with the qualifier Abnormal, as a modeling idiosyncrasy in Accessory ossification center (disorder).
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 1 | ||
| 1 | ||
| 1 | ||
| 2 | ||
| 2 |
Lymphadenopathy due to congenital toxoplasmosis (disorder): a complex concept without direct reference to the underlying etiology (Congenital toxoplasmosis).
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 1 | ||
| 1 | ||
| 1 | ||
| 1 |
Adverse effect of radiation therapy (disorder): example of uniqueness, due to non-consequential modeling of concepts that include a reference to their etiology in their definition but not in their formal representation.
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 0 | ||
| 0 |
Procedure with explicit context (situation): a high-level singleton pattern.
| Attribute | Role group | Target concept with semantic tag |
|---|---|---|
| 0 | ||
| 0 | ||
| 0 |
| • Body structure: | (1) [ |
| (2) [ | |
| Interestingly, no patterns with the | |
| 3. The Physical object sub-hierarchy was the only one showing three patterns: | |
| • Physical object: | (1) [ |
| (2) [ | |
| (3) [ | |
| 4. Procedure (PR, 587) and Clinical finding (CF, 195) alone accounted for 92% of the total patterns found. There were a small number of patterns among Specimen (SN, 28), Situation with explicit context (SI, 15), Event (EV, 8), and Pharmaceutical and biologic product (PB, 6). | |