Literature DB >> 25717400

Categorizing the Relationships between Structurally Congruent Concepts from Pairs of Terminologies for Semantic Harmonization.

Zhe He1, James Geller1, Gai Elhanan2.   

Abstract

In this paper, we are using "structurally congruent concepts" in pairs of terminologies to suggest methods for harmonizing the terminologies. Two concepts are structurally congruent if they are children of the same more general concept and parents of the same more specific concept in two different terminologies. We show that structurally congruent concepts can be interpreted in six useful ways, e.g., as new synonyms. All structurally congruent concepts were found for six terminologies from the UMLS, each paired with SNOMED CT. In total, 1384 concept pairs were discovered. Concepts from a sample of 241 pairs were analyzed by a human expert. It was found that 59.3% indicated alternative classifications of the same general concept. This discovery allows an ontology designer to make existing, implicit knowledge explicit. Another 14.5% were newly discovered synonyms, 23.6% suggested the import of a concept into a terminology and 2.5% indicated errors in a terminology.

Entities:  

Year:  2014        PMID: 25717400      PMCID: PMC4333698     

Source DB:  PubMed          Journal:  AMIA Jt Summits Transl Sci Proc


Introduction

Semantic interoperability is one of the big challenges in biomedical informatics. In order to enrich the semantics and coverage of a terminology and facilitate translational biomedical informatics to be utilized in clinical and research applications, semantic harmonization efforts have recently been extended for various terminologies, e.g. SNOMED CT [1]. However, structural methodologies for semantic harmonization of terminologies have not been studied sufficiently. Weng et al. [2] discussed a conceptual design of a collaborative system for semantic harmonization. Three key design principles were defined: (1) reuse, (2) collaboration, (3) harmonization as modeling. The BRIDG model was presented as a user-centric semantic harmonization framework [3]. The harmonization in the BRIDG model is based on the concept definitions, attributes, and concept relationships. Due to the fact that BRIDG participants are distributed across organizations and no implementation-specific information is provided, it may be hard to use this approach directly by application-oriented users. Tao et al. have discussed the importance of ontology harmonization before using ontologies to annotate clinical data [4]. In this paper, we are approaching semantic harmonization by analyzing the relationships between structurally congruent concepts from pairs of terminologies in the UMLS. An outline of the implementation details for finding such structurally congruent pairs is provided. Auditing of terminologies may uncover problems such as omissions [5]. Previously, we have developed algorithmic and mixed human-computer auditing methods for the UMLS and some of its source terminologies [6, 7]. Auditing may also discover concepts that are synonymous in real life but are coded as different in the UMLS. Occasionally two terminologies in overlapping domains “cut the world at different joints,” which makes ontology alignment [8] and ontology integration difficult. In such a situation, the same conceptual knowledge may be classified in (often orthogonal) different ways. We call these “alternative classifications.” In this paper, we are describing the use of structural congruency in pairs of terminologies to alert a human auditor to possible cases of harmonization and correction. Due to the importance of SNOMED CT (abbreviated as “SNOMED”), we focus on its concepts.

Background

SNOMED CT (Systematized Nomenclature of Medicine – Clinical Terms) [9-11] is considered to be of increasing importance in Medical Informatics. One reason for this status is related to government mandates of using Electronic Health Record systems, meaningful use and incentive payments to physicians. By 2015, SNOMED will become the standard terminology for EHR encoding of diagnoses and problem lists [12]. SNOMED is to be used to “enable a user to electronically record, modify, and retrieve a patient’s problem list for longitudinal care (i.e., over multiple office visits).” Thus, in this paper, we are focusing on categorizing the relationships between structurally congruent concepts, one from SNOMED, the other from six reference terminologies. The Unified Medical Language System’s (UMLS) [13-16] Metathesaurus [17, 18] is an excellent source of pairs of terminologies with matched concepts. The 2012AB Metathesaurus contains more than 2.8 million concepts and 8.6 million unique concept names from about 160 source vocabularies [19]. SNOMED is also included in the UMLS. Previously, Bodenreider performed a study of redundant relations and similarity across families of terminologies and discussed the relationship between redundancy and semantic consistency [20]. Bodenreider observed ([21]) that it is the policy in the UMLS that ‘PAR’ represents an explicit parent-child relationship in a source, and ‘RB’ indicates an implied one (as interpreted by the UMLS editorial team). In this paper, we are focusing on explicit hierarchical relationships, thus only terminologies in the UMLS with ‘PAR’ links annotated with ‘IS_A’ relationship attributes were chosen. This current work is also marginally related to research on density and granularity of terminologies. Kumar et al. [22] lay out a comprehensive theory of granularity in the context of medical terminologies. Schulz et al. identify granularity-related problems with “cross-granularity integration” in the biomedical domain [23]. Rector et al.’s analysis provides logical formulations of important distinctions between density and related properties [24].

Methods

Our method is based on comparing two medical terminologies from the UMLS. We formally define the targets of our investigation as follows. Definition: The concepts X (from Terminology 1) and Y (of Terminology 2) are called “structurally congruent” if: Both concepts X and Y have the same parent A in Terminology 1 and in Terminology 2. Both concepts X and Y have the same child B in Terminology 1 and in Terminology 2. The concept X does not appear anywhere in Terminology 2. The concept Y does not appear anywhere in Terminology 1. There is no synonymy relationship and no hierarchical relationship between X and Y (in the UMLS). Figure 1 shows an abstract layout of two structurally congruent concepts to elucidate the above definition.
Figure 1.

An abstract layout of structurally congruent concepts

It is hypothesized that there are six possible cases for how X and Y may relate to each other. The concepts X and Y are alternative classifications. That means that concept A may be validly assigned X and Y as its children. However, these two assignments are indicative of two different ways of clustering the grandchildren of A. Furthermore, concept B may be correctly classified as a child of X and as a child of Y. However, Terminology 1 omits the classification by Y and Terminology 2 omits the classification by X. It holds that B IS_A Y, Y IS_A X, and X IS_A A. In other words, Y may be inserted as a child of X into Terminology 1, thereby adding more detailed information to Terminology 1. Similarly, X may be inserted as a parent of Y into Terminology 2. Such insertions should only be done with approval of a subject matter expert. It holds that B IS_A, X IS_A Y, and Y IS_A A. This is the mirror case of Case 2) in that now X may be inserted as a child of Y into Terminology 2 and Y may be inserted as a parent of X into Terminology 1. Concept X is a real world synonym of concept Y, which was previously not recognized by the UMLS editors. There might be a structural error in Terminology 1, e.g., X is not really a child of A. There might be a structural error in Terminology 2. Every one of these six cases may be utilized in a human review, possibly leading to an improvement and harmonization of both terminologies. To further probe the potential of this idea, we performed the following study. Six terminologies were selected from the 2012AB release of the UMLS to function as reference terminologies for SNOMED. (Note: It is a coincidence that there are six cases and six terminologies.) Only English-language terminologies using the “PAR” relationship annotated with “IS_A” relationship attributes were chosen. They are MEDCIN3_2012_07_16, National Cancer Institute Thesaurus (NCI2012_02D), Gene Ontology (GO20 12_04_03), Medical Entities Dictionary (CPM2003), UMDNS: product category thesaurus (UMD2012) and Foundational Model of Anatomy Ontology (FMA3_1). Due to the fact that the University of Washington Digital Anatomist (UWDA) consists of the Anatomy component and selected structural relationships of FMA, UWDA was excluded even though it also uses “PAR” relationships and “IS_A” relationship attributes. The algorithms were implemented in the Oracle Relational Database Management System (RDBMS) native programming language PL/SQL. The algorithms were used for finding all structurally congruent pairs of concepts, one taken from the list of six reference terminologies, the other one being the July 2012 version of SNOMED. The UMLS is well known to contain many cycles [21, 25], which were eliminated during processing.

Results

Table 1 shows the numbers of pairs of congruent concepts of six reference terminologies relative to SNOMED and the sizes of the samples we randomly chose for human review. The third column shows the number of pairs of congruent concepts found by the program. For reference terminologies with over 100 pairs of congruent concepts, random samples of 70 were chosen for human review; for the others, all of the congruent concepts were reviewed. In total, we reviewed 241 /1384 = 17.4% of all the congruent concept pairs discovered by the program.
Table 1.

Comparison of SNOMED CT with six reference terminologies

Reference TerminologySize of Terminology# of Pairs of Congruent ConceptsSample Size
MEDCIN3_2012_07_1627952965570
NCI2012_02D9552358270
FMA3_18206211670
UMD2012159561818
GO2012_04_036192566
CPM2003307877
Total1384241
The author GE, a medical informaticist and MD with many years of experience in auditing terminologies reviewed the sample. Table 2 shows the results according to the six cases defined in the Methods section. The results show that 59.3% are alternative classifications. Another 14.9% + 8.7% = 23.6% fall into the category where the congruent concept in the reference terminology could be imported into SNOMED, and vice versa.
Table 2.

Review results by reference terminology

Reference TerminologySample SizeAlternative Classific.Y IS_A XX IS_A YError in Trmgy 1Error in Trmgy 2Synonym
MEDCIN3_2012_07_16704410718
NCI2012_02D7038126311
GO2012_04_03624
CPM2003752
UMD201218918
FMA3_1704513426
Total24114336212435
Percentage100%59.3%14.9%8.7%0.8%1.7%14.5%
Figure 2 shows an example where congruent concepts were identified as alternative classifications. Thus, Eleventh posterior intercostal vein in the FMA is a classification by cardinality, while in SNOMED Lower right posterior intercostal vein is a classification by position.
Figure 2.

An example of alternative classification

The discovery of alternative classifications is useful, because it makes explicit the implicit assumptions of the ontology designers how they are viewing the world. This view could then be codified in the ontology. Figure 3 shows the utilization of the findings in Figure 2 by adding two new concepts (with labels shown in Italics.)
Figure 3.

An example of making explicit an implicit assumption of the ontology designers

Figure 4 shows a case where one congruent concept was deemed a parent of the other by the auditor. In this example, the congruent concept Finding by Site or System can be a parent of Finding by site, thus the congruent concept Finding by Site or System from FMA may be added as a parent of Finding by site in SNOMED, and vice versa, if this is desirable in the judgment of the owners of the FMA and/or SNOMED.
Figure 4.

An example of one structurally congruent concept being a parent of the other

The congruent concepts Chemical Viewed Structurally from CPM and Chemical categorized structurally from SNOMED are deemed synonyms that were not recognized before by our auditor (Figure 5) and should be merged.
Figure 5.

An example of one middle concept being synonymous of the other

During the review of the sample, a few errors within terminologies emerged. The concept from SNOMED Artificial Implant was deemed incorrect by the auditor because it should not be considered as “artificial,” in the structure with A = Prosthesis, C0175649, Y = Artificial Implants, C0021113, and B = Blood Vessel Prosthesis, C0005846.

Discussion

The UMLS provides many concept pairs from different terminologies, where algorithmically made structural observations raise the question how to harmonize those concepts. In this paper, we identified one such structural observation “structurally congruent concepts” and indicated the different ways how such a congruency can be resolved. However, the semantic harmonization cannot be done without the consent of terminology curators. Moreover, modeling differences between terminologies make semantic harmonization difficult. For UMD2012 (Table 2), eight pairs of congruent concepts were found to be synonyms. For GO, more cases where one congruent concept is a potential parent of the other were found than alternative classifications. For our cases 2) and 3), relevant work in MIREOT [26] defines a set of guidelines for importing classes from external ontologies and proposes an automated mechanism and a minimal information standard for selectively importing classes into an ontology. However, it only supports OBO foundry ontologies (OWL format). In this paper, all the terminologies are in UMLS RRF format. Thus, the import guidelines introduced in MIREOT cannot be used here directly. A possible limitation of this work is that it uses SNOMED concepts and all reference terminology concepts in the formats that they were provided in by the UMLS. There may be differences between the original concept representation of SNOMED (or the reference terminologies) and the representation of SNOMED that is accessible through the UMLS.

Conclusions and Future Work

Six terminologies of the UMLS were compared with SNOMED with respect to structurally congruent concepts. In a sample study it was found that the great majority of cases corresponded to alternative analysis situations (143 out of 241, corresponding to 59.3%). The second most common situation indicated the possibility of adding more detail to SNOMED CT or the reference terminologies (57 out of 241, corresponding to 23.6%). In 35 cases new synonyms were discovered, and three pairs of concepts indicated errors. As future work, we plan to conduct a study to analyze structurally congruent concepts between pairs of any two META terminologies with explicitly defined hierarchical relationships, e.g., not limited to SNOMED CT being Terminology 2. We are also planning a more extensive evaluation of the results. The work in this paper was limited to pairs of structurally congruent concepts. However, we have noticed cases of congruency that involve three, four and even more concepts. An analysis of these cases is under way.
  18 in total

1.  SNOMED clinical terms: overview of the development process and project status.

Authors:  M Q Stearns; C Price; K A Spackman; A Y Wang
Journal:  Proc AMIA Symp       Date:  2001

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Strength in numbers: exploring redundancy in hierarchical relations across biomedical terminologies.

Authors:  Olivier Bodenreider
Journal:  AMIA Annu Symp Proc       Date:  2003

4.  Approaches to eliminating cycles in the UMLS Metathesaurus: naïve vs. formal.

Authors:  Fleur Mougin; Olivier Bodenreider
Journal:  AMIA Annu Symp Proc       Date:  2005

5.  A call for collaborative semantics harmonization.

Authors:  Chunhua Weng; Douglas B Fridsma
Journal:  AMIA Annu Symp Proc       Date:  2006

6.  User-centered semantic harmonization: a case study.

Authors:  Chunhua Weng; John H Gennari; Douglas B Fridsma
Journal:  J Biomed Inform       Date:  2007-03-21       Impact factor: 6.317

7.  The UMLS Metathesaurus: representing different views of biomedical concepts.

Authors:  P L Schuyler; W T Hole; M S Tuttle; D D Sherertz
Journal:  Bull Med Libr Assoc       Date:  1993-04

8.  The Unified Medical Language System.

Authors:  D A Lindberg; B L Humphreys; A T McCray
Journal:  Methods Inf Med       Date:  1993-08       Impact factor: 2.176

9.  Structural group-based auditing of missing hierarchical relationships in UMLS.

Authors:  Yan Chen; Huanying Helen Gu; Yehoshua Perl; James Geller
Journal:  J Biomed Inform       Date:  2008-08-20       Impact factor: 6.317

10.  Biomedical informatics and granularity.

Authors:  Anand Kumar; Barry Smith; Daniel D Novotny
Journal:  Comp Funct Genomics       Date:  2004
View more
  13 in total

1.  A comparative analysis of the density of the SNOMED CT conceptual content for semantic harmonization.

Authors:  Zhe He; James Geller; Yan Chen
Journal:  Artif Intell Med       Date:  2015-04-02       Impact factor: 5.326

2.  Topological-Pattern-Based Recommendation of UMLS Concepts for National Cancer Institute Thesaurus.

Authors:  Zhe He; Yan Chen; Sherri de Coronado; Katrina Piskorski; James Geller
Journal:  AMIA Annu Symp Proc       Date:  2017-02-10

Review 3.  Assessing the practice of biomedical ontology evaluation: Gaps and opportunities.

Authors:  Muhammad Amith; Zhe He; Jiang Bian; Juan Antonio Lossio-Ventura; Cui Tao
Journal:  J Biomed Inform       Date:  2018-02-17       Impact factor: 6.317

4.  Leveraging Horizontal Density Differences between Ontologies to Identify Missing Child Concepts: A Proof of Concept.

Authors:  Vipina K Keloth; Zhe He; Yan Chen; James Geller
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

5.  Alternative classification of identical concepts in different terminologies: Different ways to view the world.

Authors:  Vipina K Keloth; Zhe He; Gai Elhanan; James Geller
Journal:  J Biomed Inform       Date:  2019-05-07       Impact factor: 6.317

6.  Extended Analysis of Topological-Pattern-Based Ontology Enrichment.

Authors:  Zhe He; Vipina Kuttichi Keloth; Yan Chen; James Geller
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2019-01-24

7.  A Comparison of Exhaustive and Non-lattice-based Methods for Auditing Hierarchical Relations in Gene Ontology.

Authors:  Rashmie Abeysinghe; Fengbo Zheng; Licong Cui
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

8.  Leveraging non-lattice subgraphs for suggestion of new concepts for SNOMED CT.

Authors:  Xubing Hao; Rashmie Abeysinghe; Fengbo Zheng; Licong Cui
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2021-12

9.  Consumers' Use of UMLS Concepts on Social Media: Diabetes-Related Textual Data Analysis in Blog and Social Q&A Sites.

Authors:  Min Sook Park; Zhe He; Zhiwei Chen; Sanghee Oh; Jiang Bian
Journal:  JMIR Med Inform       Date:  2016-11-24

10.  Perceiving the Usefulness of the National Cancer Institute Metathesaurus for Enriching NCIt with Topological Patterns.

Authors:  Zhe He; Yan Chen; James Geller
Journal:  Stud Health Technol Inform       Date:  2017
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.