| Literature DB >> 32766692 |
Ling Zheng1, Zhe He2, Duo Wei3, Vipina Keloth4, Jung-Wei Fan5, Luke Lindemann6, Xinxin Zhu6, James J Cimino7, Yehoshua Perl4.
Abstract
OBJECTIVE: The study sought to describe the literature related to the development of methods for auditing the Unified Medical Language System (UMLS), with particular attention to identifying errors and inconsistencies of attributes of the concepts in the UMLS Metathesaurus.Entities:
Keywords: auditing; quality assurance; review; unified medical language system
Mesh:
Year: 2020 PMID: 32766692 PMCID: PMC7566540 DOI: 10.1093/jamia/ocaa108
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
Selection criteria for article inclusions
| Type | Criteria | Rationale |
|---|---|---|
| Inclusion criteria | Methods for finding errors or inconsistencies of aspects of UMLS concepts | Errors and inconsistencies of concept names, synonyms, ST assignments, hierarchical (IS-A) relationships, and lateral relationships. |
| UMLS auditing tools, surveys, and auditors’ performance | Owing to their relevance for the auditing process. | |
| Auditing observed during the integration of sources into the UMLS | Limiting the review to this side effect of the integration. | |
| Topological patterns techniques and alignment techniques for enhancement of the UMLS sources | Owing to their major use of the UMLS although their purpose is to enhance UMLS sources; The enhancement will indirectly be leading to modifications of the UMLS. In addition, identifying missing synonyms for UMLS concepts is another byproduct of these techniques. | |
| Exclusion criteria | Coverage of the UMLS | Assessing the coverage of the UMLS concepts is not relevant to QA. |
| Applications of the UMLS | Applications of the UMLS such as information retrieval or natural language processing are not relevant to QA. | |
| Auditing of sources of the UMLS | Auditing the sources is not relevant to UMLS QA. | |
| Integration of sources into the UMLS | Integration of the sources into the UMLS is not relevant if no auditing of the UMLS is observed. | |
| Refinements, extensions, or summarization networks of the UMLS SN | Refinement, extension, partition, and summarization of the UMLS SN are not focused on QA of UMLS concepts. | |
| Not related to UMLS (eg, UML) | Some articles that are irrelevant to the UMLS were retrieved by PubMed search (eg, Unified Modeling Language). | |
| General UMLS article not relevant to QA | Some general UMLS development articles were retrieved by PubMed search. | |
| Not an article | Conference abstracts are excluded. |
QA: quality assurance; SN: Semantic Network; ST: semantic type; UML: Unified Modeling Language; UMLS: Unified Medical Language System
Figure 1.The Unified Medical Language System (UMLS) Metathesaurus Browser user interface, displaying information for the concept Bipolar Disorder: The interface shows the focus concept Bipolar Disorder at the top of the right box, followed by the semantic type of the concept and 83 synonyms from different sources (out of which only 36 fit on the screen). Relationships (including hierarchical, lateral, and qualifiers) between Bipolar Disorders and 1691 (not necessarily different) target concepts are listed below the synonyms (shown to the right of synonyms in this figure), showing the relation, relationship attribute, source terminology, the term name in the source terminology, and the concept unique identifier (CUI) for each related concept. For example, Mood Disorders appears 6 times, each mapped to the same CUI, because this relationship is found in 6 source terminologies. The screenshot was taken on January 31, 2020, using UMLS version 2019AB.
Figure 2.PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart for identifying the articles to be included in this review.
Definitions of the characteristics of the auditing techniques
| Characteristics of the auditing technique | Types | Definitions |
|---|---|---|
| Automation level | Automated systematic | Automated systematic methods are implemented as rule-checking programs or algorithms that can automatically identify potential errors and inconsistencies in the terminology. |
| Automated heuristic | Automated heuristic methods are based on rules that make inferences about terminology content and seek to identify those inferences to find likely errors and inconsistencies in the terminology. | |
| Manual | Manual review relies on a terminology reviewer (often a domain expert) to manually audit a certain aspect(s) of a terminology, with or without the support of a computerized user interface. | |
| Knowledge source | Intrinsic knowledge | Intrinsic knowledge is the information derived from the classification scheme, hierarchy, relationships, or other attributes present within the terminology itself. |
| Extrinsic knowledge | Extrinsic knowledge is derived from an outside source, such as other terminologies or human expert knowledge. |
Figure 3.The 2 levels of the Unified Medical Language System. In the Semantic Network (SN) level, we have the semantic types Neoplastic Process (NP), Experimental Model of Disease (EMD), and the intersection semantic type (IST) NP∩EMD between them. The Metathesaurus level shows concepts assigned the intersection semantic type and the 2 pure semantic types, colored to correspond to the colors of their assigned semantic type. For example, the concept Neoplasms, Experimental (as suggested by its name) is assigned both STs.
Figure 4.The Neighborhood Auditing Tool (NAT) interface and corresponding “neighborhood” network: (A) A screenshot of the NAT tool for the concept Bipolar Disorder (as in Figure 1 for the Unified Medical Language System [UMLS] interface): the focus concept is shown in the central box. The parents and grandparents in the top box (with indentation), and children (and grandchildren [not displayed]) in the bottom box. The synonyms are to the left and relationships (or siblings) are to the right. The semantic type for each concept in the screen is in blue, the UMLS sources in green, and the concept unique identifier in red. The number of concepts in each box overflows its capacity and the box is scrollable. This screenshot from 2011 is interesting because it is rich enough to display a forbidden cycle of 3 concepts. Mood Disorders as the top child, → Bipolar Disorders as the focus concept, → Affective Disorders, Psychotic as a second parent (third line from the bottom), → Mood Disorder as the sixth grandparent, closing a cycle of 3 concepts. This error was reported to the UMLS team and this cycle does not exist in the UMLS 2019 AB version of Bipolar Disorder (Figure 1). (B) Excerpt of the neighborhood for Bipolar Disorder: the highlighted boxes in yellow shows the cycle of 3 concepts. The light blue rectangles correspond to the various windows in panel (A).
A 3-dimensional table categorizing studies by the aspects audited, the automation level and the kind of knowledge used
| Automation level | Knowledge source | ||
|---|---|---|---|
| Intrinsic knowledge (n = 23) | Extrinsic knowledge (n = 5) | Intrinsic and extrinsic knowledge (n = 55) | |
| Concepts, concept names, and synonyms (37 references) | |||
|
|
|
|
|
|
|
|
| |
|
|
| ||
| Semantic type assignment (36 references) | |||
|
|
|
|
|
|
|
|
| |
|
|
|
| |
| Hierarchical relationships (24 references) | |||
|
|
|
| |
|
|
|
| |
| Lateral relationships (12 references) | |||
|
|
|
| |
|
|
|
| |
Direct access for categorization data
| Ref | ASPE | AT | KNW | Ref | ASPE | AT | KNW | Ref | ASPE | AT | KNW |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
STA HREL | AH | IEK |
|
HREL LREL | AH | IK |
| STA | AH | IEK |
|
|
CCNS STA HREL REL |
AH AS AS AS |
IK IK IK IK |
| STA | AH | IEK |
| STA | AH | IEK |
|
|
HREL LREL | AH | IK |
| STA | AH | IEK |
| STA | AS | EK |
|
| STA | AH | IEK |
| STA | AH | IEK |
| STA | AS | IEK |
|
| STA | AH | IEK |
| HREL | AS | IK |
| STA | MN | EK |
|
| STA | AH | IEK |
| STA | AH | IEK |
| HREL | AS | IK |
|
| STA | AS | IK |
| STA | AH | IEK |
| HREL | AS | IK |
|
| STA | AS | IEK |
| STA | AH | IEK |
| HREL | AS | IK |
|
| CCNS | AH | IEK |
| STA | AS | IK |
| HREL | AH | IEK |
|
| CCNS | AH | IEK |
| STA | AH | IEK |
|
HREL LREL | AH | IEK |
|
| CCNS | AH | IEK |
| CCNS | AS | IEK |
|
LREL STA | AH | IK |
|
|
HREL LREL | AS | IK |
|
CCNS HREL | AH | IK |
| CCNS | AH | IEK |
|
|
STA CCNS HREL |
AS AH AS |
IK IK IK |
| CCNS | AS | EK |
| CCNS | AH | IEK |
|
| CCNS | AH | IK |
| CCNS | MN | IEK |
| CCNS | AH | IEK |
|
| CCNS | AH | IK |
| STA | AH | IEK |
| CCNS | AH | IEK |
|
| CCNS | AS | EK |
| ALL | AS | IK |
| CCNS | AH | IEK |
|
|
HREL LREL CCNS | AS | IEK |
| HREL | AS | IK |
|
HREL LREL | AH | IEK |
|
| HREL | AS | IK |
| ALL | AS | IK |
| STA | AH | IEK |
|
| HREL | AS | IK |
| STA | AH | IEK |
| CCNS | AH | IEK |
|
| HREL | AH | IK |
| STA | AH | IEK |
| CCNS | AH | IEK |
|
|
HREL LREL STA | AH | IEK |
| STA | AS | IK |
| CCNS | AH | IEK |
|
|
HREL LREL CCNS | AH | IEK |
| STA | AH | IEK |
| CCNS | AS | IEK |
|
| STA | AH | IEK |
| STA | MN | IEK |
This table enables direct access to the categorization properties for each study.
AH: automated heuristic; AS: automated systematic; ASPE: aspect; AT: automation level; CCNS: concepts, concept names, and synonyms; EK: extrinsic knowledge; HREL: hierarchical relationships; IEK: intrinsic and extrinsic knowledge; IK: intrinsic knowledge; KNW: knowledge source; LREL: lateral relationships; MN: manual; Ref: reference; STA: semantic type assignment.
Figure 5.Publication trend over time. The trends of the numbers of publications about Unified Medical Language System (UMLS) auditing between 1998 and 2019, stratified by different aspects of a UMLS concept. Note that an article may audit several aspects of a concept so the total may be less than the sum of all the aspects. Overall, there are 2 surges of publications in 2007 and 2009 with 11 and 12 articles, respectively, possibly due to National Library of Medicine funding support on UMLS quality assurance 2005-2009 and the first special issue on terminology auditing in 2009. Except for those 2 years, there were on average about 3 publications a year. During 2010-2012, there are still more late publications due to above funding. In the last 7 years we see a decline of interest in quality assurance of the UMLS, with an average of 2.4 articles per year. For example, the second special issue on terminology auditing in 2018 did not include any UMLS articles. In 2007, most articles were focused on concept names and synonyms and semantic type assignments (STAs). In 2009, most articles were about auditing STAs, while in 2010, most articles were focused on concept names and synonyms. The numbers of articles that audited relationships were consistently low, but there were more articles on auditing hierarchical relationships (HREL) than lateral relationships (LREL). CCNS: concepts, concept names, and synonyms.