Literature DB >> 32766692

A review of auditing techniques for the Unified Medical Language System.

Ling Zheng¹, Zhe He², Duo Wei³, Vipina Keloth⁴, Jung-Wei Fan⁵, Luke Lindemann⁶, Xinxin Zhu⁶, James J Cimino⁷, Yehoshua Perl⁴.

Abstract

OBJECTIVE: The study sought to describe the literature related to the development of methods for auditing the Unified Medical Language System (UMLS), with particular attention to identifying errors and inconsistencies of attributes of the concepts in the UMLS Metathesaurus.
MATERIALS AND METHODS: We applied the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) approach by searching the MEDLINE database and Google Scholar for studies referencing the UMLS and any of several terms related to auditing, error detection, and quality assurance. A qualitative analysis and summarization of articles that met inclusion criteria were performed.
RESULTS: Eighty-three studies were reviewed in detail. We first categorized techniques based on various aspects including concepts, concept names, and synonymy (n = 37), semantic type assignments (n = 36), hierarchical relationships (n = 24), lateral relationships (n = 12), ontology enrichment (n = 8), and ontology alignment (n = 18). We also categorized the methods according to their level of automation (ie, automated systematic, automated heuristic, or manual) and the type of knowledge used (ie, intrinsic or extrinsic knowledge).
CONCLUSIONS: This study is a comprehensive review of the published methods for auditing the various conceptual aspects of the UMLS. Categorizing the auditing techniques according to the various aspects will enable the curators of the UMLS as well as researchers comprehensive easy access to this wealth of knowledge (eg, for auditing lateral relationships in the UMLS). We also reviewed ontology enrichment and alignment techniques due to their critical use of and impact on the UMLS.

Entities: CellLine Chemical Disease Gene Species

Keywords: auditing; quality assurance; review; unified medical language system

Mesh：

Year: 2020 PMID： 32766692 PMCID： PMC7566540 DOI： 10.1093/jamia/ocaa108

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

INTRODUCTION

The Unified Medical Language System (UMLS) is a unique system designed by the National Library of Medicine (NLM), spearheaded by Donald Lindberg, Betsy Humphreys, and Alexa McCray, to integrate a large collection of biomedical terminology and ontology sources (currently 213 [https://www.nlm.nih.gov/research/umls/sourcereleasedocs/index.html]) into a Metathesaurus. In the UMLS, synonymous terms from multiple sources are mapped to the same UMLS concept; each concept is classified as belonging to 1 or more of 127 semantic types (STs), taken from the UMLS Semantic Network (SN). The SN also includes 54 semantic relations (SRs) that indicate potential relationships among concepts based on their STs (counts taken from file SRDEF, archived at https://semanticnetwork.nlm.nih.gov/download/sn_current.tgz). Integrating 213 biomedical sources of various models and naming standards poses difficulties; errors and inconsistencies are inevitable. Starting with Cimino, many researchers have designed techniques for “auditing the UMLS” (ie, finding and categorizing errors and inconsistencies of the UMLS). The purpose of this article is to review and categorize these various auditing techniques, also known as quality assurance (QA) techniques, to summarize the wealth of experience applied to this task. A general UMLS users study by Chen et al reported concerns about errors in UMLS concepts, particularly with hierarchical relationships. A special issue of the Journal of Biomedical Informatics on methods for auditing biomedical terminologies contained a review article in which 51 of the cited articles described work involving the UMLS. This was more than for any single terminology, demonstrating the importance researchers attribute to the UMLS. This importance stems from the unique design of the UMLS, which enabled the rich body of research reviewed in this survey. Furthermore, it opened the possibility of comparing and contrasting multiple UMLS source terminologies. In that article, Zhu et al described articles according to several dimensions, including quality factors, knowledge source, automation level, and aspects of terminology content. Amith et al presented a review article of general ontology evaluation techniques, which is not a systematic review. In addition, some approaches that are appropriate for evaluating single ontologies, for example, “compare the target ontology to a ‘gold standard,’” as outlined in Amith et al, are at best only partially applicable to the UMLS due to its unique content (ie, integration of many source vocabularies) and unique purpose of serving as multipurpose middleware for a wide range of different applications and systems. It has 10 UMLS references but only few are discussed. In contrast to Zhu et al, this article considers auditing of just the UMLS and is restricted to methods for identifying errors and inconsistencies in the various aspects of the Metathesaurus concepts. We include alignment and topological pattern enhancements of the UMLS sources due to the intensive use of the UMLS as a matching intermediary. Furthermore, enhancements to the UMLS sources have an indirect impact on future releases of the UMLS. To limit the scope of this review, we did not consider refinements, extensions, partitions and summarization of the SN, which were reviewed by Zhu et al. Out of the 51 UMLS references in Zhu et al, only 23, conformed to the strict interpretation of “auditing the UMLS” used in this review. Table 1 provides the criteria used for inclusion and exclusion of articles considered in this review.

Table 1.

Selection criteria for article inclusions

Type	Criteria	Rationale
Inclusion criteria	Methods for finding errors or inconsistencies of aspects of UMLS concepts	Errors and inconsistencies of concept names, synonyms, ST assignments, hierarchical (IS-A) relationships, and lateral relationships.
	UMLS auditing tools, surveys, and auditors’ performance	Owing to their relevance for the auditing process.
	Auditing observed during the integration of sources into the UMLS	Limiting the review to this side effect of the integration.
	Topological patterns techniques and alignment techniques for enhancement of the UMLS sources	Owing to their major use of the UMLS although their purpose is to enhance UMLS sources; The enhancement will indirectly be leading to modifications of the UMLS. In addition, identifying missing synonyms for UMLS concepts is another byproduct of these techniques.
Exclusion criteria	Coverage of the UMLS	Assessing the coverage of the UMLS concepts is not relevant to QA.
	Applications of the UMLS	Applications of the UMLS such as information retrieval or natural language processing are not relevant to QA.
	Auditing of sources of the UMLS	Auditing the sources is not relevant to UMLS QA.
	Integration of sources into the UMLS	Integration of the sources into the UMLS is not relevant if no auditing of the UMLS is observed.
	Refinements, extensions, or summarization networks of the UMLS SN	Refinement, extension, partition, and summarization of the UMLS SN are not focused on QA of UMLS concepts.
	Not related to UMLS (eg, UML)	Some articles that are irrelevant to the UMLS were retrieved by PubMed search (eg, Unified Modeling Language).
	General UMLS article not relevant to QA	Some general UMLS development articles were retrieved by PubMed search.
	Not an article	Conference abstracts are excluded.

QA: quality assurance; SN: Semantic Network; ST: semantic type; UML: Unified Modeling Language; UMLS: Unified Medical Language System

Selection criteria for article inclusions QA: quality assurance; SN: Semantic Network; ST: semantic type; UML: Unified Modeling Language; UMLS: Unified Medical Language System Our study concentrates on the methodology of the auditing techniques. Furthermore, our review categorizes studies differently from Zhu et al, in which the major categorization is by quality factors and levels of automation. In this review, the 83 studies are categorized according to audited aspects of the concepts, which provides a clear and comprehensive picture of methodologies for UMLS auditing. We classify techniques based on particular concept characteristics: names, synonyms, ST assignments, hierarchical (IS-A) relationships, and lateral relationships. Figure 1 shows the UMLS interface for the concept Bipolar Disorder illustrating the various aspects. For each article, we provide a brief description of its technique(s), identifying the audited aspect(s), the degree of manual versus automated approach, and the source of knowledge used to support the technique. Results appear in the Supplementary Appendix.

Figure 1.

The Unified Medical Language System (UMLS) Metathesaurus Browser user interface, displaying information for the concept Bipolar Disorder: The interface shows the focus concept Bipolar Disorder at the top of the right box, followed by the semantic type of the concept and 83 synonyms from different sources (out of which only 36 fit on the screen). Relationships (including hierarchical, lateral, and qualifiers) between Bipolar Disorders and 1691 (not necessarily different) target concepts are listed below the synonyms (shown to the right of synonyms in this figure), showing the relation, relationship attribute, source terminology, the term name in the source terminology, and the concept unique identifier (CUI) for each related concept. For example, Mood Disorders appears 6 times, each mapped to the same CUI, because this relationship is found in 6 source terminologies. The screenshot was taken on January 31, 2020, using UMLS version 2019AB.

MATERIALS AND METHODS

Identifying the references

To identify relevant articles, we followed the Institute of Medicine’s standards for systematic review and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses). Our process consisted of 4 steps: (1) identifying relevant keywords; (2) formulating the search query to identify relevant articles from PubMed and Google Scholar; (3) screening titles based on predefined inclusion and exclusion criteria; and (4) reviewing abstracts and full texts to exclude irrelevant articles and code for reasons. For details on the processing of the steps see Figure 2 and the Supplementary Appendix.

Figure 2.

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow chart for identifying the articles to be included in this review.

Errors in the UMLS

Errors may occur with every aspect of a concept in the Metathesaurus. We distinguish between 2 kinds of errors. The first is errors that are imported to the UMLS from source terminologies. Sometimes these errors are invisible within the source terminology and are exposed in the UMLS as part of an illegal structure (eg, a cycle of IS-A relationships). The second kind of errors are “made in the UMLS.” One example is the ST assignment. This is a unique feature of the UMLS, in which editors assign STs to concepts. Another example is the identification of concepts from different terminologies. In the process of integrating a source into the UMLS (eg, Systematized Nomenclature of Medicine [SNOMED] or Gene Ontology [GO]) editors must establish new concepts that do not yet appear in the Metathesaurus. When this concept appears later in another terminology with the same name, it is associated with the same concept. If this concept appears in another terminology with a different name, it is assigned as a synonym for this concept. However, those decisions are not always simple. Different terminologies have various naming conventions. There are also cases of homonyms, in which different terminologies use the same name with varying semantics. Terminologies sometimes have conflicting views about which of 2 concepts is more specific. Thus, UMLS editors sometimes erroneously multiply singular concepts or unify multiple concepts. Such errors can also cause cycles in the UMLS. Auditors of the UMLS should identify whether the error is made in the UMLS and should be reported to the NLM, or whether the error is in one source terminology and should be reported to its curators. When the error is corrected in the source terminology, it will disappear from the next release of the UMLS. An interesting phenomenon is that sometimes several errors come together for 1 or several similar concepts. For example, an auditor looking for an explanation for a missing lateral relationship may discover a wrong or missing IS-A relationship. Hence, detecting 1 error may propagate the detection of more errors that would otherwise be hidden.

Categorization of the articles

We first categorized and discussed the 83 included articles based on the aspects that they focus on (1) concepts, concept names, and synonyms; (2) semantic type assignments; (3) hierarchical (IS-A) relationships; (4) lateral relationships; (5) topological pattern–based ontology enrichment; and (6) ontology alignment. Then we coded the automation level (ie, automated systematic, automated heuristic, or manual) and knowledge source of the techniques (ie, intrinsic knowledge, extrinsic knowledge, or combined intrinsic and extrinsic knowledge), based on the definitions of the characteristics (shown in Table 2) following Zhu et al.

Table 2.

Definitions of the characteristics of the auditing techniques

Characteristics of the auditing technique	Types	Definitions
Automation level	Automated systematic	Automated systematic methods are implemented as rule-checking programs or algorithms that can automatically identify potential errors and inconsistencies in the terminology.
	Automated heuristic	Automated heuristic methods are based on rules that make inferences about terminology content and seek to identify those inferences to find likely errors and inconsistencies in the terminology.
	Manual	Manual review relies on a terminology reviewer (often a domain expert) to manually audit a certain aspect(s) of a terminology, with or without the support of a computerized user interface.
Knowledge source	Intrinsic knowledge	Intrinsic knowledge is the information derived from the classification scheme, hierarchy, relationships, or other attributes present within the terminology itself.
Knowledge source	Extrinsic knowledge	Extrinsic knowledge is derived from an outside source, such as other terminologies or human expert knowledge.

Definitions of the characteristics of the auditing techniques Categorizing the articles according to these aspects allows us to combine the description of similar studies and present in our review the progress of the research ideas underlying them (see Discussion). The descriptions are succinct and focus on the essence of the techniques used, rather than the results, in order to cover a large number of studies in a limited space. The auditing results appear in the Supplementary Table 1.

RESULTS

Auditing techniques for the various aspect categories

Concepts, concept names, and synonyms

Synonym detection serves as a critical check on preventing the creation of redundant concepts when the UMLS intakes updates from source terminologies. Redundant (synonymous) concepts were detected lexically by Cimino,, by looking for pairs of concept names with the same words in a different order or with different punctuation. Those comparisons were expanded by identifying interchangeable keyword synonyms. Hole and Srinivasan introduced several heuristics used by the NLM to boost the sensitivity: lexical tweaks (eg, trimming space or punctuation), synonymous word swaps (eg, “renal” vs “kidney”), and enhanced matching through synonymous token discovery (originally credited to R.A. Miller). Huang et al further expanded the third heuristic into a formal algorithm (GSMake). The assumption is that nonoverlapping word(s) from 2 overlapping synonyms of the same concept unique identifier (CUI) may be collected as interchangeable components to facilitate sensitive matching. As an extension, Huang et al substituted the GSMake for WordNet synsets in generating the alternative match terms. They came up with ancillary heuristics to control the exploding variants by the maximum number of allowed word swaps per term and the maximum length of a term being processed. Huang et al applied these culminated methods to audit duplicate concepts that had been incorporated from the SNOMED Clinical Terms (CT) into the UMLS. Bodenreider and McCray and McCray et al aggregated the STs into 15 semantic groups. Erdogan et al, discovered missing concepts by identifying concepts that hierarchically belonged to 2 inconsistent semantic groups and yet had no ancestors carrying such inconsistency. They implemented an answer set programming method to enhance efficiency. They found that an additional concept sharing synonym at the inconsistent spot must be created to untangle the 2 semantic lineages, a common type of error. The coverage of UMLS genomic and proteomic concepts and relationships is compared with terms from GO and LocusLink. The relationships are well-represented but the coverage of fine-grained concepts is limited (year of study: 2002). Disambiguation of ambiguous terms is recommended with a systematic polysemy focus using context rather than integrating gene and gene product names. Liu et al performed a systematic analysis of abbreviations in the UMLS synonyms. The prevalent ambiguity (1 abbreviation shared by multiple CUIs) was not necessarily a quality indicator, but the evaluation of abbreviation used in clinical text could be viewed as a benchmark of lexical completeness for representing the medical domain. Merrill applied a formal semantic analysis to clarify fundamental notions underlying the UMLS: atom, term, and concept. He proposed an approach based on “synonymy-based Metathesaurus models,” with theoretical principles for maintaining the UMLS semantic integrity.

Semantic type assignment

The UMLS consists of 2 levels: the Metathesaurus and the SN of 127 STs., Each Metathesaurus concept is assigned 1 or more STs. In Geller et al and Gu et al,, as an outgrowth of the development of an object-oriented database version of the UMLS, Gu et al, introduced the Refined SN (RSN) Abstraction Network, in which each concept is assigned only 1 Refined ST (RST). The intersection of multiple STs is described by an Intersection ST (IST). Concepts assigned ISTs are more complex, with multiple semantics and a higher probability of errors. For example, Cimino found concepts assigned to mutually exclusive STs to be contradictions. Figure 3 illustrates an IST and the mapping between the 2 levels of the UMLS.

Figure 3.

The 2 levels of the Unified Medical Language System. In the Semantic Network (SN) level, we have the semantic types Neoplastic Process (NP), Experimental Model of Disease (EMD), and the intersection semantic type (IST) NP∩EMD between them. The Metathesaurus level shows concepts assigned the intersection semantic type and the 2 pure semantic types, colored to correspond to the colors of their assigned semantic type. For example, the concept Neoplasms, Experimental (as suggested by its name) is assigned both STs. Consider an IST A∩B such that A is an ancestor of B in SN. According to the specificity rule, the assignment of A is redundant and the concepts should be assigned only B. Peng et al designed an algorithm to find and remove all redundant ST assignments, which was implemented by Srinivasan of UMLS and is used in the UMLS production (YP personal communication with Suresh Srinivasan in 2005). Cimino, used multiple ST assignments to identify potentially ambiguous UMLS “concepts” with multiple meanings. ST assignments were also used to identify inconsistent hierarchical relationships where a type assigned to a child concept was neither identical to, nor a descendant of any of the types assigned to its parent. In 2 versions of the UMLS, inconsistent classifications were detected. Cimino et al refined the use of ST assignments to detect and classify errors in hierarchical relationships, finding concept pairs with inconsistent classification. Gu et al,, showed that uncommonly modeled small IST concepts had higher error rates than larger ones did. ST assignment errors were found in Gu et al. For example, Scotch Tape Mount, a laboratory procedure for detecting pinworms, was assigned to Bacterium and Laboratory Procedure. A broader study of all 232 such concepts by 4 domain experts showed that multiple auditors are required to achieve reliability in auditing complex IST concepts. They experimented auditing with the Neighborhood Auditing tool (NAT) (Figure 4). A study found it more effective than the UMLS browser (Figure 1). Ochs et al extended the NAT for a relationship-centric browsing and auditing tool.

Figure 4.

The Neighborhood Auditing Tool (NAT) interface and corresponding “neighborhood” network: (A) A screenshot of the NAT tool for the concept Bipolar Disorder (as in Figure 1 for the Unified Medical Language System [UMLS] interface): the focus concept is shown in the central box. The parents and grandparents in the top box (with indentation), and children (and grandchildren [not displayed]) in the bottom box. The synonyms are to the left and relationships (or siblings) are to the right. The semantic type for each concept in the screen is in blue, the UMLS sources in green, and the concept unique identifier in red. The number of concepts in each box overflows its capacity and the box is scrollable. This screenshot from 2011 is interesting because it is rich enough to display a forbidden cycle of 3 concepts. Mood Disorders as the top child, → Bipolar Disorders as the focus concept, → Affective Disorders, Psychotic as a second parent (third line from the bottom), → Mood Disorder as the sixth grandparent, closing a cycle of 3 concepts. This error was reported to the UMLS team and this cycle does not exist in the UMLS 2019 AB version of Bipolar Disorder (Figure 1). (B) Excerpt of the neighborhood for Bipolar Disorder: the highlighted boxes in yellow shows the cycle of 3 concepts. The light blue rectangles correspond to the various windows in panel (A). The cohesive meta-schema, is a partition of the STs of SN into the Meta Semantic Types (MST). In auditing concepts of small pure MSTs, a higher error rate is found because concepts in the intersection of different MSTs are more likely to have errors. Mougin et al analyzed concepts with multiple semantic groups in the UMLS. Categorization inconsistency between parent and child concepts is an indicator of categorization error. Chen et al, considered group auditing by expanding RST extent (the set of concepts assigned a RST). For each RST, an envelope of parents and children of extent concepts is defined. The expansion algorithm iteratively suggests concepts for review by a domain expert. Analyzing the results, Chen et al reported on additional concepts missing ST that were not found because some concepts are assigned ST Classification, which blocks the expansion. A revised algorithm overcoming the blockage was applied to the ST Experimental Model of Disease and found many extra concepts. Geller et al introduced 2 structural inconsistency patterns of the dual hierarchical relationships of 2 UMLS concepts and their STs: ST inversion (ie, ST assigned to child concept is more general than ST assigned to parent concept) and lack of ancestry. The former is a better indicator of errors than the latter. Wei et al evaluated ST assignment consistency in the UMLS using SNOMED CT Specimen concepts. Overlapping concepts in intersections of semantic uniformity groups, defined by concepts’ structural features, are strong indicators of inconsistency. Gu et al partitioned a SNOMED CT hierarchy into disjoint semantic uniformity groups based on concepts’ STs in the UMLS. Concepts in small groups are more likely to have ST assignment errors. Gu et al detected ST assignment errors for UMLS concepts if their STs are inconsistent with the mapped STs of their SNOMED CT semantic tags. Mejino and Rosse illustrated that inconsistent ST assignments of anatomical concepts in UMLS can be reconciled following the principles of the Digital Anatomist and Foundational Model of Anatomy. Chen et al modified RSN to model chemical concepts with multiple “Chemical Viewed Structurally” STs, identifying concepts with an invalid combination of STs and with incorrect ST assignments. Morrey et al resolved redundant ST assignments for chemical composites with multiple STs based on the relative sizes of components in the molecular structure of a composite. A Chemical Specialty Semantic Network was developed to provide a better categorization of chemical concepts in the UMLS. Rare STs in Chemical Specialty Semantic Network highlight errors. Fan et al, proposed a corpus-driven approach to auditing ST assignments. A huge set of >14 million MetaMap-processed PubMed abstracts was used to identify CUIs and their shallow-parsed contextual features. After CUI-to-ST grouping, a distributional classifier was trained for reclassifying CUIs into more appropriate semantic categories. Fan et al,, added a text classifier using the CUI lexical synonyms and found it complementary to the earlier context-based classifier. He et al addressed auditing ST assignments for the 10 STs in the top levels of SN. By the specificity rule, a concept is assigned the most specific ST possible. Hence, top STs should be assigned only to a few general or abstract concepts. Reviewers found that 2-thirds of these concepts have too general incorrect ST assignments. UMLS editors should avoid “erring up” in assigning top STs. He et al monitored the longitudinal changes in ST assignments via the lens of the RSN. They showed that many intersections that were removed from RSN due to error reports reappear due to the categorization of new concepts to nonsensical or forbidden ST combinations. To cope with this problem, Geller et al created a rule-based system for the UMLS editors to test whether any combination of up to 5 STs is allowed, forbidden, or questionable.

Hierarchical (IS-A) relationships

Hierarchical relationships constitute the backbone of a terminology enabling inheritance of lateral relationships and enable efficient use of a whole class of concepts (eg, Myocardial Infarction and all its descendants) for information retrieval, data mining, etc. Researchers pay special attention to auditing the hierarchical relationships of the UMLS. In a terminology, a cycle of hierarchical relationships is forbidden. In the UMLS, such cycles indicate errors in or inconsistencies between sources. Detection and resolution of cycles between 2 and among 3 UMLS concepts are discussed by Bodenreider, Mougin and Bodenreider, and Halper et al respectively. Pisanelli et al detected redundancies, cycles, and misuse of hierarchical relationships by an ontological analysis of the Metathesaurus. Bodenreider examined redundancy and semantic consistency in hierarchical relationships in UMLS sources by indexing hierarchical paths between 2 concepts. A weak link is found between redundancy and semantic consistency. Semantic inconsistency in redundant hierarchical relationships indicates potential miscategorization. Xing et al developed a tool (FEDRR) to detect redundant hierarchical relationships in source vocabularies, in linear time, using UMLS files. The overall completeness, consistency, and usability of the UMLS are evaluated by Bodenreider et al using a multiaxial coding system (MAOUSSC). They note inconsistency in hierarchical relationships and a paucity of lateral relationships (year of study: 1998). Bodenreider et al further examined the occurrence of noun phrase modifiers for concepts to assess the consistency of biomedical terminologies. The study compared disease and procedure terms in SNOMED to the UMLS. They counted the frequency of modifier pairs (eg, acute and chronic, primary and secondary) in the noun phrases and noted the lack of certain terms and relationships. Another method, COHeRE (Cross-Ontology Hierarchical Relation Examination) detects inconsistencies and possible errors in hierarchical relationships across UMLS sources. COHeRE leverages the UMLS knowledge sources and the MapReduce cloud computing technique for systematic, large-scale ontology QA. Research indicates the majority of inconsistent relationships exist in the sources rather than being introduced in the UMLS integration process. An algorithm to identify missing IS-A relationships from concepts of an extent of a RST is described by Chen et al. The extent of each RST is divided into singly rooted components. The recursive algorithm suggests for an editor to check missing IS-A relationships from the roots of small components to concepts of large components. Gu et al examined conflicting hierarchical relationships, redundant hierarchical relationships, mixed hierarchical and lateral relationships, and multiple lateral relationships in the UMLS. They investigated whether multiple relationships between 2 concepts are from the same source terminologies.

Lateral relationships

It is important to audit lateral relationships because they bear the nonhierarchical semantic connections between concepts. For certain concepts, it is impossible to reduce the ambiguity of relationships but possible to limit ambiguity by suggesting other relationship types. Mary et al proposed to extend the relationships with several relationships defined between semantic types in SN, which may improve web searches. Other researchers focused on auditing concepts associated with multiple relationships that differ in granularity or are contradictory, heterogeneous, or homogeneous. A semantic method was proposed for auditing lateral relationships by transforming them into a relationship signature and mapping signatures from the Metathesaurus to the SN. Vizenor et al argued that the semantics of lateral relationships need to be more explicitly defined by ontology developers and extend the SN. Schulz and Hahn created a terminological knowledge base using the Metathesaurus. They extracted anatomy and pathology concepts from the Metathesaurus and map them in a semi-automated way to a representation model that emphasizes part-whole reasoning. The process reveals inconsistencies in lateral relationships.

Topological pattern–based ontology enrichment

He et al introduced a topological pattern–based ontology enrichment method for source ontologies in the UMLS. Topological patterns are derived from the UMLS based on the IS-A links between identical pairs of concepts from 2 ontologies. An m:n pattern has m(n) IS-A links in the first (second) ontology. The intermediate concepts are candidates to enrich a source ontology pending the review of domain experts. This method can also help audit the UMLS by detecting missing synonyms or erroneous classifications. The 2:2, k:1, 1:k, and m:n patterns were considered to enrich SNOMED CT., In other studies, NCI Thesaurus (NCIt) was enriched., Additionally, a mathematical formula was used to compute the number of potential placements of new concepts in a target ontology. The formula was extended to the cases where cross-ontology synonyms are possible. These methods leveraged the vertical density differences between 2 ontologies. Keloth et al considered the horizontal density differences (number of siblings) in 2 ontologies. In most cases, the differences in sets of siblings are due to alternative classifications by ontology designers, not enabling enrichment. Keloth et al also used a mathematical criterion for likely cases of alternative classification to reduce human efforts for finding potential cases. They designed randomized controlled trials to compare the recommendations, with the decisions of a human expert.

Ontology alignment

Alignment techniques serve to investigate the equivalence between concepts based on various kinds of mapping across ontologies. The mappings are based on similar concept names, definitions, and relationships. Bodenreider and Burgun applied 2 methods based on lexical and conceptual similarity for node alignment of the SN with the UMLS Metathesaurus. Vizenor et al aligned the Metathesaurus relationships with SN relationships. One of the main applications enabled by these alignment strategies is auditing the consistency between the SN and the Metathesaurus. A limited review uncovered wrong and missing occurrences in both ST assignments and hierarchical relationships. Schulz et al provided methods and assessed the alignment of UMLS SN with BioTop, and identified inconsistent multiple ST combinations. Several studies have used UMLS synonymy to identify anchor concepts for point-to-point mappings across ontologies, such as NCIt, Adult Mouse Anatomical Dictionary, MeSH, ATC, Foundational Model of Anatomy, and GALEN, which could not be found by lexical similarity and conceptual similarity. When the inconsistencies uncovered in these studies of the different ontologies are corrected, they are indirectly updated in the new UMLS release. The results of the different alignment methods for anatomical ontologies were also compared and analyzed along with the challenges of the alignment process. Furthermore, ontology alignment and auditing have been facilitated by automated approaches, including logic-based and string similarity-based approaches. Jimenez-Ruiz et al described a mapping among NCIt, Foundational Model of Anatomy, and SNOMED CT utilizing UMLS as alignments reference. A logic-based semantics technique was illustrated to effectively detect errors in the UMLS, utilizing the conservativity, consistency and locality principles. To enhance source-integration and auditing, a SPED (Shortest Path Edit Distance) algorithm was proposed as a string similarity measure for UMLS terms.

Summary of study properties

Table 3 depicts the categorization data of the reviewed articles. For each technique, we recorded the level of automation and the kind of knowledge source, 2 critical issues for applying the technique. This 3-dimensional table groups together studies with similar qualities, for example, those which report on auditing hierarchical relationships using automatic systematic techniques or which use only intrinsic knowledge. However, finding the automation level or knowledge type of a given article requires a search for its reference in some entries. Thus, Table 4 lists the qualities of each article. In Supplementary Table 1, we report on the auditing results for the articles. Figure 5 shows the distribution of articles of different categories over time.

Table 3.

A 3-dimensional table categorizing studies by the aspects audited, the automation level and the kind of knowledge used

Automation level	Knowledge source
Automation level	Intrinsic knowledge (n = 23)	Extrinsic knowledge (n = 5)	Intrinsic and extrinsic knowledge (n = 55)
Concepts, concept names, and synonyms (37 references)
Automated systematic (n = 7)	⁶³ ^, ⁶⁵	³⁰ ^, ⁵⁷	³¹ ^, ⁵² ^, ¹¹⁵
Automated heuristic (n = 29)	¹¹ ^, ^27–29 ^, ⁵⁵ ^, ⁵⁶		^23–25 ^, ³⁶ ^, ^88–92 ^, ^101–114
Manual (n = 1)			⁵⁸
Semantic type assignment (36 references)
Automated systematic (n = 13)	¹¹ ^, ²¹ ^, ²⁷ ^, ⁴⁶ ^, ⁶³ ^, ⁶⁵ ^, ⁷⁰	⁷⁷ ^, ⁷⁸	²² ^, ⁴⁸ ^, ⁷⁹ ^, ⁸⁰
Automated heuristic (n = 21)	⁸⁷		¹⁰ ^, ^18–20 ^, ³⁵ ^, ³⁷ ^, ^39–41 ^, ^43–45 ^, ⁴⁷ ^, ⁶² ^, ⁶⁸ ^, ⁶⁹ ^, ⁷¹ ^, ⁷⁵ ^, ⁷⁶ ^, ⁹⁴
Manual (n = 2)		⁸¹	⁷²
Hierarchical relationships (24 references)
Automated systematic (n = 13)	¹¹ ^, ²⁶ ^, ²⁷ ^, ³² ^, ³³ ^, ⁴² ^, ⁶³ ^, ⁶⁴ ^, ⁶⁵ ^, ^82–84		³¹
Automated heuristic (n = 11)	¹⁷ ^, ³⁴ ^, ³⁸ ^, ⁵⁵ ^, ⁵⁶		¹⁰ ^, ³⁵ ^, ³⁶ ^, ⁸⁵ ^, ⁸⁶ ^, ⁹³
Lateral relationships (12 references)
Automated systematic (n = 5)	¹¹ ^, ²⁶ ^, ⁶³ ^, ⁶⁵		³¹
Automated heuristic (n = 7)	¹⁷ ^, ³⁸ ^, ⁸⁷		³⁵ ^, ³⁶ ^, ⁸⁶ ^, ⁹³

Table 4.

Direct access for categorization data

Ref	ASPE	AT	KNW	Ref	ASPE	AT	KNW	Ref	ASPE	AT	KNW
¹⁰	STA HREL	AH	IEK	³⁸	HREL LREL	AH	IK	⁷⁵	STA	AH	IEK
¹¹	CCNS STA HREL REL	AH AS AS AS	IK IK IK IK	³⁹	STA	AH	IEK	⁷⁶	STA	AH	IEK
¹⁷	HREL LREL	AH	IK	⁴⁰	STA	AH	IEK	⁷⁷ ^, ⁷⁸	STA	AS	EK
¹⁸	STA	AH	IEK	⁴¹	STA	AH	IEK	⁴⁸ ^, ⁷⁹ ^, ⁸⁰	STA	AS	IEK
¹⁹	STA	AH	IEK	⁴²	HREL	AS	IK	⁸¹	STA	MN	EK
²⁰	STA	AH	IEK	⁴³	STA	AH	IEK	⁸²	HREL	AS	IK
²¹	STA	AS	IK	⁴⁴	STA	AH	IEK	⁸³	HREL	AS	IK
²²	STA	AS	IEK	⁴⁵	STA	AH	IEK	⁸⁴	HREL	AS	IK
²³	CCNS	AH	IEK	⁴⁶	STA	AS	IK	⁸⁵	HREL	AH	IEK
²⁴	CCNS	AH	IEK	⁴⁷	STA	AH	IEK	⁸⁶	HREL LREL	AH	IEK
²⁵	CCNS	AH	IEK	⁵²	CCNS	AS	IEK	⁸⁷	LREL STA	AH	IK
²⁶	HREL LREL	AS	IK	⁵⁵ ^, ⁵⁶	CCNS HREL	AH	IK	⁸⁸	CCNS	AH	IEK
²⁷	STA CCNS HREL	AS AH AS	IK IK IK	⁵⁷	CCNS	AS	EK	⁸⁹	CCNS	AH	IEK
²⁸	CCNS	AH	IK	⁵⁸	CCNS	MN	IEK	⁹⁰	CCNS	AH	IEK
²⁹	CCNS	AH	IK	⁶²	STA	AH	IEK	⁹¹	CCNS	AH	IEK
³⁰	CCNS	AS	EK	⁶³	ALL	AS	IK	⁹²	CCNS	AH	IEK
³¹	HREL LREL CCNS	AS	IEK	⁶⁴	HREL	AS	IK	⁹³	HREL LREL	AH	IEK
³²	HREL	AS	IK	⁶⁵	ALL	AS	IK	⁹⁴	STA	AH	IEK
³³	HREL	AS	IK	⁶⁸	STA	AH	IEK	^101–107	CCNS	AH	IEK
³⁴	HREL	AH	IK	⁶⁹	STA	AH	IEK	^108–111	CCNS	AH	IEK
³⁵	HREL LREL STA	AH	IEK	⁷⁰	STA	AS	IK	^112–114	CCNS	AH	IEK
³⁶	HREL LREL CCNS	AH	IEK	⁷¹	STA	AH	IEK	¹¹⁵	CCNS	AS	IEK
³⁷	STA	AH	IEK	⁷²	STA	MN	IEK

This table enables direct access to the categorization properties for each study.

AH: automated heuristic; AS: automated systematic; ASPE: aspect; AT: automation level; CCNS: concepts, concept names, and synonyms; EK: extrinsic knowledge; HREL: hierarchical relationships; IEK: intrinsic and extrinsic knowledge; IK: intrinsic knowledge; KNW: knowledge source; LREL: lateral relationships; MN: manual; Ref: reference; STA: semantic type assignment.

Figure 5.

Publication trend over time. The trends of the numbers of publications about Unified Medical Language System (UMLS) auditing between 1998 and 2019, stratified by different aspects of a UMLS concept. Note that an article may audit several aspects of a concept so the total may be less than the sum of all the aspects. Overall, there are 2 surges of publications in 2007 and 2009 with 11 and 12 articles, respectively, possibly due to National Library of Medicine funding support on UMLS quality assurance 2005-2009 and the first special issue on terminology auditing in 2009. Except for those 2 years, there were on average about 3 publications a year. During 2010-2012, there are still more late publications due to above funding. In the last 7 years we see a decline of interest in quality assurance of the UMLS, with an average of 2.4 articles per year. For example, the second special issue on terminology auditing in 2018 did not include any UMLS articles. In 2007, most articles were focused on concept names and synonyms and semantic type assignments (STAs). In 2009, most articles were about auditing STAs, while in 2010, most articles were focused on concept names and synonyms. The numbers of articles that audited relationships were consistently low, but there were more articles on auditing hierarchical relationships (HREL) than lateral relationships (LREL). CCNS: concepts, concept names, and synonyms. A 3-dimensional table categorizing studies by the aspects audited, the automation level and the kind of knowledge used Direct access for categorization data STA HREL HREL LREL CCNS STA HREL REL AH AS AS AS IK IK IK IK HREL LREL HREL LREL LREL STA HREL LREL CCNS HREL STA CCNS HREL AS AH AS IK IK IK HREL LREL CCNS HREL LREL HREL LREL STA HREL LREL CCNS This table enables direct access to the categorization properties for each study. AH: automated heuristic; AS: automated systematic; ASPE: aspect; AT: automation level; CCNS: concepts, concept names, and synonyms; EK: extrinsic knowledge; HREL: hierarchical relationships; IEK: intrinsic and extrinsic knowledge; IK: intrinsic knowledge; KNW: knowledge source; LREL: lateral relationships; MN: manual; Ref: reference; STA: semantic type assignment.

DISCUSSION

Terminology developers variably follow desirable characteristics for terminology models. Errors and inconsistencies are therefore expected when integrating terminologies into the UMLS. The 83 reviewed studies demonstrate the special role UMLS plays in the field of QA of terminologies.,, We have given a short description of the QA techniques in each of the 83 surveyed articles in order to describe the various available techniques in a single place. Classifying the techniques according to the aspects of a UMLS concept will enable practitioners like UMLS curators in the NLM and researchers to learn from previously developed techniques, say for QA of synonyms or hierarchical relationships. From Table 3, we observe: Most studies (n = 55) combine the use of intrinsic and extrinsic knowledge (IEK) sources. Most studies (n = 56) use automated heuristic (AH) techniques. Some studies (n = 26) are automated systematic (AS), and very few (n = 3) are purely manual (MN), while 46 are both AH and IEK. The 2 most common aspects are concepts, concept names, and synonyms (n = 37) and ST assignments (n = 36). As much as researchers try to develop automatic techniques, there is typically a need for a domain expert review of the results. This is not surprising since QA is as complicated as terminology modeling, which is not automatic. Only specific errors, which are detectable by logical rules, can be totally automated (eg, redundancy in ST assignment, hierarchical, and lateral relationships). That the 2 most investigated aspects are concepts, concept names, synonyms and ST assignments, which may be because they are the most important features of the UMLS and are widely used for various downstream applications including natural language processing, data mining, information retrieval, mapping from local terminologies, creation of clinical data warehouses, etc. In addition, errors in these aspects can be corrected in the UMLS itself, as explained in Introduction. The techniques concentrated on ST assignment utilize the mapping from the Metathesaurus to the SN, which is an Abstraction Network designed independently for the UMLS to capture the semantics of concepts, in contrast to Abstraction Networks which are derived from a terminology. Thus, the mapping of concepts to STs provides a reality check on the mapping of terms to concepts. In particular, several studies demonstrate that concepts assigned to multiple STs are susceptible to errors due to their semantic complexity. Designers of terminologies could mimic the UMLS in creating an a priori Abstraction Network for a terminology rather than an a posteriori Abstraction Network as has been done previously. Learning from the UMLS experience, such networks can help with terminology QA. Finally, an error in the ST assignment might indicate confusion or ambiguity about the semantics of a concept which may be manifested in the existence of other errors. This is only one way that the unique design of the UMLS opened research opportunities in the field of auditing biomedical terminologies. As a compendium knowledge base that integrates multiple terminologies, UMLS opens the possibility of comparing and contrasting multiple terminologies. This quality was exploited in the research on alignments of terminologies and topological-based ontology enrichment. In the Materials and Methods section, we described how some kinds of errors occur in the UMLS. Errors migrating to the UMLS from source terminologies can be detected by the context in which the modeling of several terminologies is contrasted. These errors were not detected in the context of their own terminologies. But even errors made in the UMLS, like erroneously matching concepts from different terminologies to the same UMLS concept, provide valuable feedback on semantic issues and improper naming to the source terminologies curators. Figure 5 shows a constant flow of approximately 3 articles a year. The peaks in 2007 and 2009 show the impact funding can have on this niche research area. In some subjects one can trace research progress along a particular approach. For example, auditing IS-A cycles of length 2 started with Bodenreider and continued with Mougin and Boudenreider, and extended for length 3 in Halper et al. Another example of longitudinal progress is found regarding likelihood of errors in small ISTs. The early detection of errors was a side effect., Several rigorous studies established this observation., In Gu et al, all small ISTs concepts were reviewed. In Gu et al,, 2 refinements were presented. In Chen et al and Morrey et al,, special modeling was required for chemical concepts and ISTs in which intersections are frequent. Then, in He et al, a longitudinal study showed that, while NLM corrected reported errors, eliminating nonsensical or forbidden ISTs (by SN use notes), those ISTs pop up again after a year or 2. To provide a systematic solution preventing such cases, Geller et al designed a system to check the legitimacy of any IST before assigning it to concepts. This chain of articles demonstrates a development from an accidental observation, through studies reported to the UMLS, to the creation of a tool to provide a systematic solution to prevent errors by editors, obtaining better quality, and saving resources. The topological pattern–based method started with vertical density differences between a pair of source terminologies in terms of IS-A paths. It continued to construct different topological patterns., Recently, Keloth et al., expanded to investigate the horizontal density differences between source terminologies. Alignment research began with specific ontologies., The techniques evolved from manual to automated, rule-based systems and hybrid strategies combining direct and indirect alignment techniques,, with Zhang and Bodenreider, summarizing lessons learned.

Future research

To predict future directions for auditing the Metathesaurus, we take cues from recent trends in QA of UMLS source terminologies (eg, SNOMED, GO, NCIt). Initial efforts harness machine learning (ML) for QA of hierarchical (IS-A) relationships. A critical issue for ML is obtaining training data, perhaps by comparing consecutive releases of UMLS tracing error corrections. The techniques we reviewed are based on one idea. In the terminology QA, we observe a recent trend of hybrid techniques combining multiple ideas. Hybrid techniques hold promise for better performance. For example, Cui et al combined the structural technique of nonlattice with the natural language processing technique. Others combine structural and lexical techniques. Two ML techniques mentioned previously, are hybrid. Scalable approaches based on distributed computing framework for big data (eg, MapReduce) audit IS-A relationships in UMLS and SNOMED CT. To reduce the level of human efforts involved in auditing, we expect that ML, hybrid, and big data techniques will improve the auditing yield by increasing the ratio of errors found to the number of concepts reviewed in the UMLS and reducing false positives. Improvements in this area will lead to greater application of the methods.

CONCLUSION

The UMLS contains innumerable errors and inconsistencies of varying importance that originate in its component terminologies or in the addition of its unique features. Researchers of QA techniques for the UMLS have found creative methods to expose errors and inconsistencies in this enormous problem space. This exhaustive survey of the state of the art in UMLS auditing will assist researchers, terminology resource developers, and advanced UMLS users to identify and adapt existing methods that may be applicable to their own needs.

FUNDING

JC is supported in part by research funds from the University of Alabama School of Medicine Informatics Institute and by the Center for Clinical and Translational Sciences under grant UL1TR001417 and the National Center for Advancing Translational Sciences. ZH is supported in part by the University of Florida Clinical and Translational Science Institute funded by National Center for Advancing Translational Sciences under award number UL1TR001427 and National Institute on Aging awards R01AG064529 and R21AG061431. LL is supported by Medical Informatics Fellowship with Veteran Administration funding from the Office of Academic Affairs, Department of Veterans Affairs. LZ is supported by Monmouth University Summer Faculty Fellowship. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

AUTHOR CONTRIBUTIONS

YP, JC, and XZ conceived, designed, guided, and coordinated the study and the writing. ZH and DW identified publication records from MEDLINE and screened the titles, abstracts, and full text of the articles. ZL identified additional articles that were not retrieved in MEDLINE. VK prepared the figures. The tables were prepared by LZ, VK, DW, and ZH. Each of the 9 authors reviewed and summarized a subset of articles. All the authors contributed to the writing of the article. The Abstract, Introduction, Discussion, and Conclusion were written jointly by YP and JC. ZH wrote the Materials and Methods section. LL performed thorough editing of the article. All the authors revised and approved the final article.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Journal of the American Medical Informatics Association online. Click here for additional data file.

109 in total

A review of auditing techniques for the Unified Medical Language System.

INTRODUCTION

MATERIALS AND METHODS

Identifying the references

Errors in the UMLS

Categorization of the articles

RESULTS

Auditing techniques for the various aspect categories

Concepts, concept names, and synonyms

Semantic type assignment

Hierarchical (IS-A) relationships

Lateral relationships

Topological pattern–based ontology enrichment

Ontology alignment

Summary of study properties

DISCUSSION

Future research

CONCLUSION

FUNDING

AUTHOR CONTRIBUTIONS

SUPPLEMENTARY MATERIAL

1. Battling Scylla and Charybdis: the search for redundancy and ambiguity in the 2001 UMLS metathesaurus.

2. Aggregating UMLS semantic types for reducing conceptual complexity.

3. Auditing as part of the terminology design life cycle.

4. Lessons learned from cross-validating alignments between large anatomical ontologies.

5. Contrasting lexical similarity and formal definitions in SNOMED CT: consistency and implications.

6. The potential of the digital anatomist foundational model for assuring consistency in UMLS sources.

7. Quality assurance of biomedical terminologies and ontologies.

Review 8. Assessing the practice of biomedical ontology evaluation: Gaps and opportunities.

9. MaPLE: A MapReduce Pipeline for Lattice-based Evaluation and Its Application to SNOMED CT.

10. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT.

1. The UMLS knowledge sources at 30: indispensable to current research and applications in biomedical informatics.

2. Extending import detection algorithms for concept import from two to three biomedical terminologies.

3. Quality assurance and enrichment of biological and biomedical ontologies and terminologies.