Licong Cui1, Olivier Bodenreider2, Jay Shi3, Guo-Qiang Zhang4. 1. Department of Computer Science, University of Kentucky, Lexington, KY, USA; Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA. Electronic address: licong.cui@uky.edu. 2. National Library of Medicine, Bethesda, MD, USA. 3. Department of Internal Medicine, University of Kentucky, Lexington, KY, USA. 4. Department of Computer Science, University of Kentucky, Lexington, KY, USA; Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA; Department of Internal Medicine, University of Kentucky, Lexington, KY, USA.
Abstract
OBJECTIVE: We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS: Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. RESULTS: A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. CONCLUSIONS: Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT.
OBJECTIVE: We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations. METHODS: Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT's IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor concepts within the non-lattice subgraph. In stage 3, subset inclusion relations between the lexical attribute sets of each pair of concepts in each non-lattice subgraph are compared to existing IS-A relations in SNOMED CT. For concept pairs within each non-lattice subgraph, if a subset relation is identified but an IS-A relation is not present in SNOMED CT IS-A transitive closure, then a missing IS-A relation is reported. The September 2017 release of SNOMED CT (US edition) was used in this investigation. RESULTS: A total of 14,380 non-lattice subgraphs were extracted, from which we suggested a total of 41,357 missing IS-A relations. For evaluation purposes, 200 non-lattice subgraphs were randomly selected from 996 smaller subgraphs (of size 4, 5, or 6) within the "Clinical Finding" and "Procedure" sub-hierarchies. Two domain experts confirmed 185 (among 223) suggested missing IS-A relations, a precision of 82.96%. CONCLUSIONS: Our results demonstrate that analyzing the lexical features of concepts in non-lattice subgraphs is an effective approach for auditing SNOMED CT.
Authors: Christopher Ochs; James Geller; Yehoshua Perl; Yan Chen; Junchuan Xu; Hua Min; James T Case; Zhi Wei Journal: J Am Med Inform Assoc Date: 2014-10-21 Impact factor: 4.497
Authors: Yue Wang; Michael Halper; Duo Wei; Huanying Gu; Yehoshua Perl; Junchuan Xu; Gai Elhanan; Yan Chen; Kent A Spackman; James T Case; George Hripcsak Journal: J Biomed Inform Date: 2011-09-01 Impact factor: 6.317
Authors: Anna Ostropolets; Christian Reich; Patrick Ryan; Chunhua Weng; Anthony Molinaro; Frank DeFalco; Jitendra Jonnagaddala; Siaw-Teng Liaw; Hokyun Jeon; Rae Woong Park; Matthew E Spotnitz; Karthik Natarajan; George Argyriou; Kristin Kostka; Robert Miller; Andrew Williams; Evan Minty; Jose Posada; George Hripcsak Journal: AMIA Annu Symp Proc Date: 2021-01-25