Byungkon Kang1, Jisang Yoon2, Ha Young Kim2, Sung Jin Jo3, Yourim Lee4, Hye Jin Kam5. 1. Department of Computer Science, State University of New York, Incheon, South Korea. 2. Graduate School of Information, Yonsei University, Seoul, South Korea. 3. Department of Industrial and Management Engineering, Pohang University of Science and Technology, Pohang, North Gyeongsang,South Korea. 4. RWE Analytics, EvidNet, Seongnam-si, Gyeonggi-do, South Korea. 5. Healthcare, Life Solution Cluster, New Business Unit, Hanwha Life, Seoul, South Korea.
Abstract
OBJECTIVE: Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping. MATERIALS AND METHODS: To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings. RESULTS: The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements. DISCUSSION: The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system. CONCLUSION: Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.
OBJECTIVE: Accessing medical data from multiple institutions is difficult owing to the interinstitutional diversity of vocabularies. Standardization schemes, such as the common data model, have been proposed as solutions to this problem, but such schemes require expensive human supervision. This study aims to construct a trainable system that can automate the process of semantic interinstitutional code mapping. MATERIALS AND METHODS: To automate mapping between source and target codes, we compute the embedding-based semantic similarity between corresponding descriptive sentences. We also implement a systematic approach for preparing training data for similarity computation. Experimental results are compared to traditional word-based mappings. RESULTS: The proposed model is compared against the state-of-the-art automated matching system, which is called Usagi, of the Observational Medical Outcomes Partnership common data model. By incorporating multiple negative training samples per positive sample, our semantic matching method significantly outperforms Usagi. Its matching accuracy is at least 10% greater than that of Usagi, and this trend is consistent across various top-k measurements. DISCUSSION: The proposed deep learning-based mapping approach outperforms previous simple word-level matching algorithms because it can account for contextual and semantic information. Additionally, we demonstrate that the manner in which negative training samples are selected significantly affects the overall performance of the system. CONCLUSION: Incorporating the semantics of code descriptions more significantly increases matching accuracy compared to traditional text co-occurrence-based approaches. The negative training sample collection methodology is also an important component of the proposed trainable system that can be adopted in both present and future related systems.
Authors: Jeffrey G Klann; Lori C Phillips; Christopher Herrick; Matthew A H Joss; Kavishwar B Wagholikar; Shawn N Murphy Journal: J Am Med Inform Assoc Date: 2018-10-01 Impact factor: 4.497
Authors: Kristine E Lynch; Stephen A Deppen; Scott L DuVall; Benjamin Viernes; Aize Cao; Daniel Park; Elizabeth Hanchrow; Kushan Hewa; Peter Greaves; Michael E Matheny Journal: Appl Clin Inform Date: 2019-10-23 Impact factor: 2.342
Authors: Andrew J McMurry; Shawn N Murphy; Douglas MacFadden; Griffin Weber; William W Simons; John Orechia; Jonathan Bickel; Nich Wattanasin; Clint Gilbert; Philip Trevvett; Susanne Churchill; Isaac S Kohane Journal: PLoS One Date: 2013-03-07 Impact factor: 3.240