Qiong Wang1, Zongcheng Ji2, Jingqi Wang2, Stephen Wu2, Weiyan Lin3, Wenzhen Li3, Li Ke3, Guohong Xiao3, Qing Jiang4, Hua Xu2, Yi Zhou5. 1. Biomedical Engineering School, Sun-yet San University, Guangzhou, China; The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. 2. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030 USA. 3. The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China. 4. Biomedical Engineering School, Sun-yet San University, Guangzhou, China. 5. Department of Biomedical Engineering, Zhongshan School of Medicine, Sun Yat - San University, Guangzhou, China. Electronic address: zhouyi@mail.sysu.edu.cn.
Abstract
OBJECTIVE: This study aims to develop and evaluate effective methods that can normalize diagnosis and procedure terms written by physicians to standard concepts in International Classification of Diseases(ICD) in Chinese, with the goal to facilitate automated medical coding in China. METHODS: We applied the entity-linking framework to normalize Chinese diagnosis and procedure terms, which consists of two steps - candidate concept generation and candidate concept ranking. For candidate concept generation, we implemented both the traditional BM25 algorithm and an extended version that integrates a synonym knowledgebase. For candidate concept ranking, we investigated a number of different algorithms: (1) the BM25 algorithm, (2) ranking support vector machines (RankSVM), (3) a previously reported Convolutional Neural Network (CNN) approach, (4) 11 deep ranking-based methods from the MatchZoo toolkit, and (5) a new BERT (Bidirectional Encoder Representations from Transformers) based ranking method. Using two manually annotated datasets (8,547 diagnoses and 8,282 procedures) collected from a Tier 3A hospital in China, we evaluated above methods and reported their performance (i.e., accuracy) at different cutoffs. RESULTS: The coverage of candidate concept generation was greatly improved after integrating the synonym knowledgebase, achieving 97.9% for diagnoses and 93.4% for procedures respectively. Overall the new BERT-based ranking method achieved the best performance on both diagnosis and procedure normalization, with the best accuracy of 92.1% for diagnosis and 80.1% for procedure, when the top one concept and exact match criteria were used. CONCLUSIONS: This study developed and compared diverse entity-linking methods to normalize clinical terms in Chinese and our evaluation shows good performance on mapping disease terms to ICD codes, demonstrating the feasibility of automated encoding of clinical terms in Chinese.
OBJECTIVE: This study aims to develop and evaluate effective methods that can normalize diagnosis and procedure terms written by physicians to standard concepts in International Classification of Diseases(ICD) in Chinese, with the goal to facilitate automated medical coding in China. METHODS: We applied the entity-linking framework to normalize Chinese diagnosis and procedure terms, which consists of two steps - candidate concept generation and candidate concept ranking. For candidate concept generation, we implemented both the traditional BM25 algorithm and an extended version that integrates a synonym knowledgebase. For candidate concept ranking, we investigated a number of different algorithms: (1) the BM25 algorithm, (2) ranking support vector machines (RankSVM), (3) a previously reported Convolutional Neural Network (CNN) approach, (4) 11 deep ranking-based methods from the MatchZoo toolkit, and (5) a new BERT (Bidirectional Encoder Representations from Transformers) based ranking method. Using two manually annotated datasets (8,547 diagnoses and 8,282 procedures) collected from a Tier 3A hospital in China, we evaluated above methods and reported their performance (i.e., accuracy) at different cutoffs. RESULTS: The coverage of candidate concept generation was greatly improved after integrating the synonym knowledgebase, achieving 97.9% for diagnoses and 93.4% for procedures respectively. Overall the new BERT-based ranking method achieved the best performance on both diagnosis and procedure normalization, with the best accuracy of 92.1% for diagnosis and 80.1% for procedure, when the top one concept and exact match criteria were used. CONCLUSIONS: This study developed and compared diverse entity-linking methods to normalize clinical terms in Chinese and our evaluation shows good performance on mapping disease terms to ICD codes, demonstrating the feasibility of automated encoding of clinical terms in Chinese.