Chi-Jen Chen1, Neha Warikoo2,3, Yung-Chun Chang1,4,5, Jin-Hua Chen1, Wen-Lian Hsu5,6. 1. Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, Taiwan. 2. Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan. 3. Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. 4. Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan. 5. Pervasive AI Research Labs, Ministry of Science and Technology, Taipei, Taiwan. 6. Institute of Information Science, Academia Sinica, Taipei, Taiwan.
Abstract
OBJECTIVE: In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we developed a novel clinical data representation model entitled medical knowledge-infused convolutional neural network (MKCNN), which is used for learning the clinical trial criteria eligibility status of patients to participate in cohort studies. MATERIALS AND METHODS: In this study, we propose a clinical text representation infused with medical knowledge (MK). First, we isolate the noise from the relevant data using a medically relevant description extractor; then we utilize log-likelihood ratio based weights from selected sentences to highlight "met" and "not-met" knowledge-infused representations in bichannel setting for each instance. The combined medical knowledge-infused representation (MK) from these modules helps identify significant clinical criteria semantics, which in turn renders effective learning when used with a convolutional neural network architecture. RESULTS: MKCNN outperforms other Medical Knowledge (MK) relevant learning architectures by approximately 3%; notably SVM and XGBoost implementations developed in this study. MKCNN scored 86.1% on F1metric, a gain of 6% above the average performance assessed from the submissions for n2c2 task. Although pattern/rule-based methods show a higher average performance for the n2c2 clinical data set, MKCNN significantly improves performance of machine learning implementations for clinical datasets. CONCLUSION: MKCNN scored 86.1% on the F1 score metric. In contrast to many of the rule-based systems introduced during the n2c2 challenge workshop, our system presents a model that heavily draws on machine-based learning. In addition, the MK representations add more value to clinical comprehension and interpretation of natural texts.
OBJECTIVE: In this era of digitized health records, there has been a marked interest in using de-identified patient records for conducting various health related surveys. To assist in this research effort, we developed a novel clinical data representation model entitled medical knowledge-infused convolutional neural network (MKCNN), which is used for learning the clinical trial criteria eligibility status of patients to participate in cohort studies. MATERIALS AND METHODS: In this study, we propose a clinical text representation infused with medical knowledge (MK). First, we isolate the noise from the relevant data using a medically relevant description extractor; then we utilize log-likelihood ratio based weights from selected sentences to highlight "met" and "not-met" knowledge-infused representations in bichannel setting for each instance. The combined medical knowledge-infused representation (MK) from these modules helps identify significant clinical criteria semantics, which in turn renders effective learning when used with a convolutional neural network architecture. RESULTS: MKCNN outperforms other Medical Knowledge (MK) relevant learning architectures by approximately 3%; notably SVM and XGBoost implementations developed in this study. MKCNN scored 86.1% on F1metric, a gain of 6% above the average performance assessed from the submissions for n2c2 task. Although pattern/rule-based methods show a higher average performance for the n2c2 clinical data set, MKCNN significantly improves performance of machine learning implementations for clinical datasets. CONCLUSION: MKCNN scored 86.1% on the F1 score metric. In contrast to many of the rule-based systems introduced during the n2c2 challenge workshop, our system presents a model that heavily draws on machine-based learning. In addition, the MK representations add more value to clinical comprehension and interpretation of natural texts.
Authors: Simon Kocbek; Lawrence Cavedon; David Martinez; Christopher Bain; Chris Mac Manus; Gholamreza Haffari; Ingrid Zukerman; Karin Verspoor Journal: J Biomed Inform Date: 2016-10-11 Impact factor: 6.317
Authors: Ben J Marafino; Jason M Davies; Naomi S Bardach; Mitzi L Dean; R Adams Dudley Journal: J Am Med Inform Assoc Date: 2014-04-30 Impact factor: 4.497
Authors: Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley Journal: Pac Symp Biocomput Date: 2018
Authors: Jodie A Trafton; Susana B Martins; Martha C Michel; Dan Wang; Samson W Tu; David J Clark; Jan Elliott; Brigit Vucic; Steve Balt; Michael E Clark; Charles D Sintek; Jack Rosenberg; Denise Daniels; Mary K Goldstein Journal: Implement Sci Date: 2010-04-12 Impact factor: 7.327
Authors: Thomas H McCoy; Victor M Castro; Andrew Cagan; Ashlee M Roberson; Isaac S Kohane; Roy H Perlis Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240
Authors: Katherine P Liao; Ashwin N Ananthakrishnan; Vishesh Kumar; Zongqi Xia; Andrew Cagan; Vivian S Gainer; Sergey Goryachev; Pei Chen; Guergana K Savova; Denis Agniel; Susanne Churchill; Jaeyoung Lee; Shawn N Murphy; Robert M Plenge; Peter Szolovits; Isaac Kohane; Stanley Y Shaw; Elizabeth W Karlson; Tianxi Cai Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240