Yang Xiang1, Jun Xu1, Yuqi Si1, Zhiheng Li1,2, Laila Rasmy1, Yujia Zhou1, Firat Tiryaki1, Fang Li1, Yaoyun Zhang1, Yonghui Wu3, Xiaoqian Jiang1, Wenjin Jim Zheng1, Degui Zhi1, Cui Tao1, Hua Xu4. 1. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA. 2. School of Computer Science and Technology, Dalian University of Technology, Dalian, China. 3. Department of Health Outcomes & Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA. 4. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA. Hua.Xu@uth.tmc.edu.
Abstract
BACKGROUND: Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient's records, which may lead to incorrect selection of contexts. METHODS: To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. RESULTS: Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. CONCLUSIONS: Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.
BACKGROUND: Learning distributional representation of clinical concepts (e.g., diseases, drugs, and labs) is an important research area of deep learning in the medical domain. However, many existing relevant methods do not consider temporal dependencies along the longitudinal sequence of a patient's records, which may lead to incorrect selection of contexts. METHODS: To address this issue, we extended three popular concept embedding learning methods: word2vec, positive pointwise mutual information (PPMI) and FastText, to consider time-sensitive information. We then trained them on a large electronic health records (EHR) database containing about 50 million patients to generate concept embeddings and evaluated them for both intrinsic evaluations focusing on concept similarity measure and an extrinsic evaluation to assess the use of generated concept embeddings in the task of predicting disease onset. RESULTS: Our experiments show that embeddings learned from information within one visit (time window zero) improve performance on the concept similarity measure and the FastText algorithm usually had better performance than the other two algorithms. For the predictive modeling task, the optimal result was achieved by word2vec embeddings with a 30-day sliding window. CONCLUSIONS: Considering time constraints are important in training clinical concept embeddings. We expect they can benefit a series of downstream applications.
Entities:
Keywords:
Clinical concept embedding; Concept similarity; Distributional representation; Electronic medical records; Predictive modeling; Time sensitive concept embedding
Authors: Jennifer E Devoe; Rachel Gold; Patti McIntire; Jon Puro; Susan Chauvie; Charles A Gallia Journal: Ann Fam Med Date: 2011 Jul-Aug Impact factor: 5.166
Authors: Anthony Finch; Alexander Crowell; Mamta Bhatia; Pooja Parameshwarappa; Yung-Chieh Chang; Jose Martinez; Michael Horberg Journal: JAMIA Open Date: 2021-03-16
Authors: Anthony Finch; Alexander Crowell; Yung-Chieh Chang; Pooja Parameshwarappa; Jose Martinez; Michael Horberg Journal: JAMIA Open Date: 2021-08-12