Literature DB >> 34297000

Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.

Yanqun Huang1,2, Ni Wang1,2, Zhiqiang Zhang1,2, Honglei Liu1,2, Xiaolu Fei3, Lan Wei3, Hui Chen1,2.   

Abstract

BACKGROUND: The secondary use of structured electronic medical record (sEMR) data has become a challenge due to the diversity, sparsity, and high dimensionality of the data representation. Constructing an effective representation for sEMR data is becoming more and more crucial for subsequent data applications.
OBJECTIVE: We aimed to apply the embedding technique used in the natural language processing domain for the sEMR data representation and to explore the feasibility and superiority of the embedding-based feature and patient representations in clinical application.
METHODS: The entire training corpus consisted of records of 104,752 hospitalized patients with 13,757 medical concepts of disease diagnoses, physical examinations and procedures, laboratory tests, medications, etc. Each medical concept was embedded into a 200-dimensional real number vector using the Skip-gram algorithm with some adaptive changes from shuffling the medical concepts in a record 20 times. The average of vectors for all medical concepts in a patient record represented the patient. For embedding-based feature representation evaluation, we used the cosine similarities among the medical concept vectors to capture the latent clinical associations among the medical concepts. We further conducted a clustering analysis on stroke patients to evaluate and compare the embedding-based patient representations. The Hopkins statistic, Silhouette index (SI), and Davies-Bouldin index were used for the unsupervised evaluation, and the precision, recall, and F1 score were used for the supervised evaluation.
RESULTS: The dimension of patient representation was reduced from 13,757 to 200 using the embedding-based representation. The average cosine similarity of the selected disease (subarachnoid hemorrhage) and its 15 clinically relevant medical concepts was 0.973. Stroke patients were clustered into two clusters with the highest SI (0.852). Clustering analyses conducted on patients with the embedding representations showed higher applicability (Hopkins statistic 0.931), higher aggregation (SI 0.862), and lower dispersion (Davies-Bouldin index 0.551) than those conducted on patients with reference representation methods. The clustering solutions for patients with the embedding-based representation achieved the highest F1 scores of 0.944 and 0.717 for two clusters.
CONCLUSIONS: The feature-level embedding-based representations can reflect the potential clinical associations among medical concepts effectively. The patient-level embedding-based representation is easy to use as continuous input to standard machine learning algorithms and can bring performance improvements. It is expected that the embedding-based representation will be helpful in a wide range of secondary uses of sEMR data. ©Yanqun Huang, Ni Wang, Zhiqiang Zhang, Honglei Liu, Xiaolu Fei, Lan Wei, Hui Chen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.07.2021.

Entities:  

Keywords:  Skip-gram; electronic medical records; feature representation; patient representation; stroke

Year:  2021        PMID: 34297000     DOI: 10.2196/19905

Source DB:  PubMed          Journal:  JMIR Med Inform


  1 in total

1.  Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study.

Authors:  Yanqun Huang; Zhimin Zheng; Moxuan Ma; Xin Xin; Honglei Liu; Xiaolu Fei; Lan Wei; Hui Chen
Journal:  J Med Internet Res       Date:  2022-08-03       Impact factor: 7.076

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.