Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.

Literature DB >> 34297000

Patient Representation From Structured Electronic Medical Records Based on Embedding Technique: Development and Validation Study.

Yanqun Huang^1,2, Ni Wang^1,2, Zhiqiang Zhang^1,2, Honglei Liu^1,2, Xiaolu Fei³, Lan Wei³, Hui Chen^1,2.

Abstract

BACKGROUND: The secondary use of structured electronic medical record (sEMR) data has become a challenge due to the diversity, sparsity, and high dimensionality of the data representation. Constructing an effective representation for sEMR data is becoming more and more crucial for subsequent data applications.
OBJECTIVE: We aimed to apply the embedding technique used in the natural language processing domain for the sEMR data representation and to explore the feasibility and superiority of the embedding-based feature and patient representations in clinical application.
METHODS: The entire training corpus consisted of records of 104,752 hospitalized patients with 13,757 medical concepts of disease diagnoses, physical examinations and procedures, laboratory tests, medications, etc. Each medical concept was embedded into a 200-dimensional real number vector using the Skip-gram algorithm with some adaptive changes from shuffling the medical concepts in a record 20 times. The average of vectors for all medical concepts in a patient record represented the patient. For embedding-based feature representation evaluation, we used the cosine similarities among the medical concept vectors to capture the latent clinical associations among the medical concepts. We further conducted a clustering analysis on stroke patients to evaluate and compare the embedding-based patient representations. The Hopkins statistic, Silhouette index (SI), and Davies-Bouldin index were used for the unsupervised evaluation, and the precision, recall, and F1 score were used for the supervised evaluation.
RESULTS: The dimension of patient representation was reduced from 13,757 to 200 using the embedding-based representation. The average cosine similarity of the selected disease (subarachnoid hemorrhage) and its 15 clinically relevant medical concepts was 0.973. Stroke patients were clustered into two clusters with the highest SI (0.852). Clustering analyses conducted on patients with the embedding representations showed higher applicability (Hopkins statistic 0.931), higher aggregation (SI 0.862), and lower dispersion (Davies-Bouldin index 0.551) than those conducted on patients with reference representation methods. The clustering solutions for patients with the embedding-based representation achieved the highest F1 scores of 0.944 and 0.717 for two clusters.
CONCLUSIONS: The feature-level embedding-based representations can reflect the potential clinical associations among medical concepts effectively. The patient-level embedding-based representation is easy to use as continuous input to standard machine learning algorithms and can bring performance improvements. It is expected that the embedding-based representation will be helpful in a wide range of secondary uses of sEMR data. ©Yanqun Huang, Ni Wang, Zhiqiang Zhang, Honglei Liu, Xiaolu Fei, Lan Wei, Hui Chen. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.07.2021.

Entities: Chemical Disease Gene Species

Keywords: Skip-gram; electronic medical records; feature representation; patient representation; stroke

Year: 2021 PMID： 34297000 DOI： 10.2196/19905

Source DB: PubMed Journal: JMIR Med Inform

1 in total

1. Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study.

Authors: Yanqun Huang; Zhimin Zheng; Moxuan Ma; Xin Xin; Honglei Liu; Xiaolu Fei; Lan Wei; Hui Chen
Journal: J Med Internet Res Date: 2022-08-03 Impact factor: 7.076

1 in total