Literature DB >> 33502324

Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.

Yen-Pin Chen1,2,3, Yuan-Hsun Lo4, Feipei Lai1, Chien-Hua Huang3.   

Abstract

BACKGROUND: The electronic health record (EHR) contains a wealth of medical information. An organized EHR can greatly help doctors treat patients. In some cases, only limited patient information is collected to help doctors make treatment decisions. Because EHRs can serve as a reference for this limited information, doctors' treatment capabilities can be enhanced. Natural language processing and deep learning methods can help organize and translate EHR information into medical knowledge and experience.
OBJECTIVE: In this study, we aimed to create a model to extract concept embeddings from EHRs for disease pattern retrieval and further classification tasks.
METHODS: We collected 1,040,989 emergency department visits from the National Taiwan University Hospital Integrated Medical Database and 305,897 samples from the National Hospital and Ambulatory Medical Care Survey Emergency Department data. After data cleansing and preprocessing, the data sets were divided into training, validation, and test sets. We proposed a Transformer-based model to embed EHRs and used Bidirectional Encoder Representations from Transformers (BERT) to extract features from free text and concatenate features with structural data as input to our proposed model. Then, Deep InfoMax (DIM) and Simple Contrastive Learning of Visual Representations (SimCLR) were used for the unsupervised embedding of the disease concept. The pretrained disease concept-embedding model, named EDisease, was further finetuned to adapt to the critical care outcome prediction task. We evaluated the performance of embedding using t-distributed stochastic neighbor embedding (t-SNE) to perform dimension reduction for visualization. The performance of the finetuned predictive model was evaluated against published models using the area under the receiver operating characteristic (AUROC).
RESULTS: The performance of our model on the outcome prediction had the highest AUROC of 0.876. In the ablation study, the use of a smaller data set or fewer unsupervised methods for pretraining deteriorated the prediction performance. The AUROCs were 0.857, 0.870, and 0.868 for the model without pretraining, the model pretrained by only SimCLR, and the model pretrained by only DIM, respectively. On the smaller finetuning set, the AUROC was 0.815 for the proposed model.
CONCLUSIONS: Through contrastive learning methods, disease concepts can be embedded meaningfully. Moreover, these methods can be used for disease retrieval tasks to enhance clinical practice capabilities. The disease concept model is also suitable as a pretrained model for subsequent prediction tasks. ©Yen-Pin Chen, Yuan-Hsun Lo, Feipei Lai, Chien-Hua Huang. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 27.01.2021.

Entities:  

Keywords:  EHR; NLP; concept; deep learning; disease embedding; disease retrieval; electronic health record; emergency department; extraction; machine learning; natural language processing

Mesh:

Year:  2021        PMID: 33502324      PMCID: PMC7875703          DOI: 10.2196/25113

Source DB:  PubMed          Journal:  J Med Internet Res        ISSN: 1438-8871            Impact factor:   5.428


  23 in total

Review 1.  Clinical Decision Support Systems for Triage in the Emergency Department using Intelligent Systems: a Review.

Authors:  Marta Fernandes; Susana M Vieira; Francisca Leite; Carlos Palos; Stan Finkelstein; João M C Sousa
Journal:  Artif Intell Med       Date:  2019-11-17       Impact factor: 5.326

2.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

3.  Semi-supervised learning of the electronic health record for phenotype stratification.

Authors:  Brett K Beaulieu-Jones; Casey S Greene
Journal:  J Biomed Inform       Date:  2016-10-12       Impact factor: 6.317

4.  An Evolutionary Computation Approach for Optimizing Multilevel Data to Predict Patient Outcomes.

Authors:  Sean Barnes; Suchi Saria; Scott Levin
Journal:  J Healthc Eng       Date:  2018-03-18       Impact factor: 2.682

5.  Modified Bidirectional Encoder Representations From Transformers Extractive Summarization Model for Hospital Information Systems Based on Character-Level Tokens (AlphaBERT): Development and Performance Evaluation.

Authors:  Yen-Pin Chen; Yi-Ying Chen; Jr-Jiun Lin; Chien-Hua Huang; Feipei Lai
Journal:  JMIR Med Inform       Date:  2020-04-29

6.  Early death after discharge from emergency departments: analysis of national US insurance claims data.

Authors:  Ziad Obermeyer; Brent Cohn; Michael Wilson; Anupam B Jena; David M Cutler
Journal:  BMJ       Date:  2017-02-01

7.  Test collections for electronic health record-based clinical information retrieval.

Authors:  Yanshan Wang; Andrew Wen; Sijia Liu; William Hersh; Steven Bedrick; Hongfang Liu
Journal:  JAMIA Open       Date:  2019-06-04

8.  BEHRT: Transformer for Electronic Health Records.

Authors:  Yikuan Li; Shishir Rao; José Roberto Ayala Solares; Abdelaali Hassaine; Rema Ramakrishnan; Dexter Canoy; Yajie Zhu; Kazem Rahimi; Gholamreza Salimi-Khorshidi
Journal:  Sci Rep       Date:  2020-04-28       Impact factor: 4.379

9.  Autoencoder as a New Method for Maintaining Data Privacy While Analyzing Videos of Patients With Motor Dysfunction: Proof-of-Concept Study.

Authors:  Marcus D'Souza; Caspar E P Van Munster; Jonas F Dorn; Alexis Dorier; Christian P Kamm; Saskia Steinheimer; Frank Dahlke; Bernard M J Uitdehaag; Ludwig Kappos; Matthew Johnson
Journal:  J Med Internet Res       Date:  2020-05-08       Impact factor: 5.428

10.  Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Authors:  Cao Xiao; Edward Choi; Jimeng Sun
Journal:  J Am Med Inform Assoc       Date:  2018-10-01       Impact factor: 4.497

View more
  1 in total

1.  Predictive value of neutrophil-to-lymphocyte ratio for the fatality of COVID-19 patients complicated with cardiovascular diseases and/or risk factors.

Authors:  Akinori Higaki; Hideki Okayama; Yoshito Homma; Takahide Sano; Takeshi Kitai; Taishi Yonetsu; Sho Torii; Shun Kohsaka; Shunsuke Kuroda; Koichi Node; Yuya Matsue; Shingo Matsumoto
Journal:  Sci Rep       Date:  2022-08-10       Impact factor: 4.996

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.