Literature DB >> 34017034

Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

Laila Rasmy1, Yang Xiang2, Ziqian Xie1, Cui Tao1, Degui Zhi3.   

Abstract

Deep learning (DL)-based predictive models from electronic health records (EHRs) deliver impressive performance in many clinical tasks. Large training cohorts, however, are often required by these models to achieve high accuracy, hindering the adoption of DL-based models in scenarios with limited training data. Recently, bidirectional encoder representations from transformers (BERT) and related models have achieved tremendous successes in the natural language processing domain. The pretraining of BERT on a very large training corpus generates contextualized embeddings that can boost the performance of models trained on smaller datasets. Inspired by BERT, we propose Med-BERT, which adapts the BERT framework originally developed for the text domain to the structured EHR domain. Med-BERT is a contextualized embedding model pretrained on a structured EHR dataset of 28,490,650 patients. Fine-tuning experiments showed that Med-BERT substantially improves the prediction accuracy, boosting the area under the receiver operating characteristics curve (AUC) by 1.21-6.14% in two disease prediction tasks from two clinical databases. In particular, pretrained Med-BERT obtains promising performances on tasks with small fine-tuning training sets and can boost the AUC by more than 20% or obtain an AUC as high as a model trained on a training set ten times larger, compared with deep learning models without Med-BERT. We believe that Med-BERT will benefit disease prediction studies with small local training datasets, reduce data collection expenses, and accelerate the pace of artificial intelligence aided healthcare.

Entities:  

Year:  2021        PMID: 34017034     DOI: 10.1038/s41746-021-00455-y

Source DB:  PubMed          Journal:  NPJ Digit Med        ISSN: 2398-6352


  2 in total

1.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors:  Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal:  Pac Symp Biocomput       Date:  2020

2.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  2 in total
  25 in total

Review 1.  Shifting machine learning for healthcare from development to deployment and from models to data.

Authors:  Angela Zhang; Lei Xing; James Zou; Joseph C Wu
Journal:  Nat Biomed Eng       Date:  2022-07-04       Impact factor: 25.671

2.  Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.

Authors:  Renzo M Rivera-Zavala; Paloma Martínez
Journal:  BMC Bioinformatics       Date:  2021-12-17       Impact factor: 3.169

3.  Word embeddings trained on published case reports are lightweight, effective for clinical tasks, and free of protected health information.

Authors:  Zachary N Flamholz; Andrew Crane-Droesch; Lyle H Ungar; Gary E Weissman
Journal:  J Biomed Inform       Date:  2021-12-14       Impact factor: 6.317

Review 4.  Advances in Machine Learning Approaches to Heart Failure with Preserved Ejection Fraction.

Authors:  Faraz S Ahmad; Yuan Luo; Ramsey M Wehbe; James D Thomas; Sanjiv J Shah
Journal:  Heart Fail Clin       Date:  2022-03-04       Impact factor: 3.179

Review 5.  Overview of Noninterpretive Artificial Intelligence Models for Safety, Quality, Workflow, and Education Applications in Radiology Practice.

Authors:  Yasasvi Tadavarthi; Valeria Makeeva; William Wagstaff; Henry Zhan; Anna Podlasek; Neil Bhatia; Marta Heilbrun; Elizabeth Krupinski; Nabile Safdar; Imon Banerjee; Judy Gichoya; Hari Trivedi
Journal:  Radiol Artif Intell       Date:  2022-02-02

6.  Identification of Uncontrolled Symptoms in Cancer Patients Using Natural Language Processing.

Authors:  Lisa DiMartino; Thomas Miano; Kathryn Wessell; Buck Bohac; Laura C Hanson
Journal:  J Pain Symptom Manage       Date:  2021-11-04       Impact factor: 3.612

7.  AdaDiag: Adversarial Domain Adaptation of Diagnostic Prediction with Clinical Event Sequences.

Authors:  Tianran Zhang; Muhao Chen; Alex A T Bui
Journal:  J Biomed Inform       Date:  2022-08-17       Impact factor: 8.000

8.  Using machine learning to predict subsequent events after EMS non-conveyance decisions.

Authors:  Jani Paulin; Akseli Reunamo; Jouni Kurola; Hans Moen; Sanna Salanterä; Heikki Riihimäki; Tero Vesanen; Mari Koivisto; Timo Iirola
Journal:  BMC Med Inform Decis Mak       Date:  2022-06-23       Impact factor: 3.298

9.  Preparing for the next pandemic via transfer learning from existing diseases with hierarchical multi-modal BERT: a study on COVID-19 outcome prediction.

Authors:  Khushbu Agarwal; Sutanay Choudhury; Sindhu Tipirneni; Pritam Mukherjee; Colby Ham; Suzanne Tamang; Matthew Baker; Siyi Tang; Veysel Kocaman; Olivier Gevaert; Robert Rallo; Chandan K Reddy
Journal:  Sci Rep       Date:  2022-06-24       Impact factor: 4.996

10.  Natural Language Processing Enhances Prediction of Functional Outcome After Acute Ischemic Stroke.

Authors:  Sheng-Feng Sung; Chih-Hao Chen; Ru-Chiou Pan; Ya-Han Hu; Jiann-Shing Jeng
Journal:  J Am Heart Assoc       Date:  2021-11-19       Impact factor: 6.106

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.