Literature DB >> 29893864

Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review.

Cao Xiao1, Edward Choi2, Jimeng Sun2.   

Abstract

Objective: To conduct a systematic review of deep learning models for electronic health record (EHR) data, and illustrate various deep learning architectures for analyzing different data sources and their target applications. We also highlight ongoing research and identify open challenges in building deep learning models of EHRs. Design/method: We searched PubMed and Google Scholar for papers on deep learning studies using EHR data published between January 1, 2010, and January 31, 2018. We summarize them according to these axes: types of analytics tasks, types of deep learning model architectures, special challenges arising from health data and tasks and their potential solutions, as well as evaluation strategies.
Results: We surveyed and analyzed multiple aspects of the 98 articles we found and identified the following analytics tasks: disease detection/classification, sequential prediction of clinical events, concept embedding, data augmentation, and EHR data privacy. We then studied how deep architectures were applied to these tasks. We also discussed some special challenges arising from modeling EHR data and reviewed a few popular approaches. Finally, we summarized how performance evaluations were conducted for each task. Discussion: Despite the early success in using deep learning for health analytics applications, there still exist a number of issues to be addressed. We discuss them in detail including data and label availability, the interpretability and transparency of the model, and ease of deployment.

Entities:  

Year:  2018        PMID: 29893864      PMCID: PMC6188527          DOI: 10.1093/jamia/ocy068

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


INTRODUCTION

Electronic health record (EHR) data from millions of patients are now routinely collected across diverse healthcare institutions. They consist of heterogeneous data elements, including patient demographic information, diagnoses, laboratory test results, medication prescriptions, clinical notes, and medical images. However, it is challenging to create accurate analytic models from EHR data, because of data quality, data and label availability, and heterogeneity of data types. Traditional health analytics modeling often depends on labor intensive efforts, such as expert-defined phenotyping and ad-hoc feature engineering. The resulting models often have limited generalizability across datasets or institutions. Deep learning has had a profound impact in many data analytic applications, such as speech recognition, image classification, computer vision, and natural language processing. It has changed the data analytic modeling paradigm from expert-driven feature engineering to data-driven feature construction. Over the past few years, an increasing body of literature confirmed the success of feature construction using deep learning methods (ie., models with multiple layers of neural networks). Interest in deep learning for healthcare has grown for two reasons. First, for healthcare researchers, deep learning models yield better performance in many tasks than traditional machine learning methods and require less manual feature engineering. Second, large and complex datasets (eg., longitudinal event sequences and continuous monitoring data) are available in healthcare and enable training of complex deep learning models. However EHR data also introduce many interesting modeling challenges for deep learning research. This review summarizes the recent development of deep learning models for EHR data and suggests future research directions.

METHOD

Literature selection

We conducted a systematic review of deep learning studies using EHR [or electronic medical records (EMR)] data from PubMed and Google Scholar. The combined search includes, but is not limited to, Journal of American Medical Informatics Association (JAMIA), Journal of American Biomedical Informatics (JBI), Nature Scientific Reports, PLoS One, ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Neural Information Processing Systems (NIPS) and the Machine Learning for Health Care (MLHC) conference. We searched using the combinations of keywords from “deep learning,” “neural networks,” “EHR,” “EMR,” and “health.” We limited our search to recent papers published between January 1, 2010, and January 30, 2018, and found total 361 articles. We filtered the initial result set in three steps. First, we removed duplicate articles based on titles and authors. After deduplication, we identified 290 articles. Second, we conducted a topic relevance review of these articles by examining titles and abstracts. For the relevance evaluation, we used the following criteria: since we focus on deep learning models that use EHR data, we excluded works that do not utilize deep learning approaches or did not use EHR data. We include a small number of articles related to medical imaging or genetic data if such data were used in combination with EHR. For example, deep learning for imaging classification for healthcare such as, and predicting the effects of gene expression mutations such as and are out of scope of this review. Readers who are interested in those topics could refer to the surveys. The topic relevance review based on titles and abstracts left 182 remaining articles (159 studies about traditional EHR data, and 23 studies that use medical images and genetic data in addition to EHR data). In the third step, we read the full text of the remaining articles using the same inclusion criteria to confirm the final relevancy of these articles. This left 98 articles to be included in this survey. The literature selection procedure is illustrated and described in Figure 1.
Figure 1.

Illustration of literature search and selection procedure.

Illustration of literature search and selection procedure.

Assessment focuses

We summarize the basic information of the selected papers in Supplementary Table S1. For each paper, we evaluated three aspects: 1) the category of the venue (eg., medical, informatics, computer science journal, or conference), 2) use of EHR data, and 3) target task, model, and performance. For the use of EHR data, we assessed the sample size, number of clinical events, the existence of labels (ie., the availability of gold standard targets of interest, such as mortality and target disease diagnosis), use of longitudinal or temporal information, handling of data quality (eg., missing or irregularly sampled data). We divided target tasks into the following categories: disease detection, sequential prediction of clinical events, concept embedding, data augmentation, and EHR data privacy. Finally, we identified the type of deep learning models used in the articles [eg., recurrent neural networks (RNN) or convolutional neural networks (CNN)] and the corresponding performance results [eg., area under the receiver operating characteristic curve (AUC) = 0.8]. We summarized the modeling challenges and solutions from the reviewed articles into four categories of modeling challenges and possible solutions provided by existing work. Likewise, we generalized several open challenges that could become promising directions for future research. We present the challenges and solutions for each article in Supplementary Table S2.

Task categories

After reviewing the selected articles, we identify five categories of analytics tasks: Disease detection/classification refers to the tasks of detecting whether specific diseases can be confirmed in the EHR data. Sequential prediction of clinical events refers to predicting future clinical events based on past longitudinal event sequences. Concept embedding is algorithmically deriving feature representation of clinical concepts or phenotypes from EHR data. Data augmentation is creating realistic data elements or patient records based on real EHR data. EHR data privacy refers to the techniques that protect patient EHR privacy and confidentiality, eg., de-identification. The chosen analytics tasks balance the following priorities: 1) they are supported by the EHR data, 2) they correspond to diverse machine learning problems, and 3) they are motivated by important clinical problems, such as phenotyping complex diseases, prediction of disease onset, and readmission.

RESULTS

We included 98 articles for full-text review. Of these, two studies were published in medical journals, 40 in medical informatics venues, and 56 in computer science venues. While detailed information for all papers is provided in Supplementary Table S1, a brief summary is provided here. The summary is structured as follows: first we describe the analytics tasks and the associated EHR data. Second, we examine the tasks for several commonly used deep learning architectures. Third, we discuss special challenges rising from modeling EHR data with deep learning, and present the approaches used in the reviewed articles. Last, we discuss the evaluation of these tasks. Distributions of models over analytic tasks

Analytics tasks using EHR data

Disease classification

The goal of developing a deep learning model for disease classification is to map the input EHR data to the output disease target via multiple layers of neural networks. Of the surveyed articles, some used disease-specific datasets. Examples include the Pooled Resource Open-Access Amyotrophic Lateral Sclerosis (ALS) Clinical Trials data used in and the Parkinson’s Progression Markers Initiative data used in. Some studies include data from multiple modalities (eg., cognitive assessments, vital signs, medical images), and support both binary classification (eg., onset of disease,) and multi-class classification (eg., classification of stages of Parkinson’s disease) Besides disease-specific multimodal data, some studies used multivariate time series data. For instance, applied convolutional neural networks on multivariate encephalogram (EEG) signals for automated classification of normal, preictal, and seizure subjects. In, a long short-term memory model (LSTM) was developed using vital sign series from the Medical Information Mart for Intensive Care III (MIMIC III) for sepsis detection. Automatic coding of clinical notes according to diagnosis or disease codes is another type of multilabel classification task. In, clinical documents from the MIMIC III dataset were automatically tagged with related diagnosis codes using the hierarchical attention bidirectional gated recurrent unit (GRU) model. In, an interpretable model based on convolution plus attention model architecture was introduced to provide an explanation to the classification from clinical notes to diagnosis codes. In and, deep feedforward neural networks and convolutional neural networks were applied, respectively, to free-text pathology reports to automate the extraction of the primary cancer sites and their laterality.

Sequential prediction of clinical events

When modeling longitudinal EHR data, neural networks were used to establish relationships between historical observations and future events. In such cases, one can build predictive models of future events (eg., clinical outcome such as mortality) based on a patient’s history. In the reviewed articles, some were conducted to predict the future onset of a new disease condition such as heart failure (HF) onset prediction using RNN on longitudinal outpatient data from Sutter Health. In, using a cohort of 1328, 384 patients (3 295 775 visits) from the New Zealand National Minimum Dataset, the deep feedforward neural network was shown to have the best AUC performance (AUC = 0.734) in predicting next hospital admission. In, the authors used 114 003 patient records from University of California, San Francisco (UCSF), from 2012 to 2016, and the University of Chicago Medicine (UCM) from 2009 to 2016 for prediction tasks. They tried three deep learning models: one based on recurrent neural networks, one on an attention-based time-aware neural network model, and one on a neural network with boosted time-based decision stumps. They discovered that deep learning methods were capable of accurately predicting multiple medical events (eg., the prediction of in-hospital mortality, readmission, length of stay, and discharge diagnoses) from multiple centers without site-specific data harmonization. In addition, a large number of articles performed multilabel sequential prediction of clinical events using EHR data from a large number of patients. Multilabel prediction means that each patient can have multiple target labels co-occur at the same visit (eg., multiple diagnoses in one visit). For instance, in, encounter records (eg., diagnosis codes, medication codes, or procedure codes) of 263 706 patients from Sutter Health were used as input to a RNN model to predict (all) the diagnosis categories for a subsequent visit. Besides predicting disease diagnoses or hospital admissions, several studies formulated medication prescription as a sequential prediction problem. For instance, in, 610 076 patient records from Vanderbilt’s Electronic Medical Record were used to perform sequential prediction of medications. Later, used 50 206 medical encounter records from MIMIC III and 2 415 414 medical encounters from Sutter Health to provide treatment recommendations using a sequence-to-sequence model to present the relationship between comorbid conditions and a set of medications.

Concept embedding

It is noteworthy that clinical phenotyping is a special case of concept embedding where various EHR data elements are mapped to the phenotype of interest. However, general concept embedding also provides feature representation of those phenotypes (ie., a vector associated with each phenotype), such as med2vec. For concept embedding tasks, deep learning models are often trained in an unsupervised setting without target labels. To ensure good generalization power, these tasks often leverage massive EHR databases. For example, the aggregated EHRs of about 700 000 patients from the Mount Sinai data warehouse were used to extract patient representation (embedding). The resulting concept embedding was evaluated via disease prediction tasks and compared against other well-known shallow feature learning algorithms, such as principal component analysis, k-means clustering, and the Gaussian mixture model. Results showed disease prediction tasks based on concept embedding outperformed those achieved using other feature learning strategies. In, concept embedding was learned from the data of 550 339 patients from Children’s Healthcare of Atlanta (CHOA) and demonstrated improved performance in multiple real-world prediction problems. Other types of concept embedding take only free-text as input, eg.,, to extract pre-defined medical concepts from discharge summaries from MIMIC III data and use them to predict patient phenotypes. However, deep learning models do not always outperform traditional models, as compared deep models with shallow models (eg., random forest) using classification tasks on clinical notes and discovered that when training sample size is small (eg., 662 total subjects in this case), deep learning shows inferior performance.

Data augmentation

Data augmentation includes various data synthesis and generation techniques that create either more training data to avoid overfitting or more labeled data to reduce the cost of label acquisition,, or even generating adverse drug reaction trajectories to inform potential risks. For example, in, patients from the Columbia University Irving Medical Center/New York Presbyterian database who were exposed to HMG-CoA reductase inhibitors or statins at any point in time were included. Their total cholesterol measurements were collected, and were augmented by the Generative Adversarial Networks (GAN). The generated records were evaluated using prediction of drug-induced laboratory test trajectories tasks and demonstrated good performance. In, GAN was used to generate static patient records of discrete events such as diagnosis counts. The synthetic data achieved comparable performance to real data on many experiments, including distribution statistics, predictive modeling tasks, and medical expert review.

EHR data privacy

De-identification is a crucial task in preserving privacy of patient EHR data. Dernoncourt et al. built a RNN based de-identification system and evaluated their system using i2b2 2014 data (1304 notes with a 46 803 word vocabulary) and MIMIC de-identification data (1635 notes with a 69 525 word vocabulary) and showed better performance using RNN than existing systems. Later in, a RNN hybrid model was developed for clinical notes de-identification where a bidirectional LSTM model was deployed for character-level representation to capture the morphological information of words.

Deep learning architectures for analytics tasks

Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. This has dramatically improved machine learning performance in many domains, such as computer vision, natural language processing, and speech recognition, and has also demonstrated great performance in healthcare and medical domains, such as using deep neural networks to detect referable diabetic retinopathy. Various deep learning architectures besides fully connected neural networks were used to tackle different challenges as elaborated below. Figure 2 illustrates commonly used deep architectures. Table 1 shows the architecture distribution over all tasks.
Figure 2.

Transform longitudinal EHR data into input vectors (top left), which could support different analytics tasks described in the survey (top right). The underlying deep learning models are visually described at the bottom (a): Feedforward neural networks use multiple layers of fully connected neural networks and non-linear activations (eg., sigmoid or rectified linear unit). (b): Recurrent neural networks can process variable-length input sequence using its recurrent connection. (c): Restricted Boltzmann Machines are bipartite neural networks that consist of binary stochastic nodes. They can capture the latent representation of the input data by learning their generative probability. (d): Generative adversarial networks can generate realistic synthetic samples by training the generator and the discriminator in an adversarial game. (e): Convolutional neural networks capture local features of the input data, and stack those features up via a sequence of convolution to derive global features. (f): Word2vec exploits the co-occurrence information of discrete concepts (eg., words in text, codes in EHR data) to derive concept representations. (g): Denoising autoencoders (AE) try to reconstruct original input from its corrupted version, thus learning robust representations of the input data.

Table 1.

Distributions of models over analytic tasks

Disease Detection or ClassificationSequential Prediction of Clinical EventsConcept EmbeddingData AugmentationEHR Privacy
RNN and its variants[13, 20, 41–53][23, 26–28, 42, 48–50, 54–56, 57, 45, 58–62, 41, 25, 45][11, 14, 63–66][67][36,37]
CNN and its variants[12, 15, 20, 68, 51, 69,70][71,72, 57, 73][31, 74, 22, 75–77]NANA
AE and its variants[78–81]NA[10, 30, 63, 82–87, 11, 30, 88][53, 89,90, 86]NA
Unsupervised embedding[91–93][21, 24, 70, 91, 94][29, 32, 95, 96, 85, 97]NA[36]
GANsNA[35]NA[33, 35, 98, 89, 56, 98][34]
Transform longitudinal EHR data into input vectors (top left), which could support different analytics tasks described in the survey (top right). The underlying deep learning models are visually described at the bottom (a): Feedforward neural networks use multiple layers of fully connected neural networks and non-linear activations (eg., sigmoid or rectified linear unit). (b): Recurrent neural networks can process variable-length input sequence using its recurrent connection. (c): Restricted Boltzmann Machines are bipartite neural networks that consist of binary stochastic nodes. They can capture the latent representation of the input data by learning their generative probability. (d): Generative adversarial networks can generate realistic synthetic samples by training the generator and the discriminator in an adversarial game. (e): Convolutional neural networks capture local features of the input data, and stack those features up via a sequence of convolution to derive global features. (f): Word2vec exploits the co-occurrence information of discrete concepts (eg., words in text, codes in EHR data) to derive concept representations. (g): Denoising autoencoders (AE) try to reconstruct original input from its corrupted version, thus learning robust representations of the input data.

Recurrent neural networks (RNNs)

RNNs are an extension of feedforward neural networks to model sequential data, such as time series, event sequences and natural language text. In particular, the recurrent structure in RNN can capture the complex temporal dynamics in the longitudinal EHR data, thus making them the preferred architecture for several EHR modeling tasks, including sequential clinical event prediction,,,,,, disease classification,,, and computational phenotyping.,, The hidden states of the RNN work as its memory, since the current state of the hidden layer depends on the previous state of the hidden layer and the input at the current time. This also enables the RNN to handle variable-length sequence input. Two prominent RNN variants with gating mechanisms are widely used: the LSTM unit, and the GRU. They are designed to overcome the vanishing gradient problem as well as capture the effect of long-term dependencies.

Autoencoders (AEs)

AEs are an unsupervised dimensionality reduction model via non-linear transformation. For medical concept embedding (eg., embed different medical codes in a common space), AEs are a preferred family of models.,,,,, An AE [see Figure 2(e)] maps inputs to an internal code representation through an encoder, and then maps the low-dimensional representation back to the input space through a decoder. The composition of encoder and decoder is called the reconstruction function. A typical implementation of the AE minimizes the reconstruction loss, thus allowing AEs to focus on capturing essential properties of the data, while reducing the dimension size. In, AEs were used to model EHRs in an unsupervised manner to capture stable structures and regular patterns in the data. Sparse AE (SAE) and denoising AE (DAE) are two AE variants. For SAE, the reconstruction loss is regularized via a sparsity penalty on internal code representation, so that the model will learn sparse representation. SAE has often been used for unsupervised EHR phenotyping or sparse EEG feature representation.,, For DAE, the reconstruction is based on randomly corrupted inputs, through which the model gains robustness against missing data or noise. DAE has been used for learning robust representations of human physiology,,, deriving robust patient representation from EHRs, or extracting EHR phenotypes that can be paired with genetic data to identify disease-gene associations.

CNNs

In image, speech, and video analysis, CNNs exploit local properties of data (stationarity and the compositionality through local statistics) and utilize convolutional and pooling layers to progressively extract abstract patterns. For example, CNNs greatly improved the performance of automatic classification of skin lesions from image data. CNNs work as follows: the convolutional layers connect multiple local filters with their input data (raw data or outputs of previous layers) and produce translation invariant local features. Then, pooling layers progressively reduce the size of the output to avoid overfitting. Here, both convolution and pooling are locally performed, such that (in image analysis) the representation for one local feature will not influence other regions. As temporal EHR information is often informative, modeling it with CNNs requires considering how to capture temporality. For example, in,, an additional convolutional operation was conducted over the temporal dimension. In, a hybrid convolutional recurrent neural network for joint feature extraction and temporal summarization was used. Besides modeling images and event sequences, CNNs have been used to label clinical text.,

Unsupervised embedding

Several other unsupervised learning methods besides AEs have been applied to EHR concept representations. Word2vec variants have been applied to learn representation for medical codes., In particular, word2vec has been extended to create two-level representation for medical codes and clinical visits jointly. Word2vec has two variants: the continuous bag of words (CBOW) that predicts target (codes) given surrounding contexts, and the Skip-gram that predicts surrounding contexts given target (codes). The goal of these models is to embed terminologies from different domains into the same space to discover the relations among them (eg., relationships between diseases and drugs). In addition, a Restricted Boltzmann Machine (RBM) has been used for latent concept embedding., It uses a generative approach to model the underlying data generation process of the input, which can also provide latent representations for EHR data.

Generative adversarial network (GAN)

GAN is an approach for data generation via a game-theoretical process. The main idea is to train two neural networks: a generator and a discriminator. The generator takes random noise as input and generates samples, while the discriminator takes both real samples and the generated samples as input and tries to distinguish between the two. The two networks are trained alternatively, with the expectation that the competition will drive the generator to produce more realistic samples and the discriminator to achieve greater distinguishing power. Recently, GAN has been used in the healthcare domain for generating continuous medical time series and discrete codes.

Special challenges and possible solutions

Special challenges arise from EHR data (eg., temporality, irregularity, multiple modalities, lack of label) and model characteristics (eg., interpretability). In this section, we elaborate on those challenges and describe possible solutions from the surveyed articles. The detailed summary can be found in Supplementary Table S2.

Temporality and irregularity

Longitudinal EHR data describes the trajectories of patients’ health conditions over time. The short-term dependencies among medical events in EHRs were considered as local context for patient history and the long-term effects provided global context. Such contexts impact the hidden relations among the clinical variables (eg., diagnoses, procedures, medications, etc.) and future patient health outcomes (ie., disease or readmission). However, it is challenging to identify the true signals from the long-term context due to the complex associations among the clinical events.,,,, In addition, some found patient records vary significantly in terms of data density, since events are irregularly sampled.,, Such irregularity, if not properly handled, would affect the model performance.

Gated architecture

LSTM or GRUs units are the preferred choice to solve the challenge of extracting informative long-term context due to their abilities to handle long-term dependencies using gated structures.,, are examples in which LSTMs or GRUs were applied to model long-term dependencies between clinical events and to make predictions. In, LSTM was used to find long-term dependencies of codes in discharge notes.

Strategies for irregularity

To solve the challenge of time irregularity, several strategies were proposed. borrowed the idea of dynamic time warping, an algorithm measuring similarity between two varying speed temporal sequences, and modeled it into the gate parameters of 2D-GRU, thus aligning EHR sequences pairwise. proposed to learn a subspace decomposition of the LSTM memory, thus discounting the effect of the memory according to the elapsed time.

Multi-modality

EHR data encompass multiple data modalities, including numeric values such as lab tests, free-text clinical notes, continuous monitoring data, such as electrocardiography (ECG) and electroencephalography (EEG), medical images and discrete codes for diagnosis, medication, and procedures. Researchers have confirmed that finding patterns among multimodal data can increase the accuracy of diagnosis, prediction, and overall performance of the learning system. However, multimodal learning is challenging due to the heterogeneity of the data. Existing work often took a multitask learning approach to jointly learn data across multiple modalities.,

Multitask learning

Multi-modal EHR learning often utilizes a strategy that requires certain neurons in the neural network model to be shared among all tasks, and certain neurons to be specialized for specific tasks., The tasks could be different types of lab tests or data modalities., For example, in, the authors took a multitask learning approach to jointly model the prediction tasks based on two data modalities: medical codes and natural language text from clinical notes, and empirically demonstrated improved performance. In, each modality, composed of observed counts, is represented as a Poisson distribution, parameterized in terms of hidden binary units. Information from different modalities was then shared via a feedforward network of common hidden units.

Lack of labels

In our setting, labels refer to the gold standard target of interest, such as true states of clinical outcomes or the true disease phenotypes. Gold standard labels are often not consistently captured in EHR data and are thus typically unavailable in large numbers for training models. Identifying effective ways to label EHR records is one of the biggest obstacles to deep learning on EHR data. Label acquisition requires domain knowledge, often involving highly trained domain experts. In practice, a “silver standard” is often adopted. For example, in this survey, in most articles that took a supervised learning approaches, patient labels were derived based on the occurrences of codes, such as diagnosis, procedure, and medication codes. Other than manually crafting labels, transfer learning could offer alternative approaches.

Transfer learning

Some articles attempt to label EHR data implicitly. For example, used LSTM to model sequences of diagnostic codes, a proxy problem for disease progression, and showed that the learned knowledge could be transferred to new datasets for the same task. In, an autoencoder variant architecture was applied to perform transfer learning from generic EHR to predict a specific target, such as inferring prescriptions from diagnostic codes.

Interpretability

Although deep learning models can produce accurate predictions, they are often treated as black-box models that lack interpretability and transparency of their inner working. This is an important problem because clinicians often are unwilling to accept machine recommendations without clarity as to the underlying reasoning. Recently, there have been some efforts to explain black-box deep models. Below we list several approaches from the reviewed articles to enhancing interpretability or transparency in EHR modeling.

Attention mechanism

The attention-mechanism-based learning is a recent trend,,, for understanding what part of historical information weighs more in predicting disease onset or future events. The original attention mechanism proposed in aims at improving the performance of neural machine translation. When introduced to EHR modeling, attention weights indicate the degree to which clinical events the model can predict disease onsets or future events., The attention mechanism is also used to derive a latent representation of medical codes (eg., diagnosis codes, medication codes).

Knowledge injection via attention

Biomedical ontology is a major source of biomedical knowledge that has been jointly modeled with the attention mechanism to add interpretability and model robustness. In, this is achieved by learning the latent embedding of a clinical code (eg., diagnosis code) as a convex combination of the embeddings of the code itself and its ancestors on the ontology graph.

Knowledge distillation

Knowledge distillation compresses the knowledge learned from a complex model into a simpler model that is much easier to deploy. The recent development of mimic learning/knowledge distillation has provided a way of transferring information from a complex model (eg., a deep neural network) to a simpler model (eg., a decision tree). There are recent attempts to apply mimic learning to the healthcare domain in order to enhance interpretability of deep models via boosting trees., The main idea is to use the complex model to generate more soft-labeled examples to train a simpler model.

Evaluation of analytics tasks

For supervised models, evaluation was often done directly on the learning task via quantitative metrics, such as accuracy and AUC. For unsupervised models, evaluation was often indirectly done using separate prediction tasks., Popular evaluation metrics for binary prediction or classification include AUC, the area under the precision-recall curve (PRAUC), and the F1 score. For multiclass prediction or classification, micro-F1 and macro-F1 scores are popular choices. In addition, some also use mean squared error for performance evaluation. Performance details are summarized in Supplementary Table S1.

DISCUSSION

In this review, we provided an overview of the current deep learning models for EHR data. Results from the reviewed articles have shown that as compared to other machine learning approaches, deep learning models excel in modeling raw data, minimizing the need for preprocessing and feature engineering, and significantly improving performance in many analytical tasks. It is noteworthy that deep learning models are ideal tools for recognizing diseases or predicting clinical events or outcomes (eg., mortality or treatment response) given time series data such as EEG or biosignals from ICU,, or imaging data., However, although deep learning techniques have shown promising results in performing many analytics tasks, several open challenges remain. First, despite various attempts, there is still a significant need to improve the quality of generated data and labels. For data augmentation, current challenges include: 1) generated data lack variety; 2) data generation is often conducted under supervision, making the generated data biased toward the prediction task; and 3) there is a need for more accurate quantitative measures to evaluate the utility and privacy preservation of the generated data. Challenges arise for transfer learning of data and labels from the fact that deep models often do not explicitly capture uncertainties. This makes the models less robust in handling changes in underlying data distribution. Thus, there is risk of deploying models in which the real EHR data could invalidate the models’ future predictions. This could be a significant risk, especially in the healthcare setting. General methods have attempted to solve these challenges. These include better calibration of uncertainties and adversarial learning with relaxing the shared label space assumption. However, this is still an open area for deep learning on EHR data. Moreover, regarding the interpretability and transparency of the model, current efforts (eg., attention mechanism, visualization, explanation by examples) often attempt to explain the predictions. However, to bring deep models built from EHR data into real use, users often need to understand the mechanisms by which models operate. Such a level of model transparency is still challenging to achieve. Last, for direct clinical impact, deployment and automation of deep learning models must be considered. For instance, large amounts of EHR data are processed to create standardized inputs to train deep models. The difficulty of obtaining large EHR datasets needs to be dealt with in order for deep EHR models to be integrated into actual EHR systems.

Funding

This work was supported by the National Science Foundation, award IIS-#1418511 and CCF-#1533768, and National Institutes of Health award 1R01MD011682-01 and R56HL138415, Children’s Healthcare of Atlanta and UCB.

CONTRIBUTORS

All the authors contributed to the conception of the work, surveying the literature, and drafting the paper. Conflict of interest statement. None declared. Click here for additional data file.
  41 in total

1.  $\mathtt {Deepr}$: A Convolutional Net for Medical Records.

Authors:  Phuoc Nguyen; Truyen Tran; Nilmini Wickramasinghe; Svetha Venkatesh
Journal:  IEEE J Biomed Health Inform       Date:  2016-12-01       Impact factor: 5.772

2.  Deep Learning for Automated Extraction of Primary Sites From Cancer Pathology Reports.

Authors:  John X Qiu; Hong-Jun Yoon; Paul A Fearn; Georgia D Tourassi
Journal:  IEEE J Biomed Health Inform       Date:  2017-05-03       Impact factor: 5.772

3.  Development and Validation of a Deep Learning Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.

Authors:  Varun Gulshan; Lily Peng; Marc Coram; Martin C Stumpe; Derek Wu; Arunachalam Narayanaswamy; Subhashini Venugopalan; Kasumi Widner; Tom Madams; Jorge Cuadros; Ramasamy Kim; Rajiv Raman; Philip C Nelson; Jessica L Mega; Dale R Webster
Journal:  JAMA       Date:  2016-12-13       Impact factor: 56.272

4.  Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.

Authors:  Edward Choi; Mohammad Taha Bahadori; Andy Schuetz; Walter F Stewart; Jimeng Sun
Journal:  JMLR Workshop Conf Proc       Date:  2016-12-10

5.  Automated disease cohort selection using word embeddings from Electronic Health Records.

Authors:  Benjamin S Glicksberg; Riccardo Miotto; Kipp W Johnson; Khader Shameer; Li Li; Rong Chen; Joel T Dudley
Journal:  Pac Symp Biocomput       Date:  2018

6.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods.

Authors:  Rachel L Richesson; Jimeng Sun; Jyotishman Pathak; Abel N Kho; Joshua C Denny
Journal:  Artif Intell Med       Date:  2016-06-25       Impact factor: 5.326

7.  Bidirectional RNN for Medical Event Detection in Electronic Health Records.

Authors:  Abhyuday N Jagannatha; Hong Yu
Journal:  Proc Conf       Date:  2016-06

8.  Structured prediction models for RNN based sequence labeling in clinical text.

Authors:  Abhyuday N Jagannatha; Hong Yu
Journal:  Proc Conf Empir Methods Nat Lang Process       Date:  2016-11

9.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

10.  Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records.

Authors:  Riccardo Miotto; Li Li; Brian A Kidd; Joel T Dudley
Journal:  Sci Rep       Date:  2016-05-17       Impact factor: 4.379

View more
  89 in total

1.  Deep Learning Applied on Next Generation Sequencing Data Analysis.

Authors:  Artem Danilevsky; Noam Shomron
Journal:  Methods Mol Biol       Date:  2021

2.  Deep representation learning of electronic health records to unlock patient stratification at scale.

Authors:  Isotta Landi; Benjamin S Glicksberg; Hao-Chih Lee; Sarah Cherng; Giulia Landi; Matteo Danieletto; Joel T Dudley; Cesare Furlanello; Riccardo Miotto
Journal:  NPJ Digit Med       Date:  2020-07-17

3.  Changing Health-Related Behaviors 6: Analysis, Interpretation, and Application of Big Data.

Authors:  Randy Giffen; Donald Bryant
Journal:  Methods Mol Biol       Date:  2021

Review 4.  Machine Learning Methods for Precision Medicine Research Designed to Reduce Health Disparities: A Structured Tutorial.

Authors:  Sanjay Basu; James H Faghmous; Patrick Doupe
Journal:  Ethn Dis       Date:  2020-04-02       Impact factor: 1.847

5.  Reporting of demographic data and representativeness in machine learning models using electronic health records.

Authors:  Selen Bozkurt; Eli M Cahan; Martin G Seneviratne; Ran Sun; Juan A Lossio-Ventura; John P A Ioannidis; Tina Hernandez-Boussard
Journal:  J Am Med Inform Assoc       Date:  2020-12-09       Impact factor: 4.497

6.  Using convolutional neural networks to identify patient safety incident reports by type and severity.

Authors:  Ying Wang; Enrico Coiera; Farah Magrabi
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

Review 7.  Deep learning in clinical natural language processing: a methodical review.

Authors:  Stephen Wu; Kirk Roberts; Surabhi Datta; Jingcheng Du; Zongcheng Ji; Yuqi Si; Sarvesh Soni; Qiong Wang; Qiang Wei; Yang Xiang; Bo Zhao; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2020-03-01       Impact factor: 4.497

8.  Prognostic models will be victims of their own success, unless….

Authors:  Matthew C Lenert; Michael E Matheny; Colin G Walsh
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

Review 9.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

Review 10.  Artificial Intelligence for Drug Toxicity and Safety.

Authors:  Anna O Basile; Alexandre Yahi; Nicholas P Tatonetti
Journal:  Trends Pharmacol Sci       Date:  2019-08-02       Impact factor: 14.819

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.