| Literature DB >> 27185194 |
Riccardo Miotto1,2,3, Li Li1,2,3, Brian A Kidd1,2,3, Joel T Dudley1,2,3.
Abstract
Secondary use of electronic health records (EHRs) promises to advance clinical research and better inform clinical decision making. Challenges in summarizing and representing patient data prevent widespread practice of predictive modeling using EHRs. Here we present a novel unsupervised deep feature learning method to derive a general-purpose patient representation from EHR data that facilitates clinical predictive modeling. In particular, a three-layer stack of denoising autoencoders was used to capture hierarchical regularities and dependencies in the aggregated EHRs of about 700,000 patients from the Mount Sinai data warehouse. The result is a representation we name "deep patient". We evaluated this representation as broadly predictive of health states by assessing the probability of patients to develop various diseases. We performed evaluation using 76,214 test patients comprising 78 diseases from diverse clinical domains and temporal windows. Our results significantly outperformed those achieved using representations based on raw EHR data and alternative feature learning strategies. Prediction performance for severe diabetes, schizophrenia, and various cancers were among the top performing. These findings indicate that deep learning applied to EHRs can derive patient representations that offer improved clinical predictions, and could provide a machine learning framework for augmenting clinical decision systems.Entities:
Mesh:
Year: 2016 PMID: 27185194 PMCID: PMC4869115 DOI: 10.1038/srep26094
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Conceptual framework used to derive the deep patient representation through unsupervised deep learning of a large EHR data warehouse.
(A) Pre-processing stage to obtain raw patient representations from the EHRs. (B) The raw representations are modeled by the unsupervised deep architecture leading to a set of general and robust features. (C) The deep features are applied to the entire hospital database to derive patient representations that can be applied to a number of clinical tasks.
Figure 2Diagram of the unsupervised deep feature learning pipeline to transform a raw dataset into the deep patient representation through multiple layers of neural networks.
Each layer of the neural network is trained to produce a higher-level representation from the result of the previous layer.
Disease classification results in terms of area under the ROC curve (AUC-ROC), accuracy and F-score.
| Time Interval = 1 year (76,214 patients) | |||
|---|---|---|---|
| Patient Representation | AUC-ROC | Classification Threshold = 0.6 | |
| Accuracy | F-Score | ||
| RawFeat | 0.659 | 0.805 | 0.084 |
| PCA | 0.696 | 0.879 | 0.104 |
| GMM | 0.632 | 0.891 | 0.072 |
| K-Means | 0.672 | 0.887 | 0.093 |
| ICA | 0.695 | 0.882 | 0.101 |
| DeepPatient | |||
(*) The difference with the corresponding second best measurement is statistically significant (p < 0.05, t-test).
Area under the ROC curve obtained in the disease classification experiment using patient data represented with original descriptors (“RawFeat”) and pre-processed by principal component analysis (“PCA”) and three-layer stacked denoising autoencoders (“DeepPatient”).
| Time Interval = 1 year (76,214 patients) | |||
|---|---|---|---|
| Disease | Area under the ROC curve | ||
| RawFeat | PCA | DeepPatient | |
| Diabetes mellitus with complications | 0.794 | 0.861 | |
| Cancer of rectum and anus | 0.863 | 0.821 | |
| Cancer of liver and intrahepatic bile duct | 0.830 | 0.867 | |
| Regional enteritis and ulcerative colitis | 0.814 | 0.843 | |
| Congestive heart failure (non-hypertensive) | 0.808 | 0.808 | |
| Attention-deficit and disruptive behavior disorders | 0.730 | 0.797 | |
| Cancer of prostate | 0.692 | 0.820 | |
| Schizophrenia | 0.791 | 0.788 | |
| Multiple myeloma | 0.783 | 0.739 | |
| Acute myocardial infarction | 0.771 | 0.775 | |
Patient disease tagging results for diagnoses assigned during different time intervals in terms of precision-at-k, with k = 1, 3, 5; UppBnd shows the best results achievable (i.e., all the correct diagnoses assigned to all the patients).
| Time Interval | Metrics | UppBnd | Patient Representation | |||
|---|---|---|---|---|---|---|
| RawFeat | PCA | ICA | DeepPatient | |||
| Prec@1 | 1.000 | 0.319 | 0.343 | 0.345 | ||
| Prec@3 | 0.492 | 0.217 | 0.251 | 0.255 | ||
| Prec@5 | 0.319 | 0.191 | 0.214 | 0.215 | ||
| Prec@1 | 1.000 | 0.329 | 0.349 | 0.353 | ||
| Prec@3 | 0.511 | 0.221 | 0.254 | 0.259 | ||
| Prec@5 | 0.335 | 0.199 | 0.216 | 0.219 | ||
| Prec@1 | 1.000 | 0.332 | 0.353 | 0.360 | ||
| Prec@3 | 0.521 | 0.243 | 0.257 | 0.262 | ||
| Prec@5 | 0.345 | 0.201 | 0.219 | 0.220 | ||
| Prec@1 | 1.000 | 0.331 | 0.361 | 0.363 | ||
| Prec@3 | 0.549 | 0.246 | 0.261 | 0.265 | ||
| Prec@5 | 0.370 | 0.207 | 0.221 | 0.224 | ||
(*) The difference with the corresponding second best measurement is statistically significant (p < 0.05, t-test).
Figure 3R-precision obtained in the disease tagging experiment by the different patient representations over several prediction time intervals (expressed as number of days).
We reports results for patients represented with original descriptors (RawFeat) and pre-processed by principal component analysis (PCA), independent component analysis (ICA), Gaussian mixture model (GMM), k-means clustering (K-Means), and three-layer stacked denoising autoencoders (DeepPatient).