| Literature DB >> 33623481 |
Shan Yang1, Xiangwei Zheng1, Cun Ji1, Xuanchi Chen1.
Abstract
Electronic Health Records (EHRs) are digital records associated with hospitalization, diagnosis, medications and so on. Secondary use of EHRs can promote the clinical informatics applications and the development of healthcare undertaking. EHRs have the unique characteristic where the patient visits are temporally ordered but the diagnosis codes within a visit are randomly ordered. The hierarchical structure requires a multi-layer network to explore the different relational information of EHRs. In this paper, we propose a Multi-Layer Representation Learning method (MLRL), which is capable of learning effective patient representation by hierarchically exploring the valuable information in both diagnosis codes and patient visits. Firstly, MLRL utilizes the multi-head attention mechanism to explore the potential connections in diagnosis codes, and a linear transformation is implemented to further map the code vectors to non-negative real-valued representations. The initial visit vectors are then obtained by summarizing all the code representations. Secondly, the proposed method combines Bidirectional Long Short-Term Memory with self-attention mechanism to learn the weighted visit vectors which are aggregated to form the patient representation. Finally, to evaluate the performance of MLRL, we apply it to patient's mortality prediction on real EHRs and the experimental results demonstrate that MLRL has a significant improvement in prediction performance. MLRL achieves around 0.915 in Area Under Curve which is superior to the results obtained by baseline methods. Furthermore, compared with raw data and other data representations, the learned representation with MLRL shows its outstanding results and availability on multiple different classifiers.Entities:
Keywords: Attention; Bidirectional long short-term memory; Electronic health records; Multi-layer representation learning
Year: 2021 PMID: 33623481 PMCID: PMC7891814 DOI: 10.1007/s11063-021-10449-2
Source DB: PubMed Journal: Neural Process Lett ISSN: 1370-4621 Impact factor: 2.908
Fig. 1The structure of MLRL
Fig. 2The visits for a patient
Fig. 3Multi-head attention mechanism
Basic statistics of the MIMIC-III database
| Data Set | MIMIC-III |
|---|---|
| 7537 | |
| 19,993 | |
| Avg. | 2.65 |
| 849 | |
| Avg. | 11.92 |
| Max | 39 |
Mean of TP, FP, TN and FN for confusion matrix
| Real status | |||
|---|---|---|---|
| Positive | Negative | ||
| Predicted status | Positive | True Positive (TP) | False Positive (FP) |
| Negative | False Negative (FN) | True Negative (TN) | |
MLRL parameter settings
| Parameters | Values |
|---|---|
| Optimizer | Adam |
| Learning rate | 1e−4 |
| 8 | |
| 50 | |
| 50 | |
| 400 | |
| Hidden layer size | 200 |
| Batch size | 500 |
| Epochs | 50 |
The prediction performance of MLRL and baseline methods
| Methods | AUC | Accuracy | Recall | F1 score |
|---|---|---|---|---|
| LR | 0.812 ± 0.007 | 0.762 ± 0.006 | 0.767 ± 0.007 | 0.767 ± 0.009 |
| MLP | 0.811 ± 0.008 | 0.760 ± 0.006 | 0.766 ± 0.009 | 0.767 ± 0.012 |
| Deep patient | 0.822 ± 0.011 | 0.776 ± 0.009 | 0.775 ± 0.012 | 0.772 ± 0.012 |
| Med2Vec | 0.901 ± 0.010 | 0.778 ± 0.008 | 0.780 ± 0.013 | 0.778 ± 0.012 |
| BiLSTM-Soft | 0.889 ± 0.009 | 0.766 ± 0.008 | 0.767 ± 0.012 | 0.766 ± 0.011 |
| BiLSTM-Att-Soft | 0.897 ± 0.009 | 0.775 ± 0.009 | 0.773 ± 0.011 | 0.773 ± 0.013 |
| MLRL | 0.915 ± 0.009 | 0.785 ± 0.007 | 0.791 ± 0.012 | 0.792 ± 0.012 |
Fig. 4The training process of MLRL and baseline methods
Mortality prediction results of different data representations
| Metrics | Classifiers | Data representation | |||||
|---|---|---|---|---|---|---|---|
| Raw data | Deep patient | Med2Vec | BiLSTM | BiLSTM-Att | MLRL | ||
| AUC | LR | 0.812 ± 0.005 | 0.817 ± 0.009 | 0.829 ± 0.010 | 0.816 ± 0.007 | 0.818 ± 0.009 | 0.837 ± 0.009 |
| MLP | 0.811 ± 0.006 | 0.820 ± 0.008 | 0.832 ± 0.010 | 0.818 ± 0.009 | 0.821 ± 0.008 | 0.839 ± 0.008 | |
| LSTM | 0.826 ± 0.006 | 0.865 ± 0.011 | 0.889 ± 0.009 | 0.869 ± 0.009 | 0.879 ± 0.010 | 0.908 ± 0.009 | |
| RF | 0.819 ± 0.003 | 0.822 ± 0.007 | 0.826 ± 0.009 | 0.823 ± 0.006 | 0.824 ± 0.007 | 0.835 ± 0.007 | |
| SVM | 0.748 ± 0.006 | 0.811 ± 0.013 | 0.825 ± 0.011 | 0.822 ± 0.010 | 0.824 ± 0.009 | 0.837 ± 0.009 | |
| Accuracy | LR | 0.762 ± 0.004 | 0.762 ± 0.008 | 0.775 ± 0.007 | 0.756 ± 0.006 | 0.766 ± 0.008 | 0.783 ± 0.007 |
| MLP | 0.760 ± 0.004 | 0.770 ± 0.010 | 0.777 ± 0.009 | 0.758 ± 0.008 | 0.777 ± 0.008 | 0.783 ± 0.007 | |
| LSTM | 0.763 ± 0.006 | 0.781 ± 0.009 | 0.785 ± 0.012 | 0.764 ± 0.007 | 0.785 ± 0.009 | 0.793 ± 0.010 | |
| RF | 0.773 ± 0.004 | 0.776 ± 0.007 | 0.779 ± 0.008 | 0.778 ± 0.006 | 0.778 ± 0.008 | 0.784 ± 0.006 | |
| SVM | 0.709 ± 0.005 | 0.762 ± 0.012 | 0.773 ± 0.010 | 0.761 ± 0.007 | 0.763 ± 0.010 | 0.788 ± 0.009 | |
| Recall | LR | 0.767 ± 0.006 | 0.760 ± 0.010 | 0.780 ± 0.010 | 0.764 ± 0.010 | 0.765 ± 0.011 | 0.789 ± 0.011 |
| MLP | 0.766 ± 0.006 | 0.765 ± 0.012 | 0.784 ± 0.009 | 0.767 ± 0.009 | 0.768 ± 0.013 | 0.790 ± 0.010 | |
| LSTM | 0.768 ± 0.009 | 0.785 ± 0.014 | 0.787 ± 0.013 | 0.785 ± 0.013 | 0.788 ± 0.012 | 0.798 ± 0.012 | |
| RF | 0.772 ± 0.006 | 0.775 ± 0.010 | 0.778 ± 0.008 | 0.771 ± 0.009 | 0.773 ± 0.009 | 0.786 ± 0.009 | |
| SVM | 0.671 ± 0.008 | 0.769 ± 0.011 | 0.774 ± 0.012 | 0.770 ± 0.011 | 0.771 ± 0.012 | 0.790 ± 0.011 | |
| F1 score | LR | 0.767 ± 0.006 | 0.760 ± 0.012 | 0.781 ± 0.011 | 0.765 ± 0.011 | 0.766 ± 0.011 | 0.791 ± 0.010 |
| MLP | 0.767 ± 0.007 | 0.770 ± 0.014 | 0.784 ± 0.011 | 0.768 ± 0.011 | 0.770 ± 0.012 | 0.791 ± 0.013 | |
| LSTM | 0.769 ± 0.007 | 0.785 ± 0.014 | 0.789 ± 0.014 | 0.786 ± 0.013 | 0.789 ± 0.013 | 0.802 ± 0.012 | |
| RF | 0.772 ± 0.006 | 0.770 ± 0.011 | 0.776 ± 0.010 | 0.771 ± 0.010 | 0.773 ± 0.011 | 0.788 ± 0.010 | |
| SVM | 0.680 ± 0.009 | 0.769 ± 0.015 | 0.776 ± 0.013 | 0.772 ± 0.012 | 0.774 ± 0.015 | 0.792 ± 0.012 | |
Fig. 5Comparison of the results for different data representations