| Literature DB >> 32985996 |
Rui Li1, Changchang Yin1, Samuel Yang1,2, Buyue Qian3, Ping Zhang1.
Abstract
BACKGROUND: Deep learning models have attracted significant interest from health care researchers during the last few decades. There have been many studies that apply deep learning to medical applications and achieve promising results. However, there are three limitations to the existing models: (1) most clinicians are unable to interpret the results from the existing models, (2) existing models cannot incorporate complicated medical domain knowledge (eg, a disease causes another disease), and (3) most existing models lack visual exploration and interaction. Both the electronic health record (EHR) data set and the deep model results are complex and abstract, which impedes clinicians from exploring and communicating with the model directly.Entities:
Keywords: electronic health records; interpretable deep learning; knowledge graph; visual analytics
Mesh:
Year: 2020 PMID: 32985996 PMCID: PMC7551124 DOI: 10.2196/20645
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Figure 1A screenshot of DG-Viz. (A) The patient distribution view shows an overview of all patients. (B) The demographic chart shows the demographics distribution of all patients. (C) The patient history view shows the contributions of all visits and medical codes of a single patient. The line chart presents the prediction results among time. (D) Knowledge graph view shows the whole network structure. v1-v28: different visits.
Figure 2Framework of domain-knowledge–guided recurrent neural network (DG-RNN), which takes the medical event embeddings and the corresponding time encoding vectors as inputs. For each event input, DG-RNN generates two output vectors. After all the input codes input to DG-RNN, we concatenate the output vectors and leverage a global max pooling and a fully connected layer (FC) to predict the clinical risk. We adopt t-distributed stochastic neighbor embedding (t-SNE) to map the global pooling layer’s output vectors to a 2D space (the Distribution View A is DG-Viz), where the distance between patient represents their similarity. The attention results are displayed in the knowledge graph view D to show the knowledge graph’s contribution in DG-RNN. The input medical codes and the output clinical risks are displayed in the History View C in DG-Viz, which shows the patient’s risk changing trend. LSTM: long short-term memory; FC: fully connected layers; t-SNE: t-distributed stochastic neighbor embedding.
Figure 3Attention mechanism. In the knowledge graph, the yellow node means the current input medical event and the other nodes are its adjacent nodes. Our attention mechanism takes as inputs the embeddings of the adjacent nodes and generates the graph attention vector.
Figure 4Distribution view: (a) the projection scatter plot of all patients in the test data set, (b1) the race distribution chart, (b2) the gender distribution chart, and (b3) the age distribution histogram.
Figure 5Patient history view. Top: the visit view that arranges all visit records with the same distance. (a1): the prediction results involved with time, (a2): the visits and medical codes of the patient, (a3): added or removed medical codes. Bottom: (b) the temporal view that arranges all visit records based on their time intervals.
Figure 6Code edit panel: users can see all the medical codes within a specific visit and add drug to this visit. kg: knowledge graph; cont: contribution.
Figure 7The whole knowledge graph in Knowledge Graph View.
Figure 8The local network of a specific medical code and its neighbors in Knowledge Graph View.
Statistics of data sets.
| Characteristics | EHRa-120 | EHR-90 | EHR-60 | EHR-30 | EHR-14 | EHR-7 |
| Number of case patients | 442 | 462 | 494 | 517 | 536 | 554 |
| Number of control patients | 1326 | 1386 | 1482 | 1551 | 1608 | 1662 |
| Number of events in the data set | 134,666 | 140,984 | 152,389 | 160,584 | 169,636 | 176,460 |
| Number of unique events | 967 | 974 | 978 | 983 | 989 | 995 |
| Average of EHRs’ length | 76.17 | 76.29 | 77.11 | 77.65 | 79.12 | 79.62 |
| Average number of events per visit | 2.17 | 2.36 | 2.29 | 2.41 | 2.35 | 2.39 |
aEHR: electronic health record.
Area under a receiver operating characteristic of the heart failure prediction task.
| Model | EHRa-120 | EHR-90 | EHR-60 | EHR-30 | EHR-14 | EHR-7 |
| LRb | 0.6883 | 0.6956 | 0.6932 | 0.7139 | 0.7347 | 0.7386 |
| RFc | 0.6726 | 0.6913 | 0.6965 | 0.7212 | 0.7217 | 0.7336 |
| SVMd | 0.6173 | 0.6339 | 0.6213 | 0.6258 | 0.6323 | 0.6372 |
| GRUe | 0.6504 | 0.6670 | 0.6939 | 0.7178 | 0.7438 | 0.7638 |
| LSTMf | 0.6628 | 0.6792 | 0.6982 | 0.7282 | 0.7459 | 0.7631 |
| RETAINg | 0.6962 | 0.7115 | 0.7318 | 0.7437 | 0.7561 | 0.7683 |
| GRAMh | 0.7081 | 0.7292 | 0.7378 | 0.7525 | 0.7648 | 0.7656 |
| KAMEi | 0.7168 | 0.7319 | 0.7392 | 0.7573 | 0.7662 | 0.7717 |
| DG-RNNj-nk | 0.7158 | 0.7310 | 0.7368 | 0.7486 | 0.7583 | 0.7663 |
| DG-RNN-np | 0.6995 | 0.7075 | 0.7182 | 0.7425 | 0.7596 | 0.7723 |
| DG-RNN | 0.7288 | 0.7437 | 0.7510 | 0.7663 | 0.7789 | 0.7863 |
aEHR: electronic health record.
bLR: logistic regression.
cRF: random forest.
dSVM: support vector machine.
eGRU: gated recurrent unit.
fLSTM: long short-term memory.
gRETAIN: reverse time attention model.
hGRAM: graph-based attention model.
iKAME: knowledge-based attention model.
jDG-RNN: domain-knowledge–guided recurrent neural network.
Sensitivity of the heart failure prediction task.
| Model | EHRa-120 | EHR-90 | EHR-60 | EHR-30 | EHR-14 | EHR-7 |
| LRb | 0.6262 | 0.6441 | 0.6452 | 0.6512 | 0.6522 | 0.6684 |
| RFc | 0.6235 | 0.6456 | 0.6549 | 0.6612 | 0.6636 | 0.6723 |
| SVMd | 0.5689 | 0.5835 | 0.5732 | 0.5769 | 0.5822 | 0.5862 |
| GRUe | 0.6120 | 0.6227 | 0.6348 | 0.6524 | 0.6837 | 0.7001 |
| LSTMf | 0.6322 | 0.6407 | 0.6564 | 0.6869 | 0.6874 | 0.7006 |
| RETAINg | 0.6556 | 0.6612 | 0.6719 | 0.6916 | 0.6938 | 0.7018 |
| GRAMh | 0.6614 | 0.6627 | 0.6718 | 0.6914 | 0.7030 | 0.7046 |
| KAMEi | 0.6645 | 0.6714 | 0.6759 | 0.6828 | 0.6991 | 0.7036 |
| DG-RNNj-nk | 0.6634 | 0.6712 | 0.6790 | 0.6817 | 0.6926 | 0.7132 |
| DG-RNN-np | 0.6513 | 0.6569 | 0.6727 | 0.6801 | 0.6997 | 0.7101 |
| DG-RNN | 0.6754 | 0.6816 | 0.6856 | 0.7012 | 0.7145 | 0.7206 |
aEHR: electronic health record.
bLR: logistic regression.
cRF: random forest.
dSVM: support vector machine.
eGRU: gated recurrent unit.
fLSTM: long short-term memory.
gRETAIN: reverse time attention model.
hGRAM: graph-based attention model.
iKAME: knowledge-based attention model.
jDG-RNN: domain-knowledge–guided recurrent neural network.
Specificity of the heart failure prediction task.
| Model | EHRa-120 | EHR-90 | EHR-60 | EHR-30 | EHR-14 | EHR-7 |
| LRb | 0.6402 | 0.6437 | 0.6429 | 0.6528 | 0.6727 | 0.6887 |
| RFc | 0.6301 | 0.6414 | 0.6484 | 0.6674 | 0.6720 | 0.6802 |
| SVMd | 0.5897 | 0.5904 | 0.5948 | 0.6041 | 0.6062 | 0.6079 |
| GRUe | 0.6231 | 0.6458 | 0.6510 | 0.6718 | 0.6947 | 0.7020 |
| LSTMf | 0.6106 | 0.6252 | 0.6293 | 0.6427 | 0.6563 | 0.6595 |
| RETAINg | 0.6602 | 0.6619 | 0.6755 | 0.7016 | 0.7041 | 0.7165 |
| GRAMh | 0.6673 | 0.6835 | 0.6901 | 0.7014 | 0.7108 | 0.7114 |
| KAMEi | 0.6720 | 0.6806 | 0.6842 | 0.6951 | 0.7119 | 0.7131 |
| DG-RNNj-nk | 0.6773 | 0.6819 | 0.6893 | 0.6924 | 0.7158 | 0.7190 |
| DG-RNN-np | 0.6707 | 0.6769 | 0.6791 | 0.7037 | 0.7078 | 0.7166 |
| DG-RNN | 0.6862 | 0.6976 | 0.7022 | 0.7128 | 0.7254 | 0.7273 |
aEHR: electronic health record.
bLR: logistic regression.
cRF: random forest.
dSVM: support vector machine.
eGRU: gated recurrent unit.
fLSTM: long short-term memory.
gRETAIN: reverse time attention model.
hGRAM: graph-based attention model.
iKAME: knowledge-based attention model.
jDG-RNN: domain-knowledge–guided recurrent neural network.