| Literature DB >> 36241976 |
Fan Yang1,2, Jian Zhang3, Wanyi Chen3, Yongxuan Lai4, Ying Wang3, Quan Zou5.
Abstract
BACKGROUND: Accurate precision approaches have far not been developed for modeling mortality risk in intensive care unit (ICU) patients. Conventional mortality risk prediction methods can hardly extract the information in longitudinal electronic medical records (EHRs) effectively, since they simply aggregate the heterogeneous variables in EHRs, ignoring the complex relationship and interactions between variables and the time dependence in longitudinal records. Recently deep learning approaches have been widely used in modeling longitudinal EHR data. However, most existing deep learning-based risk prediction approaches only use the information of a single disease, neglecting the interactions between multiple diseases and different conditions.Entities:
Keywords: Deep learning; Electronic health records; Mortality risk prediction
Mesh:
Year: 2022 PMID: 36241976 PMCID: PMC9561325 DOI: 10.1186/s12859-022-04975-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1Three diagnostic records of a patient: HADM_ID refers to the record ID. Each record contains multiple variables, such as gender, age, admission time, discharge time, diagnosis codes, and prescription codes
Fig. 2The framework of DeepMPM: a mortality risk prediction model using two-level attention mechanism and integrating multiple data types
Table of notations
| Notation | Meaning |
|---|---|
| Diagnoses codes set, | |
| DRGs codes set, | |
| Representation vector of diagnosis | |
| Representation vector of treatment | |
| Diagnoses codes of a record, | |
| DRGs codes of a record, | |
| Weight of embedding layer for diagnoses codes | |
| Weight of embedding layer for DRGs codes | |
| Forget gate of LSTM at time step | |
| Weight of the forget gate of LSTM | |
| Input gate of LSTM at time step | |
| Weight of the input gate of LSTM | |
| Candidate cell state of LSTM at time step | |
| Cell state of LSTM at time step | |
| Output gate of LSTM at time step | |
| Weight of the output gate of LSTM | |
| Hidden state of LSTM at time step | |
| Type of hospitalization | |
| Hospital stay vector | |
| Weight of | |
| Weight of | |
| Weight of | |
| Weight of | |
| Hospital stay during | |
| Adjacent hospital stay intervals | |
| Weight of | |
| Weight of | |
| Output of the hidden layer of Care-LSTM at time step | |
| Variable-level weight vector, | |
| Weight matrix in attention module | |
| Output of the hidden layer of Care-LSTM at time step | |
| Visit-level weight vector, | |
| Weight matrix in attention module | |
| Harmonic weight coefficient | |
| Final weight vector of the two-level attention module | |
| Patient health status vector |
Fig. 3The encoder in DeepMPM: the varying-length sequence of diagnoses and DRGs codes are represented as equal-length vectors in a specific vector space
Fig. 4DeepMPM’s representation learning on disease codes: HAMD_ID represents the ID of diagnosis record, and each record has corresponding varying-length coding sequence. Firstly, all sequences are represented as a binary matrix, and then mapped to a specific vector space, and the varying-length sequence is transformed into multi-dimensional equal-length non-negative vector
Fig. 5Modified Care-LSTM: the input units marked in red are the parts different from the standard LSTM
EHR data description
| Item | Value | |
|---|---|---|
| The number of patients | – | 7491 |
| The number of visits | – | 19,265 |
| positive samples/negative samples ratio | – | 1.074:1 |
| Avg. number of visits per patient | – | 2.57 |
| Diagnoses code | The number of code groups | 931 |
| Avg. number of codes per visit | 12.97 | |
| Max. number of codes per visit | 39 | |
| DRGs code | The number of code groups | 1406 |
| Avg. number of codes per visit | 2.23 | |
| Max. number of codes per visit | 3 |
Fig. 6a The distribution of ICD-9 codes in a single record. The average value is 12.97. b The distribution of DRGs codes in a single record. The average value is 23. c The distribution of visit numbers of each patient. The average value is 2.57
Comparison of the characteristics of the baseline methods with DeepMPM
| Model | RNN architecture | Reverse time training | Attention mechanism | Visit-level attention | Variable-level attention |
|---|---|---|---|---|---|
| RNN | |||||
| Multi-task Learning | |||||
| LSTM-NN | |||||
| RETAIN | |||||
| Deepcare | |||||
| DeepMPM-w/o- | |||||
| DeepMPM |
Fig. 7Slanted triangle learning rate. The curve of learning rate is similar to a triangle, and its expression is shown in Eqs. 22–24, where T is the total number of training iterations, is the percentage of rising segment to the total number of iterations, and ratio determines the lowest value of learning rate. In the experiment, we set , ,
The results of the performances of different models
| Model | AUC | Precision | Recall | F1-score |
|---|---|---|---|---|
| RNN | 0.8318 ± 0.0102 | 0.7392 ± 0.0340 | 0.7571 ± 0.0408 | 0.7505 ± 0.0139 |
| Multi-task learning | 0.6330 ± 0.0084 | 0.6130 ± 0.0245 | 0.5808 ± 0.1439 | 0.5868 ± 0.0674 |
| LSTM-NN | 0.8326 ± 0.0087 | 0.7562 ± 0.0289 | 0.7508 ± 0.0519 | 0.7621 ± 0.0148 |
| RETAIN | 0.8268 ± 0.0081 | 0.7592 ± 0.0103 | 0.7788 ± 0.0091 | 0.7687 ± 0.0089 |
| Deepcare | 0.7876 ± 0.0098 | 0.7707 ± 0.0458 | 0.7782 ± 0.0147 | |
| DeepMPM-w/o- | 0.8435 ± 0.0073 | 0.7685 ± 0.0210 | 0.7759 ± 0.0490 | 0.7710 ± 0.0177 |
| DeepMPM | 0.7700 ± 0.0306 |
The overall best result is given in bold font
The performances on the same test set of DeepMPM trained with different training sets
| Training set description | AUC | Precision | Recall | F1-score |
|---|---|---|---|---|
| Group I: all of patients were diagnosed with CHF | 0.7593 ± 0.0473 | 0.7533 ± 0.0378 | 0.8419 ± 0.0219 | 0.7853 ± 0.0145 |
| Group II: Containing patients weren’t diagnosed with CHF | ||||
| Group I: all of patients were diagnosed with diabetes | 0.7468 ± 0.0255 | 0.7244 ± 0.0416 | 0.7184 ± 0.0214 | 0.6913 ± 0.0087 |
| Group II: containing patients weren’t diagnosed with diabetes |
The overall best result is given in bold font
Fig. 8Comparison on the distribution of the hard positive examples and other positive examples in the length of course of disease, the length of last stay in hospital and the interval between the last admission and the last discharge. a–f Violin diagram and histogram of the three factors in CHF records; g–l violin diagram and histogram of the three factors in Diabetes records
Fig. 9The heatmap of correlation matrix obtained by DeepMPM: a pairwise correlation between the diseases; b pairwise correlation between DRGs. The deeper the color of the pixel block, the stronger the correlation between the two diseases or DRGs codes represented by rows and columns
Diseases related to 4140, V458 and 1403 identified by DeepMPM
| Example | Related diseases | ICD-9 code |
|---|---|---|
| 4140 | Subendocardial infarction, episode of care unspecified | 4107 |
| Congestive heart failure | 4280 | |
| Atrial fibrillation and flutter | 4273 | |
| Chronic airway obstruction | 496 | |
| V458 | Benign neoplasm of cerebral meninges | 2252 |
| Pure hypercholesterolemia | 2720 | |
| Atrial fibrillation and flutter | 4237 | |
| 1403 | Acute chemical stroke W use of thrombotic agent W MCC | 61 |
| Other respiratory system operating room procedures with complications | 76 | |
| Purmonary edema and respiratory failure | 1333 |
Two-level attention weights of Case 1
| Visit ID | ICD-9 code and the disease it represents | Weight |
|---|---|---|
Visit 1 0.2736 | 1983(Secondary malignant neoplasm of brain and spinal cord) |
|
| 3314(Obstructive hydrocephalus) | 0.0403 | |
| 1977(Malignant neoplasm of liver, secondary) |
| |
| 1970(Secondary malignant neoplasm of lung) |
| |
| 1985(Secondary malignant neoplasm of bone and bone marrow) |
| |
| V1006(Personal history of malignant neoplasm of rectosigmoid junction) |
| |
Visit 2
| 1983(Secondary malignant neoplasm of brain and spinal cord) |
|
| 431(Intracerebral hemorrhage) |
| |
| 78039(Other convulsions) | 0.0278 | |
| 1970(Secondary malignant neoplasm of lung) |
| |
| 1977(Malignant neoplasm of liver, secondary) |
| |
| V452(Presence of cerebrospinal fluid drainage device) | 0.0432 | |
| 7812(Abnormality of gait) | 0.0358 | |
| V1006(Personal history of malignant neoplasm of rectosigmoid junction) |
| |
| 4019(Unspecified essential hypertension) |
| |
| V153(Personal history of irradiation, presenting hazards to health) | 0.0510 | |
| 2518(Other specified disorders of pancreatic internal secretion) | 0.0128 | |
| E9320(Adrenal cortical steroids causing adverse effects in therapeutic use) | 0.0476 | |
Visit 3
| 1977(Malignant neoplasm of liver, secondary) |
|
| 1983(Secondary malignant neoplasm of brain and spinal cord) |
| |
| 1970(Secondary malignant neoplasm of lung) |
| |
| 5770(Acute pancreatitis) |
| |
| 79,902(Hypoxemia) | 0.0860 | |
| V1006(Personal history of malignant neoplasm of rectosigmoid junction) |
| |
| V452(Cerebrospinal fluid drainage device) | 0.0628 | |
| 99591(Sepsis) |
| |
| 4019(Unspecified essential hypertension) | 0.0823 | |
| 25,000(Diabetes mellitus without mention of complication) | 0.0354 |
The visit-level attention weight is displayed under visit ID, while all variable-level attention weights are associated with the ICD-9 codes. Bold values under visit ID indicate that the visit has a relatively higher visit-level attention weight. In the Weight column, bold values indicate that the corresponding ICD-9 code was assigned a relatively higher variable-level attention weight
Two-level attention weights of Case 2
| Visit ID | ICD-9 code and the disease it represents | Weight |
|---|---|---|
Visit 1 | 4280(Congestive heart failure, unspecified) | |
| 4254(Other primary cardiomyopathies) | ||
| 5849(Acute kidney failure, unspecified) | ||
| 2866(Defibrination syndrome) | ||
| 2762(Acidosis) | 0.0742 | |
| 42,731(Atrial fibrillation) | 0.0207 | |
| 1749(Malignant neoplasm of breast (female), unspecified) | ||
Visit 2 | 5789(Hemorrhage of gastrointestinal tract, unspecified) | |
| 4240(Mitral valve disorders) | 0.0907 | |
| 2851(Acute posthemorrhagic anemia) | ||
| 40,391(Hypertensive chronic kidney disease, chronic kidney disease stage V) | ||
| 4254(Other primary cardiomyopathies) | ||
| 4280(Congestive heart failure, unspecified) | ||
| 4271(Paroxysmal ventricular tachycardia) | 0.0186 | |
| 56,982(Ulceration of intestine) | 0.0081 | |
| 53,190(Gastric ulcer, without mention of hemorrhage or perforation) | 0.0138 | |
Visit 3 0.1976 | 71,536(Osteoarthrosis, localized, not specified whether primary or secondary) | 0.0898 |
| 4254(Other primary cardiomyopathies) | ||
| 4280(Congestive heart failure, unspecified) | ||
| 4240(Mitral valve disorders) | ||
| 2809(Iron deficiency anemia, unspecified) | 0.0245 | |
| V103(Personal history of malignant neoplasm of breast) |
The visit-level attention weight is displayed under visit ID, while all variable-level attention weights are associated with the ICD-9 codes. Bold values under visit ID indicate that the visit has a relatively higher visit-level attention weight. In the Weight column, bold values indicate that the corresponding ICD-9 code was assigned a relatively higher variable-level attention weight