| Literature DB >> 28442772 |
Yejin Kim1, Robert El-Kareh2, Jimeng Sun3, Hwanjo Yu4, Xiaoqian Jiang5.
Abstract
Adoption of Electronic Health Record (EHR) systems has led to collection of massive healthcare data, which creates oppor- tunities and challenges to study them. Computational phenotyping offers a promising way to convert the sparse and complex data into meaningful concepts that are interpretable to healthcare givers to make use of them. We propose a novel su- pervised nonnegative tensor factorization methodology that derives discriminative and distinct phenotypes. We represented co-occurrence of diagnoses and prescriptions in EHRs as a third-order tensor, and decomposed it using the CP algorithm. We evaluated discriminative power of our models with an Intensive Care Unit database (MIMIC-III) and demonstrated superior performance than state-of-the-art ICU mortality calculators (e.g., APACHE II, SAPS II). Example of the resulted phenotypes are sepsis with acute kidney injury, cardiac surgery, anemia, respiratory failure, heart failure, cardiac arrest, metastatic cancer (requiring ICU), end-stage dementia (requiring ICU and transitioned to comfort-care), intraabdominal conditions, and alcohol abuse/withdrawal.Entities:
Mesh:
Year: 2017 PMID: 28442772 PMCID: PMC5430728 DOI: 10.1038/s41598-017-01139-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow of our phenotyping method. We constructed a tensor using the number of co-occurrences between diagnoses and prescriptions of each patient in EHRs. We then decomposed the tensor using the proposed constrained tensor factorization that incorporates regularizers for discriminative and distinct phenotypes. We defined phenotype as a set of co-occurring diagnoses and prescriptions, which can be inferred using decomposed tensors, and evaluated their discriminative and distinct power. We also selected top 10 representative phenotypes and presented its meaning and usefulness.
Figure 2Constructing tensor from EHRs. We built a third-order tensor with co-occurrences of patients, diagnoses, and prescriptions from EHRs. Patient I is diagnosed with Alzheimer’s disease and is ordered morphine sulfate twice.
Figure 3Phenotyping by tensor factorization. Dark shade, light shade, and no shade represents high membership, low membership, and zero membership to the phenotype, respectively. Patients who died have high membership to Phenotype 2 and Phenotype R.
Examples of time-ordered EHRs sequences.
| Lorazepam → Acetaminophen → Piperacillin-Tazobactam → Ventricular fibrillation |
| Diltiazem → Pneumococcal Vac Polyvalent → Anemia → Chronic obst asthma |
| Pantoprazole Sodium → Acetaminophen |
| Oxycodone → Heparin Flush → Severe sepsis |
Each sequence consists of formulary drug codes (prescription) and ICD-9 codes (diagnosis), and is used in Word2Vec to derive pairwise similarities.
Discriminative and distinction power comparison.
| RMSE | AUC | Sensitivity | Specificity | Rel. Length | Avg. overlap | |
|---|---|---|---|---|---|---|
| APACHE II[ | — | 0.7364 | 0.6712 | 0.6728 | — | — |
| SAPS II[ | — | 0.8129 | 0.7970 | 0.6720 | — | — |
| OASIS[ | — | 0.7227 | 0.6253 | 0.7077 | — | — |
| APS III[ | — | 0.7419 | 0.6861 | 0.6994 | — | — |
| CP[ | 2.2153 (±0.0015) | 0.8469 (±0.0156) | 0.8375 (±0.0391) | 0.7342 (±0.0401) | 0.6807 (±0.0047) | 0.3777 (±0.0064) |
| Supervised | 2.2152 ±(0.0016) | 0.8568 (±0.0106) | 0.8392 (±0.0377) | 0.7518 (±0.0393) | 0.6828 (±0.0019) | 0.3787 (±0.0059) |
| Rubik[ | 2.5025 (±0.0003) | 0.7779 (±0.0247) | 0.7310 (±0.0304) | 0.7242 (±0.0377) | 0.3934 (±0.0102) | 0.2806 (±0.0075) |
| Sim.-based | 2.5069 (±0.0130) | 0.7796 (±0.0204) | 0.7615 (±0.0378) | 0.7097 (±0.0473) | 0.0714 (±0.0406) | 0.0013 (±0.0014) |
| Supervised + Sim.-based | 2.3014 (±0.0060) | 0.8389 (±0.0199) | 0.8223 (±0.0387) | 0.7487 (±0.0409) | 0.3958 (±0.0137) | 0.1267 (±0.0100) |
RMSE, discrimination (AUC, sensitivity, specificity) and distinction (Relative length, Average overlap) with 95% confidence interval of baselines and our proposed models when R = 50. CP = CP decomposition, Supervised = the supervised phenotyping for discriminative power, Sim.-based = the similarity-based phenotyping for distinct power, Supervised + Sim. -based = the final model that incorporates the both supervised and similarity-based phenotyping.
Logistic regression coefficient from feature selection, p-value, and prevalence.
| Phenotype | Coefficient |
|
| Prevalence |
|---|---|---|---|---|
| Intercept | −0.19 | <0.001 | — | — |
| 1 | 28.47 | <0.001 | 749 | 94.53 |
| 3: Sepsis with acute kidney injury | 44.64 | <0.001 | 96 | 45.24 |
| 4: Cardiac surgery | −138.00 | <0.001 | 95 | 50.43 |
| 5: Anemia | −19.76 | <0.001 | 58 | 36.81 |
| 6: Respiratory failure | 88.87 | <0.001 | 56 | 30.98 |
| 10: Heart failure | 30.79 | <0.001 | 39 | 27.19 |
| 11 | 15.13 | <0.001 | 37 | 16.74 |
| 13 | −15.23 | <0.001 | 31 | 22.48 |
| 15 | −7.74 | 0.02 | 30 | 19.02 |
| 16 | 8.69 | <0.001 | 29 | 42.99 |
| 18: Cardiac arrest | 47.08 | <0.001 | 28 | 9.14 |
| 20 | −11.49 | <0.001 | 23 | 9.70 |
| 21 | −5.54 | 0.02 | 22 | 18.46 |
| 23: Metastatic cancer requiring ICU | 25.10 | <0.001 | 20 | 12.29 |
| 24: End-stage dementia requiring ICU | 34.46 | <0.001 | 20 | 12.72 |
| 25 | 12.81 | <0.001 | 18 | 15.08 |
| 28 | −9.00 | <0.001 | 17 | 10.23 |
| 29 | 10.78 | <0.001 | 16 | 18.06 |
| 31 | 10.42 | 0.01 | 16 | 6.13 |
| 32: Intraabdominal conditions | −19.21 | <0.001 | 15 | 4.84 |
| 33 | −6.41 | 0.04 | 14 | 5.12 |
| 34: Alcohol abuse/withdrawal | −22.82 | <0.001 | 13 | 12.57 |
| 41 | −19.89 | <0.001 | 10 | 16.23 |
| 46 | 13.54 | <0.001 | 8 | 7.20 |
| 47 | −9.78 | <0.001 | 6 | 7.96 |
Ten representative phenotypes are 3: Sepsis with acute kidney injury, 4: Cardiac surgery, 5: Anemia, 6: Respiratory failure, 10: Heart failure, 18: Cardiac arrest, 23: Metastatic cancer requiring ICU, 24: End-stage dementia requiring ICU for comport care, 32: Intraabdominal conditions, 34: Alcohol abuse/withdrawal. λ = ||A :||||B :||||C :|| (for frequency). Prevalence = (the number of patients whose membership to the phenotype is non-zero/the total number of patients) × 100%.
Ten representative phenotypes. Listed in order of frequency.
| Sepsis with acute kidney injury | Cardiac surgery (CABG/valve replacements) | ||
|---|---|---|---|
| Diagnosis | Prescription | Diagnosis | Prescription |
| Acute kidney failure NOS, Acute kidny fail - tubr necr, Acute respiratry failure, Severe sepsis, Septic shock, Septicemia NOS | Vancomycin, Ciprofloxacin, Piperacillin-Tazobactam, CefePIME, Linezolid, Meropenem, Miconazole Powder, Nystatin Oral Suspension, Alteplase, Fluconazole, Loperamide HCl | Hypertension NOS, Crnry athrscl natve vssl, Hyperlipidemia NEC/NOS, Atrial fibrillation, DMII wo cmp nt st uncntr, Pure hypercholesterolem, Surg compl-heart, Aortic valve disorder | Phenylephrine HCl, Neostigmine, Aspirin EC, Ketorolac, Oxycodone-Acetaminophen, Ranitidine, Milk of Magnesia, Furosemide, Ibuprofen, TraMADOL (Ultram) |
|
|
| ||
| Anemia NOS, Ac posthemorrhag anemia, Chr blood loss anemia, Iron defic anemia NOS | Insulin, Metformin | Acute respiratry failure, Pulmonary insufficiency following trauma and surgery, Other pulmonary insuff, Acute & chronc resp fail | Albumin, PHENYLEPHrine, Dextrose 50%, Chlorhexidine Gluconate, Milrinone, Epinephrine |
|
|
| ||
| CHF NOS | Morphine Sulfate, Nitroprusside Sodium, Nitroglycerin, Aspirin EC, Sucralfate | Ventricular fibrillation, Cardiogenic shock, Parox ventric tachycard, Atriovent block complete, Cardiac arrest, AMI anterior wall - init | Acetaminophen IV, Fentanyl Citrate, Influenza Virus Vaccine, Morphine Sulfate, NORepinephrine, Glucagon, Readi-Cat 2, Midazolam, Omeprazole |
|
|
| ||
| Secondary malig neo bone, Secondary malig neo brain/spine, Secondary malig neo lung, Secondary malig neo liver, Neurohypophysis dis NEC | Propofol, Midazolam, Fentanyl Citrate, Dexmedetomidine HCl, Vecuronium Bromide | Alzheimer’s disease, Paralysis agitans, Dementia w/o behav dist, Mental disor NEC oth dis | Morphine Sulfate, Scopolamine Patch |
|
|
| ||
| Paralytic ileus, Digestive system complications not elsewhere classified, Acute pancreatitis, Cholangitis | Captopril, Metoprolol Tartrate | Alcohol dep NEC/NOS-unspec, Alcohol withdrawal, Alcohol dep NEC/NOS-contin, Bipolar disorder NOS | Hydromorphone, Diphenhydramine HCl, Morphine Sulfate, Prochlorperazine |
Figure 4Phenotype maps. Phenotypes are positioned according to frequency and mortality risk.
Patient’s mortality distribution.
| Phenotype | Membership | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| [0, 0.1) | [0.1, 0.2) | [0.2, 0.3) | [0.3, 0.4) | [0.4, 0.5) | [0.5, 0.6) | [0.6, 0.7) | [0.7, 0.8) | [0.8, 0.9) | [0.9, 1) | |
| Sepsis with acute kidney injury | 0.48 | 0.79 | 0.80 | 0.85 | 0.82 | 0.87 | 0.63 | 0.86 | — | — |
| Cardiac surgery | 0.58 | 0.39 | 0.25 | 0.18 | 0.08 | 0.05 | 0.04 | 0.04 | 0.04 | 0.05 |
| Anemia | 0.53 | 0.49 | 0.50 | 0.35 | 0.34 | 0.30 | 0.29 | 0.24 | 0.10 | 0.18 |
| Respiratory failure | 0.48 | 0.84 | 0.85 | 0.91 | 0.86 | 0.88 | 0.80 | 0.77 | 0.92 | 0.73 |
| Heart failure | 0.50 | 0.72 | 0.74 | 0.67 | 0.67 | 0.65 | 0.64 | 0.73 | 0.71 | 0.84 |
| Cardiac arrest | 0.51 | 0.83 | 0.76 | 0.84 | 0.85 | 0.91 | 1.00 | 0.83 | 0.88 | 1.00 |
| Metastatic cancer requiring ICU | 0.51 | 0.80 | 0.71 | 0.81 | 0.65 | 0.78 | 0.87 | 0.80 | 0.75 | 0.74 |
| End-stage dementia requiring ICU | 0.51 | 0.81 | 0.80 | 0.81 | 0.74 | 0.75 | 0.90 | 0.93 | 0.84 | 0.91 |
| Intraabdominal conditions | 0.52 | 0.52 | 0.39 | 0.45 | 0.38 | 0.33 | 0.17 | 0.27 | — | — |
| Alcohol abuse/withdrawal | 0.53 | 0.44 | 0.36 | 0.36 | 0.30 | 0.42 | 0.20 | 0.13 | 0.08 | 0.19 |
The distribution is computed as the number of patients who died/the total number of patients whose membership value is in the range. Empty values when the number of patients <10. Note that our dataset contained half patients who died and half patients who survived.
|
|
| 1: Randomly initialize |
| 2: |
| 3: |
| 4: Update |
| 5: |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |