| Literature DB >> 33288774 |
Tasha Nagamine1, Brian Gillette2,3, Alexey Pakhomov1, John Kahoun1,4, Hannah Mayer5, Rolf Burghaus5, Jörg Lippert5, Mayur Saxena6.
Abstract
As a leading cause of death and morbidity, heart failure (HF) is responsible for a large portion of healthcare and disability costs worldwide. Current approaches to define specific HF subpopulations may fail to account for the diversity of etiologies, comorbidities, and factors driving disease progression, and therefore have limited value for clinical decision making and development of novel therapies. Here we present a novel and data-driven approach to understand and characterize the real-world manifestation of HF by clustering disease and symptom-related clinical concepts (complaints) captured from unstructured electronic health record clinical notes. We used natural language processing to construct vectorized representations of patient complaints followed by clustering to group HF patients by similarity of complaint vectors. We then identified complaints that were significantly enriched within each cluster using statistical testing. Breaking the HF population into groups of similar patients revealed a clinically interpretable hierarchy of subgroups characterized by similar HF manifestation. Importantly, our methodology revealed well-known etiologies, risk factors, and comorbid conditions of HF (including ischemic heart disease, aortic valve disease, atrial fibrillation, congenital heart disease, various cardiomyopathies, obesity, hypertension, diabetes, and chronic kidney disease) and yielded additional insights into the details of each HF subgroup's clinical manifestation of HF. Our approach is entirely hypothesis free and can therefore be readily applied for discovery of novel insights in alternative diseases or patient populations.Entities:
Mesh:
Year: 2020 PMID: 33288774 PMCID: PMC7721729 DOI: 10.1038/s41598-020-77286-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Workflow diagram depicting cohort definition and vectorization of EHRs. (A) The final cohort consists of 25,952 individuals with heart failure. An NER system was used to extract condition and symptom mentions in the clinical notes. These were aggregated over the entire timeline of each patient. The resultant “corpus” (medical concept counts for each patient) was then transformed using TF-IDF to obtain a vector space representation of patient EHRs. (B) Schematic of patient EHR clustering methodology. The patient-feature matrix derived from patients’ clinical notes in (A) was used to fit K-means clustering models for values of K in [2, 3, …, 30]. Examples of clustering results are shown at right for K in [2, 3], with cluster assignment highlighted with colored overlays.
Baseline characteristics of the heart failure cohort.
| Patients in cohort | 25,952 (100%) |
| Unique patients with Hypertensive heart disease with heart failure ICD-10 inclusion criteria (I11.0, I13.0, I13.2) | 3,527 (13.59%) |
| Unique patients with Cardiomyopathy ICD-10 inclusion criteria (I42) | 6,811 (26.24%) |
| Unique patients with Heart failure ICD-10 inclusion criteria (I50) | 20,534 (79.12%) |
| Total number of medical concept mentions | 12,490,330 |
| Number of unique medical concepts | 1,276 |
| Congestive heart failure | 25,870 (99.68%) |
| Cardiomyopathy | 9,041 (34.84%) |
| Hypertension | 21,933 (84.51%) |
| Ischemic heart disease | 21,358 (82.30%) |
| Cerebral ischemia | 14,633 (56.38%) |
| Cardiac valve disease | 19,960 (76.91%) |
| Atrial fibrillation and flutter | 8,872 (34.19%) |
| Chronic obstructive pulmonary disease | 5,855 (22.56%) |
| Obesity | 9,351 (36.03%) |
| Hyperlipidemia | 14,139 (54.48%) |
| Type 2 diabetes | 5,746 (22.14%) |
| Chronic kidney disease | 3,249 (12.52%) |
| N females | 13,220 (42.60%) |
| Age of males (years), median (25, 75 quartile) | 58 (48,67) |
| Age of females (years), median (25, 75 quartile) | 63 (48, 72) |
| Age of males, 18 + (years), median (25, 75 quartile) | 60 (52, 68) |
| Age of females, 18 + (years), median (25, 75 quartile) | 65 (55, 73) |
| Timeline length (months), median (25, 75 quartile) | 4 (1, 28) |
| BMI, median (25, 75 quartile) | 27.34 (23.45, 31.23) |
| Median number of concepts/patient (25, 75 quartile) | 253 (103, 513) |
| Median number of concepts/patient (unique) (25, 75 quartile) | 72 (43, 107) |
Cluster characteristics for K = 15. Top ten most significant concepts for each phenotype, ranked by p value (smallest to largest).
| Cluster name | Top ten significant concepts | Descriptive statistics | ICD-10 |
|---|---|---|---|
| Cardiac surgery (male) | Cardiac index (90.6%) Peristalsis (95.3%) Cerebrovascular Disorders (72.6%) Color of urine (98.2%) Postpericardiotomy Syndrome (58.6%) Central venous pressure finding (91.4%) Coronary Artery Disease (61.7%) Effusion (76.9%) Cardiac activity (79.3%) Pulmonary artery pressure (78.5%) | N = 2633 20.6% female Age: 63.3 years (57.3, 68.8) BMI: 28.0 (25.5, 31.1) Mortality: 1.21% | I50: Heart failure (98.3%) I20: Angina pectoris (97.0%) I25: Chronic ischemic heart disease (61.1%) I60–I69: Cerebrovascular diseases (45.9%) K20–K31: Diseases of esophagus, stomach and duodenum (23.0%) I70–I79: Diseases of arteries, arterioles and capillaries (21.8%) E08–E13: Diabetes mellitus (13.9%) I21: Acute myocardial infarction (7.40%) |
| History of myocardial infarction (male) | Hypertensive disease (97.3%) Angina Pectoris (98.3%) Stenosis (87.2%) Myocardial Infarction (78.2%) Coronary heart disease (99.8%) Myocardial Ischemia (96.2%) Systemic arterial pressure (74.3%) Heart failure (98.1%) Atherosclerosis (76.3%) Chronic gastritis (71.3%) | N = 2633 30.1% female Age: 62.5 years (56.1, 68.4) BMI: 28.9 (25.8, 32.2) Mortality: 0.30% | I50: Heart failure (96.8%) I20: Angina pectoris (91.8%) I25: Chronic ischemic heart disease (61.9%) K20-K31: Diseases of esophagus, stomach and duodenum (42.5%) I70-I79: Diseases of arteries, arterioles and capillaries (20.6%) E08–E13: Diabetes mellitus (16.6%) |
| Acute myocardial infarction (male) | Acute myocardial infarction (91.6%) Myocardial Infarction (99.1%) Acute Coronary Syndrome (82.1%) Coronary heart disease (99.0%) Myocardial Ischemia (96.9%) Infarction (71.8%) Sinus rhythm (95.4%) Akinesia (74.7%) Stenosis (87.1%) Systemic arterial pressure (83.5%) | N = 1227 26.6% female Age: 61.5 years (53.4, 69.3) BMI: 27.6 (24.8, 31.1) Mortality: 3.01% | I50: Heart failure (98.2%) I21: Acute myocardial infarction (91.0%) I25: Chronic ischemic heart disease (45.9%) I20: Angina pectoris (37.3%) K20–K31: Diseases of esophagus, stomach and duodenum (29.9%) I22: Subsequent ST elevation (STEMI) and non-ST elevation (NSTEMI) myocardial infarction (16.2%) |
| Unstable angina (male) | Unstable Angina (99.2%) Angina Pectoris (96.1%) Myocardial Ischemia (95.3%) Coronary heart disease (98.6%) Acute Coronary Syndrome (71.0%) Progressive Angina (49.8%) Sinus rhythm (91.3%) Hepatitis B (54.8%) Stenosis (80.1%) Pain (86.6%) | N = 1382 36.4% female Age: 64.5 years (57.2, 72.7) BMI: 28.4 (25.2, 32.3) Mortality: 0.50% | I20: Angina pectoris (98.6%) I50: Heart failure (96.4%) I25: Chronic ischemic heart disease (43.4%) K20–K31: Diseases of esophagus, stomach and duodenum (28.5%) I10–I16: Hypertensive diseases (16.4%) E08–E13: Diabetes mellitus (15.6%) I21: Acute myocardial infarction (14.5%) |
| Congenital heart defects | Congenital Heart Defects (89.0%) Congenital heart disease (99.6%) Congenital Abnormality (96.4%) Birth (89.6%) Pregnancy (85.4%) Air Embolism (65.3%) Atrial Septal Defects (66.2%) Respiration Disorders (56.5%) Childbirth (61.1%) Systolic Murmurs (66.2%) | N = 1599 54.2% female Age: 2.60 years (0.86, 8.56) BMI: 15.7 (14.4, 18.0) Mortality: 0.56% | Q20–Q28: Congenital malformations of the circulatory system (98.4%) I50: Heart failure (97.1%) G96: Other disorders of central nervous system (16.6%) Q90: Down syndrome (7.00%) K20–K31: Diseases of esophagus, stomach and duodenum (6.50%) E40–E46: Malnutrition (6.37%) |
| NICU | Diuresis (99.0%) Congenital heart disease (97.7%) Birth (98.6%) Congenital Abnormality (94.2%) Newborn (87.0%) Systolic Murmurs (92.0%) Childbirth (87.9%) Wheezing (92.2%) Surgical wound (76.7%) Pregnancy (91.2) | N = 803 41.7% female Age: 0.20 years (0.09, 0.98) BMI: 13.4 (12.1, 15.2) Mortality: 16.6% | I50: Heart failure (98.3%) Q20–Q28: Congenital malformations of the circulatory system (98.2%) G96: Other disorders of central nervous system (32.2%) P50–P61: Hemorrhagic and hematological disorders of newborn (30.5%) P91: Other disturbances of cerebral status of newborn (29.8%) G93: Other disorders of brain (29.2%) |
| Atrial fibrillation | Atrial Fibrillation (95.7%) Atrial fibrillation and flutter (68.7%) Paroxysmal atrial fibrillation (60.4%) Premature ventricular contractions (87.9%) Atrial Flutter (53.1%) Cardiac Arrhythmia (85.2%) Premature Cardiac Complex (71.8%) Dyspnea (93.1%) Supraventricular arrhythmia (54.6%) Persistent atrial fibrillation (37.9%) | N = 1765 43.2% female Age: 67.1 years (59.3, 75.5) BMI: 29.3 (25.7, 33.4) Mortality: 0.39% | I48: Atrial fibrillation and flutter (68.9%) I50: Heart failure (67.9%) I25: Chronic ischemic heart disease (39.5%) I20: Angina pectoris (36.1%) I10–I16: Hypertensive diseases (31.0%) I42: Cardiomyopathy (23.1%) |
| Decompensated CHF (male) | Decompensation (63.3%) Pulmonary Hypertension (78.8%) Swelling (80.4%) Pulmonary Embolism (53.2%) Cardiac asthma (49.3%) Hydrothorax (52.4%) Diuresis (87.2%) Ascites (45.3%) Thromboembolism (43.9%) Pulmonary Thromboembolisms (44.2%) | N = 2158 31.1% female Age: 62.8 years (53.2, 71.0) BMI: 27.1 (23.5, 31.1) Mortality: 14.5% | I50: Heart failure (88.7%) I25: Chronic ischemic heart disease (41.8%) I20: Angina pectoris (33.0%) I42: Cardiomyopathy (26.2%) I47: Paroxysmal tachycardia (22.2%) I48: Atrial fibrillation and flutter (17.8%) |
| Dilated cardiomyopathy (male) | Dilated Cardiomyopathy (97.2%) Chronic heart failure (97.1%) Cardiomyopathies (81.4%) Dyspnea (90.6%) Hypokinesia (73.9%) Mitral Valve Insufficiency (82.8%) Tricuspid Valve Insufficiency (76.0%) Cardiomegaly (36.4%) Myocarditis (30.3%) Ventricular Tachycardia (37.7%) | N = 1597 21.6% female Age: 54.5 years (45.9, 62.3) BMI: 28.0 (25.1, 31.5) Mortality: 0.62% | I42: Cardiomyopathy (93.9%) I50: Heart failure (24.8%) I47: Paroxysmal tachycardia (21.7%) I25: Chronic ischemic heart disease (16.8%) I48: Atrial fibrillation and flutter (11.1%) I20: Angina pectoris (10.7%) |
| Aortic valve disease | Calcinosis (80.3%) Heart Neoplasm (76.1%) Aortic Valve Stenosis (74.6%) Aortic Valve Insufficiency (88.9%) Heart valve disease (76.7%) Color of urine (85.5%) Cardiac index (74.1%) Central venous pressure finding (76.1%) Cardiac activity (71.3%) Blood flow (96.4%) | N = 1857 49.5% female Age: 66.4 years (57.7, 74.3) BMI: 27.1 (24.2, 30.5) Mortality: 3.50% | I50: Heart failure (93.7%) I35: Nonrheumatic aortic valve disorders (64.0%) I60–I69: Cerebrovascular diseases (41.0%) I20: Angina pectoris (27.4%) I05–I09: Chronic rheumatic heart diseases (26.3%) K20–K31: Diseases of esophagus, stomach and duodenum (23.1%) |
| Hypertensive heart disease (female) | Heart Diseases (95.8%) Hypertensive disease (97.2%) Heart failure (98.8%) Hyperlipidemia (43.1%) Increase in blood pressure (35.9%) Obesity (46.4%) Menopause present (33.2%) Lipid Metabolism Disorders (25.8%) Gynecological history (24.3%) Vertebrobasilar Insufficiency (22.1%) | N = 1727 72.3% female Age: 63.5 years (55.2, 71.7) BMI: 30.1 (27.4, 35.1) Mortality: 0.05% | I10–I16: Hypertensive diseases (97.5%) I20: Angina pectoris (8.80%) I25: Chronic ischemic heart disease (5.90%) |
| Cerebrovascular disease | Encephalopathies (67.2%) Dysarthria (52.5%) Gagging (50.8%) Corneal Reflexes (40.0%) Nystagmus (34.7%) On examination—pupil reaction to light (34.7%) Dysphonia (32.6%) Deglutition Disorders (31.5%) Cataract (32.5%) Headache (45.6%) | N = 2348 58.2% female Age: 70.1 years (62.4, 77.6) BMI: 29.0 (25.5, 32.9) Mortality: 0.89% | I50: Heart failure (59.1%) I10–I16: Hypertensive diseases (55.5%) I20: Angina pectoris (51.5%) I60–I69: Cerebrovascular diseases (45.7%) I25: Chronic ischemic heart disease (44.2%) E08–E13: Diabetes mellitus (27.1%) |
| Hypertrophic cardiomyopathy | Hypertrophic Cardiomyopathy (100%) Left Ventricular Hypertrophy (73.2%) Hypertrophy (45.5%) Hypertrophic cardiomyopathy without obstruction (27.1%) Mitral Valve Insufficiency (77.1%) Diastolic dysfunction (55.3%) Pulmonary Valve Insufficiency (48.5%) Asymmetric hypertrophy (13.5%) Heart murmur (34.0%) Tricuspid Valve Insufficiency (67.9%) | N = 1159 53.9% female Age = 56.9 years (46.7, 65.8) BMI: 28.7 (26.3, 32.5) Mortality: 0.34% | I42: Cardiomyopathy (98.0%) I10–I16: Hypertensive diseases (13.4%) I20: Angina pectoris (13.2%) I50: Heart failure (8.54%) I25: Chronic ischemic heart disease (6.12%) I47: Paroxysmal tachycardia (5.78%) |
| Isolated cardiomyopathy (female) | Cardiomyopathies (94.5%) Osteochondrosis (34.0%) Palpitations (27.5%) Gynecological history (15.3%) Autoimmune thyroiditis (20.3%) Dystrophy (11.3%) Dystonia Disorders (8.92%) Vertebrobasilar Insufficiency (19.2%) Nodular Goiter (20.3%) Unspecified Abortion (17.7%) | N = 2319 69.3% female Age: 46.7 years (33.9, 55.8) BMI: 25.4 (21.9, 29.6) Mortality: 0.12% | I42: Cardiomyopathy (95.2%) E00–E07: Disorders of thyroid gland (10.6%) I10–I16: Hypertensive diseases (9.27%) I49: Other cardiac arrhythmias (6.12%) |
| Pediatric cardiomyopathy | Birth (86.1%) Cardiac Arrhythmia (94.7%) Pregnancy (85.2%) Childbirth (71.6%) Cardiomyopathies (89.1%) Myocarditis (69.6%) Endocarditis (57.3%) Pericarditis (57.5%) Myocardial dysfunction (54.7%) Viral respiratory infection (66.1%) | N = 745 50.2% female Age: 14.2 years (6.90, 20.5) BMI: 18.8 (15.7, 22.6) Mortality: 0.53% | I42: Cardiomyopathy (69.9%) I50: Heart failure (63.0%) I49: Other cardiac arrhythmias (29.2%) I47: Paroxysmal tachycardia (25.2%) Q20–Q28: Congenital malformations of the circulatory system (24.5%) O99: Other maternal diseases classifiable elsewhere but complicating pregnancy, childbirth and the puerperium (14.7%) |
Significance was determined using a one-sided (greater) t-test with Bonferroni correction testing the null hypothesis that the distribution of values of TF-IDF features for a medical entity in cluster i are drawn from the same distribution as the same entity in all other clusters. At right are shown characteristics of heart failure phenotypes, including number of patients, age, sex breakdown, and body mass index (BMI). The “Mortality” statistic denotes the percentage of patients in the cluster that expired within the hospital, as recorded in their EHRs. The “ICD-10” column shows the six most frequent ICD-10 codes and/or groups of codes with more than 5% incidence within the cluster.
Figure 2Exemplary 2D visualization of the relative distances between all patients EHRs in the heart failure cohort using t-SNE. Colors show cluster assignment using K-means clustering (K = 15). Each cluster is shown with an interpretable name defining the heart failure phenotype.
Figure 3A data-driven hierarchy of HF classification. (A) Dendrogram showing hierarchical relationship between cluster phenotypes at different values of K. The dissimilarity metric used to construct distances between clusters at different levels of hierarchy was 1 – J(C, C), where J(C, C) is the Jaccard index between cluster assignment for cluster i in for K = K1 (e.g., 15) and cluster j for K = K2 (e.g., 8). Branches are labeled using a clinical interpretation of the hierarchical structure of the clusters (see discussion). (B) t-SNE plots showing cluster assignment for K in [2, 4, 8, 15], which are marked with black arrows in (A).
Exemplary association scores between pairs of medical concepts that co-occur within cluster phenotypes from Table 2.
| Cluster name | Term 1 | Term 2 | Association score: HF cohort | Association score: PubMed |
|---|---|---|---|---|
| Atrial fibrillation | Atrial fibrillation | Cardiac arrhythmia | 0.356738 | 0.238322 |
| Atrial fibrillation | Atrial fibrillation | Varicosity | 0.428386 | 0.000394 |
| Atrial fibrillation | Mitral valve insufficiency | Subclinical hypothyroidism | 0.130795 | 0 |
| Decompensated CHF | Pulmonary embolism | Thrombus | 0.301952 | 0.150213 |
| Decompensated CHF | Decompensation | Lymphadenopathy | 0.246885 | 0.000425 |
| Decompensated CHF | Lymphadenopathy | Mitral valve insufficiency | 0.165153 | 0.000082 |
| Hypertensive heart disease | Heart diseases | Heart failure | 0.292404 | 0.154376 |
| Hypertensive heart disease | Hyperlipidemia | Obesity | 0.226397 | 0.026089 |
| Hypertensive heart disease | Obesity | Vertebrobasilar insufficiency | 0.134699 | 0.000024 |
| Isolated cardiomyopathy | Cardiomyopathies | Myocarditis | 0.222379 | 0.154172 |
| Isolated cardiomyopathy | Autoimmune thyroiditis | Cardiomyopathies | 0.125604 | 0.000934 |
| Isolated cardiomyopathy | Cardiomyopathies | Osteochondrosis | 0.146162 | 0.000031 |
Comparable association scores within the HF cohort and the scientific literature (PubMed) indicate that co-occurrences are already known. Significantly higher association scores in the HF cohort indicate potentially novel associations.