| Literature DB >> 33976166 |
Ahsan Huda1, Adam Castaño1, Anindita Niyogi1, Jennifer Schumacher1, Michelle Stewart1, Marianna Bruno1, Mo Hu2, Faraz S Ahmad2, Rahul C Deo3, Sanjiv J Shah4.
Abstract
Transthyretin amyloid cardiomyopathy, an often unrecognized cause of heart failure, is now treatable with a transthyretin stabilizer. It is therefore important to identify at-risk patients who can undergo targeted testing for earlier diagnosis and treatment, prior to the development of irreversible heart failure. Here we show that a random forest machine learning model can identify potential wild-type transthyretin amyloid cardiomyopathy using medical claims data. We derive a machine learning model in 1071 cases and 1071 non-amyloid heart failure controls and validate the model in three nationally representative cohorts (9412 cases, 9412 matched controls), and a large, single-center electronic health record-based cohort (261 cases, 39393 controls). We show that the machine learning model performs well in identifying patients with cardiac amyloidosis in the derivation cohort and all four validation cohorts, thereby providing a systematic framework to increase the suspicion of transthyretin cardiac amyloidosis in patients with heart failure.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33976166 PMCID: PMC8113237 DOI: 10.1038/s41467-021-22876-9
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 17.694
Demographic and clinical characteristics of the five cohorts included in the study.
| Characteristic | Cohort 1: IQVIA (ATTR-CM) | Cohort 2: Optum (ATTR-CM) | Cohort 3: IQVIA (cardiac amyloid) | Cohort 4: Optum (cardiac amyloid) | Cohort 5: NMEDW* (cardiac amyloid) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| ATTRwt-CM + HF ( | Non-amyloid HF ( | ATTRwt-CM + HF ( | Non-amyloid HF ( | Cardiac amyloid + HF ( | Non-amyloid HF ( | Cardiac amyloid + HF ( | Non-amyloid HF ( | Cardiac amyloid + HF ( | Non-amyloid HF ( | |
| Age, years | 77 ± 7 | 77 ± 7 | 77 ± 8 | 74 ± 11 | 78 ± 8 | 77 ± 8 | 78 ± 9 | 77 ± 11 | 74 ± 9 | 72 ± 11 |
| Male | 84% | 84% | 82% | 81% | 67% | 67% | 68% | 69% | 69% | 56% |
| Ethnicitya | ||||||||||
| White | — | — | 66% | 80% | — | — | 72% | 80% | 54% | 71% |
| Black | — | — | 28% | 11% | — | — | 22% | 10% | 32% | 13% |
| Other | — | — | 6% | 9% | — | — | 6% | 10% | 14% | 16% |
| Location of final visit | ||||||||||
| Midwest | 28% | 20% | 47% | 48% | 24% | 21% | 49% | 49% | 100% | 100% |
| Northeast | 27% | 20% | 25% | 16% | 28% | 19% | 22% | 11% | 0% | 0% |
| South | 25% | 40% | 20% | 29% | 28% | 39% | 19% | 28% | 0% | 0% |
| West | 18% | 18% | 5% | 5% | 19% | 19% | 6% | 9% | 0% | 0% |
| Other/unknown | 1% | 2% | 3% | 2% | 1% | 1% | 3% | 4% | 0% | 0% |
| Comorbidities | ||||||||||
| Hypertension | 90% | 96% | 83% | 88% | 89% | 92% | 81% | 86% | 72% | 78% |
| Obesity | 42% | 48% | 48% | 47% | 37% | 42% | 34% | 39% | 21% | 25% |
| Diabetes | 42% | 62% | 35% | 52% | 45% | 55% | 39% | 50% | 38% | 36% |
| CAD | 64% | 73% | 54% | 68% | 60% | 65% | 59% | 65% | 56% | 54% |
| CKD | 61% | 44% | 58% | 47% | 52% | 42% | 56% | 43% | 47% | 29% |
| Atrial fibrillation | 72% | 52% | 65% | 52% | 64% | 50% | 61% | 50% | 56% | 41% |
| Diagnosis history duration, yearsb | 9.6 (6.2–10.4) | 9.7 (6.3–10.5) | 6.2 (2.8–9.8) | 6.4 (3.0–9.5) | 8.7 (6.0–10.2) | 8.7 (6.0–10.2) | 5.2 (2.5–8.3) | 5.4 (2.7–8.2) | 6.2 (1.2–15.7) | 4.6 (0.9–10.1) |
| Number of visitsb | 129 (82–210) | 131 (87–205) | 79 (34–187) | 76 (31–168) | 117 (63–193) | 112 (60–187) | 67 (25–141) | 66 (24–140) | 37 (9–99) | 21 (6–60) |
T-tests (for age), Wilcoxon rank-sum tests (for diagnosis history duration and number of visits), and χ2-tests (for categorical variables) were used to compare groups. P-values are two-sided. Adjustments were not made for multiple comparisons.
ATTRwt-CM wild-type amyloidogenic transthyretin cardiomyopathy, CAD coronary artery disease, CKD chronic kidney disease, NMEDW Northwestern Memorial Enterprise Data Warehouse.
aEthnicity data were not available in the IQVIA data.
bMedian (25th–75th percentile).
*P < 0.001 for age, sex, ethnicity, CKD, AF, diagnosis history, and number of visits, and P = 0.015 for hypertension for differences between cardiac amyloid and non-amyloid HF in the NMEDW cohort.
Validation of the machine learning model in four cohorts derived from medical claims and electronic health records.
| Metric | Validation cohort | |||
|---|---|---|---|---|
| Cohort 1: IQVIA holdout (ATTR-CM) | Cohort 2: Optum (ATTR-CM) | Cohort 3: IQVIA (cardiac amyloid) | Cohort 4: Optum (cardiac amyloid) | |
| Sensitivity, % | 87 | 90 | 56 | 61 |
| Specificity, % | 87 | 79 | 83 | 81 |
| PPV, % | 88 | 81 | 76 | 76 |
| NPV, % | 86 | 89 | 65 | 67 |
| Accuracy, % | 87 | 84 | 69 | 71 |
| ROC AUC | 0.93 | 0.95 | 0.76 | 0.78 |
ATTR-CM amyloidogenic transthyretin cardiomyopathy, NPV negative predictive value, PPV positive predictive value, ROC AUC receiver operating characteristic area under the curve.
Fig. 1Receiver operating characteristic curves for the Random Forest machine learning model in the four validation cohorts.
a Optum ATTR-CM validation cohort. b IQIVA cardiac amyloidosis validation cohort. c Optum cardiac amyloidosis cohort. d Northwestern Medicine Enterprise Data Warehouse validation cohort. AUROC, area under the receiver operating characteristic curve; ATTR-CM, amyloidogenic transthyretin cardiomyopathy; NMEDW, Northwestern Medicine Enterprise Data Warehouse.
Prediction of cardiac amyloidosis in the Northwestern Medicine Enterprise Data Warehouse Heart Failure Cohort using the wild-type ATTR-CM Random Forest prediction model.
| Metric | Probability cutoff for the diagnosis of ATTR-CM | |||||
|---|---|---|---|---|---|---|
| >0.50 | >0.55 | >0.60 | >0.65 | >0.70 | >0.75 | |
| Sensitivity, % | 69.7 | 64.0 | 52.5 | 36.8 | 22.2 | 11.1 |
| Specificity, % | 75.6 | 84.5 | 91.0 | 95.5 | 98.0 | 99.3 |
| PPV, % | 1.9 | 2.7 | 3.7 | 5.2 | 6.8 | 9.6 |
| NPV, % | 99.7 | 99.7 | 99.7 | 99.6 | 99.5 | 99.4 |
| Accuracy, % | 75.5 | 84.4 | 90.8 | 95.2 | 97.5 | 98.7 |
| LR+ | 2.86 | 4.12 | 5.85 | 8.24 | 11.07 | 15.97 |
| LR− | 0.40 | 0.43 | 0.52 | 0.66 | 0.79 | 0.90 |
ATTR-CM amyloidogenic transthyretin cardiomyopathy, LR+ positive likelihood ratio, LR− negative likelihood ratio, NPV negative predictive value, PPV positive predictive value.
Areas under the receiver operating characteristic curve for various prediction models in the Northwestern Medicine Enterprise Data Warehouse Heart Failure Cohort.
| Model | AUROC | |
|---|---|---|
| ATTRwt-CM RF model | 39,654 | 0.80 |
| ATTRwt-CM RF model, age > 70 years | 23,570 | 0.82 |
| Age only | 39,624 | 0.54 |
| Age + sex | 39,618 | 0.62 |
| Age + sex + ethnicitya | 39,203 | 0.70 |
| Age + sex + ethnicity + logBNPb | 20,419 | 0.73 |
| Age + sex + ethnicity + logBNP + abnormal troponin-Ic | 15,046 | 0.73 |
| ATTRwt-CM RF model + age + sex + ethnicity | 39,203 | 0.83 |
| ATTRwt-CM RF model + age + sex + ethnicity + total number of encounters | 38,337 | 0.83 |
ATTRwt-CM amyloidogenic transthyretin (wild-type), AUROC area under the receiver operating characteristic curve, RF Random Forest, BNP B-type natriuretic peptide.
aEthnicity categories: non-Hispanic White, non-Hispanic Black, Hispanic, Asian, others.
bHighest BNP value in the electronic health record, log-transformed.
cBased on the highest troponin-I in the electronic health record (abnormal defined as >0.04 ng/ml).
Post-test probabilities for the Random Forest ATTR-CM and cardiac amyloid Random Forest models based on model performance in the Northwestern Medicine Enterprise Data Warehouse Heart Failure Cohort.
| Model | Pre-test probability of ATTR-CMa | Random Forest model output cutoff for the diagnosis of ATTR-CM | LR+ | LR− | Post-test probability, LR+ | Post-test probability, LR− |
|---|---|---|---|---|---|---|
| Random Forest ATTR-CM model | 4% | >0.50 | 2.86 | 0.40 | 10.7% | 1.7% |
| 4% | >0.55 | 4.12 | 0.43 | 14.8% | 1.8% | |
| 4% | >0.60 | 5.85 | 0.52 | 19.7% | 2.1% | |
| 4% | >0.65 | 8.24 | 0.66 | 25.7% | 2.7% | |
| 4% | >0.70 | 11.07 | 0.79 | 31.7% | 3.2% | |
| 4% | >0.75 | 15.97 | 0.90 | 40.1% | 3.6% | |
| Random Forest cardiac amyloid model | 4% | >0.50 | 4.38 | 0.43 | 15.5% | 1.8% |
| 4% | >0.55 | 7.13 | 0.53 | 23.0% | 2.2% | |
| 4% | >0.60 | 12.37 | 0.66 | 34.2% | 2.7% | |
| 4% | >0.65 | 21.78 | 0.79 | 47.8% | 3.2% | |
| 4% | >0.70 | 39.37 | 0.89 | 62.3% | 3.6% | |
| 4% | >0.75 | 72.18 | 0.96 | 75.2% | 3.9% |
The random forest ATTR-CM model was derived using diagnosis codes specifically for wild-type ATTR-CM. The random forest cardiac amyloid model was derived using the more nonspecific umbrella diagnosis code for cardiac amyloidosis.
ATTR-CM amyloidogenic transthyretin cardiomyopathy, LR+ positive likelihood ratio, LR− negative likelihood ratio.
aPre-test probability was estimated to be 4% based on a prior publication (Kazi et al.[20]) that modeled the estimated prevalence of ATTR-CM in heart failure patients.
Fig. 2Odds ratio vs. prevalence for top clinical phenotypes predictive of wild-type ATTR cardiomyopathy.
a Cardiac phenotypes associated with wild-type ATTR cardiomyopathy. b Non-cardiac phenotypes associated with wild-type ATTR cardiomyopathy. All features associated with the diagnosis of ATTR cardiomyopathy at a significance level of P < 10−4, which had an odds ratio (OR) < 10, were included in the graphs. The three features that had an OR > 10 that met the p-value threshold were: hypertrophic cardiomyopathy (OR 15.8, prevalence 11%); localized adiposity (OR 26.6, prevalence 2%); and organ transplantation (OR 23.4, prevalence 4%). Some diagnoses that were associated with ATTR cardiomyopathy (e.g., hypertrophic cardiomyopathy, multiple myeloma) were likely initial misdiagnoses, as these diagnoses (similar to all diagnoses included here) preceded the ATTR cardiomyopathy diagnosis. Univariate logistic regression was used to calculate odds ratios. *Localized to the connective tissue or soft tissue. AV, atrioventricular; ECG, electrocardiogram; HFrE, heart failure with reduced ejection fraction; HFpEF, heart failure with preserved ejection fraction; SVT, supraventricular tachycardia; VT, ventricular tachycardia.
Top 21 combinations of phenotypes based on ICD codes and their association with wild-type ATTR cardiomyopathy (IQVIA dataset, Cohort 1).
| Combination of phenotypesa | Prevalence in ATTR-CM | Prevalence in controls | Number of phenotypes in addition to HF | Odds ratio (95% CI) | |
|---|---|---|---|---|---|
| Combined systolic and diastolic HF, HFpEF | 52.1% | 17.3% | 2 | 5.2 (4.2–6.3) | 2.02 × 10−65 |
| Carpal tunnel syndrome | 31.9% | 7.8% | 1 | 5.5 (4.2–7.1) | 4.62 × 10−46 |
| AF, joint disorders, HFpEF | 29.7% | 7.0% | 3 | 5.6 (4.2–7.4) | 1.11 × 10−43 |
| Heart block, cardiomegaly, HFpEF | 28.7% | 6.4% | 3 | 5.8 (4.4–7.7) | 1.54 × 10−43 |
| Cardiomegaly, joint disorders, HFpEF | 28.7% | 6.5% | 3 | 5.7 (4.3–7.6) | 4.70 × 10−43 |
| Heart block, CKD, HFpEF | 26.6% | 6.6% | 3 | 5.1 (3.8–6.8) | 9.97 × 10−37 |
| AF, cardiomegaly, soft tissue disorders, HFpEF | 24.1% | 5.7% | 4 | 5.2 (3.9–7.1) | 2.95 × 10−34 |
| Heart block, soft tissue disorders, HFpEF | 23.7% | 5.7% | 3 | 5.1 (3.8–7.0) | 3.14 × 10−33 |
| AF, cardiomegaly, joint disorders, combined systolic and diastolic HF | 23.2% | 5.2% | 4 | 5.4 (4.0–7.5) | 2.99 × 10−34 |
| Heart block, cardiomegaly, joint disorders | 22.0% | 5.1% | 3 | 5.2 (3.8–7.2) | 2.28 × 10−31 |
| Heart block, joint disorders, combined systolic and diastolic HF | 21.8% | 4.7% | 3 | 5.6 (4.1–7.9) | 5.88 × 10−33 |
| Cardiomegaly, joint disorders, soft tissue disorders, combined systolic and diastolic HF | 21.5% | 5.0% | 4 | 5.1 (3.7–7.1) | 2.63 × 10−30 |
| Joint disorders, osteoarthrosis, pleurisy or pleural effusion, combined systolic and diastolic HF | 18.6% | 4.2% | 4 | 5.2 (3.7–7.4) | 6.63 × 10−27 |
| AF, joint disorders, pleurisy or pleural effusion, combined systolic and diastolic HF | 18.3% | 3.9% | 4 | 5.4 (3.8−7.9) | 1.42 × 10−27 |
| Joint disorders, osteoarthrosis, pleurisy or pleural effusion, HFpEF | 18.1% | 3.6% | 4 | 5.8 (4.0–8.5) | 1.51 × 10−28 |
| AF, CKD, pleurisy or pleural effusion, soft tissue disorders, combined systolic and diastolic HF | 16.0% | 3.5% | 5 | 5.1 (3.5–7.6) | 2.96 × 10−23 |
| AF, heart block, CKD, pleurisy or pleural effusion, combined systolic and diastolic HF | 15.3% | 3.3% | 5 | 5.3 (3.6–8.0) | 5.96 × 10−23 |
| AF, heart block, CKD, soft tissue disorders, combined systolic and diastolic HF | 15.1% | 3.4% | 5 | 5.1 (3.5–7.6) | 5.44 × 10−22 |
| AF, heart block, joint disorders, osteoarthrosis, soft tissue disorders | 14.5% | 3.2% | 5 | 5.1 (3.5–7.7) | 3.19 × 10−21 |
| AF, CKD, pleurisy or pleural effusion, soft tissue disorders, HFpEF | 14.4% | 3.2% | 5 | 5.1 (3.4–7.7) | 5.57 × 10−21 |
| Heart block, CKD, pleurisy or pleural effusion, soft tissue disorders, combined systolic and diastolic HF | 11.8% | 2.5% | 5 | 5.1 (3.3–8.1) | 1.62 × 10−17 |
Univariate logistic regression was used to calculate odds ratios. P-values are two-sided. Adjustments were not made for multiple comparisons.
AF atrial fibrillation (includes atrial flutter), CKD chronic kidney disease, HF heart failure, HFpEF heart failure with preserved ejection fraction.
aOne or more of these combinations were present in 876 (82%) of the IQVIA Cohort 1 patients with wild-type ATTR-CM.
Fig. 3Time course of non-cardiac and cardiac phenotypes associated with wild-type ATTR cardiomyopathy vs. non-amyloid heart failure prior to the diagnosis of heart failure.
The proportion of patients at each time point (years before heart failure diagnosis) with a first diagnosis of an associated feature (phenotype). The cumulative proportion of patients with each particular phenotype is equal to the sum of the proportions from each of the years preceding the heart failure diagnosis. ATTR-CM, amyloidogenic transthyretin cardiomyopathy.
Fig. 4Development and validation of a machine learning model of medical claims data for the systematic identification of wild-type transthyretin amyloid cardiomyopathy.
Nationally representative medical claims data were used to develop a cohort of ATTR-CM and non-amyloid HF controls. ICD codes were extracted and used as features to train a Random Forest machine learning model, which was then internally tested in the derivation cohort. The model was then validated in four external cohorts, one of which was a single health system that is similar to how the model would be used in the clinical setting. The top features (ICD codes) based on variable importance in the Random Forest model were also used to generate phenotypes and phenotype combinations associated with the ATTR-CM diagnosis, which provide clinical insight and clues into the diagnosis. In the future, additional prospective clinical validation with blood tests, echocardiography (with speckle-tracking strain analysis), and bone scintigraphy can be used to verify the ATTR-CM diagnosis with the ultimate goal to automate the identification of ATTR-CM, thereby leading to earlier diagnosis and intervention of these high-risk patients. ATTR-CM, amyloidogenic transthyretin cardiomyopathy.