| Literature DB >> 32515746 |
Shannon Wongvibulsin1, Katherine C Wu2, Scott L Zeger3.
Abstract
BACKGROUND: Despite the promise of machine learning (ML) to inform individualized medical care, the clinical utility of ML in medicine has been limited by the minimal interpretability and black box nature of these algorithms.Entities:
Keywords: clinical translation; interpretability; machine learning; prediction models; visualization
Year: 2020 PMID: 32515746 PMCID: PMC7312245 DOI: 10.2196/15791
Source DB: PubMed Journal: JMIR Med Inform
Patient characteristics in the Left Ventricular Structural Predictors of Sudden Cardiac Death Registry (N=382).
| Variables | No. of SCDa event (n=307) | Patient with SCD event (n=75) | ||||||
|
| ||||||||
|
| Age (years), mean (SD) | 57 (13) | 57 (12) | .75 | ||||
|
| Male, n (%) | 211 (68.7) | 63 (84) | . | ||||
|
|
|
| ||||||
|
|
| White | 200 (65.1) | 51 (68) |
| |||
|
|
| African American | 99 (32) | 21 (28) |
| |||
|
|
| Other | 8 (3) | 3 (4) |
| |||
|
| Body surface area (m2), mean (SD) | 1.98 (0.28) | 2.05 (0.28) | .07 | ||||
|
| Ischemic cardiomyopathy etiology, n (%) | 149 (48.5) | 44 (59) | .15 | ||||
|
| Years from incident MIc or cardiomyopathy diagnosis, mean (SD) | 3.83 (5.18) | 5.43 (5.61) |
| ||||
|
|
|
| ||||||
|
|
| I | 64 (21) | 20 (27) |
| |||
|
|
| II | 137 (44.6) | 31 (41) |
| |||
|
|
| III | 106 (34.5) | 24 (32) |
| |||
|
| One or more heart failure hospitalizations, n (%) | 0 (0) | 19 (25.3) |
| ||||
|
| ||||||||
|
| Hypertension | 180 (58.6) | 44 (59) | >.99 | ||||
|
| Hypercholesterolemia | 180 (58.6) | 45 (60) | .93 | ||||
|
| Diabetes | 85 (28) | 19 (25) | .79 | ||||
|
| Nicotine use | 133 (43.3) | 44 (59) |
| ||||
|
| ||||||||
|
| ACEe-inhibitor or ARBf | 275 (89.6) | 66 (88) | .85 | ||||
|
| Beta-blocker | 288 (93.8) | 68 (91) | .48 | ||||
|
| Lipid-lowering | 199 (64.8) | 56 (75) | .14 | ||||
|
| Antiarrhythmics (amiodarone) | 18 (6) | 8 (11) | .22 | ||||
|
| Diuretics | 173 (56.4) | 54 (72) |
| ||||
|
| Digoxin | 50 (16) | 16 (21) | .39 | ||||
|
| Aldosterone inhibitor | 80 (26) | 21 (28) | .85 | ||||
|
| Aspirin | 215 (70.0) | 55 (73) | .67 | ||||
|
| ||||||||
|
| Prior atrial fibrillation, n (%) | 51 (17) | 14 (19) | .80 | ||||
|
| Ventricular rate (bpm), mean (SD) | 73 (14) | 70 (14) | .06 | ||||
|
| QRS duration (ms), mean (SD) | 118 (31) | 122 (27) | .30 | ||||
|
| Presence of LBBBg, n (%) | 79 (26) | 14 (19) | .26 | ||||
|
| Biventricular ICDh, n (%) | 90 (29) | 17 (23) | .31 | ||||
|
| ||||||||
|
| Sodium (mEq/L), mean (SD) | 139 (3) | 139 (3) | .73 | ||||
|
| Potassium (mEq/L), mean (SD) | 4.26 (0.42) | 4.27 (0.39) | .87 | ||||
|
| Creatinine (mEq/L), mean (SD) | 1.07 (0.59) | 1.09 (0.33) | .81 | ||||
|
| eGFRi (mL/min/1.73 m2), mean (SD) | 81 (24) | 80 (21) | .80 | ||||
|
| Blood urea nitrogen (mg/dL), mean (SD) | 19.62 (8.72) | 20.28 (8.33) | .55 | ||||
|
| Glucose (mg/dL), mean (SD) | 120 (53) | 113 (34) | .23 | ||||
|
| Hematocrit (%), mean (SD) | 40 (4) | 41 (5) |
| ||||
|
| hsCRPj (µg/mL), mean (SD) | 6.89 (12.87) | 9.10 (16.29) | .22 | ||||
|
| NT-proBNPk (ng/L), mean (SD) | 2704 (6736) | 2519 (1902) | .82 | ||||
|
| IL-6l (pg/mL), mean (SD) | 3.05 (5.36) | 4.32 (6.28) | .12 | ||||
|
| IL-10m (pg/mL), mean (SD) | 10.74 (49.67) | 13.67 (59.94) | .70 | ||||
|
| TNF-αRIIn (pg/mL), mean (SD) | 3425 (1700) | 3456 (1671) | .90 | ||||
|
| cTnTo (ng/mL), mean (SD) | 0.03 (0.08) | 0.02 (0.05) | .62 | ||||
|
| cTnIp (ng/mL), mean (SD) | 0.10 (0.28) | 0.10 (0.25) | .98 | ||||
|
| CK-MBq (ng/mL), mean (SD) | 3.94 (5.77) | 3.87 (3.86) | .93 | ||||
|
| Myoglobin (ng/mL), mean (SD) | 31.37 (30.80) | 37.13 (41.53) | .31 | ||||
| LVEFr: NonCMRs LVEF (%), mean (SD) | 24.2 (7.6) | 23.0 (7.4) | .19 | |||||
|
| ||||||||
|
| LVEF (%), mean (SD) | 27.8 (10.3) | 25.1 (8.8) |
| ||||
|
| LVt end-diastolic volume index (ml/m2), mean (SD) | 122.3 (39.9) | 136.2 (48.4) |
| ||||
|
| LV end-systolic volume index (ml/m2), mean (SD) | 91.5 (39.1) | 104.3 (45.2) |
| ||||
|
| LV mass index (ml/m2), mean (SD) | 75.1 (24.4) | 80.3 (21.2) |
| ||||
|
| ||||||||
|
| LGEu present (%), mean (SD) | 176 (66) | 56 (86) |
| ||||
|
| Gray zone (g), mean (SD) | 8.8 (11.6) | 13.8 (12.2) |
| ||||
|
| Core (g), mean (SD) | 12.4 (14.9) | 17.7 (15.1) |
| ||||
|
| Total scar (g), mean (SD) | 21.1 (25.4) | 31.3 (25.6) |
| ||||
aSCD: sudden cardiac death.
bP values <.05 are italicized.
cMI: myocardial infarction.
dNYHA: New York Heart Association.
eACE: angiotensin-converting enzyme.
fARB: angiotensin II receptor blocker.
gLBBB: left bundle branch block.
hICD: implantable cardioverter defibrillator.
ieGFR: estimated glomerular filtration rate.
jhsCRP: high-sensitivity C-reactive protein.
kNT-proBNP: N-terminal pro-b-type natriuretic peptide.
lIL-6: interleukin-6.
mIL-10: interleukin-10.
nTNF-αRII: tumor necrosis factor alpha R II.
ocTnT: cardiac troponin T.
pcTnI: cardiac troponin I.
qCK-MB: creatine kinase MB.
rLVEF: left ventricular ejection fraction.
sCMR: cardiac magnetic resonance.
tLV: left ventricular.
uLGE: late gadolinium enhancement.
Figure 1Steps to present machine learning (ML) predictions in an interpretable manner: The black box algorithm is applied to input data comprising outcomes (Y) and predictors (X) to obtain black-box predictions (P) of the input outcomes. The original X variables and the black-box predictions (P) are inputs to a simple model or algorithm, for example a single tree, whose predictions (S) are sufficiently close to (P) but more easily understood and explained.
Figure 2R2 equation where i=1 to n observations evaluated. S(i) denotes the prediction for the ith observation using the simplified model, P(i) denotes the prediction for the ith observation using the ML model, and Pavg denotes the average prediction from the ML model.
Figure 3Global summary tree of random forest (RF) model for sudden cardiac death (SCD) prediction: Several risk factors (namely heart failure hospitalization, several cardiac magnetic resonance imaging indices, and interleukin-6 [IL-6], a marker of inflammation) discriminate between low-, intermediate-, and high- risk patients. Decision rules in the tree are shown in bold italics. The 1-year risks of SCD are shown in the boxes at the bottom of the decision tree. The boxes are colored according to the magnitude of the percent per year risk, with white corresponding to the lowest risk subgroup and dark red corresponding to the highest risk subgroup. Percentages in parentheses at the bottom of the boxes are the proportions of the total training data that belong to each of the risk subgroups. R2 is 0.88 for how well this global summary tree represents the RF model. HF: heart failure; LV: left ventricular.
Figure 4Visualization of predictor effects in random forest (RF) model for sudden cardiac death (SCD) prediction: Risk ratio point estimates and the 95% confidence intervals generated from 500 bootstrap replications are shown for the RF model for SCD risk prediction. The largest risk ratio is between individuals who never experienced a heart failure hospitalization and those who experienced one or more heart failure hospitalizations. The other risk ratio comparisons show the risk ratios between individuals grouped into different categories based upon inflammation or cardiac magnetic resonance (CMR) imaging variables indicating the structural and functional properties of the heart. HF: heart failure; IL-6: interleukin 6; LV: left ventricular.
This table summarizes the global summary tree (shown in Figure 3) with an analysis of the variation (deviance) in the predicted values (P) from the machine learning (ML) model explained by the predictors in the global summary tree. The number of splits contributed by each variable in the global summary tree is enumerated along with the deviance and the percentage of the deviance explained. The predictors' ranked importance (ordered from most to least important from left to right in the table) is determined from the percentage of the deviance explained.
| Split variable | HFa hospitalization history | LVb end-diastolic volume index | Total scar | Inflammation (IL-6c) | Gray zone | Tree total | MLd total |
| Number of splits | 1 | 2 | 2 | 1 | 1 | 7 | N/Ae |
| Deviance explained | 1.26 | 0.255 | 0.100 | 0.034 | 0.020 | 1.67 | 1.89 |
| Percentage of deviance explained | 66.6 | 13.5 | 5.2 | 1.7 | 1.1 | 0.88f | 100 |
aHF: heart failure.
bLV: left ventricular.
cIL-6: interleukin 6.
dML: machine learning.
eN/A: not applicable.
fThis corresponds to the R2 value (0.88) obtained when using the equation shown in Figure 3 for the calculations.