Literature DB >> 33241727

Machine Learning-Based Risk Assessment for Cancer Therapy-Related Cardiac Dysfunction in 4300 Longitudinal Oncology Patients.

Yadi Zhou¹, Yuan Hou¹, Muzna Hussain^2,3, Sherry-Ann Brown⁴, Thomas Budd⁵, W H Wilson Tang^2,6, Jame Abraham⁵, Bo Xu², Chirag Shah⁷, Rohit Moudgil², Zoran Popovic², Leslie Cho², Mohamed Kanj², Chris Watson³, Brian Griffin², Mina K Chung^2,6, Samir Kapadia², Lars Svensson⁸, Patrick Collier^2,6, Feixiong Cheng^1,5,9.

Abstract

Background The growing awareness of cardiovascular toxicity from cancer therapies has led to the emerging field of cardio-oncology, which centers on preventing, detecting, and treating patients with cardiac dysfunction before, during, or after cancer treatment. Early detection and prevention of cancer therapy-related cardiac dysfunction (CTRCD) play important roles in precision cardio-oncology. Methods and Results This retrospective study included 4309 cancer patients between 1997 and 2018 whose laboratory tests and cardiovascular echocardiographic variables were collected from the Cleveland Clinic institutional electronic medical record database (Epic Systems). Among these patients, 1560 (36%) were diagnosed with at least 1 type of CTRCD, and 838 (19%) developed CTRCD after cancer therapy (de novo). We posited that machine learning algorithms can be implemented to predict CTRCDs in cancer patients according to clinically relevant variables. Classification models were trained and evaluated for 6 types of cardiovascular outcomes, including coronary artery disease (area under the receiver operating characteristic curve [AUROC], 0.821; 95% CI, 0.815-0.826), atrial fibrillation (AUROC, 0.787; 95% CI, 0.782-0.792), heart failure (AUROC, 0.882; 95% CI, 0.878-0.887), stroke (AUROC, 0.660; 95% CI, 0.650-0.670), myocardial infarction (AUROC, 0.807; 95% CI, 0.799-0.816), and de novo CTRCD (AUROC, 0.802; 95% CI, 0.797-0.807). Model generalizability was further confirmed using time-split data. Model inspection revealed several clinically relevant variables significantly associated with CTRCDs, including age, hypertension, glucose levels, left ventricular ejection fraction, creatinine, and aspartate aminotransferase levels. Conclusions This study suggests that machine learning approaches offer powerful tools for cardiac risk stratification in oncology patients by utilizing large-scale, longitudinal patient data from healthcare systems.

Entities: Chemical Disease Gene Species

Keywords: anthracycline therapy; cancer therapy–related cardiac dysfunction; cardiotoxicity; cardio‐oncology; echocardiography; machine learning

Year: 2020 PMID： 33241727 PMCID： PMC7763760 DOI： 10.1161/JAHA.120.019628

Source DB: PubMed Journal: J Am Heart Assoc ISSN： 2047-9980 Impact factor: 5.501

area under the precision‐recall curve area under the receiver operating characteristic curve cancer therapy–related cardiac dysfunction gradient tree boosting logistic regression machine learning random forest Synthetic Minority Oversampling Technique support vector machine

Clinical Perspective

What Is New?

This study presents the first, large‐scale machine learning–based approach to evaluate complications between cancer therapies and cardiovascular diseases using cardiovascular echocardiographic and laboratory test variables from over 4300 longitudinal cancer patients. We developed machine learning models with high performance and verified the generalizability using time‐split data to simulate real‐world scenarios and found that combining both laboratory test and echocardiographic variables resulted in the highest performance. We identified and validated multiple clinically relevant variables associated with cancer therapy–related cardiac dysfunction using learned weight analysis of the optimal machine learning models.

What Are the Clinical Implications?

We demonstrate the potential clinical implication of using a machine learning method to predict 6 types of cancer therapy–related cardiac dysfunction, including heart failure, atrial fibrillation, coronary artery disease, myocardial infarction, stroke, and de novo cancer therapy–related cardiac dysfunction. These machine learning models offer potential tools for risk assessment of cancer therapy–related cardiac dysfunction in cardio‐oncology clinical practices. Cardiovascular disease (CVD) is the leading cause of death and the second leading cause of morbidity in cancer survivors after recurrent malignancy in the United States. Comorbidity between CVD and cancer suggests underlying shared disease pathogeneses, which can be both genetic and environmental. One critical issue regarding environmental factors is that CVD can be associated with various treatments for cancer itself. First recognized in the 1960s, cancer therapy–related cardiac dysfunction (CTRCD) has been increasingly diagnosed and investigated. , , , , , For example, a growing number of cancer survivors (>5 million) are at risk for cardiotoxicity caused by anthracycline therapy years or even decades prior for various types of cancer. Through the success of basic and translational research, cancer survivors have become one of the largest growing subsets of patients in the US healthcare system. Currently, there are over 16.9 million cancer survivors in the United States. This number is projected to reach more than 22.1 million by 2030. Increasing numbers of oncology patients are facing CTRCD risks as cancer survival improves. The growing awareness of cardiovascular toxicity by cancer treatment has led to the emerging field of cardio‐oncology, which centers on preventing, detecting, and treating patients with cardiovascular toxicity from cancer treatment. However, precise prediction and prevention of cardiovascular toxicity in individual cancer patients or survivors has proven elusive. Further, while basic and translational research studies continue, experimental assays in animal models are limited by significant functional disparities between animal and human cardiomyocytes. Development of novel methodologies or tools, such as computational approaches, would offer unique opportunities for cardio‐oncology by utilizing the accumulated longitudinal clinical data available from healthcare systems. In recent years, machine learning (ML) has been increasingly used for cardiovascular studies, such as for the prediction of drug‐induced cardiovascular complications, , cardiac resynchronization therapy response prediction, risk assessment of cardiovascular events after acute myocardial infarction (MI), , and claims data–based mortality risk predictions. As more longitudinal clinical data are accumulated for oncology patients, ML presents a great opportunity to use these data to build predictive models in clinical practices. , In this study, we hypothesized that supervised ML models could accurately predict the risk for developing several cardiovascular outcomes in cancer patients. Specifically, we applied ML models to the prediction of 6 types of cardiac outcomes, namely heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), MI, stroke, and de novo CTRCD. We also determined several clinically relevant variables associated with these outcomes.

Methods

All data used in this study are available from the corresponding author on reasonable request and the approval of the institutional review board. The code can be found at https://github.com/ChengF‐Lab/CO‐ML.

Study Design

Figure 1 shows the overview of the study design. We integrated both cardiovascular echocardiographic and laboratory testing variables from over 4300 longitudinal cancer patients. We developed and evaluated ML models to assist in the risk assessment of CTRCDs. We systematically tested 5 classification methods: k‐nearest neighbors, logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient tree boosting (GB). For the feature sets, we tested: (1) laboratory tests only, (2) echocardiography only, and (3) laboratory tests and echocardiography combined. The generalizability of these models was verified by time‐based data split. We also interrogated the final models to uncover clinically relevant variables associated with CTRCDs using learned weight analysis.

Figure 1

Overview of the study design.

We integrated both cardiovascular echocardiographic and laboratory testing variables from over 4300 longitudinal cancer patients for the prediction of 6 outcomes, including heart failure (HF), atrial fibrillation (AF), coronary artery disease (CAD), myocardial infarction (MI), stroke, and de novo cancer therapy–related cardiac dysfunction (CTRCD). We systematically tested 5 classification methods: k‐nearest neighbors (k‐NN), logistic regression (LR), support vector machine (SVM), random forest (RF), and gradient tree boosting (GB). For the feature sets, we tested laboratory test variables only, echocardiographic variables only, and laboratory test and echocardiographic variables combined.

Overview of the study design.

Study Population and Data Preparation

This study was reviewed and approved by the institutional review board and the patients gave informed consent. We extracted the clinical data of over 4600 oncology patients receiving cancer therapies from our institutional electronic medical health record database. All adult patients with cancer referred to the cardio‐oncology service at the Cleveland Clinic from 1997 to 2018 were included. Five outcomes, including HF, AF, CAD, MI, and stroke, were extracted using International Classification of Diseases, Ninth and Tenth Revision (ICD‐9, ICD‐10), diagnosis codes and were manually checked by looking at patient charts on EPIC for accuracy (Epic Systems Corporation). Both inpatient and outpatient codes were included in this study. An additional outcome, de novo CTRCD, was also examined in this study. According to the diagnosis date of these 5 cardiac events, we identified the cardiac events that were diagnosed before cancer therapy as preexisting cardiac events and those after as de novo CTRCD. All variables were collected per patient based on the entirety of all available data. All patients had 2 sets of clinical variables: laboratory tests and echocardiographic variables. Laboratory test results included variables such as estimated glomerular filtration rate, glycated hemoglobin, glucose, calcium, total protein, and many others. Echocardiographic data included variables such as left ventricular ejection fraction, left ventricular end‐systolic volume index, and left ventricular end‐diastolic volume index. Since available echocardiographic data were longitudinal, we extracted several features for each echocardiographic variable: maximum of all follow‐ups, minimum of all follow‐ups, slope of all follow‐ups, maximum increase within 3 months, and maximum decrease within 3 months (see Table S1 for a list of the variables). Finally, clinical variables were used as features to build ML models among 6 types of cardiovascular outcomes. After removing patients with >6 missing variables, the final data set contained 4309 patients (see Table for the characteristics of the cohort).

Table 1

Characteristics of the Entire Cardio‐Oncology Cohort

Variables	Cohort (N=4309)
Basic characteristics
Age, y	61.1±13.7*
Sex
Female	2552 (59) ^†
Male	1757 (41)
Body mass index, kg/m²	28.3±7.3
Tobacco use	2162 (50)
Alcohol use	1995 (48)
Family history	1548 (36)
Comorbidity characteristics
Hypertension	2450 (57)
Hyperlipidemia	1877 (44)
Diabetes mellitus	974 (23)
Chest pain	1724 (40)
Shortness of breath	1523 (35)
Fatigue	2202 (51)
Cardiac outcomes
CTRCD	1560 (36)
HF	596 (14)
AF	653 (15)
CAD	673 (16)
MI	193 (4)
Stroke	275 (6)
Preexisting CVD	722 (17)
de novo CTRCD	838 (19)
Cancer therapy
Chemotherapy	4011 (93)
Radiation	1969 (46)
Chemotherapy and radiation	1780 (41)
Anthracycline	1764 (41)
Cyclophosphamide	1567 (36)
Trastuzumab	822 (19)

AF indicates atrial fibrillation; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac function; CVD, cardiovascular disease; HF, heart failure; and MI, myocardial infarction.

Continuous variables are reported as mean±SD.

Categorical variables are reported as number (percentage).

Characteristics of the Entire Cardio‐Oncology Cohort AF indicates atrial fibrillation; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac function; CVD, cardiovascular disease; HF, heart failure; and MI, myocardial infarction. Continuous variables are reported as mean±SD. Categorical variables are reported as number (percentage).

Classifier Development and Evaluation

Our first goal was to identify the optimal classification method and feature set combination. To do this, we systematically tested all of the combinations of 5 classification methods and 3 feature sets. For each outcome, we adopted a training‐validation test procedure, repeated 100 times. In each iteration, all patients were randomly split into training set (81%), validation set (9%), or test set (10%). The training and validation sets were used in a grid search (Table S2) to identify the optimal hyperparameters for each classification method and feature set combination. Then, these 2 sets were merged and trained with the optimal hyperparameters to build the final model, which was evaluated using the test set. See Figure S1 for the detailed workflow of method and feature selection. All classification models were trained using the Python package scikit‐learn. We tested the effect of balancing the data sets using Synthetic Minority Oversampling Technique (SMOTE) implemented in the Python package imbalanced‐learn. To test the generalizability of our ML models, we adopted a time‐based data split strategy to simulate real‐world scenarios, in which models used to predict new patients (external validation set) are built on data from the past. Specifically, we selected January 1, 2017 (2017.1.1) as the cutoff time point, as it produced subsequent test sets with reasonable sizes. Patients who received cancer therapies before 2017.1.1 were used as the training set, and those who received cancer therapies after 2017.1.1 were used as the test set. The detailed workflow of this strategy is provided in Figure S2.

Model Criteria to Determine Predictive Variables

Next, we sought to understand which clinically relevant variables were significantly associated with CTRCD and further contributed to the high performance of ML models. We examined the weights of the 100 final LR models for each outcome. LR learns a weight for each feature, and the prediction is the summation of all of the products of the weight and feature pairs squashed using a sigmoid function. We identified the clinically relevant variables based on 2 criteria: (1) the absolute coefficient of variation (the ratio of SD and mean) was low to ensure small fluctuation of the weight in the 100 repeats; (2) the absolute associated weight compared with the extremum weight for that outcome was high (relative weight). We used 0.5 and 0.3 as the 2 cutoffs: where T denotes the feature set, w denotes the learned weight for feature i, and sgn is the sign function. To verify the clinically relevant variables uncovered by examining the LR weights, we tested the hazard ratios (95% CIs) of the clinically relevant variables for the de novo CTRCD. The Wald χ 2 test was used to evaluate the variables with statistically significant coefficients. In addition, the log‐rank test was used for global significance evaluation. The hazard analyses were performed with the survival (v2.44‐1.1) and survminer (v0.4.6) packages on R 3.6.1.

Statistical Analysis

To evaluate the performance of ML models, we used 2 metrics: area under the receiver operating characteristic curve (AUROC) and area under the precision‐recall curve (AUPR). AUROC and AUPR were computed using the metrics.roc_auc_score and metrics.average_precision_score functions from the scikit‐learn Python package. For the comparison of the performances of the laboratory test and echocardiographic feature sets, we applied a 2‐sided paired sample t test using the AUROCs of the test sets from 100 iterations. P<0.05 was considered statistically significant. The t test was performed using the stats.ttest_rel function from the SciPy Python package. We applied χ 2 test for the categorical variables to verify their associations with the outcomes. Kolmogorov‐Smirnov test was used for the continuous variables. These 2 statistical analyses were performed by stats.chi2_contingency and stats.ks_2samp from the SciPy Python package.

Results

Overview of the Classifier Performance

In this study, we built a large, longitudinal cardio‐oncology cohort with 4309 oncology patients collected from our institutional electronic medical record database (Table). The median age was 61.1 years (interquartile range [IQR], 53.8–70.5 years) for the overall population. Six types of cardiac events, including HF (n=596), AF (n=653), CAD (n=673), MI (n=193), stroke (n=275), and de novo CTRCD (n=838) were evaluated. In total, 1560 (36%) of patients had at least 1 type of diagnosed cardiac events, among which 722 (17%) patients had preexisting cardiac events/disease before cancer therapy, while 838 (19%) patients developed de novo CTRCD afterward. Among all of the patients, 4011 (93%) were treated with chemotherapy and 1969 (46%) were treated with radiation. For chemotherapy, 1764 (41%) patients were treated with anthracycline drugs (including doxorubicin, idarubicin, daunorubicin, and epirubicin), 1567 (36%) were treated with cyclophosphamide, and 822 (19%) patients were treated with trastuzumab. A list of all therapies can be found in Table S3. Two sets of clinical variables—laboratory tests (such as estimated glomerular filtration rate, glycated hemoglobin, glucose, calcium, and total protein) and echocardiographic variables (such as left ventricular ejection fraction, left ventricular end‐diastolic volume index, and left ventricular end‐systolic volume index)—were used to build the ML models. Table S1 lists all of the variables used in this study. We conducted a systematic evaluation of 5 ML algorithms (k‐nearest neighbors, LR, SVM, RF, and GB) and 3 feature sets (laboratory tests only, echocardiography only, or both combined). The average performance and SD for each outcome based on the 100 iterations are listed in Table S4 (AUROC) and Table S5 (AUPR). LR, RF, and GB achieved the first‐tier performance, followed by SVM, then k‐nearest neighbors. Although LR, RF, and GB performed similarly, LR achieved the highest AUROCs among 5 outcomes and comparable AUROC for HF which GB achieved the highest AUROC. LR was selected as the optimal classification method for all further analyses. Figure 2 shows the overall performance for LR models. The AUROCs were 0.882 (95% CI, 0.878–0.887) for HF, 0.787 (95% CI, 0.782–0.792) for AF, 0.821 (95% CI, 0.815–0.826) for CAD, 0.807 (95% CI, 0.799–0.816) for MI, 0.660 (95% CI, 0.650–0.670) for stroke, and 0.802 (95% CI, 0.797–0.807) for de novo CTRCD. All AUPRs were at least 2‐fold of their respective baselines of random classifiers. Precision‐recall curve showed the trade‐off between precision and recall, which, in this case, means the fraction of patients actually developed the disease in the patients who were predicted to have disease (precision) and their fraction in all of the patients who developed the disease (recall). In the case of a random classifier, the prediction error made by the classifier is consistent (a horizontal line in the precision‐recall plot), thus leading to a baseline AUPR that is the percentage of patients with the outcomes in the cohort. The AUPRs compared with their respective baselines were 0.651 (95% CI, 0.641–0.661) versus 0.138 for HF, 0.401 (95% CI, 0.392–0.411) versus 0.151 for AF, 0.481 (95% CI, 0.469–0.492) versus 0.156 for CAD, 0.220 (95% CI, 0.206–0.234) versus 0.045 for MI, 0.138 (95% CI, 0.131–0.146) versus 0.064 for stroke, and 0.592 (95% CI, 0.583–0.601) versus 0.234 for de novo CTRCD.

Figure 2

Performances for the 6 outcomes in receiver operating characteristic (A through F) and precision‐recall (G through L) curves using logistic regression and the combined feature set.

For each subplot, light‐colored lines correspond to the 100 iterations; the saturated‐colored line is the average of the 100 iterations; background indicates mean±SD; the grey dotted line indicates the baseline of a random classifier. The area under the receiver operating characteristic curves (AUROCs) and area under the precision‐recall curves (AUPRs) shown are the averages. AF indicates atrial fibrillation; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac dysfunction; HF, heart failure; and MI, myocardial infarction.

Performances for the 6 outcomes in receiver operating characteristic (A through F) and precision‐recall (G through L) curves using logistic regression and the combined feature set.

Combining Echocardiographic and Laboratory Test Variables Showed the Best Performance

Next, we wanted to find out the complementary effect of different feature sets on the model performance. Based on the 100 iterations, we found that while echocardiographic or laboratory test variables alone were predictive, inclusion of both types of data synergistically improved performance of the models (Figure 3 and Figure S3). Moreover, we showed that laboratory test and echocardiographic features performed differently among the outcomes (2‐sided paired t test). Echocardiographic features outperformed laboratory test for HF (0.854 versus 0.729, P<0.001), MI (0.766 versus 0.746, P=0.003), and de novo CTRCD (0.742 versus 0.733, P=0.04). Laboratory test outperformed echocardiographic features for AF (0.760 versus 0.700, P<0.001), CAD (0.797 versus 0.702, P<0.001), and stroke (0.656 versus 0.617, P<0.001). In summary, combining both echocardiographic and laboratory test variables showed the best performance.

Figure 3

Comparison of the performances of laboratory test and echocardiographic feature sets.

A through F, When using the combined feature set, the models outperformed those that used either feature set individually. A, D, and F, Echocardiographic features showed significantly better performances for heart failure (HF), myocardial infarction (MI), and de novo cancer therapy–related cardiac dysfunction (CTRCD) than laboratory test. B, C, and E, Laboratory test features significantly outperformed echocardiographic features for atrial fibrillation (AF), coronary artery disease (CAD), and stroke. P values were calculated using 2‐sided paired sample t test. AUROC indicates area under the receiver operating characteristic curve; and AUPR, area under the precision‐recall curve.

Comparison of the performances of laboratory test and echocardiographic feature sets.

Generalizability of the Models

An important aspect of ML models is real‐world generalizability. The patients were further split by dates—those with cancer therapy start dates before 2017.1.1 (see Methods) as the training set and those with start dates after 2017.1.1 as the test set. The results show that for all 6 outcomes, the AUROCs ranged from 0.913 for HF to 0.656 for MI (Figure 4 and Table S6). All AUPRs were higher than their corresponding baselines as well (Figure S4), indicating high generalizability of ML models in the prediction of CTRCD for new patients in real‐world clinical practices.

Figure 4

Evaluation of the model generalizability using time‐split data.

The receiver operating characteristic curve for each outcome is shown. Dotted line indicates the theoretical baseline performance of a random classifier. Patients were split by the date January 1, 2017. Patients who received cancer therapies before this date were used for model training, and patients who received cancer therapies after this date comprised the test sets. Logistic regression and the combined feature set were used. All models achieved moderate to high performances, suggesting a high generalizability of the models. AF indicates atrial fibrillation; AUROC, area under the receiver operating characteristic curve; CAD, coronary artery disease; CTRCD, cancer therapy–related cardiac dysfunction; HF, heart failure; and MI, myocardial infarction.

Evaluation of the model generalizability using time‐split data.

Clinical Interpretability of the Models

We next interrogated what the LR models learned from the data to determine associations between clinical variables and the CTRCD outcomes. We examined the model weights of the 100 final models for each outcome. Using the mean and SD of the weight, we derived 2 metrics, coefficient of variation and relative weight (see Methods), to identify the features that have stable and relatively large absolute weights throughout the 100 iterations. Figure 5A shows the 23 variables that were predictive of at least 1 cardiovascular outcome; the actual values of the weights in the LR models can be found in Table S7. Age was most predictive for all 6 outcomes, followed by hypertension and left ventricular ejection fraction, which were also predictive for the 6 outcomes. The predictive variables for each outcome can be found in Table S8. Using Cox proportional hazards model analysis for de novo CTRCD, left ventricular ejection fraction, hazard ratio, and risk factors such as sex, age, and hypertension, were verified as predictive (Figure 5B). The distributions of the 23 variables among the patients further illustrated the clinical relevancy of the variables uncovered by LR model weight analysis (Figure 5C and 5D, Figures S5 and S6).

Figure 5

Clinically relevant variables uncovered by weight examination of the final logistic regression models.

A, Twenty‐three predictive variables for at least 1 outcome (marked by an “X” in the grid). Color gradient indicates that, as the value of the variable increases, the risk for the outcome increases (red) or decreases (green). B, Cox proportional hazards model analysis was performed for de novo cancer therapy–related cardiac dysfunction (CTRCD), which verified the clinically relevant variables using the machine learning method. C, Distributions of 6 continuous variables by the outcomes (P values were computed by Kolmogorov‐Smirnov test). D, Distributions of 5 categorical variables (P values were computed by χ 2 test). +/− indicates whether the patients have the symptoms (row) or the outcomes (column). AF indicates atrial fibrillation; AST, aspartate aminotransferase; CAD, coronary artery disease; HF, heart failure; LVEF, left ventricular ejection fraction; LVESVi, left ventricular end‐systolic volume index; and MI, myocardial infarction.

Clinically relevant variables uncovered by weight examination of the final logistic regression models.

Impact of Cancer Treatment Types on the Models

We next examined whether cancer treatment information can affect the model performances by conducting 2 separate experiments. In the first experiment, we pursued to find out whether our models could be applied to patients with specific types of cancer treatments. We generated 5 subpopulations (Table) based on whether the patients were treated with the following cancer therapies respectively: (1) chemotherapy, (2) radiation therapy, (3) chemotherapy and radiation therapy, (4) anthracycline, and (5) trastuzumab. We found high AUROCs in the prediction of de novo CTRCD among different types of cancer therapies as well (Figure 6). Specifically, the AUROCs were 0.779 (95% CI, 0.771–0.787) for anthracycline and 0.764 (95% CI, 0.746–0.783) for trastuzumab.

Figure 6

Performances for de novo cancer therapy–related cardiac dysfunction for patients with different cancer therapies.

Performances for de novo cancer therapy–related cardiac dysfunction for patients with different cancer therapies.

A through F, Receiver operating characteristic curves. G through L, Precision‐recall curves. F and L, The model performances using all of the patients with de novo cancer therapy–related cardiac dysfunction (CTRCD) as comparison. For each subplot, light‐colored lines correspond to the 100 iterations; saturated‐colored line is the average of the 100 iterations; background indicates mean±SD; grey dotted line indicates the baseline of a random classifier. The area under the receiver operating characteristic curves (AUROCs) and area under the precision‐recall curves (AUPRs) shown are the averages. In the second experiment, we examined whether cancer therapy information used as features can improve model performances. We included 4 additional categorical features: the usage of chemotherapy, radiation, anthracycline, or trastuzumab. We found that incorporating treatment information had a marginal improvement on the model performances (AUROC: 0.805 versus 0.802; P>0.1, t test) (Figure S7).

Discussion

In this study, we built predictive ML models for cardiac risk assessment among 6 types of cardiovascular outcomes, including HF, AF, CAD, MI, stroke, and de novo CTRCD. Based on 100 model iterations, all outcomes received relatively high or high AUROC, ranging from 0.882 for HF to 0.660 for stroke (Figure 2). In addition, models built using time‐split data demonstrated a high generalizability of our models for potential clinical implementation (Figure 4). By comparing the model performances using different feature sets, we found that both laboratory test variables and echocardiographic variables contributed to the overall high performance. When laboratory test data were used alone, all outcomes still achieved moderate to high AUROCs (Figure 3), with 5 of the AUROCs >0.7 and 1 at 0.66. In addition, by comparing the performances of laboratory test and echocardiographic feature sets, we found that for HF, MI, and de novo CTRCD, echocardiographic features significantly outperformed the laboratory test. For AF, CAD, and stroke, laboratory test performed better than echocardiographic features (Figure 3 and Figure S3). These highly predictive models offer potential approaches for cardio‐oncology clinical practice. Oncologists referred these patients to the cardio‐oncology services based on professional assessment of clinical factors such as cardiac symptoms, preexisting cardiac diseases, or cardiovascular risk factors. The models trained on laboratory test data could assist in the decision of referring, with or without incorporation of echocardiographic data. To understand which specific variables contributed to model performance, we examined the learned weights for the features (Figure 5, Figures S5 and S6). We found that increased creatinine level was associated with high risk of cancer treatment–associated HF. In the general population, creatinine elevation in patients with HF is associated with increased mortality. Creatinine is the metabolic product of creatine that is excreted in the urine. An elevated glucose level is commonly found in patients with acute MI. Studies have also shown that high glucose level is associated with high mortality risk in patients with MI. Our results showed that a higher glucose level was associated with higher risks of cancer treatment–associated MI. Other risk factors, such as sex, hypertension, and age, were also verified. Men have a higher risk of heart disease than women. , , , , Age is a well‐known CVD risk factor, , , and it was identified for all 6 outcomes. Hypertension is another strong risk factor for many types of CVDs. , , To summarize, by looking at the learned weights of the LR models, we uncovered the clinically relevant variables that were strong predictors for the CTRCD outcomes in the oncology cohort. The skewness of the cardiovascular events in the data sets, especially in MI and stroke, could negatively affect the performances. Therefore, we tested this issue using SMOTE. As shown in Figure S8, LR did not benefit from the resampling. The resampling marginally improved the performance of other methods for certain outcomes, such as k‐nearest neighbors for MI, SVM for MI, and SVM for AF. However, the improved models still do not outperform LR. We also experimented with stacking the output of these models. We found that stacking LR, RF, and GB achieved a marginal improvement compared with using LR alone (Figure S9; HF, AF, and stroke). In summary, these observations suggest a low risk of data skewness in our current models, especially for LR models; yet, potential further improvements by combining the techniques such as stacking and resampling, and perhaps by a meta‐classifier trained using the output of the models, are achievable in the future. Our future work includes several directions. First, we will continue to improve the models as more data are gathered, since we noticed marginally increased model performances when the training sizes increased (Figure S10), suggesting the importance of large‐scale cohorts for ML studies. Our models may also be improved with a more model‐specific variable selection procedure to further reduce risk of “overfitting.” When we tested the effect of limiting variables to a certain period (ie, variable collected within 1, 5, and 10 years of the first diagnosis for the outcome), we found that the models performed similarly, although certain outcomes may be slightly improved (Figure S11). Second, we are actively incorporating imaging data , directly using convolutional neural networks to improve performance of models further. Third, we plan to integrate ML‐based risk assessment with online tools for use in clinical practice.

Limitations

We acknowledge several potential limitations in the current study. First, because of the retrospective nature of this study and potential risk of patient selection bias, the model performances may be overestimated for real‐world uses, even though model generalizability was evaluated with time‐split data as the external validation set. Although each ICD‐9/10 diagnosis code was manually reviewed by a physician for accuracy, potential errors of ICD‐9/10 codes may influence the performance of ML models. In addition, while our models can output a probability for each outcome, they have not been explicitly programmed to predict risk levels. This could be considered in the next iteration of the models, in which a system of risk‐based tertiles or quartiles could potentially be implemented based on our data. , We did not include feature interactions as additional features for the modeling using LR. Some risk factors are known to interact with others, such as sex and diabetes mellitus. A potential improvement would be to include these interactions as features. However, we should also note that it could introduce a large number of features and could potentially increase the risk of model overfitting. In addition, some of the classification methods we have evaluated had such capacity, but they did not outperform LR. We were able to identify several clinically relevant variables that were stable strong predictors of the outcomes. However, this method could not reveal all of the factors. When 2 features are linearly related (multicollinearity), their final learned weights may fluctuate and will depend on the initial randomization of the weights. These features will have high absolute coefficient of variations, and their contributions to the observed outcomes cannot easily be inferred using this method. Last, although we applied L2 regularization for the training of the LR models, the models could still potentially overfit. To overcome this, we could filter the features to remove irrelevant ones, which could be performed through variance analysis, mutual information, and L1 regularization.

Conclusions

ML models were built for each of 6 CTRCD outcomes for the oncology population based on a systematic evaluation of 5 classification methods and 3 feature sets. These models showed moderate to high performances and real‐world generalizability using time‐split data. We found that laboratory test and echocardiographic variables were each associated with different outcomes. We uncovered several clinically relevant variables associated with CTRCD, offering potential predictive factors and biomarkers for cardio‐oncology clinical practices. Future versions of our models can include risk stratification in tertiles or quartiles to help with clinical decision‐making to impact patient outcomes. To this end, we are currently working on the development of free online outcomes and risk calculators that integrate our models for shared decision‐making. Our findings suggest that ML tools hold promise for cardiac risk assessment for patients before, during, or after cancer treatments by integrating large‐scale, longitudinal patient data from healthcare systems.

Sources of Funding

This work was supported by the National Heart, Lung, and Blood Institute of the National Institutes of Health under award numbers K99 HL138272 and R00 HL138272 to F.C.

Disclosures

None. Tables S1–S8 Figures S1–S11 Click here for additional data file.

41 in total

Review 1. Serum creatinine as an index of renal function: new insights into old concepts.

Authors: R D Perrone; N E Madias; A S Levey
Journal: Clin Chem Date: 1992-10 Impact factor: 8.327

Review 2. Applications of artificial intelligence in multimodality cardiovascular imaging: A state-of-the-art review.

Authors: Bo Xu; Duygu Kocyigit; Brian P Griffin; Feixiong Cheng
Journal: Prog Cardiovasc Dis Date: 2020-03-19 Impact factor: 8.194

Review 3. Hypertension and cardiovascular risk: General aspects.

Authors: Sverre E Kjeldsen
Journal: Pharmacol Res Date: 2017-11-07 Impact factor: 7.658

4. Cancer treatment and survivorship statistics, 2019.

Authors: Kimberly D Miller; Leticia Nogueira; Angela B Mariotto; Julia H Rowland; K Robin Yabroff; Catherine M Alfano; Ahmedin Jemal; Joan L Kramer; Rebecca L Siegel
Journal: CA Cancer J Clin Date: 2019-06-11 Impact factor: 508.702

Review 5. Machine Learning Approaches in Cardiovascular Imaging.

Authors: Mir Henglin; Gillian Stein; Pavel V Hushcha; Jasper Snoek; Alexander B Wiltschko; Susan Cheng
Journal: Circ Cardiovasc Imaging Date: 2017-10 Impact factor: 7.792

6. Long-term risk of congestive heart failure in younger breast cancer survivors: A nationwide study by the SMARTSHIP group.

Authors: Jihyoun Lee; Ho Hur; Jong Won Lee; Hyun Jo Youn; Kyungdo Han; Nam Won Kim; So-Youn Jung; Zisun Kim; Ku Sang Kim; Min Hyuk Lee; Se-Hwan Han; Sung Hoo Jung; Il Yong Chung
Journal: Cancer Date: 2019-08-27 Impact factor: 6.860

Review 7. Gender differences in the cardiovascular effects of sex hormones.

Authors: Cristiana Vitale; Massimo Fini; Giuseppe Speziale; Sergio Chierchia
Journal: Fundam Clin Pharmacol Date: 2010-12 Impact factor: 2.748

8. In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers.

Authors: Chuipu Cai; Jiansong Fang; Pengfei Guo; Qi Wang; Huixiao Hong; Javid Moslehi; Feixiong Cheng
Journal: J Chem Inf Model Date: 2018-05-10 Impact factor: 4.956

9. Comparative Effectiveness of New Approaches to Improve Mortality Risk Models From Medicare Claims Data.

Authors: Harlan M Krumholz; Andreas C Coppi; Frederick Warner; Elizabeth W Triche; Shu-Xia Li; Shiwani Mahajan; Yixin Li; Susannah M Bernheim; Jacqueline Grady; Karen Dorsey; Zhenqiu Lin; Sharon-Lise T Normand
Journal: JAMA Netw Open Date: 2019-07-03

10. Fully Automated Echocardiogram Interpretation in Clinical Practice.

Authors: Jeffrey Zhang; Sravani Gajjala; Pulkit Agrawal; Geoffrey H Tison; Laura A Hallock; Lauren Beussink-Nelson; Mats H Lassen; Eugene Fan; Mandar A Aras; ChaRandle Jordan; Kirsten E Fleischmann; Michelle Melisko; Atif Qasim; Sanjiv J Shah; Ruzena Bajcsy; Rahul C Deo
Journal: Circulation Date: 2018-10-16 Impact factor: 29.690

12 in total

1. An artificial intelligence approach for predicting cardiotoxicity in breast cancer patients receiving anthracycline.

Authors: Hsiang-Chun Lee; Jhih-Yuan Shih; Wei-Ting Chang; Chung-Feng Liu; Yin-Hsun Feng; Chia-Te Liao; Jhi-Joung Wang; Zhih-Cherng Chen
Journal: Arch Toxicol Date: 2022-07-25 Impact factor: 6.168

Review 2. Cardiac imaging techniques for the assessment of immune checkpoint inhibitor-induced cardiotoxicity and their potential clinical applications.

Authors: Yi Li; Pei-Jun Liu; Zhuo-Li Zhang; Yi-Ning Wang
Journal: Am J Cancer Res Date: 2022-08-15 Impact factor: 5.942

3. Artificial intelligence opportunities in cardio-oncology: Overview with spotlight on electrocardiography.

Authors: Daniel Sierra-Lara Martinez; Peter A Noseworthy; Oguz Akbilgic; Joerg Herrmann; Kathryn J Ruddy; Abdulaziz Hamid; Ragasnehith Maddula; Ashima Singh; Robert Davis; Fatma Gunturkun; John L Jefferies; Sherry-Ann Brown
Journal: Am Heart J Plus Date: 2022-04-01

4. Effects of Carfilzomib Therapy on Left Ventricular Function in Multiple Myeloma Patients.

Authors: Giulia Mingrone; Anna Astarita; Lorenzo Airale; Ilaria Maffei; Marco Cesareo; Teresa Crea; Giulia Bruno; Dario Leone; Eleonora Avenatti; Cinzia Catarinella; Marco Salvini; Giusy Cetani; Francesca Gay; Sara Bringhen; Franco Veglio; Fabrizio Vallelonga; Alberto Milan
Journal: Front Cardiovasc Med Date: 2021-04-21

Review 5. The Role of AI in Characterizing the DCM Phenotype.

Authors: Clint Asher; Esther Puyol-Antón; Maleeha Rizvi; Bram Ruijsink; Amedeo Chiribiri; Reza Razavi; Gerry Carr-White
Journal: Front Cardiovasc Med Date: 2021-12-21

Review 6. Multimodality Advanced Cardiovascular and Molecular Imaging for Early Detection and Monitoring of Cancer Therapy-Associated Cardiotoxicity and the Role of Artificial Intelligence and Big Data.

Authors: Jennifer M Kwan; Evangelos K Oikonomou; Mariana L Henry; Albert J Sinusas
Journal: Front Cardiovasc Med Date: 2022-03-15

7. Establishing an interdisciplinary research team for cardio-oncology artificial intelligence informatics precision and health equity.

Authors: Sherry-Ann Brown; Rodney Sparapani; Kristen Osinski; Jun Zhang; Jeffrey Blessing; Feixiong Cheng; Abdulaziz Hamid; Generika Berman; Kyla Lee; Mehri BagheriMohamadiPour; Jessica Castrillon Lal; Anai N Kothari; Pedro Caraballo; Peter Noseworthy; Roger H Johnson; Kathryn Hansen; Louise Y Sun; Bradley Crotty; Yee Chung Cheng; Jessica Olson
Journal: Am Heart J Plus Date: 2022-02-05

Review 8. Pursuing Connectivity in Cardio-Oncology Care-The Future of Telemedicine and Artificial Intelligence in Providing Equity and Access to Rural Communities.

Authors: Coralea Kappel; Moira Rushton-Marovac; Darryl Leong; Susan Dent
Journal: Front Cardiovasc Med Date: 2022-06-13

9. A retrospective analysis of cardiovascular adverse events associated with immune checkpoint inhibitors.

Authors: Jessica Castrillon Lal; Sherry-Ann Brown; Patrick Collier; Feixiong Cheng
Journal: Cardiooncology Date: 2021-05-28

10. Machine Learning-Based Risk Assessment for Cancer Therapy-Related Cardiac Dysfunction in 4300 Longitudinal Oncology Patients.

Authors: Yadi Zhou; Yuan Hou; Muzna Hussain; Sherry-Ann Brown; Thomas Budd; W H Wilson Tang; Jame Abraham; Bo Xu; Chirag Shah; Rohit Moudgil; Zoran Popovic; Leslie Cho; Mohamed Kanj; Chris Watson; Brian Griffin; Mina K Chung; Samir Kapadia; Lars Svensson; Patrick Collier; Feixiong Cheng
Journal: J Am Heart Assoc Date: 2020-11-26 Impact factor: 5.501