Literature DB >> 30829708

Development of heart failure risk prediction models based on a multi-marker approach using random forest algorithms.

Hui Yuan^1,2, Xue-Song Fan², Yang Jin², Jian-Xun He², Yuan Gui², Li-Ying Song², Yang Song², Qi Sun², Wei Chen¹.

Abstract

BACKGROUND: The early identification of heart failure (HF) risk may favorably affect outcomes, and the combination of multiple biomarkers may provide a more comprehensive and valuable means for improving the risk of stratification. This study was conducted to assess the importance of individual cardiac biomarkers creatine kinase MB isoenzyme (CK-MB), B-type natriuretic peptide (BNP), galectin-3 (Gal-3) and soluble suppression of tumorigenicity-2 (sST2) for HF diagnosis, and the predictive performance of the combination of these four biomarkers was analyzed using random forest algorithms.
METHODS: A total of 193 participants (80 patients with HF and 113 age- and gender-matched healthy controls) were included from June 2017 to December 2017. The correlation and regression analysis were conducted between cardiac biomarkers and echocardiographic parameters. The accuracy and importance of these predictor variables were assessed using random forest algorithms.
RESULTS: Patients with HF exhibited significantly higher levels of CK-MB, BNP, Gal-3, and sST2. BNP exhibited a good independent predictive capacity for HF (AUC 0.956). However, CK-MB, sST2, and Gal-3 exhibited a modest diagnostic performance for HF, with an AUC of 0.709, 0.711, and 0.777, respectively. BNP was the most important variable, with a remarkably higher mean decrease accuracy and Gini. Furthermore, there was a general increase in predictive performance using the multi-marker model, and the sensitivity, specificity was 91.5% and 96.7%, respectively.
CONCLUSION: The random forest algorithm provides a robust method to assess the accuracy and importance of predictor variables. The combination of CK-MB, BNP, Gal-3, and sST2 achieves improvement in prediction accuracy for HF.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2019 PMID： 30829708 PMCID： PMC6595865 DOI： 10.1097/CM9.0000000000000149

Source DB: PubMed Journal: Chin Med J (Engl) ISSN： 0366-6999 Impact factor: 2.628

Introduction

Heart failure (HF) is a growing public epidemic characterized by ventricular remodeling and variable degrees of myocardial fibrosis, and is associated with significant mortality, morbidity and healthcare expenditures.[ Despite the remarkable therapeutic advancement in recent decades, the prognosis of patients with HF remains poor. The early identification of patients with high HF risk may favorably affect outcomes. Recently, there are several biomarkers that have gained mounting interest. B-type natriuretic peptide (BNP) and its amino-terminal fragment (NT-proBNP) are gold standard biomarkers that have been generally used for evaluating patients with HF.[ Creatine kinase MB isoenzyme (CK-MB) is a classical indicator for myocardial injury, with excellent sensitivity and specificity.[ Suppression of tumorigenicity 2 (ST2) is a receptor expressed in cardiomyocytes and the vascular endocardium, which has two forms: soluble (sST2) and transmembrane forms. sST2 has been considered to function as a “decoy” receptor for interleukin (IL)-33.[ Elevated serum concentrations of sST2 as a biomarker represent disease severity, myocardial stretch and inflammation, as well as unfavorable prognosis.[ Galectin-3 (Gal-3) is a soluble β-galactosidase–binding glycoprotein that is secreted by cardiac macrophages. It induces fibroblast proliferation and collagen deposition in the myocardium, and causes ventricular dysfunction as a consequence. An increase in Gal-3 concentration has been found to be associated with increased risk for mortality in patients with chronic HF, regardless of etiology.[ The dynamic progression of HF is complex, and is driven by cardiac dysfunction and maladaptive compensatory processes. Given the complexity of this syndrome, it is unlikely that individual biomarkers may be sufficient to assess multi-system disorders, leading to the limited prediction accuracy of HF. Thus, the combination of multiple biomarkers may provide a more comprehensive and valuable means for improving the risk of stratification.[ The random forest, which was first proposed by Leo Breiman,[ is one commonly used ensemble machine learning algorithm for classification, regression and prediction, which operates by building a collection of decision trees at training. The decision tree works by learning simple decision rules extracted from the data features. The output will consider the prediction of each decision tree and make a final decision. Random forests can help identify potential predictors and uncover complex interactions, even in high dimensional settings. This method also simultaneously considers the impact of each individual variable and multivariate interaction with other features. Given its superior performance in multi-class classification, the random forest method has been widely applied in various scientific fields, such as metabolomic or microarray data.[ However, it has been less employed on cardiac data for the prediction of HF. In the present study, a prediction model was constructed by combining four cardiac biomarkers, namely, CK-MB, BNP, Gal-3, and sST2, using the random forest algorithm. The importance of individual cardiac biomarkers for HF prediction was identified, and the predictive performance of the combination of the four biomarkers was analyzed. The present results suggest that the integration of CK-MB, BNP, Gal-3, and sST2 exhibited considerable improvement in assessing the risk of HF, when compared with individual biomarkers.

Methods

Ethical approval

The Ethics Committee of Beijing Anzhen Hospital approved the study protocol, and all patients provided a written informed consent prior to participation.

Study design and participants

This study was conducted in China from June 2017 to December 2017. A total of 100 consecutive patients with HF (71 male and 29 female), who consulted the Cardiology Department of Anzhen Hospital, were enrolled in the study. Exclusion criteria included acute myocardial infarction, atrial fibrillation, pulmonary hypertension, acute pulmonary embolism, interstitial lung diseases, metabolic disorders, connective tissue diseases, malignancies, sepsis, and blood system diseases, use of medication within 6 months before enrollment. All the HF patients were diagnosed and treated in accordance with the criteria of the 2016 European Society of Cardiology (ESC) Guidelines for the diagnosis and treatment of acute and chronic heart failure.[ The New York Heart Association (NYHA) classification was used to grade the symptoms of patients with HF. Among these patients, 20 patients were classified as class I to II, while 80 patients were classified as class III to IV. In addition, another 113 age- and gender-matched healthy individuals were included, which served as controls.

Data collection

Data of the patients were collected from medical charts and clinical examinations. Demographic information (ie, age and gender) and health behaviors (ie, smoking status and alcohol consumption) were obtained by self-reporting during a face-to-face interview. A physical examination, including anthropometric measurements (height and body weight) and blood pressure measurements, was conducted by specifically trained staff. Body weight and height were assessed while the subjects stood barefoot in light clothing.

Laboratory examination

Blood samples were collected after overnight fasting, and immediately centrifuged for examination or stored at −70°C until assayed. Blood biochemical indicators, including fasting plasma glucose (FPG), triglyceride (TG), total cholesterol (TC), low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), C-reactive protein (CRP), serum creatinine (SCr), alanine aminotransferase (ALT), and aspartate aminotransferase (AST), were measured. Plasma BNP, Gal-3, and CK-MB were measured using a standard electrochemiluminesence immunoassay on an ARCHITECT I2000SR analyzer (Abbott Laboratories, Abbott Park, IL, USA). The sST2 was assayed via high sensitivity sandwich monoclonal immunoassay (PresageTM ST2 assay; Critical Diagnostics, NY, USA).

Echocardiographic measurements

Standard echocardiography with Doppler was performed by the same ultrasound technician. All study subjects underwent echocardiography after admission. Left ventricular ejection fraction (LVEF) was quantitatively measured using the Simpson two-dimensional methodology. Left ventricular end diastolic dimension (LVEDD) and left ventricular end systolic dimension (LVESD) were measured according to the American Society of Echocardiography guidelines.[

Definition of variables

Smoking was defined as a current cigarette consumption of an average of at least one cigarette daily for at least 1 year. Drinking was defined as a current consumption of an average of at least 50 g of alcohol daily for at least one year. BMI was calculated as weight in kilograms divided by height in meters squared (kg/m2). The estimated glomerular filtration rate (eGFR) value was calculated using the Chinese-modified Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation.

Statistical analysis

Statistical analyses were performed using the SAS (version 9.4; SAS Institute, Inc., Cary, NC, USA) and R statistical package. For statistical description, all continuous variables were described as mean ± standard deviation (SD), or medians and percentiles (25th percentile and 75th percentile). Comparisons of continuous variables between groups were performed using t test for normally distributed data and rank sum test for data not normally distributed. In addition, discrete variables were reported as numbers (n) and percentages (%), and compared using Chi-squared test. The Satterthwaite method was used for unequal variances. The Spearman rank correlation analysis was conducted between cardiac biomarkers and echocardiographic parameters. Correlations between 0 and 0.4, and 0.7, and 1 indicate a weak, moderate and strong correlation, respectively. A correlation coefficient >0.7, within 0.4 to 0.7 and <0.4 indicated a strong, moderate, and weak correlation, respectively. Linear regression analysis was performed to assess the relationship of the echocardiographic findings with the serum cardiac biomarkers. The receiver operating characteristic (ROC) curve was drawn to determine the diagnostic power of the cardiac biomarkers for the prediction of HF, in terms of sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). The area under the curve (AUC) was calculated to determine the optimal cut-off value of an individual predictor of HF, at the point when AUC value was maximized. Random forests of CK-MB, BNP, sST2, and Gal-3 were established for prediction via the randomForest package in R software, where the number of trees was specified as 500 to obtain stable results. The importance of individual variables was measured by the mean decrease in accuracy and Gini measures. Variables with a higher mean decrease in accuracy or Gini value were considered to be more important than those with lower values. Statistical significance was accepted as a two-sided test with a P value <0.05.

Results

Characteristics of participants

A total of 193 participants were included in the present study. Among these participants, 80 participants were patients with HF, while 113 participants were control subjects. The characteristics of these participants are presented in Table 1. Overall, the mean age of these patients was 53.7 ± 14.1 years old, and 61.7% of these patients were male. Patients with HF tended to be male. Furthermore, differences in age, height, weight and the BMI value between HF patients and controls were not statistically significant. The presence of HF was associated with a higher incidence of current smoking and drinking. The value of SBP was higher in patients with HF than that of control subjects. When compared with control subjects, patients with HF were more likely to have higher values of ALT, AST, SCr, and CRP, and lower values of eGFR, TC, LDL-C, and HDL-C.

Table 1

Baseline characteristics of 80 patients with HF and 113 age- and gender-matched healthy controls.

Baseline characteristics of 80 patients with HF and 113 age- and gender-matched healthy controls. Patients with HF had poor cardiac function and abnormal ventricular dimensions, as evidenced by the lower value of LVEF and higher values of LVEDD and LVESD, and in comparison to their counterparts. For cardiac biomarkers, patients with HF exhibited significantly higher levels of CK-MB, BNP, Gal-3, and sST2, compared with control subjects (all, P > 0.05).

Correlation and regression analysis

Spearman rank correlation analysis revealed that BNP was moderately correlated with NYHA (R = 0.4047, P = 0.0004). However, BNP was poorly correlated with echocardiographic measures (LVEF, LVEDD, and LVESD; all, R < 0.4) [Table 2]. The other three cardiac biomarkers, CK-MB, sST2 and Gal-3, were also poorly correlated with NYHA, as well as echocardiographic parameters (all, R < 0.4), expecting a weak but significant correlation between Gal-3 and NYHA (R = 0.3617, P = 0.0011).

Table 2

Result of Spearman rank correlation analysis between cardiac biomarkers and echocardiographic parameters.

Result of Spearman rank correlation analysis between cardiac biomarkers and echocardiographic parameters. The linear regression analysis revealed no significant relationship between any cardiac biomarker mentioned and echocardiographic parameters measured [Table 3].

Table 3

Result of linear regression analysis between cardiac biomarkers and imaging parameters.

Predictive performance of individual biomarkers for HF

BNP exhibited a good independent predictive capacity for HF, with a sensitivity, specificity and AUC of 94.4%, 90.3%, and 0.956, at a cut-off value of 60 pg/mL. At a cut-off of 100 pg/mL, the specificity of BNP increased (95.6%), but its sensitivity decreased (86.1%), for predicting HF [Figure 1].

Figure 1

Predictive performance of individual biomarkers for heart failure. BNP exhibited a good independent predictive capacity for HF, with an AUC of 0.956. CK-MB, sST2, and Gal-3 exhibited a modest diagnostic performance for HF, with an AUC of 0.709, 0.711, and 0.777, respectively. However, CK-MB, sST2, and Gal-3 exhibited a modest diagnostic performance for HF, with an AUC of 0.709, 0.711, and 0.777, respectively [Table 4].

Table 4

Prediction properties of single cardiac biomarkers.

Random forests

A multi-marker model was constructed by combining four cardiac biomarkers (CK-MB, BNP, Gal-3, and sST2) using the random forest method. In order to minimize the possible variance of the present results, the random forest algorithm was run multiple times to obtain the average of the predictions. The results of the random forests are presented in Figure 2.

Figure 2

The importance of variables in the random forest algorithm. A multi-marker model was constructed by combining four cardiac biomarkers (CK-MB, BNP, Gal-3, and sST2) using the random forest algorithm. BNP was the most important variable, as evidenced by its significantly higher mean decrease accuracy (A) and mean decrease in Gini (B), when compared with sST2, Gal-3, and CK-MB. BNP was the most important variable, as evidenced by a significantly higher mean decrease in accuracy and Gini, when compared with sST2, Gal-3, and CK-MB. Overall, there was a general increase in predictive performance using the multi-marker model, when compared with individual cardiac biomarkers, with a sensitivity, specificity, PPV, and NPV of 91.5%, 96.7%, 97.0%, and 90.8%.

Discussion

The present study analyzed a multi-marker approach via the random forest algorithm for the prediction of HF. The random forests, which were implemented with 500 trees when training, revealed that BNP was the most important cardiac biomarker for HF prediction, as assessed by the mean decrease in accuracy and Gini measures. This was consistent with the ROC results for individual cardiac predictors, which exhibited good sensitivity and specificity for BNP. In addition, the random forest algorithm revealed that the combination of the four cardiac biomarkers (CK-MB, BNP, Gal-3, and sST2) achieved a general increase in prediction accuracy for HF, suggesting its potential application for the clinical assessment of patients with HF. BNP is a neurohormone synthesized in the cardiac ventricles, and its release is directly proportional to ventricular expansion and pressure overload.[ Furthermore, BNP concentrations are higher in patients with more advanced NYHA classes,[ demonstrating a correlation between plasma BNP concentration and the NYHA classification.[ Consistently, it was found that among the four cardiac biomarkers, only plasma BNP concentration moderately correlated with the NYHA class. The combined assessment of cardiac biomarkers together with echocardiography provides more a powerful risk stratification for patients with HF across all stages.[ The relationship between BNP and echocardiographic measurements remain uncertain with mixed results, depending upon the study population. In some studies, plasma BNP concentration correlates moderately or strongly with echocardiographic indexes.[ However, other studies have revealed that echocardiographic indexes and circulating cardiac indicators (BNP and Gal-3) were poorly or even not correlated in patients with HF,[ which were consistent with the present results. In addition, linear regression analysis indicated no relationship among any of the cardiac biomarkers assessed and echocardiographic parameters measured. The results for single plasma cardiac biomarkers, in terms of predictive capacity, have been conflicting. The Breathing Not Properly Study measured plasma BNP concentrations in 1586 patients with acute dyspnea, and found that BNP was the best single predictor for predicting the final diagnosis of HF among demographic indicators, imaging parameters and laboratory variables.[ Furthermore, a study revealed the superior biochemically diagnostic capacity of BNP over Gal-3 and sST2 for HF.[ Consistently, a good predictive performance of BNP for HF diagnosis was found, with a sensitivity, specificity and AUC of 94.4%, 90.3%, and 0.956, respectively, at the cut-off of 60 pg/mL. However, CK-MB, sST2, and Gal-3 only exhibited modest discriminability for HF patients. Despite acceptable sensitivity, sST2 has been considered to lack of specificity for the prediction of HF, limiting its independent application as a diagnostic tool alone.[ In a study of 599 patients with acute dyspnea, Gal-3 exhibited a moderate diagnostic performance for acute HF with an AUC of 0.72 vs. an AUC of 0.94 for BNP.[ In contrast to BNP, both galectin-3 and sST2 were questioned to be probably not useful as an aid for the diagnosis of HF. [ Interestingly, the head-to-head comparison of sST2 vs. Gal-3 in chronic HF demonstrated the superiority of ST2 over Gal-3 in risk stratification.[ The present findings demonstrated an opposite result, in which the AUC of Gal-3 was slightly greater than that of sST2 for predicting HF. The random forest method represents as a useful tool to identify the most important predictors from a collection of variables by calculating several measures. One of the measures was mean decrease accuracy, which ranks the importance of the predictor based on the decrease in prediction accuracy, when the values of the variable are randomly permuted. The Gini index is another measure of the prediction powers of variables through the sum of all decreases in Gini impurity, which measures how often an element would be incorrectly labeled if randomly labeled according to the distribution of labels in the subset. Thus, a greater mean decrease in accuracy or Gini represents that a predictor feature plays a more important role in partitioning the data at a node of the decision tree. Ward et al used random forests to identify and validate important predictors of mortality among patients with systemic lupus erythematosus, achieving an AUC of 0.94.[ In the present study, the random forest test was performed for four predictors, and BNP displayed a greater mean decrease accuracy and Gini, compared with sST2, Gal-3, and CK-MB, highlighting it as the most important individual predictor among the variables examined for predicting HF. Since no single factor reliably predicts HF, attempts have been made to improve risk prediction by combining a diverse panel of biomarkers, allowing for the integration of various pathophysiological aspects of the disease process, such as myocardial injury, vascular load, neurohormonal activation and inflammation.[ The combination of ST2 and BNP offers a modest improvement in the risk stratification of chronic HF patients.[ Combining plasma Gal-3 and BNP increased the diagnostic and prognostic value over either of biomarkers alone.[ In a multi-center cohort of 1513 ambulatory HF patients, a multi-marker panel that comprised of seven circulating biomarkers was reflective of diverse biological pathways, and was proven to be a strong predictor that outperformed the Seattle Heart Failure Model (SHFM) score, and substantially improved the prediction of chronic HF.[ These findings support the potential use of the multi-marker approach for improving the predictive power for HF. In the present study, although BNP had a good predictive performance for HF, the improvement in accuracy was observed by the addition of CK-MB, Gal-3, and sST2 to BNP for the biochemical diagnosis of HF, suggesting the meaningfulness of a multi-marker strategy for HF diagnosis. Although the concept of a multi-marker tool was endorsed, determining how to identify an optimal panel of biomarkers for assessing HF is a formidable task. In order to classify patients and identify predictors, random forests were employed, which is a novel method that develops numerous decision trees, by which the accuracy of the predictors were tested. The random forest classifier operates by building a set of decision trees, and at each node in the trees, a random subset of the predictor variables are randomly selected and considered as split candidates. The dataset was repeatedly divided into subtrees, assessing predictor variables by importance on a basis of the change in the classification error affected by its presence or absence in the subset. Furthermore, the random forests also combine the predictions of multiple decision trees, improving the power of the algorithm.[ In the present study, the implementation of a random forest algorithm revealed that the combination of BNP, CK-MB, Gal-3, and sST2 achieved the greatest accuracy for HF prediction, with the sensitivity, specificity, PPV and NPV all exceeding 90%. This outperformed the models for individual cardiac biomarker assessment, suggesting an improvement in overall accuracy via a multi-marker strategy. The present study has several limitations. First, this was a preliminary observational study with relatively small sample size that was performed in a single center, and the relevance of these results among patients in other parts of the world was not validated. Second, given the potential confounding and selection bias by indication, the cautious interpretation of these data is needed. Third, the present study focused on four cardiac biomarkers (BNP, CK-MB, sST2, and Gal-3), and some other biomarkers reflective of other aspects of the underlying pathophysiology of HF, such as renal salt and water retention, and oxidative stress, were not included. These limitations could be overcome through future large-scale, prospective studies with multiple attempts of combining a diverse panel of biomarkers. Also, for four biomarkers, especially galectin-3 and ST2 which are mentioned by the 2013 American College of Cardiology/American Heart Association guideline for the management of heart failure,[ their prognostic value (eg, for hospitalization and mortality) in patients with HF deserves further investigation. In conclusion, the present study provided a framework for the exploration of the random forest algorithm in the prediction of HF. The present findings support that BNP has a higher accuracy for the diagnosis of HF, compared with CK-MB, galectin-3, and sST2. In addition, CK-MB, Gal-3, and sST2 to BNP led to a substantial improvement in the biochemical diagnosis of HF. The random forest algorithm provides a robust method to assess the accuracy and importance of predictor variables. In addition, its potential to validate the usefulness of multiple biomarkers for HF diagnosis needs further investigation.

Funding

This work was supported by grants from the National Natural Science Foundation of China (No. 81770353) and the Abbott China research fund (ADD-2017).

Conflicts of interest

This study received Abbott China research fund.

4 in total

1. Machine learning for subtype definition and risk prediction in heart failure, acute coronary syndromes and atrial fibrillation: systematic review of validity and clinical utility.

Authors: Amitava Banerjee; Suliang Chen; Ghazaleh Fatemifar; Mohamad Zeina; R Thomas Lumbers; Johanna Mielke; Simrat Gill; Dipak Kotecha; Daniel F Freitag; Spiros Denaxas; Harry Hemingway
Journal: BMC Med Date: 2021-04-06 Impact factor: 11.150

2. The amputation and mortality of inpatients with diabetic foot ulceration in the COVID-19 pandemic and postpandemic era: A machine learning study.

Authors: Chenzhen Du; Yuyao Li; Puguang Xie; Xi Zhang; Bo Deng; Guixue Wang; Youqiang Hu; Min Wang; Wu Deng; David G Armstrong; Yu Ma; Wuquan Deng
Journal: Int Wound J Date: 2021-11-24 Impact factor: 3.099

3. Early prediction of clinical scores for left ventricular reverse remodeling using extreme gradient random forest, boosting, and logistic regression algorithm representations.

Authors: Lu Liu; Cen Qiao; Jun-Ren Zha; Huan Qin; Xiao-Rui Wang; Xin-Yu Zhang; Yi-Ou Wang; Xiu-Mei Yang; Shu-Long Zhang; Jing Qin
Journal: Front Cardiovasc Med Date: 2022-08-17

4. Application of metagenomic next-generation sequencing in the diagnosis of pulmonary invasive fungal disease.

Authors: Chengtan Wang; Zhiqing You; Juanjuan Fu; Shuai Chen; Di Bai; Hui Zhao; Pingping Song; Xiuqin Jia; Xiaoju Yuan; Wenbin Xu; Qigang Zhao; Feng Pang
Journal: Front Cell Infect Microbiol Date: 2022-09-27 Impact factor: 6.073

4 in total