| Literature DB >> 33828587 |
Liye Zhou1, Zhifei Guo1, Bijue Wang1, Yongqing Wu2, Zhi Li3, Hongmei Yao4, Ruiling Fang2, Haitao Yang5, Hongyan Cao2,6, Yuehua Cui7.
Abstract
Heart failure with preserved ejection fraction (HFpEF) has become a major health issue because of its high mortality, high heterogeneity, and poor prognosis. Using genomic data to classify patients into different risk groups is a promising method to facilitate the identification of high-risk groups for further precision treatment. Here, we applied six machine learning models, namely kernel partial least squares with the genetic algorithm (GA-KPLS), the least absolute shrinkage and selection operator (LASSO), random forest, ridge regression, support vector machine, and the conventional logistic regression model, to predict HFpEF risk and to identify subgroups at high risk of death based on gene expression data. The model performance was evaluated using various criteria. Our analysis was focused on 149 HFpEF patients from the Framingham Heart Study cohort who were classified into good-outcome and poor-outcome groups based on their 3-year survival outcome. The results showed that the GA-KPLS model exhibited the best performance in predicting patient risk. We further identified 116 differentially expressed genes (DEGs) between the two groups, thus providing novel therapeutic targets for HFpEF. Additionally, the DEGs were enriched in Gene Ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways related to HFpEF. The GA-KPLS-based HFpEF model is a powerful method for risk stratification of 3-year mortality in HFpEF patients.Entities:
Keywords: genetic algorithm; heart failure with preserved ejection fraction; kernel partial least squares; machine learning; risk prediction
Year: 2021 PMID: 33828587 PMCID: PMC8019773 DOI: 10.3389/fgene.2021.652315
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Clinical characteristics of the study population (N = 149).
| Characteristic | Good-outcome group (107) | Poor-outcome group (42) |
| p-value |
|---|---|---|---|---|
| Age, years | 74.44 ± 8.23 | 76.50 ± 7.46 | 0.572 | 0.568 |
| Female, | 40(37.4) | 18(42.9) | 0.380 | 0.538 |
| Comorbidities, | ||||
| Hypertension | 84(78.5) | 33(78.6) | <0.001 | 0.993 |
| Hyperlipidemia | 70(65.4) | 26(61.9) | 0.163 | 0.687 |
| Diabetes | 27(25.2) | 11(26.2) | 0.015 | 0.904 |
| Vital signs and laboratory data | ||||
| Systolic blood pressure, mmHg | 127.74 ± 18.44 | 138.88 ± 22.71 | −3.102 | 0.002 |
| Diastolic blood pressure, mmHg | 65.64 ± 11.58 | 67.83 ± 9.55 | −1.08 | 0.279 |
| Body mass index, kg/m2 | 29.84 ± 5.47 | 29.21 ± 5.68 | 0.633 | 0.528 |
| Serum creatinine, mg/dl | 1.24 ± 0.86 | 1.29 ± 0.88 | 0.288 | 0.774 |
| Total cholesterol, mg/dl | 162.12 ± 36.70 | 167.74 ± 41.31 | −0.811 | 0.419 |
| Heart rate, bpm | 62.50 ± 10.90 | 64.45 ± 12.97 | −0.929 | 0.354 |
Shows the statistical significance at the α = 0.05 level.
Model performance.
| Model | Se | Sp | AUC | ACC | Youden | F-measure | MCC | G-means |
|---|---|---|---|---|---|---|---|---|
| GA-KPLS | 0.925 | 0.984 | 0.955 | 0.968 | 0.909 | 0.939 | 0.921 | 0.953 |
| RF | 0.319 | 0.974 | 0.646 | 0.793 | 0.293 | 0.445 | 0.427 | 0.535 |
| LASSO | 0.605 | 0.943 | 0.774 | 0.850 | 0.548 | 0.678 | 0.608 | 0.745 |
| RR | 0.469 | 1.000 | 0.734 | 0.853 | 0.469 | 0.618 | 0.620 | 0.669 |
| Logit | 0.549 | 0.574 | 0.591 | 0.567 | 0.122 | 0.410 | 0.112 | 0.548 |
| SVM | 0.870 | 0.989 | 0.929 | 0.956 | 0.859 | 0.913 | 0.891 | 0.926 |
Figure 1Boxplot of the area under the curve (AUC) values for the six different models (based on 1,000 random splits). The y-axis represents the AUC value. Values of p were obtained using Dunnett’s multiple-comparison test.
Figure 2Kaplan-Meier survival curves of the good-outcome and poor-outcome groups. (A) The survival curve including the original 29 patients in the testing cohort and (B) the survival curve based on the predicted survival outcomes using the GA-KPLS method.
Figure 3The heatmap of DEGs between the good-outcome and poor-outcome groups. Each column represents a patient, and each row represents a gene. Patients labeled with the black bar are poor-outcome samples, and those with the gray bar are good-outcome samples.
Figure 4Gene Ontology (GO) enrichment analysis of DEGs. The x-axis shows the number of genes, and the y-axis indicates the GO terms. Bars with different colors correspond to different GO categories, with green representing biological process, orange representing cellular component, and blue representing molecular function.