Literature DB >> 33868594

Subtypes identification on heart failure with preserved ejection fraction via network enhancement fusion using multi-omics data.

Yongqing Wu¹, Huihui Wang¹, Zhi Li², Jinfang Cheng³, Ruiling Fang¹, Hongyan Cao^1,4, Yuehua Cui⁵.

Abstract

Heart failure with preserved ejection fraction (HFpEF) is associated with multiple etiologic and pathophysiologic factors. HFpEF leads to significant cardiovascular morbidity and mortality. There are various reasons that fail to identify effective therapeutic interventions for HFpEF, primarily due to its clinical heterogeneity causing significant difficulties in determining physiologic and prognostic implications for this syndrome. Thus, identifying clinical subtypes using multi-omics data has great implications for efficient treatment and prognosis of HFpEF patients. Here we proposed to integrate mRNA, DNA methylation and microRNA (miRNA) expression data of HFpEF with a similarity network fusion (SNF) method following a network enhancement (ne-SNF) denoising technique to form a fused network. A spectral clustering method was then used to obtain clusters of patient subtypes. Experiments on HFpEF datasets demonstrated that ne-SNF significantly outperforms single data subtype analysis and other integrated methods. The identified subgroups were shown to have statistically significant differences in survival. Two HFpEF subtypes were defined: a high-risk group (16.8%) and a low-risk group (83.2%). The 5-year mortality rates were 63.3% and 33.0% for the high- and low-risk group, respectively. After adjusting for the effects of clinical covariates, HFpEF patients in the high-risk group were 2.43 times more likely to die than the low-risk group. A total of 157 differentially expressed (DE) mRNAs, 2199 abnormal methylations and 121 DE miRNAs were identified between two subtypes. They were also enriched in many HFpEF-related biological processes or pathways. The ne-SNF method provides a novel pipeline for subtype identification in integrated analysis of multi-omics data.

Entities: Chemical Disease Gene Species

Keywords: Biomarkers; HFpEF; Multi-omics data integration; Subtypes identification; ne-SNF

Year: 2021 PMID： 33868594 PMCID： PMC8039555 DOI： 10.1016/j.csbj.2021.03.010

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Heart failure (HF) is the leading cause of hospitalization and death around the world. Approximately half of HF patients have a preserved left ventricular ejection fraction (HFpEF) and its prevalence is rising at an alarming rate of 1% per year relative to HF with reduced ejection fraction (HFrEF) [1], [2]. HFpEF patients are highly heterogenous and have poor prognosis. No effective treatment has been found to treat HFpEF patients [3]. Compared to HFrEF, HFpEF patients are more likely to be older female, hypertensive, anemic, and having atrial fibrillation. The annual mortality rate for HFpEF patients is close to 8%; and the prevalence and 5-year mortality rate of HFpEF patients over the age of 70 are close to 50% [4]. With an aging population worldwide, the growing challenge of HFpEF requires urgent attention to speed up disease characterization and to identify subtypes of patients for more targeted clinical diagnosis and treatment of HFpEF patients. To date, most approaches to identify the subtypes of HFpEF are mainly focused on phenomapping of HFpEF [5] using latent variable model and hierarchical clustering methods, based on phenotypic features such as demographic, physical and clinical characteristics, laboratory, electrocardiographic and echocardiographic data. Although the targeted therapy for the identified HFpEF phenogroups seems promising, the current therapy is limited to diuretic drugs and the treatment of co-morbidities [6]. The subtyping based on phenotypic features is still insufficient to identify HFpEF subgroups with unique underlying pathophysiology and to assess the response to personalized therapy [7]. As the heritability of cardiovascular diseases (CVD) is at 40–50% in the general population [8] several molecular subtype studies were developed for CVD [9]. The subtypes based on common molecular features lead to novel classification of HFpEF, and shed light on mechanisms and plasma biomarkers involved in HFpEF [6]. With the radical advances in sequencing and computer technology, we are now able to extract high throughput multi-omics data without limit. Integrative genomic study is being increasingly emphasized [10]. It is promising that multi-omics data integration can capture associations and potential causal relationships between different data types [11] and further form a comprehensive understanding of the underlying biological processes [10]. Rappoport and Shamir [12] did a comprehensive review of multi-omic data integration methods in general. Specifically, the authors divided joint clustering of multiple data types into four categories: 1) Early integration, which simply combines multiple omics data matrices to form a single matrix and is the most simple one to apply. However, it has a few discernible limitations in practice as described in the paper; 2) Late integration, a method in which each omics data type is clustered separately using single-omic algorithm, then different clusters are integrated. One noticeable drawback is that one can lose signals that are weak in individual omic data type; 3) Methods based on statistical modeling (e.g. iCluster [13]), which assume a probabilistic distribution of the data and are powerful but they run slower and are sensitive to feature selections; and 4) Similarity-based methods (e.g. similarity network fusion (SNF) [14]). The similarity-based methods construct similarity matrices for each omic data type separately. These similarities are then integrated, with the advantage of incorporating diverse omics data types. SNF uses message-passing theory to propagate information through interactions between samples, thus effectively reduces noises and strengthens similarities present in one or more networks. However, it is acknowledged that in the fused network, the unavoidable noise features generated by the limitations of measurement technology and inherent natural variation, may dilute clustering signals and lead to some spurious associations between samples [15]. To alleviate this disadvantage, it is essential to further denoise the fused networks to achieve efficient and precise subtyping. Recently, Wang et al. [16] proposed a network enhancement (NE) approach to denoise weighted biological networks. NE uses a doubly stochastic matrix operator to induce sparsity. Therefore, it can remove weak network edges and enhance true connectivity, which leads to better performance in downstream analysis. Given the promise of NE, in this work, we incorporated the NE strategy to denoise the SNF fused network (ne-SNF), thus strengthening strong similarities between patients and removing weak edges mainly caused by noises in the fused network. The proposed ne-SNF successfully improved the performance of subtype identification compared to SNF and other current state-of-the-art integration methods. Specifically, using multi-omics data collected by Framingham Heart Study (FHS), we aimed to discover molecular subtypes of HFpEF patients, and identify biomarkers and key pathways. Subsequent biological analysis of the key molecular features and pathways were shown to hold potential prognostic value and biological significance.

Materials and methods

Data

Ethics statement

The Framingham Heart Study (FHS) data set used for this work contain fully deidentified individuals obtained through dbGAP (http://dbgap.ncbi.nlm.nih.gov). This is a secondary data analysis and the authors had no role in collecting the patient data. An IRB approval was awarded before accessing the data set.

Diagnosis of HFpEF patients in Framingham Heart study

The FHS data in this study included clinical data, survival data and multi-omics data, downloaded from dbGAP (study accession: phs000007, http://dbgap.ncbi.nlm.nih.gov). The cohort includes participants from Framingham, MA, to undergo biennial examinations to investigate CVD and its risk factors since 1948 [17]. Offspring (and their spouses) and adult grandchildren of the original cohort participants were recruited into the second- and third-generation cohorts in 1971 and 2002 [18]. According to the “2016 ESC guidelines for diagnosis and treatment of acute and chronic heart failure”, we selected HFpEF participants with the following criteria: (1) signs and symptoms of HF; (2) left ventricular ejection fraction >50%; (3) B-type natriuretic peptide >35 pg/ml and/or N-terminal-pro hormone B-type natriuretic peptide >125 pg/ml; and (4) showing related structural heart failure (left ventricular hypertrophy/left atrial enlargement) and/or diastolic dysfunction. Patients with valvular vitium of the heart and hypertrophic cardiomyopathy were removed. Out of a total of 175 HFpEF patients, 125 participants from the offspring cohort that took place between 1971 and 2011 (at examination 8) who had mRNA expression, DNA methylation and miRNA expression profiling were included in this study. Survival time was from the time of admission of HFpEF diagnosis to the time of last follow-up (2011) or time of death from cardiovascular disease. The outcome was death from cardiovascular disease or survival. The survival time ranged from 0 to 29.6 years.

Omics data preprocessing

Three types of omics data were obtained, including 17,874 mRNA expression probes, 443,207 methylation CpG sites, and 416 miRNAs. We converted 17,874 mRNA expression probes to 17,358 gene level expressions by taking the mean value of multiple probes as the gene level expression, following the annotation from the Affymetrix GeneChip Human Exon 1.0 ST platform. Using the annotation from the Illumina Human Methylation 450 BeadChip platform, we mapped 443,207 CpG sites to 27,604 genes. We then took the mean beta value of multiple CpG sites in a gene as the gene level methylation signal. For miRNA, we filtered out biological features which had >30% of missing values across patients, leaving 212 miRNA features. There was no missing data in mRNA expression. Missing values in both methylation and miRNA data were imputed with the K nearest neighbor (KNN) imputation method [19].

Statistical method

Similarity network fusion (SNF)

SNF [14] is a computational method to integrate multi-omics data by calculating and fusing the similarity networks of patients. To make the work self-contained, here we briefly introduce the SNF algorithm which is implemented through the following three steps: Step 1, generates similarity networks from each omics data type separately. Suppose we have n samples and m data sources. A patient similarity network is represented as a graph . The vertices V correspond to patients and the edges E represent the correlations between two nodes (i.e., patients). Edge weights are represented by an similarity matrix W. indicates the similarity between patients and and is computed as follows,where is the Euclidean distance between patients and is a hyperparameter that can be empirically set in which we set as 0.5; , where is the averaged value of the distances between and each of its neighbors. To compute the fused similarity matrix from multiple data sources, Wang et al. [14] defined a full and sparse kernel given as,so that holds. Step 2, a local kernel matrix using K nearest neighbors (KNN) is introduced to further reduce noise between patients, where S only retains the similarities between samples and their nearest neighbors, while the similarities between samples far away from each other are discarded (set as 0). i.e., Step 3, similarity networks are iteratively updated as described below. Specifically, letwhere represents the similarity matrix derived from the vth data type; represents the local affinity which contains the nearest neighbors’ information; represents the fused network; and is the number of different data types. As a result, the similarity networks from different sources are more similar to realize the network fusion.

Denoising SNF with network enhancement (ne-SNF)

To improve the signal-to-noise ratio of the fused network based on SNF, we incorporated the NE strategy [16] to denoise SNF, termed as ne-SNF. Given the fused similarity network initially obtained with SNF, NE constructed a symmetric and doubly stochastic matrix (DSM) with the following two steps:where is an indicator function, represents a set of ’s neighbors including in , is a symmetric DSM which encodes the local structures of the original network. The further diffusion process using was defined as follows [16]:where is the iteration step and is a regularization parameter. The initial value can be set as the SNF fused similarity matrix . remains as a symmetric DSM at each iteration t and further converges to a non-trivial equilibrium symmetric DSM network. NE defines a diffusion process that uses random walks of length three or less and a form of regularized information flow to denoise the input network [16]. Following Wang et al. [16] after NE, 1) if the original eigenvalues are either 0 or 1, the process preserves these eigenvalues; 2) Let eigen-pair () denote the eigen-pair of the initial DSM (. Then the final converged graph has eigen-pair , where . Because , the NE process always decreases the eigenvalues; and 3) While all eigenvalues are reduced, function denoises the input by aggressively down-weighting smaller eigenvalues more than it does on larger eigenvalues. Thus, the constructed undirected network can effectively improve the similarity between related nodes, leading to better subtyping performance.

Spectral clustering

Given the ne-SNF denoised similarity network , we used spectral clustering [20] to obtain network clusters. The spectral clustering can effectively capture the global structure of a graph [21] with the following processes: First, each sample is associated with a label indicator , where if patient belongs to the kth cluster, otherwise . A partition matrix is obtained to represent a clustering scheme. The spectral clustering based on similarity matrix aims to minimize the objective function as follows,where and where D is a diagonal matrix whose diagonal element is the sum of the row elements of .

Evaluation of prognosis of HFpEF patients

We used survival analysis to evaluate the performance of the subtyping results and if the identified subgroups are clinically meaningful in terms of survival rate. The Kaplan-Meier survival curve represents the change in survival rate over time, and it is a monotone decreasing curve representing the cumulative survival rate [22]. The abscissa represents the survival time, and the ordinate represents the cumulative survival probability. When , the survival probability is 1, and the survival probability gradually decreases with the increase of time. The slope of decline indicates the speed of death rate, and the survival curve can intuitively measure the survival risk of different subgroups. In addition, the log-rank test was used to test the difference of survival curves. Controlling for age, gender, smoking history, drinking, hypertension, hyperlipidemia and body weight index, we conducted Cox regression to explore the association between subtypes and HFpEF survival outcomes, and identified patients with high risk of prognosis. Meanwhile, the subtypes were compared in demographic and clinical features using Chi-squared test, two-sample t-test or Mann-Whitney U test. We further explored the significantly differentially expressed (DE) mRNAs (DEmRNAs) and DE miRNAs (DEmiRNAs) between subtypes using the significance analysis of microarrays (SAM) tool [23] and used the R Limma package [24] to identify abnormal methylation genes. We presented the top 15 features of mRNA, DNA methylation and miRNA expression features that were most relevant to the final clustering results, and performed a series of biological functional analysis on these features using DAVID tools [25] including GO [26] enrichment analysis and KEGG [27] pathway analysis. miRTarBase (http://mirtarbase.cuhk.edu.cn/php/search.php) [28] was used to predict the target genes of DEmiRNAs.

Results

Characteristics of HFpEF in FHS

In this study, we used the data of all 125 HFpEF patients, who ranged from 51 to 88 years old, with an average age of 74.8 years. By the last follow-up, 57 patients survived and 68 died; the survival time ranged from 0 to 29.6 years. Their baseline characteristics were shown in Table 1.

Table 1

Baseline characteristics of the study population (N = 125).

Item	Classification	n (%)/mean ± SD
Age, year		74.81 ± 8.35
Gender	Female	48(38.4)
	Male	77(61.6)
Smoking history	Yes	18(14.4)
	No	107(85.6)
Drinking	Yes	63(50.4)
	No	62(49.6)
Comorbidities
Hypertension	Yes	91(72.8)
	No	34(27.2)
Hyperlipidemia	Yes	78(62.4)
	No	47(37.6)
Diabetes	Yes	33(26.4)
	No	92(73.6)
Chronic kidney disease	Yes	19(15.2)
	No	106(84.8)
Vital signs and laboratory data
Systolic blood pressure, mmHg		132.86 ± 20.37
Diastolic blood pressure, mmHg		66.75 ± 11.35
Body weight index, kg/m²		30.36 ± 5.52
Serum creatinine, mg/dl		1.25 ± 0.91
Heart rate, bpm		63.33 ± 11.77
BNP, median (IQR), pg/ml		1036.5(947.4)
LVEF, %		59.82 ± 12.61

Baseline characteristics of the study population (N = 125).

Subtyping of HFpEF using ne-SNF

We carried out the ne-SNF method for the 125 patients with HFpEF using the mRNA expression (17,358 mRNAs), miRNA expression (212 sites) and DNA methylation (27,604 genes) data. Fig. 1 shows the flowchart of the ne-SNF analysis.

Fig. 1

Schematic representation of the pipeline for the proposed ne-SNF method.

Schematic representation of the pipeline for the proposed ne-SNF method. We compared the result of ne-SNF with the ones obtained by using single data type as well as two other subtyping strategies (unsupervised multiple kernel learning (UMKL) [29] and SNF) (see Fig. 2). As expected, networks built with individual data type yielded quite different patterns of patient similarity, while the fused data clustered tightly and quite similarly. After the denoising step with ne-SNF, the clustering result clearly shows three distinct sub-clusters.

Fig. 2

Heatmaps of similarity matrix derived from single data type (mRNA, DNA methylation and miRNA) and using different methods (UMKL, SNF and ne-SNF). In each similarity matrix, the color shade represents the degree of patient-patient similarity: the darker the color, the higher the similarity between two individuals. Patients with high similarities define a subtype group, called a subgroup. We further reported p-values using the log-rank test to evaluate the significance of the difference in survival profiles between subgroups (Fig. 3). Analysis with single data type did not lead to significantly different survival profiles except for the result using miRNA expression, while the fused networks had significant differences in survival between subtypes in most cases. Although the smallest p-value is observed with 3 clusters when using the miRNA data alone, the heatmap does not show a clear pattern of three clusters (see Fig. 2). Among the three fusion methods (UMKL, SNF and ne-SNF), ne-SNF achieved the smallest p-value (P = 0.0059) with 3 clusters.

Fig. 3

Comparison of −log10(p-value) based on single data type, UMKL, SNF and ne-SNF under different number of clusters (2,3,4 and 5).

Comparison of −log10(p-value) based on single data type, UMKL, SNF and ne-SNF under different number of clusters (2,3,4 and 5). We further explored the association between prognosis of HFpEF and the identified 3 subgroups based on ne-SNF. A total of 75 patients (60%) in group 1 had a 5-year mortality rate of 34.5%, 29 patients (23.2%) in group 2 with a 5-year mortality rate of 29.6%, and 21 patients (16.8%) in group 3 with a 5-year mortality rate of 63.3%. The difference of survival conditions between group 1 and group 2 was not significant ( = 0.499, P = 0.48). Therefore, we combined group 1 and group 2 patients into one group which was defined as the low-risk group, while group 3 patients were defined as the high-risk group. Table 2 shows the 5-year mortality of the two groups. The high-risk group comprised of fewer patients (21 patients, 16.8%), but had the higher 5-year mortality rate (63.3%). Those in the low-risk group had 5-year mortality rate of 33%. The survival curve and the first three principal components (PCs) of the two subtypes were shown in Fig. 4. It can be seen from the survival curve that the two subtypes based on ne-SNF had significant differences in clinical prognosis (left figure). The survival probability of the high-risk group was significantly lower than that of the low-risk group (P = 0.0017). Besides, the 3D scatter plot (right figure) based on the top three PCs can distinguish the two subtypes (as shown in different colors).

Table 2

The 5-year mortality of low-risk group and high-risk group.

Item	Low-risk group		High-risk group
Total, n (%)	104 (83.2)		21 (16.8)
5-year mortality (%)	33.0		63.3
χ2		9.835
P-value		0.002

Fig. 4

Kaplan-Meier survival curves for the low- and high-risk groups (left), and 3D scatter plots of the first three PCs (right).

The 5-year mortality of low-risk group and high-risk group. Kaplan-Meier survival curves for the low- and high-risk groups (left), and 3D scatter plots of the first three PCs (right).

Association of prognosis with identified molecular subtypes

Controlling for age, gender, smoking history, drinking, hypertension, hyperlipidemia and body mass index, we conducted Cox regression to explore the association between the two subtypes and HFpEF survival outcomes. The results were shown in Table 3. Patients in the high-risk group were 2.43 times higher in risk of death than the low-risk group. The only significant covariate is drinking (P = 0.04) and it is a protective factor.

Table 3

List of Cox regression results of 125 HFpEF patients.

Item	Coefficient (SE)	Wald	P	HR	95%CI
Subtypes*	0.890 (0.282)	3.153	0.002	2.432	(1.400,4.227)
Age	0.021 (0.016)	1.313	0.189	1.021	(0.990,1.053)
Gender	−0.239 (0.280)	−0.856	0.392	0.787	(0.455,1.361)
Smoking history	0.308 (0.331)	0.933	0.351	1.361	(0.712,2.602)
Drinking*	−0.562 (0.286)	−1.966	0.040	0.570	(0.325,0.998)
Hypertension	0.189 (0.394)	0.481	0.630	1.209	(0.599,2.614)
Hyperlipidemia	−0.108 (0.331)	−0.326	0.745	0.898	(0.469,1.718)
Body mass index	−0.037 (0.025)	−1.489	0.136	0.964	(0.919,1.012)

*Showing statistical significance at the 0.05 significance level.

List of Cox regression results of 125 HFpEF patients. *Showing statistical significance at the 0.05 significance level. Clinical characteristic varied significantly by low- and high-risk groups. As shown in Table 4, hypertension, hyperlipidemia and chronic kidney disease were reported more frequently in the high-risk group.

Table 4

Clinical and laboratory characteristics stratified by subtypes.

Characteristic	Low-risk group	High-risk group	χ2/t	p-value
Age, years	75.08 ± 8.59	73.52 ± 7.37	0.773	0.441
Female, n (%)	42(40.4)	6(28.6)	1.031	0.310
Comorbidities, n (%)
Hypertension*	72(68.9)	19(90.9)	4.422	0.035
Hyperlipidemia*	61(58.3)	17(81.8)	4.291	0.038
Diabetes	28(26.9)	5(23.8)	0.087	0.768
Chronic kidney disease*	12(11.7)	7(31.8)	4.263	0.039
Vital signs and laboratory data
Systolic blood pressure, mmHg	133.90 ± 20.35	127.6 ± 20.65	1.278	0.204
Diastolic blood pressure, mmHg	66.59 ± 11.71	67.57 ± 9.92	−0.360	0.720
Body weight index, kg/m²	30.43 ± 5.49	30.05 ± 5.91	0.282	0.778
Serum creatinine, mg/dl	1.26 ± 0.98	1.19 ± 0.41	0.313	0.755
Heart rate, bpm	63.68 ± 11.78	61.57 ± 12.13	0.746	0.457
BNP, median (IQR), pg/ml	1003.5(979.9)	1074.99(965.3)	−0.746	0.456
LVEF, %	59.79 ± 12.62	59.96 ± 12.87	−0.058	0.954

Categorical variables are presented as counts and percentages, continuous variables are presented as mean ± SD.

*Showing statistical significance at the 0.05 significance level.

Clinical and laboratory characteristics stratified by subtypes. Categorical variables are presented as counts and percentages, continuous variables are presented as mean ± SD. *Showing statistical significance at the 0.05 significance level.

Functional annotation of important features between HFpEF subtypes

Focusing on the low- and high-risk groups, we conducted a DE analysis for each omics data type. The low-risk group was used as a control group. Based on an FDR q-value <0.05 threshold, a total of 157 DEmRNAs were identified, among which 95 were up-regulated and 62 were down-regulated; 121 DEmiRNAs were identified, among which 10 were up-regulated and 111 were down-regulated. We also identified 2199 abnormal DNA methylation genes, among which 1627 were hypermethylated and 572 were hypomethylated following the Bonferroni adjusted criteria of < 0.05 and > 2. A heatmap of different omics features between the low- and high-risk group is shown in Fig. 5.

Fig. 5

The heatmap of DEmRNAs, abnormal methylations and DEmiRNAs between the low- and high-risk group. Each row represents an individual feature and each column represents a patient. Red and blue color represents relatively high and low expression respectively, with the intensity of the color representing the magnitude of high/low expression. The heatmap indicates that the high- and low-risk groups of HFpEF patients are highly heterogeneous among the three data types. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) To further demonstrate the biological significance of features associated with the two subtypes, we selected the top 15 significant features in mRNAs, methylations and miRNAs for further analysis. If these selected features are differentially expressed across the two subgroups, their biological implications could help us confirm that the subtypes are biologically meaningful in addition to their clinical implication. Firstly, we constructed a corresponding heatmap of the total 45 features between two subtypes (see Fig. 6). For the mRNA expression, in the low-risk group, SLC25A24, MTERF1, SCLT1, USP1, ARHGAP18, SH3BGRL2, MTURN, NEIL3 showed low expression, while high-risk group exhibited completely opposite performance in the same gene set. In the methylation data, low-risk group showed hypermethylation in IMPG2, SPATA5L1, while high-risk group showed hypermethylation in other 13 methylations. The heatmap of miRNAs showed decreased expression of hsa-miR-233-3p, hsa-miR-126-5p, hsa-miR-454-3p, hsa-miR-590-5p, hsa-miR-186-5p-a1, hsa-miR-186-5p-a2, hsa-miR-19a-3p, hsa-miR-222-3p, hsa-miR-374b-5p, hsa-miR-144-5p, hsa-miR-16-5p, hsa-miR-106a-5p, and hsa-miR-17-5p, which correlated with the decreased survival rate.

Fig. 6

The heatmap of the selected top15 mRNAs, methylation genes and miRNAs that are associated with the two risk subgroups.

The heatmap of the selected top15 mRNAs, methylation genes and miRNAs that are associated with the two risk subgroups. Second, focusing on each single feature, we performed survival analysis with a log-rank test, and found that >1/3 features showed good partition ability ( < 0.05), including three mRNAs (EIF4A1, SH3BGRL2, MTERF1), seven methylations (ZNF689, IMPG2, SPATA5L1, LRRIQ1, ATP6V1G2, PYDC2, RAC3) and six miRNAs (hsa-miR-186-5p-a2, hsa-miR-186-5p-a1, hsa-miR-19a-3p, hsa-miR-17-5p, hsa-miR-106a-5p, hsa-miR-374b-5p). Fig. 7 shows the Kaplan-Meier survival curves of the top 3 most significant features in mRNAs, methylations and miRNAs. It can be seen that these identified biomarkers had direct clinical prognostic value.

Fig. 7

Plots of survival curves of the top 3 most significant features in mRNAs, methylations and miRNAs: the top 3 most significant mRNAs, namely, EIF4A1, SH3BGRL2 and MTERF1 (A-C), the top 3 most significant methylations, namely, IMPG2, PYDC2 and ATP6V1G2 (D-F), and the top 3 most significant miRNAs, namely, hsa-miR-19a-3p, hsa-miR-186-5p-a1, hsa-miR-186-5p-a2 (G-I). We further merged the identified mRNAs, methylation genes and targeted genes of miRNAs into a core set, to determine the functional relevance of the selected features, then performed the GO enrichment analysis and KEGG pathway analysis. The genes targeted by miRNAs were predicted using miRTarBase (an experimentally validated miRNA-target interaction database). The final core gene set contained 241 genes including 15 original mRNAs, 15 methylation genes and 211 miRNA targeted genes. The result showed that the core gene set was enriched in 17 GO biological processes, with the Bonferroni-adjusted p-value <0.05. Fig. 8 depicted all significant GO pathways. For KEGG pathway analysis, 45 pathways showed statistical significance (Bonferroni-adjusted p-value <0.05). Fig. 9 showed 17 KEGG pathways with the most significant enrichment.

Fig. 8

GO biological process enrichment analysis of 241 core genes.

Fig. 9

KEGG enrichment analysis of 241 core genes.

GO biological process enrichment analysis of 241 core genes. KEGG enrichment analysis of 241 core genes.

Discussion

Classification of HFpEF based on ne-SNF

HFpEF is a highly heterogenous disease with no effective treatment so far. Identifying high-risk group with the aid of molecular data can provide meaningful clinical support for effective therapeutic treatment of HFpEF patients. Shah [30] did a comprehensive review on precision medicine for HFpEF. While the reviewed works mostly focus on clinical data from a deep phenotyping perspective, there has been a large shift on precision medicine to omics-based data driven studies. In this work, we conducted subtype identification of HFpEF patients using multi-omics data. We proposed an ne-SNF method to improve SNF for subtype identification. HFpEF patients were further classified into two groups (low-risk and high-risk group) which had dramatic difference in survival rate. The high-risk group consisted of a smaller proportion of the study population (16.8%); however, it had a higher 5-year mortality rate at 63.3% compared to the low-risk group with a 5-year mortality rate of 33.0%. Furthermore, after adjusting for the covariates’ effects, HFpEF patients in the high-risk group were 2.43 times higher in death risk compared to the low-risk group. The high-risk population and molecular markers identified in this analysis can potentially help for the development of novel targeted therapies. Our proposed ne-SNF method took the advantages of both SNF and network denoising with the network enhancing technique [16]. Given the SNF fused similarity network, we used NE as a preprocessing step for spectral clustering, thus effectively denoising the fused similarity network, and further improving the clustering performance. The ne-SNF method generated meaningful disease groups compared to the ones obtained with single data type and other integrated methods for HFpEF subtyping; thus, provides a novel pipeline for disease subtype identification in integrated analysis of multi-omics data.

HFpEF was associated with chronic comorbidities

The subtypes obtained with the ne-SNF demonstrated both biological and clinical relevance for HFpEF patients. The identified high-risk group had high 5-year mortality, and was commonly associated with hypertension, hyperlipidemia and chronic kidney disease. This result illustrated that HFpEF along with chronic diseases might be the main cause of patients’ death. Individuals with hypertension were at high risk of HFpEF, and multimorbidity is common in HFpEF [31]. HFpEF patients had a high rate of adverse outcome with a high rate of hyperlipidemia and chronic kidney disease [5]. This implies that one should make more effort in treating HFpEF patients with chronic diseases.

Role of molecular biomarkers in the pathophysiology of HFpEF

Our study discovered 157 DEmRNAs, 121 DEmiRNAs and 2199 abnormal methylations for HFpEF. Among DEmRNAs, 95 were up-regulated and 62 were down-regulated. Among DEmiRNAs, 10 were up-regulated and 111 were down-regulated. Among 2199 abnormal methylations, 1627 were hypermethylated and 572 were hypomethylated. In DEmRNAs, heart failure patients had elevated myocardia NEIL3 expression. NEIL3 regulated LDL-C and HDL-C levels and was associated with traditional cardiovascular risk factors [32], [33]. The expression level of MMP13 was significantly increased while the expression of VEGF was significantly decreased in the high-risk group. Significant changes in gene expression in the two genes associated with HF were recently reported in a study using Dahl salt sensitive rat [34]. In addition, gene LUM coding for lumican was up-regulated in the high-risk group. Lumican is an extracellular matrix localized proteoglycan associated with inflammatory conditions known to bind collagen [35]. For the identified DE methylated genes, some studies have shown that DNA methylation can regulate the expression of genes related to cardiovascular disease and further affect the occurrence and development of cardiovascular disease. In our study, we found that MYBPC3 had higher methylation levels in the high-risk group, which caused gene MYBPC3 silencing; and MYBPC3 ablation caused defective diastolic relaxation and may affect an individual's susceptibility to develop HFpEF [36]. miRNAs have been shown to be attractive therapeutic targets. Inhibiting an miRNA or stimulating its activity can potentially influence many gene expressions. This could lead to significant therapeutic effects compared to standard drug treatments [37]. In our analysis, hsa-miR-106a-5p, hsa-miR-193b-3p, hsa-miR-193b-5p, hsa-miR-191-5p and hsa-miR-660-5p showed significantly differential expression between the low- and high-risk group. Existing literature revealed that these miRNAs were associated with several mechanisms of potential HF [38], [39]. In addition, some HFpEF-related genes identified in this study were found to be tumor-related. For example, gene ARHGAP18 was shown to be overexpressed in highly migratory triple-negative breast cancer cells; gene CLIC4 was shown to be a diagnostic biomarker for different types of epithelial ovarian cancer [40], [41]. Low-grade inflammation and increased oxidative stress were the main cause of cardiovascular disease, including coronary artery disease and heart failure [42]. While inflammatory responses accompanied by oxidative stress may cause cancer [43], the cross phenomenon between HF and cancer may share common pathological mechanism, which needs to be further studied. Some of the DE genes were enriched in 17 GO biological process terms. These terms have been shown to be correlated with HF. For example, intrinsic apoptotic signaling pathway responds to DNA damage and shows positive regulation of apoptotic process and replicative senescence participated in the full life-cycle of HF [44], [45]. In addition, transmembrane receptor protein tyrosine kinase is involved in myocardial damage in HF [46]. For KEGG analysis, a total of 45 pathways were identified. In particular, MAPK signaling pathway showed statistical significance. MAPK signaling pathway induces proliferation response and apoptotic response which contribute to the augmented activity in HF, consequently promoting inflammation and renin-angiotensin system activity in regions with key cardiovascular regulations [47]. HIF-1 signaling pathway was shown to participate in the process of myocardial fibrosis, thereby impairing cardiac diastolic function; Ras signaling pathway played a role in the pathogenesis of diastolic dysfunction in HFpEF [48], [49]. In addition, Zhang et al. [50] revealed that genetic risk factors are involved in p53 signaling pathway of HFpEF based on gene set enrichment analysis of multi-omics data.

Can molecular biomarkers help direct future diagnosis and therapy for HFpEF?

Many studies used clinical data to phenotype HFpEF subgroups by machine learning and cluster-mapping approaches, and found that HFpEF patients at high-risk have hemodynamic overload, higher NF-proBNP, and renal dysfunction [51], [52], [53]. One of the limiting factors is that patients with HFpEF often have many overlapping characteristics, making it difficult to assign them to a predesignated clinical subgroup. Therefore, we took a new approach that using multi-omics data to develop sub-clusters, then mapping the clusters back to clinical features and biological pathways. The recent emergence of genome-editing technologies has enabled a new paradigm in which the human genome sequence can be modified to achieve therapeutic effects. This includes adding therapeutic genes at specific sites in the genome and removing deleterious genes or genome sequences [54]. The identified biomarkers in this work will provide a basis for gene therapy. On the other hand, the regulatory pathways of HFpEF found in this study can be used as drug targets in clinical practice, so as to provide new ways for the treatment of HFpEF. For example, Nintedanib, a tyrosine kinase inhibitor, has a wide range of targets, including vascular endothelial growth factor receptor, PDGF, and fibroblast growth factor receptor [55]. Its effect on fibroblasts prevents ECM production and myofibroblast activation [56], [57]. However, little study has been done on HFpEF. Alternative angiotensin inhibitors appeared safe in phase I studies, and this could represent a promising future additive therapy to target the angiotensin pathway [58]. It should be noted that the molecular features identified in this work need to be biologically validated before further clinical application. Evidences from multiple clinical studies showed that BNP/NT-proBNP are the most reliable indicators for the diagnosis and treatment of HFpEF [59], [60]. However, the role of clinical biomarkers in the diagnosis of HFpEF is currently limited. For example, HF biomarker-NT-proBNP was increased moderately in HFpEF and showed poor predictive value for HFpEF [39]. A study showed that biomarker panels of microRNAs demonstrated higher discriminative power for distinguishing HFpEF compared to the result using plasma NT-proBNP levels alone [38]. Additionally, studies of HFpEF from the 1980s to the present have shifted from the perspective of mechanistic understanding to multi-organ focused disease study [61] including pulmonary arterial hypertension, obesity and metabolic diseases, vascular stiffness, chronological impotence, skeletal muscle disease, and renal insufficiency [62]. This has been blamed for the clinical failure of treatment trials [51]. At present, HFpEF therapy continues to improve symptoms, as well as focusing on treating comorbidities. The identification of novel molecular biomarkers may contribute to the understanding of the pathophysiology of HFpEF, help stratify HFpEF subgroups, and open a new arena for individualized therapy. Like many other studies, there are some limitations in this work. First, we were unable to find a second HFpEF multi-omics data set to replicate the biological findings. Thus, the reproducible issue cannot be assessed in this analysis. Second, our results should be interpreted with caution. We only identified biomarkers associated with HFpEF, but could not establish causality. Further biological validations are needed to functionally validate the findings. Nevertheless, our proposed network enhanced similarity network fusion strategy (ne-SNF) for multi-omics data integration shows promising result in denoising the fused network and improving the performance of subtype identification. Given the heterogeneous nature of HFpEF, the identified subtypes and biomarkers will help for better design of clinical treatment and further lead to development of targeted molecular therapy.

CRediT authorship contribution statement

Yongqing Wu: Formal analysis, Methodology, Software, Writing - original draft. Huihui Wang: Formal analysis. Zhi Li: Visualization. Jinfang Cheng: Conceptualization. Ruiling Fang: Visualization. Hongyan Cao: Conceptualization, Supervision, Writing - review & editing. Yuehua Cui: Conceptualization, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

59 in total

1. Missing value estimation methods for DNA microarrays.

Authors: O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman
Journal: Bioinformatics Date: 2001-06 Impact factor: 6.937

Review 2. Phenotype-Specific Treatment of Heart Failure With Preserved Ejection Fraction: A Multiorgan Roadmap.

Authors: Sanjiv J Shah; Dalane W Kitzman; Barry A Borlaug; Loek van Heerebeek; Michael R Zile; David A Kass; Walter J Paulus
Journal: Circulation Date: 2016-07-05 Impact factor: 29.690

Review 3. Phenotypic spectrum of heart failure with preserved ejection fraction.

Authors: Sanjiv J Shah; Daniel H Katz; Rahul C Deo
Journal: Heart Fail Clin Date: 2014-05-22 Impact factor: 3.179

Review 4. Heart Failure with Preserved Ejection Fraction.

Authors: Margaret M Redfield
Journal: N Engl J Med Date: 2016-11-10 Impact factor: 91.245

5. miRTarBase update 2018: a resource for experimentally validated microRNA-target interactions.

Authors: Chih-Hung Chou; Sirjana Shrestha; Chi-Dung Yang; Nai-Wen Chang; Yu-Ling Lin; Kuang-Wen Liao; Wei-Chi Huang; Ting-Hsuan Sun; Siang-Jyun Tu; Wei-Hsiang Lee; Men-Yee Chiew; Chun-San Tai; Ting-Yen Wei; Tzi-Ren Tsai; Hsin-Tzu Huang; Chung-Yu Wang; Hsin-Yi Wu; Shu-Yi Ho; Pin-Rong Chen; Cheng-Hsun Chuang; Pei-Jung Hsieh; Yi-Shin Wu; Wen-Liang Chen; Meng-Ju Li; Yu-Chun Wu; Xin-Yi Huang; Fung Ling Ng; Waradee Buddhakosai; Pei-Chun Huang; Kuan-Chun Lan; Chia-Yen Huang; Shun-Long Weng; Yeong-Nan Cheng; Chao Liang; Wen-Lian Hsu; Hsien-Da Huang
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

6. BNP-guided vs symptom-guided heart failure therapy: the Trial of Intensified vs Standard Medical Therapy in Elderly Patients With Congestive Heart Failure (TIME-CHF) randomized trial.

Authors: Matthias Pfisterer; Peter Buser; Hans Rickli; Marc Gutmann; Paul Erne; Peter Rickenbacher; André Vuillomenet; Urs Jeker; Paul Dubach; Hansjürg Beer; Se-Il Yoon; Thomas Suter; Hans H Osterhues; Michael M Schieber; Patrick Hilti; Ruth Schindler; Hans-Peter Brunner-La Rocca
Journal: JAMA Date: 2009-01-28 Impact factor: 56.272

Review 7. Poly(ADP-ribose) Polymerase (PARP) and PARP Inhibitors: Mechanisms of Action and Role in Cardiovascular Disorders.

Authors: Robert J Henning; Marie Bourgeois; Raymond D Harbison
Journal: Cardiovasc Toxicol Date: 2018-12 Impact factor: 3.231

8. Transcriptomics of cardiac biopsies reveals differences in patients with or without diagnostic parameters for heart failure with preserved ejection fraction.

Authors: Sarbashis Das; Christoffer Frisk; Maria J Eriksson; Anna Walentinsson; Matthias Corbascio; Camilla Hage; Chanchal Kumar; Michaela Asp; Joakim Lundeberg; Eva Maret; Hans Persson; Cecilia Linde; Bengt Persson
Journal: Sci Rep Date: 2019-02-28 Impact factor: 4.379

9. Disproportionate Contributions of Select Genomic Compartments and Cell Types to Genetic Risk for Coronary Artery Disease.

Authors: Hong-Hee Won; Pradeep Natarajan; Amanda Dobbyn; Daniel M Jordan; Panos Roussos; Kasper Lage; Soumya Raychaudhuri; Eli Stahl; Ron Do
Journal: PLoS Genet Date: 2015-10-28 Impact factor: 5.917

Review 10. Relevance of Multi-Omics Studies in Cardiovascular Diseases.

Authors: Paola Leon-Mimila; Jessica Wang; Adriana Huertas-Vazquez
Journal: Front Cardiovasc Med Date: 2019-07-17

3 in total

1. Multi-omics data integration for subtype identification of Chinese lower-grade gliomas: A joint similarity network fusion approach.

Authors: Lingmei Li; Yifang Wei; Guojing Shi; Haitao Yang; Zhi Li; Ruiling Fang; Hongyan Cao; Yuehua Cui
Journal: Comput Struct Biotechnol J Date: 2022-07-02 Impact factor: 6.155

2. HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene.

Authors: Shanchen Pang; Wenhao Wu; Yuanyuan Zhang; Shudong Wang; Muyuan Niu; Kuijie Zhang; Wenjing Yin
Journal: Cells Date: 2022-08-08 Impact factor: 7.666

3. Identifying novel subgroups in heart failure patients with unsupervised machine learning: A scoping review.

Authors: Jin Sun; Hua Guo; Wenjun Wang; Xiao Wang; Junyu Ding; Kunlun He; Xizhou Guan
Journal: Front Cardiovasc Med Date: 2022-07-22

3 in total