Literature DB >> 27168795

Establishment of a prediction model of changing trends in cardiac hypertrophy disease based on microarray data screening.

Caiyan Ma¹, Yongjun Ying¹, Tianjie Zhang¹, Wei Zhang¹, Hui Peng¹, Xufeng Cheng¹, Lin Xu¹, Hong Tong¹.

Abstract

The aim of the present study was to construct a mathematical model to predict the changing trends of cardiac hypertrophy at gene level. Microarray data were downloaded from Gene Expression Omnibus database (accession, GSE21600), which included 35 samples harvested from the heart of Wistar rats on postoperative days 1 (D1 group), 6 (D6 group) and 42 (D42 group) following aorta ligation and sham operated Wistar rats, respectively. Each group contained six samples, with the exception of the samples harvested from the aorta ligated group after 6 days, where n=5. Differentially expressed genes (DEGs) were identified using a Limma package in R. Hierarchical clustering analysis was performed on common DEGs in order to construct a linear equation between the D1 and D42 groups, using linear discriminant analysis. Subsequent verification was performed using receiver operating characteristic (ROC) curve and the measurement data at day 42. A total of 319, 44 and 57 DEGs were detected in D1, D6 and D42 sample groups, respectively. AKIP1, ANKRD23, LTBP2, TGF-β2 and TNFRSF12A were identified as common DEGs in all groups. The predicted linear equation between D1 and D42 group was calculated to be y=1.526×-186.671. Assessment of the ROC curve demonstrated that the area under the curve was 0.831, with a specificity and sensitivity of 0.8. As compared with the predictive and measurement data at day 42, the consistency of the two sets of data was 76.5%. In conclusion, the present model may contribute to the early prediction of changing trends in cardiac hypertrophy disease at gene level.

Entities: Chemical Disease Gene Mutation Species

Keywords: cardiac hypertrophy; hierarchical clustering analysis; linear discriminant analysis; mathematical model; receiver operating characteristic curve

Year: 2016 PMID： 27168795 PMCID： PMC4840528 DOI： 10.3892/etm.2016.3105

Source DB: PubMed Journal: Exp Ther Med ISSN： 1792-0981 Impact factor: 2.447

Introduction

Cardiac hypertrophy is associated with the thickening of the heart muscle (1) and the risk factors of cardiac hypertrophy include hypertension, obesity, muscular dystrophy, cardiomyopathy or heart failure (2). Furthermore, it has been demonstrated that genetic factors and signaling pathways may participate in the pathogenesis of cardiac hypertrophy, which may be associated with an enhanced risk of sudden cardiac death and cardiovascular mortality (3,4). As the early symptoms of this disease are difficult to detect, it is crucial that novel molecular markers for the early therapy of cardiac hypertrophy are identified. Molecular markers of cardiac hypertrophy have been identified (5). In particular, Kontaraki et al (6) identified GATA4, myocardin and β-myosin heavy chain as early cardiac marker genes. Furthermore, smooth muscle α-actin has been demonstrated to be a molecular marker for pressure-overload hypertrophy (7). Using mouse models, Qing et al (8) have previously reported that miR-22 serves a crucial function in the regulation of cardiac hypertrophy and cardiac remodeling. Fibroblast growth factor 21, which is an endocrine factor, has a protective role in cardiac cells (9). As an increasing number of molecular markers are identified, mathematical models can be constructed to predict the risk of cancer (10). Various types of mathematical models have contributed to the prediction of diseases. Flux balance models of cellular metabolism have been used to analyze and predict transcriptional regulation under certain conditions, including catabolite repression and amino acid biosynthesis pathway repression (11). Furthermore, various genes and pathways associated with differentiation, including MAOA and ADH1B metabolic genes in human pulmonary type II cells (12) and nuclear factor-kappaB pathway in a mouse model of genitourinary inflammation (13), have been identified via mathematical cluster analysis using GENECLUSTER, which is a publicly available computer package that contributed to the establishment of an effective treatment for acute promyelocytic leukemia (14). According to a previous study conducted by Kondo and Miura (15), the reaction-diffusion model is effective in biological pattern formation. Thus, these previous studies suggest the mathematical modeling is a useful tool for the prediction of disease. Using microarray data downloaded from the Gene Expression Omnibus (GEO) database (accession, GSE21600), which included 35 heart samples harvested from a Wistar rat on postoperative days 1, 6 and 42 following aorta ligation and sham-operated Wistar rats, respectively. Hellman et al (16) demonstrated a correlation between hyaluronan concentration and specific gene expression levels using SPSS software. Analysis of the correlation matrix was performed according to the Principal components method (17), and orthogonal partial least squares-discrimination analysis was used to analyze the datasets of GSE21600, in which the previous clustering, including extracellular matrix and adhesion molecules were confirmed, and fatty acid metabolism, glucose metabolism, mitochondria and atherosclerosis were detected as the new clustering (18). However, these previous two studies failed to predict the changing trends of genes in this disease. Hence, the present study aimed to reanalyze the expression profiles of GSE21600 in order to construct a predictive model of cardiac hypertrophy using linear discriminant analysis (LDA) method. GSE21600 microarray data was used to identify differentially expressed genes (DEGs) using a Limma package in R (version. 3.26.5), which calculates linear models of microarray data. Common DEGs were used to construct a mathematical model in order to predict the expression levels of genes in the cardiac hypertrophy samples. The mathematical model was verified receiver operating characteristic (ROC) curve and the consistency of predictive and measurement data. The present study may be useful for the early prediction of changing trends in cardiac hypertrophy disease at the gene level.

Materials and methods

Data preprocessing and DEGs screening

GSE21600 microarray data were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/) (16). GSE21600 included data from 35 heart samples harvested from 36 Wistar rats which were excised on postoperative days 1, 6 and 42 following aorta ligation and sham-operated groups, respectively. Each group contained six samples at each time point, with the exception of the samples harvested from the aorta ligated group at 6 days, where n=5. The microarray platform of GSE21600 was Illumina GPL6101 RatRef-12 expression bead chip (version 1.0; Illumina, Inc., San Diego, CA, USA). Samples were divided into three groups: Day 1 (D1), day 6 (D6) and day 42 (D42). DEGs between the postoperative and sham-operated samples were identified in these three groups, respectively. Firstly, normalization of the microarray data was performed in the R language (19,20), and DEGs were subsequently identified using a Limma package in R (21). False discovery rate (FDR) was used to adjust the P-value, according to the method outlined by Benjamin and Hochberg (22). FDR<0.05 and >1 log2fold change (FC) were chosen as the cut-off criteria.

Specific gene screening

In order to screen the specific expression levels of genes at each time point, DEGs were compared between the two groups. Subsequently, hierarchical clustering analysis (23) was performed on the common DEGs in the three groups.

Sorting algorithm and construction of the mathematical model

Linear discriminant analysis (LDA) is a method that is commonly widely used in microarray classification to obtain discrimination function. LDA analysis can be performed when there are ≥2 groups and each group contains >2 variables (24,25). In this method, a linear equation based on the variations in the two groups is established: Y=a + b11 + b22 +…+ bnXn, where ‘a’ represents a constant and ‘b1,b2 … and bn’ represents the regression coefficient. In the present study, the cardiac hypertrophy samples were defined as ‘1’ and the control samples were defined as ‘-1’. Based on the dynamic expression changes of the common DEGs detected in the D1 group, the expression pattern in the D42 group was predicted via the calculated mathematical model constructed using the LDA method (26).

Verification of the mathematical model

Disease classification models are typically determined using multivariate regression analysis (27,28), ROC curve (29–32) or prospective validation (33). ROC curve was used in the present study in order to evaluate the discriminant effect of the mathematical model and directly observe the accuracy of the present analysis method. Indices, including specificity and sensitivity, were calculated in order to estimate the predictive ability of LDA, in addition to area under the curve (AUC) of the ROC curve, which was also calculated to estimate accuracy. In the present study, AUC was used to distinguish non-accuracy (AUC≤0.5), low accuracy (0.5

Results

Identification, comparison and feature selection of DEGs

Normalization of the microarray data is presented in Fig. 1. DEGs were identified, and the genes with FDR<0.05 and >1 log2FC were considered as differentially expressed between the ligated samples and sham-operated samples. A total of 319, 44 and 57 DEGs were identified in the D1, D6 and D42 groups respectively.

Figure 1.

Microarray data normalization. Samples were divided into three groups: days 1, 6 and 42. White, aorta ligated operation samples. Blue, sham operated samples.

A total of 23 DEGs were detected between the D1 and D6 groups, 14 DEGs were detected between the D1 and D42 groups, and five DEGs were identified between the D6 and D42 groups. Five common DEGs, including A kinase interacting protein 1 (AKIP1), ankyrin repeat domain 23 (ANKRD23), latent transforming growth factor beta binding protein (LTBP2), transforming growth factor (TGF)-β2 and tumor necrosis factor receptor superfamily member 12a (TNFRSF12A), were identified among the three groups (Fig. 2).

Figure 2.

Identification of specific differentially expressed genes. Yellow, day 1 (D1) group; green, day 6 (D6) group; purple, day 42 (D42) group.

Clustering analysis of the five common DEGs demonstrated that the sham operated and ligated samples were respectively clustered together; however, three ligated samples (16.67%; 3/18) were mixed into the operated group and two sham-operated samples (11.76%; 2/17) were mixed into the ligated group (Fig. 3). These five common DEGs were identified as downregulated genes (Table I).

Figure 3.

Hierarchical clustering analysis of five common differentially expressed genes in day 1 (D1), 6 (D6) and 42 (D42). Red labels represent the samples which were mixed into the false group.

Table I.

Expression levels of five common differentially expressed genes the in aorta ligated operation group were calculated, as compared with the sham operated group.

Gene	Day 1	Day 6	Day 42
AKIP1	−1.24914	−1.36699	−1.80092
ANKRD23	−2.90253	−3.69624	−2.85077
LTBP2	−3.68846	−4.20566	−2.02513
TGFB2	−2.15313	−2.11814	−1.75841
TNFRSF12A	−1.99987	−2.08827	−1.54923

Construction and verification of the mathematical model

Based on the expression levels and dynamic changes detected in the five common DEGs, a linear equation between the D1 and D42 groups was calculated as follows: y=1.526×-186.671; where ‘y’ and ‘x’ represent the expression levels in the D42 and D1 groups, respectively. Assessment of the ROC curve demonstrated that AUC was 0.831, which indicated that the predictive accuracy was 83.1% and the specificity and sensitivity were 0.8, respectively (Fig. 4A). By comparing the predictive and measurement data at 42 days (Table II), the consistency of these two datasets was calculated to be 76.5% (Fig. 4B).

Figure 4.

Verification of the prediction model. (A) The model was verified by receiver operating characteristic (ROC) curve (B), which was determined by the consistency of predictive and measurement data at day 42. The area under curve (AUC) of ROC was used to assess the accuracy of data. AUC value 0.7

Table II.

Predicted data at day 42 using a linear equation of the gene expression levels of cardiac hypertrophy.

Gene accession	State	Expression on day 1	Expression on day 42	Predicted on day 42
GSM539275	1	332.1987	337.3279	326.1898781
GSM539276	1	272.2375	126.1764	235.327208
GSM539277	1	485.7471	792.9784	558.8706386
GSM539278	1	778.9512	344.6311	1,003.179749
GSM539279	1	320.8331	108.7458	308.9669279
GSM539280	1	716.3563	479.7876	908.3260809
GSM539281	−1	85.13754	66.26252	−48.1961695
GSM539282	−1	71.55708	13.26508	−68.775425
GSM539283	−1	50.69723	41.25237	−100.385561
GSM539284	−1	23.54682	75.99313	−141.528145
GSM539285	−1	124.7012	29.73599	11.75692997
GSM539286	−1	49.61586	52.55618	−102.024223
GSM539275	1	4,201.869	6,096.354	6,190.124821
GSM539276	1	1,882.365	5,415.158	2,675.24642
GSM539277	1	3,337.275	9,621.91	4,879.955589
GSM539278	1	3,016.572	4,261.265	4,393.975807
GSM539279	1	2,658.368	3,865.638	3,851.168593
GSM539280	1	1,956.894	8,021.108	2,788.184519
GSM539281	−1	1,219.844	959.4762	1,671.290077
GSM539282	−1	1,070.036	1,546.261	1,444.277361
GSM539283	−1	1,431.854	1,145.456	1,992.561078
GSM539284	−1	1,024.116	3,023.837	1,374.692133
GSM539285	−1	988.543	1,751.745	1,320.786311
GSM539286	−1	1,213.691	2,605.091	1,661.966081
GSM539275	1	880.5447	147.3087	1,157.130248
GSM539276	1	126.5936	169.5375	14.62459301
GSM539277	1	1,011.612	281.1071	1,355.744099
GSM539278	1	1,073.774	185.5347	1,449.941769
GSM539279	1	340.023	62.10585	338.0464919
GSM539280	1	122.0065	237.4351	7.673495398
GSM539281	−1	36.33411	32.24878	−122.150826
GSM539282	−1	50.67635	24.24548	−100.417201
GSM539283	−1	36.68185	45.16885	−121.623876
GSM539284	−1	30.85578	71.55927	−130.452456
GSM539285	−1	15.40947	32.20256	−153.859142
GSM539286	−1	34.06184	51.90232	−125.594128
GSM539275	1	1,915.488	1,621.039	2,725.439616
GSM539276	1	719.9728	1,732.95	913.8063723
GSM539277	1	1,491.145	1,375.875	2,082.408155
GSM539278	1	2,425.283	3,341.205	3,497.961428
GSM539279	1	1,208.035	885.0079	1,653.395218
GSM539280	1	1,999.254	1,564.762	2,852.375074
GSM539281	−1	391.4185	495.3794	415.929062
GSM539282	−1	355.6202	400.3427	361.6818301
GSM539283	−1	437.2545	578.2272	485.3870006
GSM539284	−1	215.4102	719.1659	149.2135176
GSM539285	−1	464.529	402.5466	526.717626
GSM539286	−1	483.1193	857.3302	554.8885815
GSM539275	1	1,776.678	768.9708	2,515.092804
GSM539276	1	998.7648	732.2133	1,336.275995
GSM539277	1	2,373.809	1,362.486	3,419.959903
GSM539278	1	3,322.548	1,513.086	4,857.638915
GSM539279	1	879.2261	513.2602	1,155.132097
GSM539280	1	1,201.621	1,250.521	1,643.675713
GSM539281	−1	411.144	251.4373	445.8202516
GSM539282	−1	375.7809	208.7139	392.2325034
GSM539283	−1	406.536	168.1061	438.837483
GSM539284	−1	297.8341	399.4494	274.1152146
GSM539285	−1	322.7278	352.2844	311.8380763
GSM539286	−1	316.283	400.1669	302.0718985

1, the aorta ligated operation group; −1, the sham operated group.

Discussion

In the present study, the expression profiles of sham operated and ligated heart samples harvested from a Wistar rat were analyzed and 319, 44 and 57 DEGs were subsequently identified in the D1, D6 and D42 groups, respectively. AKIP1, ANKRD23, LTBP2, TGF-β2 and TNFRSF12A were identified as common DEGs among the three groups, and their association with cardiac hypertrophy has previously been demonstrated (34–37). AKIP1 was identified as a key regulator of heart function via the cAMP-dependent protein kinase signaling pathway (38). During periods of the oxidant stress, the expression of AKIP1 is capable of protecting cardiac myocytes from the ischemic injury via enhanced mitochondrial integrity (38). Furthermore, the expression of AKIP1 may also protect the heart via mitochondrial stress adaptation (39), and it has been demonstrated that mitochondrial DNA damage may contribute to the development of cardiac hypertrophy and heart failure (40). These results suggested that AKIP1 may serve a crucial function in the development of cardiac hypertrophy via mitochondrial stress adaptation mechanisms. Hellman et al (16) have previously demonstrated that LTBP2 and TGF-β2 are associated with the development of cardiac hypertrophy. LTBP2, which belongs to the fibrillin superfamily, regulates the release of TGF-β1 (41,42). Previous studies have demonstrated that TGF-β, including TGF-β1, TGF-β2 and TGF-β3, have an important role in the pathogenesis of cardiac hypertrophy by stimulating the proliferation of cardiomyocytes (43,44). These results demonstrated that LTBP2 and TGF-β2 are associated with the regulation of cardiac hypertrophy. However, the role of ANKRD23 and TNFRSF12A in the development of cardiac hypertrophy is yet to be elucidated. As the results of the present study demonstrated that they were detected as common genes in the three groups, we hypothesize that AKIP1, ANKRD23, LTBP2, TGF-β2 and TNFRSF12A may contribute to the development of cardiac hypertrophy. Numerous mathematical techniques have been developed in order to analyze large datasets, and mathematical modeling is a useful and powerful tool for the analysis of gene expression patterns (14). LDA is a well-known multivariate technique that is used for dimension reduction and classification (45). A 3-gene model, TNFRSF8, BATF3 and TMOD1, which was obtained by LDA and leave-one-out cross-validation, was previously used to separate ALK (−) and anaplastic large-cell lymphoma from peripheral T-cell lymphoma, and the accuracy of the model was ~97% (46). Furthermore, a class-prediction model of patients with Graft-vs-host disease was previously constructed using LDA, and the accuracy was 63–80%, as estimated by reverse transcription-quantitative polymerase chain reaction (47). ROC, which directly displays the correlation of specificity and sensitivity can be used to assess the accuracy of diagnostic tests (48). In a previous study conducted by Barretina et al (49), Cancer Cell Line Encyclopedia, which is a predictive model, was cross-validated by specificity and sensitivity of the ROC curve and used to predict the drug response to gene expression, including topoisomerase inhibitors associated with Schlafen family member 11. Similarly, a predictions model has previously been constructed for dementia using LDA and verified by ROC curve, and the accuracy of the model was 66%; whereas the specificity and sensitivity were 73% and 64%, respectively (50). In the present study, a prediction model of cardiac hypertrophy was constructed. The assessment of ROC curve demonstrated that the predictive accuracy of the model was ~83.1% and the specificity and sensitivity were 0.8, respectively. By comparing the predictive and measurement data at 42 days, the consistency of these two datasets was calculated to be 76.5%. These results suggested that the present prediction model provides improved predictive ability, which may contribute to the early prediction of the changing trends in gene expression exhibited in patients with cardiac hypertrophy disease. However, to elevate the discrimination ability of the model, further studies with an increased number of samples and more suitable machine learning algorithm are required. In the present study, 319, 44 and 57 DEGs were detected in D1, D6 and D42 groups, respectively. AKIP1, ANKRD23, LTBP2, TGF-β2 and TNFRSF12A were identified as common DEGs. A linear equation was calculated between the D1 and D42 groups, as follows: y=1.526×-186.671. This linear equation, which acted as a prediction model of gene expression levels, may contribute to the early prediction of the changing trends in cardiac hypertrophy disease.

42 in total

1. Regulation of gene expression in flux balance models of metabolism.

Authors: M W Covert; C H Schilling; B Palsson
Journal: J Theor Biol Date: 2001-11-07 Impact factor: 2.691

2. Temporal correlation between transcriptional changes and increased synthesis of hyaluronan in experimental cardiac hypertrophy.

Authors: Urban Hellman; Stellan Mörner; Anna Engström-Laurent; Jane-Lise Samuel; Anders Waldenström
Journal: Genomics Date: 2010-04-21 Impact factor: 5.736

3. Mitochondrial oxidative stress mediates angiotensin II-induced cardiac hypertrophy and Galphaq overexpression-induced heart failure.

Authors: Dao-Fu Dai; Simon C Johnson; Jason J Villarin; Michael T Chin; Madeline Nieves-Cintrón; Tony Chen; David J Marcinek; Gerald W Dorn; Y James Kang; Tomas A Prolla; Luis F Santana; Peter S Rabinovitch
Journal: Circ Res Date: 2011-02-10 Impact factor: 17.367

4. Time course of LPS-induced gene expression in a mouse model of genitourinary inflammation.

Authors: M R Saban; H Hellmich; N B Nguyen; J Winston; T G Hammond; R Saban
Journal: Physiol Genomics Date: 2001-04-02 Impact factor: 3.107

5. Cluster analysis and display of genome-wide expression patterns.

Authors: M B Eisen; P T Spellman; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1998-12-08 Impact factor: 11.205

Review 6. Left ventricular hypertrophy in hypertension.

Authors: F G Dunn; J M Burns; R S Hornung
Journal: Am Heart J Date: 1991-07 Impact factor: 4.749

7. Obesity as an independent risk factor for cardiovascular disease: a 26-year follow-up of participants in the Framingham Heart Study.

Authors: H B Hubert; M Feinleib; P M McNamara; W P Castelli
Journal: Circulation Date: 1983-05 Impact factor: 29.690

8. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests.

Authors: João Maroco; Dina Silva; Ana Rodrigues; Manuela Guerreiro; Isabel Santana; Alexandre de Mendonça
Journal: BMC Res Notes Date: 2011-08-17

9. Prediction of graft-versus-host disease in humans by donor gene-expression profiling.

Authors: Chantal Baron; Roland Somogyi; Larry D Greller; Vincent Rineau; Peter Wilkinson; Carolyn R Cho; Mark J Cameron; David J Kelvin; Pierre Chagnon; Denis-Claude Roy; Lambert Busque; Rafick-Pierre Sékaly; Claude Perreault
Journal: PLoS Med Date: 2007-01 Impact factor: 11.069

10. Evaluating different methods of microarray data normalization.

Authors: André Fujita; João Ricardo Sato; Leonardo de Oliveira Rodrigues; Carlos Eduardo Ferreira; Mari Cleide Sogayar
Journal: BMC Bioinformatics Date: 2006-10-23 Impact factor: 3.169

3 in total

1. Cardiomyocyte glucocorticoid and mineralocorticoid receptors directly and antagonistically regulate heart disease in mice.

Authors: Robert H Oakley; Diana Cruz-Topete; Bo He; Julie F Foley; Page H Myers; Xiaojiang Xu; Celso E Gomez-Sanchez; Pierre Chambon; Monte S Willis; John A Cidlowski
Journal: Sci Signal Date: 2019-04-16 Impact factor: 8.192

2. Tnfrsf12a-Mediated Atherosclerosis Signaling and Inflammatory Response as a Common Protection Mechanism of Shuxuening Injection Against Both Myocardial and Cerebral Ischemia-Reperfusion Injuries.

Authors: Ming Lyu; Ying Cui; Tiechan Zhao; Zhaochen Ning; Jie Ren; Xingpiao Jin; Guanwei Fan; Yan Zhu
Journal: Front Pharmacol Date: 2018-04-06 Impact factor: 5.810

3. Weighted gene co-expression network-based approach to identify key genes associated with anthracycline-induced cardiotoxicity and construction of miRNA-transcription factor-gene regulatory network.

Authors: Guoxing Wan; Peinan Chen; Xue Sun; Xiaojun Cai; Xiongjie Yu; Xianhe Wang; Fengjun Cao
Journal: Mol Med Date: 2021-11-03 Impact factor: 6.354

3 in total