Literature DB >> 29138828

Identification of risk genes associated with myocardial infarction based on the recursive feature elimination algorithm and support vector machine classifier.

Abstract

The aim of the present study was to identify risk genes in myocardial infarction. Microarray data GSE34198, containing data from the peripheral blood of 49 myocardial infarction samples and 48 corresponding control samples, were downloaded from the Gene Expression Omnibus database to screen the differentially expressed genes (DEGs). The DEGs were used to construct a protein‑protein interaction (PPI) network of patient samples, from which the feature genes were identified using the neighboring score method. The recursive feature elimination (RFE) algorithm was employed to select the risk genes among feature genes, which were subsequently applied to perform a support vector machine (SVM) classifier to identify the specific signature in myocardial infarction samples. Another dataset, GSE61144, was also downloaded to verify the efficacy of the classifier. A total of 724 downregulated and 483 upregulated DEGs were screened in patient samples compared with control samples in the GSE34198 dataset. The PPI network of myocardial infarction was comprised of 1,083 nodes (genes) and 46,363 lines (connections). Using the neighborhood scoring method, the top 100 feature genes in myocardial infarction samples were identified as the disease feature genes, which distinguish the myocardial infarction samples from the control samples. The RFE algorithm screened 15 risk genes, which were employed to construct a SVM classifier with an average precision of 88% to the patient sample following visualization by a confusion matrix. The predictive precision of the classifier on another microarray dataset, GSE61144, was 0.92, with an average true positive of 0.9278 and an average false positive of 0.2361. A‑kinase‑anchoring protein 12 (AKAP12) and glycine receptor α2 (GLRA2) were two risk genes in the SVM classifier. Therefore, AKAP12 and GLRA2 exert potential roles in the development of myocardial infarction, potentially by influencing cardiac contractility and protecting against ischemia‑reperfusion injury, which may provide clues in developing potential diagnostic biomarkers or therapeutic targets for myocardial infarction.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Biomarkers

Year: 2017 PMID： 29138828 PMCID： PMC5780094 DOI： 10.3892/mmr.2017.8044

Source DB: PubMed Journal: Mol Med Rep ISSN： 1791-2997 Impact factor: 2.952

Introduction

Myocardial infarction is a result of interrupted blood flow to a certain area of the heart, which subsequently damages heart muscle. Among the various symptoms, chest pain or discomfort that may travel to the shoulder, arm, neck, back or jaw is the most common (1). Shortness of breath, feeling faint, nausea and cold sweats may also be experienced by patients suffering a myocardial infarction. Myocardial infarction may trigger heart failure, cardiac arrest, an irregular heartbeat or cardiogenic shock (2), and, as a life-threatening disease that may lead to severe hemodynamic instability or sudden death, is one of the major causes of mortality worldwide (3). According to an estimation by the World Bank, the number of individuals experiencing myocardial infarction may reach >23 million by 2030 in China (4). Globally, the mortality associated with acute myocardial infarction has reduced in the past few decades, however, as a result, the incidence of heart failure has increased (5). Heart failure following myocardial infarction is associated with cardiac remodeling, which leads to ventricular dysfunction and chamber dilation (6). Clinically, the occurrence of myocardial infarction is often unexpected and sudden, which makes it difficult to prevent and diagnose. Cardiovascular risk factors for heart disease include circulating blood lipid levels (7), smoking (8), heavy drinking (9), oral contraceptives (10), high intake of anthocyanins (11), human immunodeficiency virus infection (12) and a family history or genetic alterations. A positive family history is among the strongest cardiovascular risk factors for heart disease, therefore, numerous studies have aimed to determine the associated genetic factors of myocardial infarction. For example, Helgadottir et al (13) reported that arachidonate 5-lipoxygenase-activating protein variants are involved in the pathogenesis of myocardial infarction by increasing the inflammation in the arterial wall and the production of leukotrienes. In addition, Do et al (14) identified that multiple rare alleles of the low-density lipoprotein receptor and apolipoprotein A5 confer risk for early-onset myocardial infarction, and a meta-analysis demonstrated that the rs671 aldehyde dehydrogenase 2 family (mitochondrial) polymorphism increases the risk of myocardial infarction (15). Despites the current findings, reliable molecular prediction in the diagnosis and prevention of myocardial infarction remains to be discovered. In the present study, using the feature genes selected from differentially expressed genes (DEGs) in patients with myocardial infarction compared with controls, a support vector machine (SVM) classifier and certain risk genes were screened. These risk genes allow patient samples to be distinguished from normal controls.

Materials and methods

Microarray data

The GSE34198 microarray dataset (16) was downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) and included 49 samples from the peripheral blood of patients with myocardial infarction and 48 control samples. The platform for GSE34198 was Illumina human-6 v2.0 expression BeadChip. Affy package RMA in R version 3.3.1 (17) (http://bioconductor.org/packages/release/bioc/html/affy.html) was utilized to transfer the array data in GSE34198 into expression data, which was subsequently normalized by the Z-score method (18).

DEG identification

The DEGs between patients with myocardial infarction and control subjects were identified using the Limma version 3.32.8 (http://bioconductor.org/packages/release/bioc/html/limma.html) (19) with a threshold of P<0.05 and log|fold change (FC)|>1. Protein-protein interaction (PPI) network construction. All screened DEGs were subjected to the human protein interaction network Human Protein Reference Database (20) (http://www.hprd.org/) for the identification of their interactions. Subsequently, the interactions were visualized using Cytoscape 3.4 software (http://www.cytoscape.org/) as the PPI network of DEGs in myocardial infarction.

Feature gene selection

Usually, significant expression connections exist between disease feature genes and their connected genes. To identify the feature genes in myocardial infarction, the neighborhood score (21) was employed to identify the feature genes in the PPI network. The formula for calculating the score was as follows: Where i represents the node in the network, FC represents the fold change value for the expression level of the node, N(i) represents the number of the connection nodes to the selected node and score(i) represents the correlations between the node(i) and the disease. By the neighborhood scoring algorithm, the changing degrees of the nodes under disease will be inferred, along with their influence on the connecting genes. If the score is >0, the node and its connected nodes are all highly expressed, and if the score is <0, the expression of the nodes are low. The nodes (DEGs) in the PPI network with the top 100 |score| values were considered to be the feature genes in myocardial infarction.

Tomography cluster analysis

Tomography cluster analysis was conducted to determine whether the feature genes were differentially expressed between patient and control samples using Pearson's correlation coefficient (22) and average linkage (23). The clustering results were visualized using heatmaps in R version 3.2.1 (24).

Risk gene identification

To further identify the most significant feature genes that distinguish patients with myocardial infarction from controls, the recursive feature elimination (RFE) algorithm was utilized (25). In this algorithm, the optional feature gene combinations were selected as the risk genes in myocardial infarction.

SVM classifier construction

SVM is a supervised classification algorithm that estimates the attribution of a class by distinguishing and predicting the samples by the eigenvalues of the features in each sample (26). A SVM classifier was performed using the selected risk genes by using 4 samples as the training dataset and 1 sample as the testing dataset. The receiver operating characteristic (ROC) curve was drawn to evaluate the precision and robustness of the SVM classifier. A confusion matrix in R version 3.2.1 (https://cran.r-project.org/web/packages/ROCR/index.html) was also employed to visualize the classification results of the classifier.

Verification of the SVM classifier

An additional dataset, GSE61144 (27), was downloaded from the GEO database, which is based on the GPL6106Sentrix Human-6 v2 Expression BeadChip platform. This dataset consists of 7 samples from patients prior to percutaneous coronary intervention (PCI), 7 from patients following PCI and 10 normal controls. These 24 samples were used to verify the classification effect of the SVM classifier on myocardial infarction patient samples by R version 3.2.1 e1071 1.6–8 package (https://cran.r-project.org/web/packages/e1071/index.html).

Results

Identification of DEGs

A total of 1,207 DEGs were screened from myocardial samples compared with normal controls, including 724 downregulated ones and 483 upregulated ones.

PPI network in myocardial infarction samples

The PPI network was comprised of 1,083 nodes (genes) and 46,363 lines (connections). The degrees of the nodes in the network were calculated, and their distributions are presented in Fig. 1. The degrees are referring indexes of the interaction of genes in influencing the development and process of myocardial infarction. A-kinase-anchoring protein (AKAP)12 and glycine receptor α (GLRA)2 were two DEGs with a degree of 1, which means the number of interaction genes is 1 in the PPI network.

Figure 1.

Distributions of node degrees in the protein-protein interaction network. The x-axis represents the log (degree) value; the y-axis indicates the number of responding nodes in each of the log (degree) ranges.

Feature genes and clustering analysis

The neighborhood scoring method was employed for the selection of the top 100 feature genes in myocardial infarction samples. The feature genes with a high neighbor score exhibited high expression in the patient samples. The top 10 feature genes are listed in Table I, and included EH domain-binding protein 1, exocyst complex component 6B, growth factor receptor-bound protein 10, AKAP12, SRY-box 4, GLRA3, GLRA2, protein phosphatase 1 regulatory subunit 3A, fatty acid-binding protein (FABP)4 and mediator complex subunit 13-like. Clustering analysis was performed on the top 100 feature genes (Fig. 2), which may allow the classification of myocardial infarction samples to distinguish them from the control samples.

Table I.

Feature genes with top 10 neighbor scores.

Node	NS_score	Log (fold change)	P-value
EHBP1	0.96	1.0153	0.0004
EXOC6B	0.96	0.9025	0.0016
GRB10	0.92	0.9488	0.0009
AKAP12	0.91	0.9764	0.0007
SOX4	0.91	0.8647	0.0026
GLRA3	0.91	−0.8335	0.0036
GLRA2	0.91	−0.9855	0.0006
PPP1R3A	0.90	−1.0402	0.0003
FABP4	0.90	1.0953	0.0001
MED13L	0.90	0.7106	0.0132

NS, neighbor score; EHBP1, EH domain-binding protein 1; EXOC6B, exocyst complex component 6B; GRB10, growth factor receptor-bound protein 10; AKAP12, A-kinase-anchoring protein 12; SOX4, SRY-box 4; GLRA, glycine receptor α; PPP1R3A, protein phosphatase 1 regulatory subunit 3A; FABP4, fatty acid-binding protein 4; MED13L, mediator complex subunit 13-like.

Figure 2.

Clustering analysis results for the top 100 feature genes. The x-axis represents the samples, with control samples marked in green and patient samples marked in red.

Risk genes and SVM classifier

Using the RFE algorithm, a 15-gene combination with a precision of 85% was obtained (Fig. 3) and these genes were recognized as risk genes in myocardial infarction. The expression significance of these risk genes is presented in in Table II, and these risk genes included hes family bHLH transcription factor 5, zinc-finger protein 417, GLRA2, olfactory receptor (OR) family 8 subfamily D member 2 (gene/pseudogene), homeobox A7, FABP6, muscle-associated receptor tyrosine kinase, 5-hydroxytryptamine receptor 6, glutamate receptor-interacting protein 2, OR family 51 subfamily M member 1, OR family 1 subfamily C member 1, killer cell lectin-like receptor K1, vascular endothelial growth factor A, AKAP12 and Ras homolog mTORC1-binding.

Figure 3.

Feature elimination of the top 100 feature genes. The x-axis is the feature gene number and the y-axis indicates the corresponding prediction precision. The gene combination with the highest precision is marked in red, which was a 15-gene combination.

Table II.

Risk genes in myocardial infarction samples.

Gene	Log (fold change)	P-value
HES5	−0.8925	0.0018
ZNF417	−0.8260	0.0040
GLRA2	−0.9855	0.0006
OR8D2	−0.8135	0.0045
HOXA7	0.7150	0.0126
FABP6	0.9234	0.0013
MUSK	−0.7975	0.0054
HTR6	−0.7651	0.0076
GRIP2	−0.9973	0.0005
OR51M1	−0.8125	0.0046
OR1C1	−0.7755	0.0068
KLRK1	−0.9248	0.0013
VEGFA	0.8442	0.0032
AKAP12	0.9764	0.0007
RHEB	0.9288	0.0012

HES5, hes family bHLH transcription factor 5; ZNF417, zinc-finger protein 417; GLRA2, glycine receptor α2; OR, olfactory receptor; OR8D2, OR family 8 subfamily D member 2 (gene/pseudogene); HOXA7, homeobox A7; FABP6, fatty acid-binding protein 6; MUSK, muscle-associated receptor tyrosine kinase; HTR6, 5-hydroxytryptamine receptor 6, GRIP2, glutamate receptor-interacting protein 2, OR51M1, OR family 51 subfamily M member 1; OR1C1, OR family 1 subfamily C member 1; KLRK1, killer cell lectin-like receptor K1; VEGFA, vascular endothelial growth factor A; AKAP12, A-kinase-anchoring protein 12; RHEB, Ras homolog mTORC1-binding.

The average precision of the SVM classifier was 86%, as indicated in the ROC curve (Fig. 4), which was 88% to the patient samples following visualization by a confusion matrix (Fig. 5). The classification effect was also verified using the independent microarray data GSE61144, and the ROC curve is presented in Fig. 6. The predictive precision was 0.92, the average true positive rate was 0.9278 and the average false positive rate was 0.2361.

Figure 4.

ROC curve of the support vector machine classifier. The x-axis represents the false positive rate and the y-axis indicates the true positive rate. The simulation results (mean ROC) is marked by the dotted line. ROC, receiver operating characteristic.

Figure 5.

Confusion matrix of the support vector machine classifier.

Figure 6.

ROC curve of the support vector machine classifier verified by the GSE61144 dataset. The x-axis represents the false positive rate and the y-axis indicates the true positive rate. ROC, receiver operating characteristic; FPR, false positive rate; TPR, true positive rate; AUC, area under the curve.

Discussion

To identify the risk genes in myocardial infarction, the GSE34198 microarray dataset was downloaded from the GEO database, and 724 downregulated and 483 upregulated DEGs were screened in patient samples compared with control samples. The PPI network of myocardial infarction was comprised of 1,083 nodes (genes) and 46,363 lines (connections). Using the neighborhood scoring method, the top 100 feature genes in myocardial infarction samples were identified as the disease feature genes, which allow myocardial infarction samples to be distinguished from the control samples. The RFE algorithm screened 15 risk genes, which were utilized to construct a SVM classifier with an average precision of 88% to the patient samples following visualization by a confusion matrix. The predictive precision of the classifier on another microarray dataset, GSE61144, was 0.92, with average true positive rate of 0.9278 and an average false positive rate of 0.2361. AKAP12 and GLRA2 were two of the risk genes identified. AKAPs are scaffolding proteins that regulate the cellular cyclic AMP response. Several AKAPs are reported to be expressed in the heart, including AKAP18, AKAP79, AKAP6 and AKAP220 (28,29). AKAPs participate in cardiovascular functions by various mechanisms. For example, AKAPs were reported to anchor protein kinase A (PKA) in the sarcomere for the phosphorylation of myofibril proteins in contractile responses (30). In addition, AKAPs docked APK in proximity of sarcomeric substrates to enhance cardiac contractility (31). AKAPs mediate certain phosphorylation events in the heart, and AKAP6 complex disruption resulted in aberrant Ca2+ cycling, which was associated with arrhythmia (32). Loss of AKAP150 promoted pathological remodeling and heart failure propensity by disrupting Ca2+ cycling and contractile reserve (33). Furthermore, when voltage-gated K+ currents were reduced in ventricular myocytes following myocardial infarction, AKAP150 was reported to be involved in the activation of calcineurin/nuclear factor of activated T-cells (34). PKA is involved in the progression of heart failure (35), therefore, AKAPs, which regulate the activity of PKA, are also risk factors in heart failure. AKAP12 has been associated with various cellular functions, including cytoskeletal architecture and cell cycle regulation (36,37). Activated AKAP12 has been observed in the plasma membrane, cell periphery and perinuclear regions in the cytoplasm (38). Although no associations between AKAP12 and heart disease have been previously reported, its potential role can be inferred based on the functions of other AKAPs. Glycine is a simple physiological compound whose function in cardiovascular disease is receiving increased attention is research. Glycine was reported to protect against ischemia-reperfusion injury in cells and isolated perfused organs by inhibiting neuronal apoptosis in mice (39,40). Glycine receptors have been identified in the myocardial cell membrane, which aid the cytoprotective effects of glycine in myocardial cells (41). Furthermore, it was reported that the cytoprotective effect of glycine against ATP depletion-induced injury may be mediated by the glycine receptor in renal cells (42). GLRA2 is one type of glycine receptor, and, currently, no direct evidence has revealed its role in cardiovascular disease. However, the present study performed bioinformatics analysis to demonstrated that GLRA2 was a risk gene in myocardial infarction. Although the above result based on bioinformatics analysis is important, confirmation of the above-mentioned results is required by performing functional studies, and the role of AKAP12 and GLRA2 genes in myocardial infarction requires further investigation. In conclusion, the results of the present study indicate that AKAP12 and GLRA2 exert potential roles in the development of myocardial infarction, potentially by influencing cardiac contractility and protecting against ischemia-reperfusion injury.

40 in total

1. Myocardial ischemia and the pains of the heart.

Authors: Julio A Panza
Journal: N Engl J Med Date: 2002-06-20 Impact factor: 91.245

Review 2. AKAP phosphatase complexes in the heart.

Authors: John M Redden; Kimberly L Dodge-Kafka
Journal: J Cardiovasc Pharmacol Date: 2011-10 Impact factor: 3.105

Review 3. Correlation Between Posttraumatic Growth and Posttraumatic Stress Disorder Symptoms Based on Pearson Correlation Coefficient: A Meta-Analysis.

Authors: An-Nuo Liu; Lu-Lu Wang; Hui-Ping Li; Juan Gong; Xiao-Hong Liu
Journal: J Nerv Ment Dis Date: 2017-05 Impact factor: 2.254

4. AKAP150 participates in calcineurin/NFAT activation during the down-regulation of voltage-gated K(+) currents in ventricular myocytes following myocardial infarction.

Authors: Madeline Nieves-Cintrón; Dinesh Hirenallur-Shanthappa; Patrick J Nygren; Simon A Hinke; Mark L Dell'Acqua; Lorene K Langeberg; Manuel Navedo; Luis F Santana; John D Scott
Journal: Cell Signal Date: 2015-12-24 Impact factor: 4.315

5. HIV infection and the risk of acute myocardial infarction.

Authors: Matthew S Freiberg; Chung-Chou H Chang; Lewis H Kuller; Melissa Skanderson; Elliott Lowy; Kevin L Kraemer; Adeel A Butt; Matthew Bidwell Goetz; David Leaf; Kris Ann Oursler; David Rimland; Maria Rodriguez Barradas; Sheldon Brown; Cynthia Gibert; Kathy McGinnis; Kristina Crothers; Jason Sico; Heidi Crane; Alberta Warner; Stephen Gottlieb; John Gottdiener; Russell P Tracy; Matthew Budoff; Courtney Watson; Kaku A Armah; Donna Doebler; Kendall Bryant; Amy C Justice
Journal: JAMA Intern Med Date: 2013-04-22 Impact factor: 21.873

6. High anthocyanin intake is associated with a reduced risk of myocardial infarction in young and middle-aged women.

Authors: Aedín Cassidy; Kenneth J Mukamal; Lydia Liu; Mary Franz; A Heather Eliassen; Eric B Rimm
Journal: Circulation Date: 2013-01-15 Impact factor: 29.690

7. Association of genetic polymorphisms in ADH and ALDH2 with risk of coronary artery disease and myocardial infarction: a meta-analysis.

Authors: Hongguang Han; Huishan Wang; Zongtao Yin; Hui Jiang; Minhua Fang; Jingsong Han
Journal: Gene Date: 2013-05-16 Impact factor: 3.688

8. Glycine receptors contribute to cytoprotection of glycine in myocardial cells.

Authors: Ren-bin Qi; Jun-yan Zhang; Da-xiang Lu; Hua-dong Wang; Hai-hua Wang; Chu-Jie Li
Journal: Chin Med J (Engl) Date: 2007-05-20 Impact factor: 2.628

9. Involvement of the protein kinase C substrate, SSeCKS, in the actin-based stellate morphology of mesangial cells.

Authors: P J Nelson; K Moissoglu; J Vargas; P E Klotman; I H Gelman
Journal: J Cell Sci Date: 1999-02 Impact factor: 5.285

10. A mitotic kinase scaffold depleted in testicular seminomas impacts spindle orientation in germ line stem cells.

Authors: Heidi Hehnly; David Canton; Paula Bucko; Lorene K Langeberg; Leah Ogier; Irwin Gelman; L Fernando Santana; Linda Wordeman; John D Scott
Journal: Elife Date: 2015-09-25 Impact factor: 8.140

2 in total

1. In Silico Analysis of Differential Gene Expression in Three Common Rat Models of Diastolic Dysfunction.

Authors: Raffaele Altara; Fouad A Zouein; Rita Dias Brandão; Saeed N Bajestani; Alessandro Cataliotti; George W Booz
Journal: Front Cardiovasc Med Date: 2018-02-21

2. Short- and long-term mortality prediction after an acute ST-elevation myocardial infarction (STEMI) in Asians: A machine learning approach.

Authors: Firdaus Aziz; Sorayya Malek; Khairul Shafiq Ibrahim; Raja Ezman Raja Shariff; Wan Azman Wan Ahmad; Rosli Mohd Ali; Kien Ting Liu; Gunavathy Selvaraj; Sazzli Kasim
Journal: PLoS One Date: 2021-08-02 Impact factor: 3.240

2 in total