Literature DB >> 27347046

Network motif-based method for identifying coronary artery disease.

Abstract

The present study aimed to develop a more efficient method for identifying coronary artery disease (CAD) than the conventional method using individual differentially expressed genes (DEGs). GSE42148 gene microarray data were downloaded, preprocessed and screened for DEGs. Additionally, based on transcriptional regulation data obtained from ENCODE database and protein-protein interaction data from the HPRD, the common genes were downloaded and compared with genes annotated from gene microarrays to screen additional common genes in order to construct an integrated regulation network. FANMOD was then used to detect significant three-gene network motifs. Subsequently, GlobalAncova was used to screen differential three-gene network motifs between the CAD group and the normal control data from GSE42148. Genes involved in the differential network motifs were then subjected to functional annotation and pathway enrichment analysis. Finally, clustering analysis of the CAD and control samples was performed based on individual DEGs and the top 20 network motifs identified. In total, 9,008 significant three-node network motifs were detected from the integrated regulation network; these were categorized into 22 interaction modes, each containing a minimum of one transcription factor. Subsequently, 1,132 differential network motifs involving 697 genes were screened between the CAD and control group. The 697 genes were enriched in 154 gene ontology terms, including 119 biological processes, and 14 KEGG pathways. Identifying patients with CAD based on the top 20 network motifs provided increased accuracy compared with the conventional method based on individual DEGs. The results of the present study indicate that the network motif-based method is more efficient and accurate for identifying CAD patients than the conventional method based on individual DEGs.

Entities: Chemical Disease Gene Species

Keywords: coronary artery disease; differentially-expressed genes; identification methods; three-node network motifs

Year: 2016 PMID： 27347046 PMCID： PMC4907106 DOI： 10.3892/etm.2016.3299

Source DB: PubMed Journal: Exp Ther Med ISSN： 1792-0981 Impact factor: 2.447

Introduction

Coronary artery disease (CAD) is a leading cause of mortality in developed countries, causing 7.3 million deaths in the year 2001 worldwide (1). It can be attributed to disturbances of the coronary circulation, a process that is responsible for the oxygen and nutrient supply to the myocardium, and may involve dysfunction within the microcirculation, in addition to the coronary arteries (2). The atherosclerotic plaques that develop along the inner walls of the coronary arteries are a direct cause of the disease, narrowing the arteries and reducing blood flow to the heart. Currently, single-photon emission computed tomography, cardiac magnetic resonance and positron emission tomography perfusion imaging are the three most commonly used diagnosis techniques (3), however, only 20% of CAD cases are diagnosed prior to a heart attack (4). Thus, there is an urgent requirement for a novel, more efficient and credible methods. A biological network contains certain small, repeated and conserved network motifs that appear at significantly higher frequencies than random ones (5). Doncic and Skotheim (6) have reported that a three-gene motif within a complex network is capable of explaining yeast cellular state decisions in response to mating pheromones, suggesting that it may not be necessary to model the full complexity of biological networks when attempting to capture the molecular determinants of cellular behaviors. Furthermore, investigation of network motifs have proven useful for the prediction of protein-protein interaction (7), decomposition of hierarchical networks (8) and analysis of temporal gene expression patterns (9). Thus, the network motif-based method is a potential approach for future CAD research. Microarray analysis is used to detect gene expression changes occurring in patients with CAD (10,11); however, the reproducibility of single-gene methods is often poor (12). In the present study, a novel network motif-based approach was employed to select motifs associated with CAD occurrence, which is considered to be more reproducible and interpretable. Through comparing this method with a conventional individual differentially expressed gene (DEG)-based method for the classification of CAD and normal control samples, the aforementioned motif-based method was demonstrated to be efficient and credible.

Materials and methods

Extraction and preprocessing of gene microarray data

Gene expression profile dataset GSE42148 was downloaded from the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) database. The gene annotation platform used was GPL13607 Agilent-028004 SurePrint G3 Human GE 8×60K Microarray. The data were collected from the whole blood samples of 13 patients with CAD and 11 population-based asymptomatic controls. First, the extracted microarray data were subject to log2 conversion, followed by quantile standardization (13).

Screening of common genes and construction of an integrated regulation network

Transcriptional regulation data comprising transcription factors and target genes were downloaded from Encyclopedia of DNA Elements (14). In addition, protein-protein interaction pairs were downloaded from the Human Protein Reference Database (15). Transcriptional regulation data and protein-protein interaction data downloaded from the two databases (R 3.2.2; https://www.r-project.org/) were integrated to screen the common genes. Any gene that exists in only the regulation network or the protein-protein interaction network was removed. The retained common genes were further compared with those annotated from the gene microarrays to screen the same genes. Final-screened common genes and associated transcription factors were used to construct an integrated regulation network.

Detection of significant three-node network motifs

Three-node network motifs that occurred at a significantly greater frequency compared with random networks in the constructed integrated network (P<0.05) were detected using Fast Network Motif Detection (version 2.2; http://theinf1.informatik.uni-jena.de/motifs/) (16). The significance test was conducted over 1,000 randomized networks, and a motif including at least one transcription factor with P<0.05 was considered to indicate a statistically significant difference.

Screening of differential network motifs

R-package GlobalAncova software (http://www.bioconductor.org) was used to screen network motifs, which revealed a significant differential score between the CAD group and the normal control (P<0.05) based on the gene expression data from microarrays (17). The significance of the network motifs was assessed by calculating random differential scores in 1,000 random disturbances.

Functional analysis of genes involved in the differential network motifs

Genes involved in the resulting differential network motifs were subject to pathway enrichment analysis based on Kyoto Encyclopedia of Genes and Genomes (KEGG; www.genome.jp/kegg) (18) and functional annotation analysis based on the gene ontology (GO) database (19) using Database for Annotation, Visualization and Integrated Discovery (version 6.7; https://david.ncifcrf.gov/) (20). P<0.05 was used as a cut-off.

Comparison of the two classification method

Differential expression analysis of the genes of 13 patients with CAD and 11 asymptomatic controls was performed to screen DEGs (false discovery rate, <0.05) using the significant analysis of microarrays (SAM) method (21). Subsequent to this, the patient and control samples were classified based on the resulting DEGs using a hierarchical clustering method. Further to this, the patient and control samples were also clustered based on the expression values of the top three-node network motifs detected in the present study and the expression value of a motif was defined as the mean of the expression values of the three genes within a motif. The clustering results of the two different methods were then compared.

Results

Genes annotated from gene microarrays

In total, 27,531 genes were annotated from the gene microarray data GSE42148.

Genes annotated from gene microarrays and detection of three-node network motifs

A total of 27,531 genes were annotated from the gene microarray data GSE42148. An integrated regulation network was constructed using the final-screened common genes and the associated transcription factors, which consisted of 13,133 genes, including 76 transcription factors and 60,709 associated pairs, including 24,573 transcriptional regulation pairs and 36,136 protein-protein interaction pairs. Finally, 9,008 three-node network motifs involving a total of 2,774 genes were detected, with each motif containing at least one transcription factor. The aforementioned network motifs may be categorized into 22 modes according to the pattern of interior molecular interaction (Fig. 1).

Figure 1.

Interaction patterns (n=22) involved in the three-node network motifs. A red node represents a transcription factor, and a green node represents a transcription factor; a red edge represents a transcriptional regulation and a green one represents co-expression between two nodes.

A total of 1,132 differentially expressed network motifs involving 697 genes, between the CAD group and the control group were screened using R-package GlobalAncova software. Of the 1,132 differential network motifs, there were 304 network motifs sharing the interaction pattern 100110222 (26.86%). In the mode 100110222, one transcription factor regulates two co-expressed target genes. The motif consisting of structural maintenance of chromosomes 3 (SMC3), CCAAT/enhancer binding protein beta (CEBPB) and tribbles pseudokinase 1 TRIB1 and that consisting of IQCG, Myc Associated Factor X (MAX) and BAT3 [also known as BCL2 associated athanogene 6 (BAG6)] were of this type. Among the top 20 network motifs, 10 shared this mode. Furthermore, 228/1,132 (20.14%) differential network motifs shared the interaction pattern 100220212, in which two transcription factors regulates the same target gene. Among the top 20 network motifs, six shared this interaction mode, including the motif consisting of USF1 and USF2 and zinc finger protein 507 (ZNF507; Table I).

Table I.

Top 20 three-gene network motifs.

	Network motif

ID	Gene 1	Gene 2	Gene 3	F-value	P-value[a]
100220212	USF2	ZNF507	USF1	9.2369	<0.001
100220212	SMARCA4	SMARCB1	H3F3B	7.4731	<0.001
100110222	CTCF	MAP2K1	TRIB1	12.1154	<0.001
100220212	SMARCA4	SMARCB1	HIST1H2AE	8.9551	<0.001
100220212	TBP	H3F3B	NFYB	11.2600	<0.001
100220212	TBP	C1orf55	GTF2B	9.6064	<0.001
100110222	RAD23B	UBB	CEBPB	7.4253	<0.001
100110222	RAD23B	TAF1	ZFAND5	6.9660	<0.001
100220222	SMC3	CEBPB	TRIB1	13.4760	<0.001
100110222	IQCG	MAX	BAT3	10.7233	<0.001
100110222	TCF12	FADD	CFLAR	6.8301	<0.001
101110022	SGK1	SMARCA4	CREB1	8.1492	<0.001
100210212	ETS1	SP1	HIST1H2AE	6.8723	<0.001
100220222	CTCF	CEBPB	TRIB1	12.1933	<0.001
100110222	TAF1	EIF3D	EIF5	15.9259	<0.001
100110222	TAF1	CSNK2B	EIF5	17.2414	<0.001
100110222	TAF1	EIF4G2	EIF5	17.0751	<0.001
100210212	NR3C1	CEBPB	TRIB1	12.7046	<0.001
100220222	TRIB1	CEBPB	RAD21	14.0708	<0.001
100110222	GATA2	FOXO3	SGK1	6.8563	0.001

Determined by permutation testing.

The 697 genes involved in 1,132 differential network motifs were enriched in 154 GO terms, including 21 molecular function terms, 14 cellular component terms and 119 biological process (BP) terms, in addition to 14 KEGG pathways and various cancer pathways, including small cell lung cancer and the mitogen-activated protein kinase (MAPK) signaling pathway. Numerous genes were enriched in apoptosis-related BP terms, including FADD and MYC. Furthermore, 43 genes were enriched in the KEGG signaling pathway, including FADD and MAX.

Comparison of the two classification methods

A total of 336 DEGs were screened using the SAM method. According to the hierarchical clustering based on these 336 DEGs, there was one control sample clustered in the patient group and three patient samples in the control group (Fig. 2). Conversely, according to the clustering based on the top 20 three-node network motifs, there were two patient samples in the control group and no control samples in the patient group (Fig. 3). Therefore, clustering based on the screened network motifs was demonstrated to be more accurate. Furthermore, only 38 genes in the top 20 network motifs were used compared with the 336 DGEs used for the clustering based on individual genes.

Figure 2.

Clustering dendrogram based on individual significantly expressed genes.

Figure 3.

Clustering dendrogram based on the top 20 three-node network motifs.

Discussion

In the present study, 1,132/9,008 network motifs revealed a significant difference between the CAD group and the normal control, indicating that the aforementioned motifs may be associated with the occurrence of CAD. Several studies have identified apoptosis in atherosclerosis, which is a typical pathological feature in patients with CAD (22–24). Various genes were demonstrated to be enriched in apoptosis regulation-related BP terms in the present study, including FADD. Previously, FADD was observed to have a role in cancer pathways (25). It was demonstrated that FADD, together with CASP8 and FADD-like apoptosis regulator (CFLAR), was regulated by transcription factor 12 (TCF12) in a network motif. FADD mediates a death signaling pathway via its binding to the death domain of the Fas receptor (26), followed by the recruitment of CASP8. CFLAR encodes c-FLIP, which is a protein that regulates apoptosis and is structurally similar to caspase-8, yet possesses no caspase activity. c-FLIP is able to interfere with the death receptor signaling pathways via binding to FADD (27). c-FLIP expression has been reported in the smooth muscle cells of normal human coronary arteries and the downregulation of c-FLIP was observed in human atherosclerotic atheroma (28). As the two co-expressed genes are both regulated by TCF12, which encodes a protein belonging to the basic helix-loop-helix (bHLH) E-protein family that recognizes the consensus binding site (E-box) CANNTG, this transcription factor may be associated with the occurrence of CAD. BAT3 (also known as BAG6), which is a member of the Bcl-2 associated anthanogene family of proteins, has also been reported to regulate apoptosis via modulating ubiquitin-mediated proteolysis of the Xenopus elongation factor Xenopus laevis elongation factor 1α oocyte form in Xenopus embryos (29). However, to the best of our knowledge, its role in atherosclerosis has yet to be reported. In the motifs consisting of IQCG, MAX and BAT3, they are regulated by MAX. MAX encodes a transcription factor belonging to the bHLH leucine zipper family, which is a binding partner of c-Myc (30). c-Myc/MAX expression has been observed to be elevated by oxidized low-density lipoprotein, which is able to promote atherogenesis (31). In the present study, MAX was also demonstrated to function in CAD via cancer pathways, including small cell lung cancer and MAPK signaling pathways. IQCG was the second gene regulated by MAX. As it is co-expressed with BAT3, it is likely that this gene may also have a role in apoptosis. Furthermore, dyslipidemia is also recognized as an important risk factor for CAD in the general population (32). The upstream stimulatory factor family USF1/2 is a major transcription factor family that bind their target genes via the E-box element (33). USF1, is responsible for governing numerous genes involved in lipid and glucose metabolism (34), and has been confirmed to contribute to aortic atherosclerosis (35). However, the role of USF2 in atherosclerosis has seldom been reported. ZNF507 was observed to display a significant alteration in expression levels in response to elevated levels of homocysteine (36), which is considered to increase the risk of atherosclerosis (37). As the two transcription factors were co-expressed in the network motif, it may be inferred that USF2 also has a role in atherosclerosis. In conclusion, at least one gene in the screened network motif was associated with the pathological characteristics occurring in patients with CAD; thus, there is sufficient evidence to indicate that these screened network motifs may be useful as markers for identifying CAD. Further consolidation was provided by the clustering results, which revealed that the network motif-based method was more accurate and efficient, as compared with the conventional method based on individual DEGs. Therefore, the network motif-based method in combination with gene expression data may be a promising method for the diagnosis of CAD.

34 in total

Review 1. Role of oxidized low density lipoprotein in atherogenesis.

Authors: J L Witztum; D Steinberg
Journal: J Clin Invest Date: 1991-12 Impact factor: 14.808

2. FADD, a novel death domain-containing protein, interacts with the death domain of Fas and initiates apoptosis.

Authors: A M Chinnaiyan; K O'Rourke; M Tewari; V M Dixit
Journal: Cell Date: 1995-05-19 Impact factor: 41.582

3. Apoptosis of human vascular smooth muscle cells derived from normal vessels and coronary atherosclerotic plaques.

Authors: M R Bennett; G I Evan; S M Schwartz
Journal: J Clin Invest Date: 1995-05 Impact factor: 14.808

4. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

5. TRAIL receptors 1 (DR4) and 2 (DR5) signal FADD-dependent apoptosis and activate NF-kappaB.

Authors: P Schneider; M Thome; K Burns; J L Bodmer; K Hofmann; T Kataoka; N Holler; J Tschopp
Journal: Immunity Date: 1997-12 Impact factor: 31.745

6. Feedforward regulation ensures stability and rapid reversibility of a cellular state.

Authors: Andreas Doncic; Jan M Skotheim
Journal: Mol Cell Date: 2013-05-16 Impact factor: 17.970

7. Genetic association and interaction analysis of USF1 and APOA5 on lipid levels and atherosclerosis.

Authors: Pirkka-Pekka Laurila; Jussi Naukkarinen; Kati Kristiansson; Samuli Ripatti; Tuuli Kauttu; Kaisa Silander; Veikko Salomaa; Markus Perola; Pekka J Karhunen; Philip J Barter; Christian Ehnholm; Leena Peltonen
Journal: Arterioscler Thromb Vasc Biol Date: 2009-11-12 Impact factor: 8.311

8. Architecture of the human regulatory network derived from ENCODE data.

Authors: Mark B Gerstein; Anshul Kundaje; Manoj Hariharan; Stephen G Landt; Koon-Kiu Yan; Chao Cheng; Xinmeng Jasmine Mu; Ekta Khurana; Joel Rozowsky; Roger Alexander; Renqiang Min; Pedro Alves; Alexej Abyzov; Nick Addleman; Nitin Bhardwaj; Alan P Boyle; Philip Cayting; Alexandra Charos; David Z Chen; Yong Cheng; Declan Clarke; Catharine Eastman; Ghia Euskirchen; Seth Frietze; Yao Fu; Jason Gertz; Fabian Grubert; Arif Harmanci; Preti Jain; Maya Kasowski; Phil Lacroute; Jing Jane Leng; Jin Lian; Hannah Monahan; Henriette O'Geen; Zhengqing Ouyang; E Christopher Partridge; Dorrelyn Patacsil; Florencia Pauli; Debasish Raha; Lucia Ramirez; Timothy E Reddy; Brian Reed; Minyi Shi; Teri Slifer; Jing Wang; Linfeng Wu; Xinqiong Yang; Kevin Y Yip; Gili Zilberman-Schapira; Serafim Batzoglou; Arend Sidow; Peggy J Farnham; Richard M Myers; Sherman M Weissman; Michael Snyder
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

Review 9. Growing epidemic of coronary heart disease in low- and middle-income countries.

Authors: Thomas A Gaziano; Asaf Bitton; Shuchi Anand; Shafika Abrahams-Gessel; Adrianna Murphy
Journal: Curr Probl Cardiol Date: 2010-02 Impact factor: 5.200

10. Considerations when using the significance analysis of microarrays (SAM) algorithm.

Authors: Ola Larsson; Claes Wahlestedt; James A Timmons
Journal: BMC Bioinformatics Date: 2005-05-29 Impact factor: 3.169