Literature DB >> 29085450

Prognostic genes of breast cancer revealed by gene co-expression network analysis.

Huijie Shi¹, Lei Zhang², Yanjun Qu¹, Lifang Hou¹, Ling Wang¹, Min Zheng¹.

Abstract

The aim of the present study was to identify genes that may serve as markers for breast cancer prognosis by constructing a gene co-expression network and mining modules associated with survival. Two gene expression datasets of breast cancer were downloaded from ArrayExpress and genes from these datasets with a coefficient of variation >0.5 were selected and underwent functional enrichment analysis with the Database for Annotation, Visualization and Integration Discovery. Gene co-expression networks were constructed with the WGCNA package in R. Modules were identified from the network via cluster analysis. Cox regression was conducted to analyze survival rates. A total of 2,669 genes were selected, and functional enrichment analysis of them revealed that they were mainly associated with the immune response, cell proliferation, cell differentiation and cell adhesion. Seven modules were identified from the gene co-expression network, one of which was found to be significantly associated with patient survival time. Expression status of 144 genes from this module was used to cluster patient samples into two groups, with a significant difference in survival time revealed between these groups. These genes were involved in the cell cycle and tumor protein p53 signaling pathway. The top 10 hub genes were identified in the module. The findings of the present study could advance the understanding of the molecular pathogenesis of breast cancer.

Entities: Chemical Disease Gene Species

Keywords: breast cancer; functional enrichment analysis; gene co-expression network; hub genes; survival analysis

Year: 2017 PMID： 29085450 PMCID： PMC5649579 DOI： 10.3892/ol.2017.6779

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

Breast cancer is the most common type of cancer in women, accounting for 25% of all cases (1). Risk factors include lifestyle (including smoking or diet), genetics and medical conditions. A number of treatment methods are now available for breast cancer, including surgery, radiotherapy, chemotherapy, hormone therapy and targeted therapy. However, certain patients have a poor prognosis and the molecular mechanisms underlying this remain unclear. Prognostic factors include disease stage and grade, recurrence of the disease, and the age and health of the patient. With advances in technology and the accumulation of research results, certain molecular markers associated with breast cancer have been well studied. Tumor protein p53 mutations are poor prognostic factors in breast cancer (2). MYC proto-oncogene and bHLH transcription factor-driven accumulation of 2-hydroxyglutarate are associated with poor breast cancer prognosis (3). Prostaglandin-endoperoxide synthase 2 expression predicts worse breast cancer prognosis (4). Ki-67 has been associated with disease-free survival, but its prognostic value remains to be validated (5). Matrix metalloproteinase-8 gene variation may influence breast cancer prognosis and can have an inhibitory effect on cancer metastasis (6). A gene signature involved in tumor-immune interactions may provide a more accurate prognostic tool (7). Zhang et al (8) performed a meta-analysis and demonstrated that overexpression of C-X-C motif chemokine receptor 4 was significantly associated with lymph node status and distant metastasis, indicating poor overall and disease free survival. SRY-box 4 overexpression is a biomarker for malignant status and poor prognosis in breast cancer patients (9). A number of other novel biomarkers have also been also identified, including chromobox homolog 1 (10), HOX transcript antisense intergenic RNA (9) and anterior gradient 3 (11). Nevertheless, more prognostic genes are required to further improve treatment decisions and thus the quality of life of patients with breast cancer. Microarray technology has been widely used to identify biomarkers of breast cancer (12,13), allowing for the large-scale screening of molecular markers. In the present study, two gene expression datasets were obtained to reveal prognostic genes (14,15). One dataset was used with the aim of identifying genes associated with the distant metastasis of lymph-node-negative primary breast cancer (14); the other was used to identify genes involved in response and survival following taxane-anthracycline chemotherapy in breast cancer (14). The two datasets were combined to construct a gene co-expression network and analyze survival time to identify novel biomarkers associated with breast cancer prognosis.

Materials and methods

Raw data and pre-treatment

Two gene expression datasets, GSE2034 (14) and GSE25066 (15), were downloaded from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). Dataset GSE2034 included 286 breast cancer samples and dataset GSE25066 included 508 breast cancer samples. The two gene expression datasets were obtained using Affymetrix GPL96 platform. Normalization was performed with rma from the affy package (16) in R (R 3.2.0; https://www.r-project.org/) and then log2 conversion was applied. Probes were mapped onto genes according to annotation files. Probes mapping to the same gene were averaged as the expression level for the gene.

Functional enrichment analysis

Gene Ontology (GO) annotation and pathway enrichment analysis were performed with DAVID (Database for Annotation, Visualization and Integration Discovery; http://david.abcc.ncifcrf.gov/) (17).

Gene co-expression network and modules

The gene co-expression network was constructed with the WGCNA package (18) in R. The adjacency coefficient a was calculated as follows: Where x and x are vectors of expression value for gene i and j; cor represents the Pearson's correlation coefficient of the two vectors; a is the adjacency coefficient and is acquired via exponential transform of S. WGCNA method takes topological properties into consideration to identify modules from gene co-expression networks. Therefore, this method not only considers the association between the two connected nodes, but also takes associated genes into account. It calculates the weighting coefficient W from a as follows: W considers the overlap between neighbor genes of genes i and j. Modules were identified via hierarchical clustering of the weighting coefficient matrix, W.

Survival analysis

Cox regression was performed with hub genes from the modules to identify survival-associated genes, and Kaplan-Meier survival was used to compare the survival time of different groups, which were performed with the Survival package in R (https://cran.r-project.org/web/views/Survival.html). P<0.05 was considered to indicate a statistically significant difference. Pearson's correlation was performed by cor function in R (19).

Results

Gene expression data

A total of 13,191 genes were identified in the GSE2034 and GSE25066 datasets, for which box plots are presented in Fig. 1. According to the box plots, the average total mRNA expression level in each sample was consistent, indicating that a good performance of normalization was achieved for both datasets.

Figure 1.

Box plots of normalized gene expression data of two datasets. (A) GSE2034 (286 samples) and (B) GSE25066 (200 samples randomly selected from the total 508 samples). The average total mRNA expression level in each sample was consistent, indicating that a good performance of normalization was achieved. The x-axis represents the gene expression level; the y-axis represents the samples.

A total of 2,669 genes with coefficient of variation (CV) >0.5 were selected. Functional enrichment analysis revealed that they were associated primarily with immune response, cell proliferation, cell differentiation and cell adhesion (Table I).

Table I.

Top 15 significantly over-represented biological pathways.

ID	Description	P-value	Adjusted P-value
GO:0006955	Immune response	2.19×10⁻⁶³	3.27×10⁻⁶¹
GO:0006952	Defense response	2.15×10⁻⁵⁷	2.80×10⁻⁵⁵
GO:0006950	Response to stress	1.57×10⁻⁵⁶	1.96×10⁻⁵⁴
GO:0007166	Cell-surface receptor signaling pathway	1.20×10⁻⁵⁵	1.43×10⁻⁵³
GO:0008283	Cell proliferation	1.06×10⁻⁴⁹	1.09×10⁻⁴⁷
GO:0002682	Regulation of immune system process	6.12×10⁻⁴²	4.82×10⁻⁴⁰
GO:0016477	Cell migration	7.58×10⁻⁴⁰	5.66×10⁻³⁸
GO:0045321	Leukocyte activation	1.90×10⁻³⁹	1.32×10⁻³⁷
GO:0006954	Inflammatory response	3.92×10⁻³⁸	2.66×10⁻³⁶
GO:0048584	Positive regulation of response to stimulus	6.10×10⁻³⁸	4.05×10⁻³⁶
GO:0042127	Regulation of cell proliferation	1.72×10⁻³⁷	1.10×10⁻³⁵
GO:0030154	Cell differentiation	3.01×10⁻³⁴	1.70×10⁻³²
GO:0048869	Cellular developmental process	2.06×10⁻³³	1.14×10⁻³¹
GO:0007155	Cell adhesion	7.77×10⁻³³	4.22×10⁻³¹
GO:0022610	Biological adhesion	1.11×10⁻³²	5.90×10⁻³¹

Adjusted P-value: Use the multiple comparisons in General Linear Model ANOVA, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.

Prognostic genes

Two gene co-expression networks were constructed for the two datasets by WGCNA (Fig. 2). Seven modules were identified from the network of GSE2034 via hierarchical clustering of the weighting coefficient matrix, W (Fig. 3). The modules were termed the red, blue, green, black, brown, yellow and turquoise modules.

Figure 2.

Gene co-expression networks for datasets GSE2034 (left) and GSE25066 (right). The x-axis represents the degree of the node, k, while the y-axis represents proportion of genes with degree of k, p (k).

Figure 3.

Seven modules identified from the gene co-expression network. Cluster analysis result is shown above and module identification shown below.

The degree, k, for each gene in the module was calculated and the P-value of Cox regression between each gene and survival was also determined. Next, the correlation between k and -log10 (P) was calculated. The yellow module exhibited significant correlation with survival time in dataset GSE2034 (P=9.3×10−13) (Fig. 4A), which was also observed in dataset GSE25066 (P=9.3×10−6) (Fig. 4B). Besides, survival-associated genes (P<0.05 in Cox regression) were significantly over-represented in the yellow module in both datasets (Fig. 5). Therefore, the yellow module was considered to be significantly associated with breast cancer patient survival, which should be further investigated to understand the association between survival time and critical gene expression.

Figure 4.

Scatter plots of the degree and P-value of Cox regression in datasets (A) GSE2034 and (B) GSE25066. The x-axis indicates the degree of regression, the y-axis indicates the P-value. Each circle represents a gene.

Figure 5.

Survival-associated genes in each module. The x-axis indicates the module, the y-axis indicates the significance of over-representation.

The 144 genes from the yellow module were used in the cluster analysis of samples from dataset GSE2034, which separated the patient samples into two groups based on the expression of these genes (Fig. 6). A significant difference in survival time was observed between the two groups (P=0.008; Fig. 7). Functional enrichment analysis indicated that the 144 genes from the yellow module were involved in cell cycle, oocyte meiosis, the tumor protein p53 signaling pathway and progesterone-mediated oocyte maturation (Table II).

Figure 6.

Cluster analysis using the degree of expression of 144 survival-associated genes for the samples in the GSE2034 dataset.

Figure 7.

Survival curves for the two groups of breast cancer patient samples clustered according to expression of the 144 genes.

Table II.

KEGG pathways enriched in the 144 genes of the yellow module.

ID	Description	P-value	Adjusted P-value
hsa04110	Cell cycle	5.22×10⁻¹⁸	3.13×10⁻¹⁷
hsa04114	Oocyte meiosis	2.17×10⁻⁹	6.50×10⁻⁹
hsa04115	p53 signaling pathway	2.46×10⁻⁵	4.91×10⁻⁵
hsa04914	Progesterone-mediated oocyte maturation	9.19×10⁻⁵	1.38×10⁻⁴

p53, tumor protein p53. Adjusted P-value: Using multiple comparisons in a general linear model analysis of variance, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.

The top 10 hub genes from the yellow module were selected (Table III) and included cyclin B2 (CCNB2), ubiquitin-conjugating enzyme E2C (UBE2C), protein regulator of cytokinesis 1 (PRC1), cell division cycle 20 (CDC20), abnormal spindle microtubule assembly (ASPM), forkhead box M1 (FOXM1), kinesin family member 4A (KIF4A), nucleolar and spindle associated protein 1 (NUSAP1), pituitary tumor-transforming 1 (PTTG1) and centrosomal protein 55 kDa (CEP55). All of these genes were significantly associated with survival time in the two datasets.

Table III.

Top 10 hub genes in the yellow module.

Dataset	Gene name	Coefficient	P-value	k_Total	k_Within
GSE2034	CCNB2	0.3640	0.0003	14.7998	12.4392
	PRC1	0.3868	0.0005	12.9603	11.3677
	UBE2C	0.4281	0.0006	14.1236	11.2433
	ASPM	0.3442	0.0002	12.9467	10.9328
	CDC20	0.2339	0.0065	14.6847	10.7527
	FOXM1	0.1988	0.0168	13.7352	10.7131
	CEP55	0.3691	0.0004	12.6988	10.6131
	KIF4A	0.2648	0.0217	12.1095	10.3165
	NUSAP1	0.3931	0.0012	11.7988	10.2885
	PTTG1	0.4027	0.0019	12.4981	10.2449
GSE25066	CCNB2	0.323932	0.3239	0.0006	9.4109
	PRC1	0.276034	0.2760	0.0023	6.6109
	UBE2C	0.381925	0.3819	0.0003	6.1036
	ASPM	0.207911	0.2079	0.0031	4.9210
	CDC20	0.329027	0.3290	0.0000	8.5936
	FOXM1	0.170967	0.1710	0.0091	5.9345
	CEP55	0.304415	0.3044	0.0002	6.3694
	KIF4A	0.568168	0.5682	0.0001	3.1945
	NUSAP1	0.270014	0.2700	0.0061	6.7332
	PTTG1	0.791755	0.7918	0.0000	4.0029

Discussion

Two gene expression datasets of breast cancer were obtained and the 2,669 differentially expressed genes with a CV >0.5 were selected. These genes were implicated in the immune response, cell proliferation and cell migration. These functions were closely associated with the development and metastasis of cancer. A breast-cancer-specific gene co-expression network was constructed for dataset GSE2034, from which 7 modules were identified. The yellow module was closely associated with survival time and, as such, the 144 genes from yellow module were investigated further. These genes were primarily involved in the cell cycle and tumor protein p53 signaling pathway. The top 10 hub genes were identified in the yellow module, all of which were associated with poor patient prognosis. The majority of the 10 critical genes in the yellow module are associated with the cell cycle. CCNB2 is an essential component of the cell-cycle regulatory machinery (20). Elevated CCNB2 expression in invasive breast cancer is associated with unfavorable clinical outcomes (21). UBE2C is required for the degradation of mitotic cyclins and for cell-cycle progression, and is involved in cancer progression. UBE2C is highly expressed in breast microcalcification lesions (22). The prognostic value of UBE2C has been validated in several studies (23–25). microRNA-196a post-transcriptionally upregulates UBE2C and promotes cell proliferation in breast cancer (26). Inhibition of UBE2C reduces proliferation and sensitizes breast cancer cells to radiotherapy and chemotherapy (27), suggesting that it could serve as a potential therapeutic target. CDC20 is a regulatory protein in the cell cycle. Overexpression of CDC20 predicts short-term breast cancer survival (22). ASPM is essential for normal mitotic spindle function and is a marker for vascular invasion, early recurrence and poor prognosis of hepatocellular carcinoma (28). Increased ASPM expression is also associated with enhanced tumor grade and lower survival rates of epithelial ovarian cancer (29). A significant correlation between the expression of the CCNB2 and ASPM proteins is reported (21), which may serve a role in the development of breast cancer. FOXM1 is a transcriptional activator involved in cell proliferation, which is a downstream target and marker of HER2 overexpression in breast cancer (30). FOXM1 is implicated in the proliferation, migration and invasion of breast cancer cells (31,32) and serves a role in chemotherapy resistance (33,34). KIF4A is an ATP-dependent microtubule-based motor protein that is involved in the intracellular transport of membranous organelles. KIF4A is implicated in doxorubicin-induced apoptosis in breast cancer cells (35). NUSAP1 may be involved in tumorigenesis and in the processes of invasion and progression of breast cancer (36); it influences the DNA damage response by controlling the protein levels of BRCA1 (37). PTTG1 exhibits tumorigenic activity in vivo and is highly expressed in various tumors; it is associated with endocrine therapy resistance in breast cancer (38). PTTG1 may promote tumor malignancy via the epithelial-to-mesenchymal transition and the expansion of the cancer stem cell population (39). CEP55 is also involved in breast cancer progression (40), possibly exerting an oncogenic function via regulation of the phosphoinositide-3 kinase/protein kinase B pathway and midbody fate (41). PRC1 encodes a protein involved in cytokinesis, specifically the polarization of parallel microtubules, whose expression level changes markedly in the different phases of the cell cycle. PRC1 has been demonstrated to be a substrate of several cyclin-dependent kinases (CDKs); its alternative splicing results in multiple transcript variants (42,43). Although PRC1 serves an important role in the cell cycle, its role in breast cancer remains unclear. The results of the present study indicate that the role of PRC1 in the pathogenesis of breast cancer necessitates further study. Gene co-expression network analysis revealed several genes of prognostic significance in breast cancer. The majority of these genes have been validated by previous studies; however, the function of certain critical genes identified by gene co-expression network analysis in breast cancer remains unclear, thus providing targets for further studies. These prospective studied may disclose novel biomarkers or provide targets for breast cancer therapies.

41 in total

Review 1. Viewpoint: putting the cell cycle in order.

Authors: K Nasmyth
Journal: Science Date: 1996-12-06 Impact factor: 47.728

2. NUSAP1 influences the DNA damage response by controlling BRCA1 protein levels.

Authors: Shweta Kotian; Tapahsama Banerjee; Ainsley Lockhart; Kun Huang; Umit V Catalyurek; Jeffrey D Parvin
Journal: Cancer Biol Ther Date: 2014-02-12 Impact factor: 4.742

3. Prognostic significance of UBE2C mRNA expression in high-risk early breast cancer. A Hellenic Cooperative Oncology Group (HeCOG) Study.

Authors: A Psyrri; K T Kalogeras; R Kronenwett; R M Wirtz; A Batistatou; E Bournakis; E Timotheadou; H Gogas; G Aravantinos; C Christodoulou; T Makatsoris; H Linardou; D Pectasides; N Pavlidis; T Economopoulos; G Fountzilas
Journal: Ann Oncol Date: 2011-11-04 Impact factor: 32.976

4. PTTG1 oncogene promotes tumor malignancy via epithelial to mesenchymal transition and expansion of cancer stem cell population.

Authors: Chang-Hwan Yoon; Min-Jung Kim; Hyejin Lee; Rae-Kwon Kim; Eun-Jung Lim; Ki-Chun Yoo; Ga-Haeng Lee; Yan-Hong Cui; Yeong Seok Oh; Myung Chan Gye; Young Yiul Lee; In-Chul Park; Sungkwan An; Sang-Gu Hwang; Myung-Jin Park; Yongjoon Suh; Su-Jae Lee
Journal: J Biol Chem Date: 2012-04-16 Impact factor: 5.157

5. FoxM1 is a downstream target and marker of HER2 overexpression in breast cancer.

Authors: Richard E Francis; Stephen S Myatt; Janna Krol; Johan Hartman; Barrie Peck; Ursula B McGovern; Jun Wang; Stephanie K Guest; Aleksandra Filipovic; Ondrej Gojis; Carlo Palmieri; David Peston; Sami Shousha; Qunyan Yu; Piotr Sicinski; R Charles Coombes; Eric W-F Lam
Journal: Int J Oncol Date: 2009-07 Impact factor: 5.650

6. Identification of TACC1, NOV, and PTTG1 as new candidate genes associated with endocrine therapy resistance in breast cancer.

Authors: Sandra E Ghayad; Julie A Vendrell; Ivan Bieche; Frédérique Spyratos; Charles Dumontet; Isabelle Treilleux; Rosette Lidereau; Pascale A Cohen
Journal: J Mol Endocrinol Date: 2008-11-04 Impact factor: 5.098

7. FoxM1 down-regulation leads to inhibition of proliferation, migration and invasion of breast cancer cells through the modulation of extra-cellular matrix degrading factors.

Authors: Aamir Ahmad; Zhiwei Wang; Dejuan Kong; Shadan Ali; Yiwei Li; Sanjeev Banerjee; Raza Ali; Fazlul H Sarkar
Journal: Breast Cancer Res Treat Date: 2009-10-08 Impact factor: 4.872

8. Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways.

Authors: Jeffrey C Miecznikowski; Dan Wang; Song Liu; Lara Sucheston; David Gold
Journal: BMC Cancer Date: 2010-10-21 Impact factor: 4.430

9. PRC1 controls spindle polarization and recruitment of cytokinetic factors during monopolar cytokinesis.

Authors: Sanjay Shrestha; Lori Jo Wilmeth; Jarrett Eyer; Charles B Shuster
Journal: Mol Biol Cell Date: 2012-02-09 Impact factor: 4.138

10. Validation of UBE2C protein as a prognostic marker in node-positive breast cancer.

Authors: D Loussouarn; L Campion; F Leclair; M Campone; C Charbonnel; G Ricolleau; W Gouraud; R Bataille; P Jézéquel
Journal: Br J Cancer Date: 2009-06-09 Impact factor: 7.640

9 in total

1. Phylostratigraphic analysis of gene co-expression network reveals the evolution of functional modules for ovarian cancer.

Authors: Luoyan Zhang; Yi Tan; Shoujin Fan; Xuejie Zhang; Zhen Zhang
Journal: Sci Rep Date: 2019-02-22 Impact factor: 4.379

2. Integrated Bioinformatics Data Analysis Reveals Prognostic Significance Of SIDT1 In Triple-Negative Breast Cancer.

Authors: Ya Wang; Hanning Li; Jingjing Ma; Tian Fang; Xiaoting Li; Jiahao Liu; Henok Kessete Afewerky; Xiong Li; Qinglei Gao
Journal: Onco Targets Ther Date: 2019-10-11 Impact factor: 4.147

3. PPWD1 is associated with the occurrence of postmenopausal osteoporosis as determined by weighted gene co‑expression network analysis.

Authors: Guo-Feng Qian; Lu-Shun Yuan; Min Chen; Dan Ye; Guo-Ping Chen; Zhe Zhang; Cheng-Jiang Li; Vijith Vijayan; Yu Xiao
Journal: Mol Med Rep Date: 2019-08-07 Impact factor: 2.952

4. Integrated analysis of co-expression and ceRNA network identifies five lncRNAs as prognostic markers for breast cancer.

Authors: Yan Yao; Tingting Zhang; Lingyu Qi; Chao Zhou; Junyu Wei; Fubin Feng; Ruijuan Liu; Changgang Sun
Journal: J Cell Mol Med Date: 2019-10-15 Impact factor: 5.310

5. Common Nevus and Skin Cutaneous Melanoma: Prognostic Genes Identified by Gene Co-Expression Network Analysis.

Authors: Lingge Yang; Yu Xu; Yan Yan; Peng Luo; Shiqi Chen; Biqiang Zheng; Wangjun Yan; Yong Chen; Chunmeng Wang
Journal: Genes (Basel) Date: 2019-09-25 Impact factor: 4.096

6. A Novel IGLC2 Gene Linked With Prognosis of Triple-Negative Breast Cancer.

Authors: Yu-Tien Chang; Wen-Chiuan Tsai; Wei-Zhi Lin; Chia-Chao Wu; Jyh-Cherng Yu; Vincent S Tseng; Guo-Shiou Liao; Je-Ming Hu; Huan-Ming Hsu; Yu-Jia Chang; Meng-Chiung Lin; Chi-Ming Chu; Chien-Yi Yang
Journal: Front Oncol Date: 2022-01-27 Impact factor: 6.244