Literature DB >> 29085450

Prognostic genes of breast cancer revealed by gene co-expression network analysis.

Huijie Shi1, Lei Zhang2, Yanjun Qu1, Lifang Hou1, Ling Wang1, Min Zheng1.   

Abstract

The aim of the present study was to identify genes that may serve as markers for breast cancer prognosis by constructing a gene co-expression network and mining modules associated with survival. Two gene expression datasets of breast cancer were downloaded from ArrayExpress and genes from these datasets with a coefficient of variation >0.5 were selected and underwent functional enrichment analysis with the Database for Annotation, Visualization and Integration Discovery. Gene co-expression networks were constructed with the WGCNA package in R. Modules were identified from the network via cluster analysis. Cox regression was conducted to analyze survival rates. A total of 2,669 genes were selected, and functional enrichment analysis of them revealed that they were mainly associated with the immune response, cell proliferation, cell differentiation and cell adhesion. Seven modules were identified from the gene co-expression network, one of which was found to be significantly associated with patient survival time. Expression status of 144 genes from this module was used to cluster patient samples into two groups, with a significant difference in survival time revealed between these groups. These genes were involved in the cell cycle and tumor protein p53 signaling pathway. The top 10 hub genes were identified in the module. The findings of the present study could advance the understanding of the molecular pathogenesis of breast cancer.

Entities:  

Keywords:  breast cancer; functional enrichment analysis; gene co-expression network; hub genes; survival analysis

Year:  2017        PMID: 29085450      PMCID: PMC5649579          DOI: 10.3892/ol.2017.6779

Source DB:  PubMed          Journal:  Oncol Lett        ISSN: 1792-1074            Impact factor:   2.967


Introduction

Breast cancer is the most common type of cancer in women, accounting for 25% of all cases (1). Risk factors include lifestyle (including smoking or diet), genetics and medical conditions. A number of treatment methods are now available for breast cancer, including surgery, radiotherapy, chemotherapy, hormone therapy and targeted therapy. However, certain patients have a poor prognosis and the molecular mechanisms underlying this remain unclear. Prognostic factors include disease stage and grade, recurrence of the disease, and the age and health of the patient. With advances in technology and the accumulation of research results, certain molecular markers associated with breast cancer have been well studied. Tumor protein p53 mutations are poor prognostic factors in breast cancer (2). MYC proto-oncogene and bHLH transcription factor-driven accumulation of 2-hydroxyglutarate are associated with poor breast cancer prognosis (3). Prostaglandin-endoperoxide synthase 2 expression predicts worse breast cancer prognosis (4). Ki-67 has been associated with disease-free survival, but its prognostic value remains to be validated (5). Matrix metalloproteinase-8 gene variation may influence breast cancer prognosis and can have an inhibitory effect on cancer metastasis (6). A gene signature involved in tumor-immune interactions may provide a more accurate prognostic tool (7). Zhang et al (8) performed a meta-analysis and demonstrated that overexpression of C-X-C motif chemokine receptor 4 was significantly associated with lymph node status and distant metastasis, indicating poor overall and disease free survival. SRY-box 4 overexpression is a biomarker for malignant status and poor prognosis in breast cancer patients (9). A number of other novel biomarkers have also been also identified, including chromobox homolog 1 (10), HOX transcript antisense intergenic RNA (9) and anterior gradient 3 (11). Nevertheless, more prognostic genes are required to further improve treatment decisions and thus the quality of life of patients with breast cancer. Microarray technology has been widely used to identify biomarkers of breast cancer (12,13), allowing for the large-scale screening of molecular markers. In the present study, two gene expression datasets were obtained to reveal prognostic genes (14,15). One dataset was used with the aim of identifying genes associated with the distant metastasis of lymph-node-negative primary breast cancer (14); the other was used to identify genes involved in response and survival following taxane-anthracycline chemotherapy in breast cancer (14). The two datasets were combined to construct a gene co-expression network and analyze survival time to identify novel biomarkers associated with breast cancer prognosis.

Materials and methods

Raw data and pre-treatment

Two gene expression datasets, GSE2034 (14) and GSE25066 (15), were downloaded from ArrayExpress (https://www.ebi.ac.uk/arrayexpress/). Dataset GSE2034 included 286 breast cancer samples and dataset GSE25066 included 508 breast cancer samples. The two gene expression datasets were obtained using Affymetrix GPL96 platform. Normalization was performed with rma from the affy package (16) in R (R 3.2.0; https://www.r-project.org/) and then log2 conversion was applied. Probes were mapped onto genes according to annotation files. Probes mapping to the same gene were averaged as the expression level for the gene.

Functional enrichment analysis

Gene Ontology (GO) annotation and pathway enrichment analysis were performed with DAVID (Database for Annotation, Visualization and Integration Discovery; http://david.abcc.ncifcrf.gov/) (17).

Gene co-expression network and modules

The gene co-expression network was constructed with the WGCNA package (18) in R. The adjacency coefficient a was calculated as follows: Where x and x are vectors of expression value for gene i and j; cor represents the Pearson's correlation coefficient of the two vectors; a is the adjacency coefficient and is acquired via exponential transform of S. WGCNA method takes topological properties into consideration to identify modules from gene co-expression networks. Therefore, this method not only considers the association between the two connected nodes, but also takes associated genes into account. It calculates the weighting coefficient W from a as follows: W considers the overlap between neighbor genes of genes i and j. Modules were identified via hierarchical clustering of the weighting coefficient matrix, W.

Survival analysis

Cox regression was performed with hub genes from the modules to identify survival-associated genes, and Kaplan-Meier survival was used to compare the survival time of different groups, which were performed with the Survival package in R (https://cran.r-project.org/web/views/Survival.html). P<0.05 was considered to indicate a statistically significant difference. Pearson's correlation was performed by cor function in R (19).

Results

Gene expression data

A total of 13,191 genes were identified in the GSE2034 and GSE25066 datasets, for which box plots are presented in Fig. 1. According to the box plots, the average total mRNA expression level in each sample was consistent, indicating that a good performance of normalization was achieved for both datasets.
Figure 1.

Box plots of normalized gene expression data of two datasets. (A) GSE2034 (286 samples) and (B) GSE25066 (200 samples randomly selected from the total 508 samples). The average total mRNA expression level in each sample was consistent, indicating that a good performance of normalization was achieved. The x-axis represents the gene expression level; the y-axis represents the samples.

A total of 2,669 genes with coefficient of variation (CV) >0.5 were selected. Functional enrichment analysis revealed that they were associated primarily with immune response, cell proliferation, cell differentiation and cell adhesion (Table I).
Table I.

Top 15 significantly over-represented biological pathways.

IDDescriptionP-valueAdjusted P-value
GO:0006955Immune response2.19×10−633.27×10−61
GO:0006952Defense response2.15×10−572.80×10−55
GO:0006950Response to stress1.57×10−561.96×10−54
GO:0007166Cell-surface receptor signaling pathway1.20×10−551.43×10−53
GO:0008283Cell proliferation1.06×10−491.09×10−47
GO:0002682Regulation of immune system process6.12×10−424.82×10−40
GO:0016477Cell migration7.58×10−405.66×10−38
GO:0045321Leukocyte activation1.90×10−391.32×10−37
GO:0006954Inflammatory response3.92×10−382.66×10−36
GO:0048584Positive regulation of response to stimulus6.10×10−384.05×10−36
GO:0042127Regulation of cell proliferation1.72×10−371.10×10−35
GO:0030154Cell differentiation3.01×10−341.70×10−32
GO:0048869Cellular developmental process2.06×10−331.14×10−31
GO:0007155Cell adhesion7.77×10−334.22×10−31
GO:0022610Biological adhesion1.11×10−325.90×10−31

Adjusted P-value: Use the multiple comparisons in General Linear Model ANOVA, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.

Prognostic genes

Two gene co-expression networks were constructed for the two datasets by WGCNA (Fig. 2). Seven modules were identified from the network of GSE2034 via hierarchical clustering of the weighting coefficient matrix, W (Fig. 3). The modules were termed the red, blue, green, black, brown, yellow and turquoise modules.
Figure 2.

Gene co-expression networks for datasets GSE2034 (left) and GSE25066 (right). The x-axis represents the degree of the node, k, while the y-axis represents proportion of genes with degree of k, p (k).

Figure 3.

Seven modules identified from the gene co-expression network. Cluster analysis result is shown above and module identification shown below.

The degree, k, for each gene in the module was calculated and the P-value of Cox regression between each gene and survival was also determined. Next, the correlation between k and -log10 (P) was calculated. The yellow module exhibited significant correlation with survival time in dataset GSE2034 (P=9.3×10−13) (Fig. 4A), which was also observed in dataset GSE25066 (P=9.3×10−6) (Fig. 4B). Besides, survival-associated genes (P<0.05 in Cox regression) were significantly over-represented in the yellow module in both datasets (Fig. 5). Therefore, the yellow module was considered to be significantly associated with breast cancer patient survival, which should be further investigated to understand the association between survival time and critical gene expression.
Figure 4.

Scatter plots of the degree and P-value of Cox regression in datasets (A) GSE2034 and (B) GSE25066. The x-axis indicates the degree of regression, the y-axis indicates the P-value. Each circle represents a gene.

Figure 5.

Survival-associated genes in each module. The x-axis indicates the module, the y-axis indicates the significance of over-representation.

The 144 genes from the yellow module were used in the cluster analysis of samples from dataset GSE2034, which separated the patient samples into two groups based on the expression of these genes (Fig. 6). A significant difference in survival time was observed between the two groups (P=0.008; Fig. 7). Functional enrichment analysis indicated that the 144 genes from the yellow module were involved in cell cycle, oocyte meiosis, the tumor protein p53 signaling pathway and progesterone-mediated oocyte maturation (Table II).
Figure 6.

Cluster analysis using the degree of expression of 144 survival-associated genes for the samples in the GSE2034 dataset.

Figure 7.

Survival curves for the two groups of breast cancer patient samples clustered according to expression of the 144 genes.

Table II.

KEGG pathways enriched in the 144 genes of the yellow module.

IDDescriptionP-valueAdjusted P-value
hsa04110Cell cycle5.22×10−183.13×10−17
hsa04114Oocyte meiosis2.17×10−96.50×10−9
hsa04115p53 signaling pathway2.46×10−54.91×10−5
hsa04914Progesterone-mediated oocyte maturation9.19×10−51.38×10−4

p53, tumor protein p53. Adjusted P-value: Using multiple comparisons in a general linear model analysis of variance, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.

The top 10 hub genes from the yellow module were selected (Table III) and included cyclin B2 (CCNB2), ubiquitin-conjugating enzyme E2C (UBE2C), protein regulator of cytokinesis 1 (PRC1), cell division cycle 20 (CDC20), abnormal spindle microtubule assembly (ASPM), forkhead box M1 (FOXM1), kinesin family member 4A (KIF4A), nucleolar and spindle associated protein 1 (NUSAP1), pituitary tumor-transforming 1 (PTTG1) and centrosomal protein 55 kDa (CEP55). All of these genes were significantly associated with survival time in the two datasets.
Table III.

Top 10 hub genes in the yellow module.

DatasetGene nameCoefficientP-valuekTotalkWithin
GSE2034CCNB20.36400.000314.799812.4392
PRC10.38680.000512.960311.3677
UBE2C0.42810.000614.123611.2433
ASPM0.34420.000212.946710.9328
CDC200.23390.006514.684710.7527
FOXM10.19880.016813.735210.7131
CEP550.36910.000412.698810.6131
KIF4A0.26480.021712.109510.3165
NUSAP10.39310.001211.798810.2885
PTTG10.40270.001912.498110.2449
GSE25066CCNB20.3239320.32390.00069.4109
PRC10.2760340.27600.00236.6109
UBE2C0.3819250.38190.00036.1036
ASPM0.2079110.20790.00314.9210
CDC200.3290270.32900.00008.5936
FOXM10.1709670.17100.00915.9345
CEP550.3044150.30440.00026.3694
KIF4A0.5681680.56820.00013.1945
NUSAP10.2700140.27000.00616.7332
PTTG10.7917550.79180.00004.0029

Discussion

Two gene expression datasets of breast cancer were obtained and the 2,669 differentially expressed genes with a CV >0.5 were selected. These genes were implicated in the immune response, cell proliferation and cell migration. These functions were closely associated with the development and metastasis of cancer. A breast-cancer-specific gene co-expression network was constructed for dataset GSE2034, from which 7 modules were identified. The yellow module was closely associated with survival time and, as such, the 144 genes from yellow module were investigated further. These genes were primarily involved in the cell cycle and tumor protein p53 signaling pathway. The top 10 hub genes were identified in the yellow module, all of which were associated with poor patient prognosis. The majority of the 10 critical genes in the yellow module are associated with the cell cycle. CCNB2 is an essential component of the cell-cycle regulatory machinery (20). Elevated CCNB2 expression in invasive breast cancer is associated with unfavorable clinical outcomes (21). UBE2C is required for the degradation of mitotic cyclins and for cell-cycle progression, and is involved in cancer progression. UBE2C is highly expressed in breast microcalcification lesions (22). The prognostic value of UBE2C has been validated in several studies (23–25). microRNA-196a post-transcriptionally upregulates UBE2C and promotes cell proliferation in breast cancer (26). Inhibition of UBE2C reduces proliferation and sensitizes breast cancer cells to radiotherapy and chemotherapy (27), suggesting that it could serve as a potential therapeutic target. CDC20 is a regulatory protein in the cell cycle. Overexpression of CDC20 predicts short-term breast cancer survival (22). ASPM is essential for normal mitotic spindle function and is a marker for vascular invasion, early recurrence and poor prognosis of hepatocellular carcinoma (28). Increased ASPM expression is also associated with enhanced tumor grade and lower survival rates of epithelial ovarian cancer (29). A significant correlation between the expression of the CCNB2 and ASPM proteins is reported (21), which may serve a role in the development of breast cancer. FOXM1 is a transcriptional activator involved in cell proliferation, which is a downstream target and marker of HER2 overexpression in breast cancer (30). FOXM1 is implicated in the proliferation, migration and invasion of breast cancer cells (31,32) and serves a role in chemotherapy resistance (33,34). KIF4A is an ATP-dependent microtubule-based motor protein that is involved in the intracellular transport of membranous organelles. KIF4A is implicated in doxorubicin-induced apoptosis in breast cancer cells (35). NUSAP1 may be involved in tumorigenesis and in the processes of invasion and progression of breast cancer (36); it influences the DNA damage response by controlling the protein levels of BRCA1 (37). PTTG1 exhibits tumorigenic activity in vivo and is highly expressed in various tumors; it is associated with endocrine therapy resistance in breast cancer (38). PTTG1 may promote tumor malignancy via the epithelial-to-mesenchymal transition and the expansion of the cancer stem cell population (39). CEP55 is also involved in breast cancer progression (40), possibly exerting an oncogenic function via regulation of the phosphoinositide-3 kinase/protein kinase B pathway and midbody fate (41). PRC1 encodes a protein involved in cytokinesis, specifically the polarization of parallel microtubules, whose expression level changes markedly in the different phases of the cell cycle. PRC1 has been demonstrated to be a substrate of several cyclin-dependent kinases (CDKs); its alternative splicing results in multiple transcript variants (42,43). Although PRC1 serves an important role in the cell cycle, its role in breast cancer remains unclear. The results of the present study indicate that the role of PRC1 in the pathogenesis of breast cancer necessitates further study. Gene co-expression network analysis revealed several genes of prognostic significance in breast cancer. The majority of these genes have been validated by previous studies; however, the function of certain critical genes identified by gene co-expression network analysis in breast cancer remains unclear, thus providing targets for further studies. These prospective studied may disclose novel biomarkers or provide targets for breast cancer therapies.
  41 in total

Review 1.  Viewpoint: putting the cell cycle in order.

Authors:  K Nasmyth
Journal:  Science       Date:  1996-12-06       Impact factor: 47.728

2.  NUSAP1 influences the DNA damage response by controlling BRCA1 protein levels.

Authors:  Shweta Kotian; Tapahsama Banerjee; Ainsley Lockhart; Kun Huang; Umit V Catalyurek; Jeffrey D Parvin
Journal:  Cancer Biol Ther       Date:  2014-02-12       Impact factor: 4.742

3.  Prognostic significance of UBE2C mRNA expression in high-risk early breast cancer. A Hellenic Cooperative Oncology Group (HeCOG) Study.

Authors:  A Psyrri; K T Kalogeras; R Kronenwett; R M Wirtz; A Batistatou; E Bournakis; E Timotheadou; H Gogas; G Aravantinos; C Christodoulou; T Makatsoris; H Linardou; D Pectasides; N Pavlidis; T Economopoulos; G Fountzilas
Journal:  Ann Oncol       Date:  2011-11-04       Impact factor: 32.976

4.  PTTG1 oncogene promotes tumor malignancy via epithelial to mesenchymal transition and expansion of cancer stem cell population.

Authors:  Chang-Hwan Yoon; Min-Jung Kim; Hyejin Lee; Rae-Kwon Kim; Eun-Jung Lim; Ki-Chun Yoo; Ga-Haeng Lee; Yan-Hong Cui; Yeong Seok Oh; Myung Chan Gye; Young Yiul Lee; In-Chul Park; Sungkwan An; Sang-Gu Hwang; Myung-Jin Park; Yongjoon Suh; Su-Jae Lee
Journal:  J Biol Chem       Date:  2012-04-16       Impact factor: 5.157

5.  FoxM1 is a downstream target and marker of HER2 overexpression in breast cancer.

Authors:  Richard E Francis; Stephen S Myatt; Janna Krol; Johan Hartman; Barrie Peck; Ursula B McGovern; Jun Wang; Stephanie K Guest; Aleksandra Filipovic; Ondrej Gojis; Carlo Palmieri; David Peston; Sami Shousha; Qunyan Yu; Piotr Sicinski; R Charles Coombes; Eric W-F Lam
Journal:  Int J Oncol       Date:  2009-07       Impact factor: 5.650

6.  Identification of TACC1, NOV, and PTTG1 as new candidate genes associated with endocrine therapy resistance in breast cancer.

Authors:  Sandra E Ghayad; Julie A Vendrell; Ivan Bieche; Frédérique Spyratos; Charles Dumontet; Isabelle Treilleux; Rosette Lidereau; Pascale A Cohen
Journal:  J Mol Endocrinol       Date:  2008-11-04       Impact factor: 5.098

7.  FoxM1 down-regulation leads to inhibition of proliferation, migration and invasion of breast cancer cells through the modulation of extra-cellular matrix degrading factors.

Authors:  Aamir Ahmad; Zhiwei Wang; Dejuan Kong; Shadan Ali; Yiwei Li; Sanjeev Banerjee; Raza Ali; Fazlul H Sarkar
Journal:  Breast Cancer Res Treat       Date:  2009-10-08       Impact factor: 4.872

8.  Comparative survival analysis of breast cancer microarray studies identifies important prognostic genetic pathways.

Authors:  Jeffrey C Miecznikowski; Dan Wang; Song Liu; Lara Sucheston; David Gold
Journal:  BMC Cancer       Date:  2010-10-21       Impact factor: 4.430

9.  PRC1 controls spindle polarization and recruitment of cytokinetic factors during monopolar cytokinesis.

Authors:  Sanjay Shrestha; Lori Jo Wilmeth; Jarrett Eyer; Charles B Shuster
Journal:  Mol Biol Cell       Date:  2012-02-09       Impact factor: 4.138

10.  Validation of UBE2C protein as a prognostic marker in node-positive breast cancer.

Authors:  D Loussouarn; L Campion; F Leclair; M Campone; C Charbonnel; G Ricolleau; W Gouraud; R Bataille; P Jézéquel
Journal:  Br J Cancer       Date:  2009-06-09       Impact factor: 7.640

View more
  9 in total

1.  Phylostratigraphic analysis of gene co-expression network reveals the evolution of functional modules for ovarian cancer.

Authors:  Luoyan Zhang; Yi Tan; Shoujin Fan; Xuejie Zhang; Zhen Zhang
Journal:  Sci Rep       Date:  2019-02-22       Impact factor: 4.379

2.  Integrated Bioinformatics Data Analysis Reveals Prognostic Significance Of SIDT1 In Triple-Negative Breast Cancer.

Authors:  Ya Wang; Hanning Li; Jingjing Ma; Tian Fang; Xiaoting Li; Jiahao Liu; Henok Kessete Afewerky; Xiong Li; Qinglei Gao
Journal:  Onco Targets Ther       Date:  2019-10-11       Impact factor: 4.147

3.  PPWD1 is associated with the occurrence of postmenopausal osteoporosis as determined by weighted gene co‑expression network analysis.

Authors:  Guo-Feng Qian; Lu-Shun Yuan; Min Chen; Dan Ye; Guo-Ping Chen; Zhe Zhang; Cheng-Jiang Li; Vijith Vijayan; Yu Xiao
Journal:  Mol Med Rep       Date:  2019-08-07       Impact factor: 2.952

4.  Integrated analysis of co-expression and ceRNA network identifies five lncRNAs as prognostic markers for breast cancer.

Authors:  Yan Yao; Tingting Zhang; Lingyu Qi; Chao Zhou; Junyu Wei; Fubin Feng; Ruijuan Liu; Changgang Sun
Journal:  J Cell Mol Med       Date:  2019-10-15       Impact factor: 5.310

5.  Common Nevus and Skin Cutaneous Melanoma: Prognostic Genes Identified by Gene Co-Expression Network Analysis.

Authors:  Lingge Yang; Yu Xu; Yan Yan; Peng Luo; Shiqi Chen; Biqiang Zheng; Wangjun Yan; Yong Chen; Chunmeng Wang
Journal:  Genes (Basel)       Date:  2019-09-25       Impact factor: 4.096

6.  A Novel IGLC2 Gene Linked With Prognosis of Triple-Negative Breast Cancer.

Authors:  Yu-Tien Chang; Wen-Chiuan Tsai; Wei-Zhi Lin; Chia-Chao Wu; Jyh-Cherng Yu; Vincent S Tseng; Guo-Shiou Liao; Je-Ming Hu; Huan-Ming Hsu; Yu-Jia Chang; Meng-Chiung Lin; Chi-Ming Chu; Chien-Yi Yang
Journal:  Front Oncol       Date:  2022-01-27       Impact factor: 6.244

7.  Eight hub genes as potential biomarkers for breast cancer diagnosis and prognosis: A TCGA-based study.

Authors:  Nan Liu; Guo-Duo Zhang; Ping Bai; Li Su; Hao Tian; Miao He
Journal:  World J Clin Oncol       Date:  2022-08-24

8.  Comparative Analysis of Gene Correlation Networks of Breast Cancer Patients Based on Mutations in TP53.

Authors:  Byungkyu Park; Jinho Im; Kyungsook Han
Journal:  Biomolecules       Date:  2022-07-13

9.  Prognostic Genes of Breast Cancer Identified by Gene Co-expression Network Analysis.

Authors:  Jianing Tang; Deguang Kong; Qiuxia Cui; Kun Wang; Dan Zhang; Yan Gong; Gaosong Wu
Journal:  Front Oncol       Date:  2018-09-11       Impact factor: 6.244

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.