| Literature DB >> 29085450 |
Huijie Shi1, Lei Zhang2, Yanjun Qu1, Lifang Hou1, Ling Wang1, Min Zheng1.
Abstract
The aim of the present study was to identify genes that may serve as markers for breast cancer prognosis by constructing a gene co-expression network and mining modules associated with survival. Two gene expression datasets of breast cancer were downloaded from ArrayExpress and genes from these datasets with a coefficient of variation >0.5 were selected and underwent functional enrichment analysis with the Database for Annotation, Visualization and Integration Discovery. Gene co-expression networks were constructed with the WGCNA package in R. Modules were identified from the network via cluster analysis. Cox regression was conducted to analyze survival rates. A total of 2,669 genes were selected, and functional enrichment analysis of them revealed that they were mainly associated with the immune response, cell proliferation, cell differentiation and cell adhesion. Seven modules were identified from the gene co-expression network, one of which was found to be significantly associated with patient survival time. Expression status of 144 genes from this module was used to cluster patient samples into two groups, with a significant difference in survival time revealed between these groups. These genes were involved in the cell cycle and tumor protein p53 signaling pathway. The top 10 hub genes were identified in the module. The findings of the present study could advance the understanding of the molecular pathogenesis of breast cancer.Entities:
Keywords: breast cancer; functional enrichment analysis; gene co-expression network; hub genes; survival analysis
Year: 2017 PMID: 29085450 PMCID: PMC5649579 DOI: 10.3892/ol.2017.6779
Source DB: PubMed Journal: Oncol Lett ISSN: 1792-1074 Impact factor: 2.967
Figure 1.Box plots of normalized gene expression data of two datasets. (A) GSE2034 (286 samples) and (B) GSE25066 (200 samples randomly selected from the total 508 samples). The average total mRNA expression level in each sample was consistent, indicating that a good performance of normalization was achieved. The x-axis represents the gene expression level; the y-axis represents the samples.
Top 15 significantly over-represented biological pathways.
| ID | Description | P-value | Adjusted P-value |
|---|---|---|---|
| GO:0006955 | Immune response | 2.19×10−63 | 3.27×10−61 |
| GO:0006952 | Defense response | 2.15×10−57 | 2.80×10−55 |
| GO:0006950 | Response to stress | 1.57×10−56 | 1.96×10−54 |
| GO:0007166 | Cell-surface receptor signaling pathway | 1.20×10−55 | 1.43×10−53 |
| GO:0008283 | Cell proliferation | 1.06×10−49 | 1.09×10−47 |
| GO:0002682 | Regulation of immune system process | 6.12×10−42 | 4.82×10−40 |
| GO:0016477 | Cell migration | 7.58×10−40 | 5.66×10−38 |
| GO:0045321 | Leukocyte activation | 1.90×10−39 | 1.32×10−37 |
| GO:0006954 | Inflammatory response | 3.92×10−38 | 2.66×10−36 |
| GO:0048584 | Positive regulation of response to stimulus | 6.10×10−38 | 4.05×10−36 |
| GO:0042127 | Regulation of cell proliferation | 1.72×10−37 | 1.10×10−35 |
| GO:0030154 | Cell differentiation | 3.01×10−34 | 1.70×10−32 |
| GO:0048869 | Cellular developmental process | 2.06×10−33 | 1.14×10−31 |
| GO:0007155 | Cell adhesion | 7.77×10−33 | 4.22×10−31 |
| GO:0022610 | Biological adhesion | 1.11×10−32 | 5.90×10−31 |
Adjusted P-value: Use the multiple comparisons in General Linear Model ANOVA, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.
Figure 2.Gene co-expression networks for datasets GSE2034 (left) and GSE25066 (right). The x-axis represents the degree of the node, k, while the y-axis represents proportion of genes with degree of k, p (k).
Figure 3.Seven modules identified from the gene co-expression network. Cluster analysis result is shown above and module identification shown below.
Figure 4.Scatter plots of the degree and P-value of Cox regression in datasets (A) GSE2034 and (B) GSE25066. The x-axis indicates the degree of regression, the y-axis indicates the P-value. Each circle represents a gene.
Figure 5.Survival-associated genes in each module. The x-axis indicates the module, the y-axis indicates the significance of over-representation.
Figure 6.Cluster analysis using the degree of expression of 144 survival-associated genes for the samples in the GSE2034 dataset.
Figure 7.Survival curves for the two groups of breast cancer patient samples clustered according to expression of the 144 genes.
KEGG pathways enriched in the 144 genes of the yellow module.
| ID | Description | P-value | Adjusted P-value |
|---|---|---|---|
| hsa04110 | Cell cycle | 5.22×10−18 | 3.13×10−17 |
| hsa04114 | Oocyte meiosis | 2.17×10−9 | 6.50×10−9 |
| hsa04115 | p53 signaling pathway | 2.46×10−5 | 4.91×10−5 |
| hsa04914 | Progesterone-mediated oocyte maturation | 9.19×10−5 | 1.38×10−4 |
p53, tumor protein p53. Adjusted P-value: Using multiple comparisons in a general linear model analysis of variance, the adjusted P-value indicates which factor level comparisons within a family of comparisons (hypothesis tests) are significantly different.
Top 10 hub genes in the yellow module.
| Dataset | Gene name | Coefficient | P-value | ||
|---|---|---|---|---|---|
| GSE2034 | CCNB2 | 0.3640 | 0.0003 | 14.7998 | 12.4392 |
| PRC1 | 0.3868 | 0.0005 | 12.9603 | 11.3677 | |
| UBE2C | 0.4281 | 0.0006 | 14.1236 | 11.2433 | |
| ASPM | 0.3442 | 0.0002 | 12.9467 | 10.9328 | |
| CDC20 | 0.2339 | 0.0065 | 14.6847 | 10.7527 | |
| FOXM1 | 0.1988 | 0.0168 | 13.7352 | 10.7131 | |
| CEP55 | 0.3691 | 0.0004 | 12.6988 | 10.6131 | |
| KIF4A | 0.2648 | 0.0217 | 12.1095 | 10.3165 | |
| NUSAP1 | 0.3931 | 0.0012 | 11.7988 | 10.2885 | |
| PTTG1 | 0.4027 | 0.0019 | 12.4981 | 10.2449 | |
| GSE25066 | CCNB2 | 0.323932 | 0.3239 | 0.0006 | 9.4109 |
| PRC1 | 0.276034 | 0.2760 | 0.0023 | 6.6109 | |
| UBE2C | 0.381925 | 0.3819 | 0.0003 | 6.1036 | |
| ASPM | 0.207911 | 0.2079 | 0.0031 | 4.9210 | |
| CDC20 | 0.329027 | 0.3290 | 0.0000 | 8.5936 | |
| FOXM1 | 0.170967 | 0.1710 | 0.0091 | 5.9345 | |
| CEP55 | 0.304415 | 0.3044 | 0.0002 | 6.3694 | |
| KIF4A | 0.568168 | 0.5682 | 0.0001 | 3.1945 | |
| NUSAP1 | 0.270014 | 0.2700 | 0.0061 | 6.7332 | |
| PTTG1 | 0.791755 | 0.7918 | 0.0000 | 4.0029 |