Li Chen1, Xueying Ke2. 1. Department of Colorectal Surgery. 2. Department of General Surgery, Sir Run Run Shaw Hospital, School of Medicine, Zhejiang University, Hangzhou, China.
Abstract
ABSTRACT: Colon cancer is one of the most common cancers in the world. To identify the candidate genes in the carcinogenesis and progression of colon cancer, the microarray datasets GSE10950, GSE44861 and GSE74602 were downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified, and functional enrichment analyses were performed. A total of 176 DEGs were identified, consisting of 55 genes upregulated and 121 genes downregulated in colon cancer tissues compared to non-cancerous tissues. The DEGs were mainly enriched in mineral absorption, nitrogen metabolism and complement and coagulation cascades. By using STRING database analysis, we constructed a coexpression network composed of 140 nodes and 280 edges for the DEGs with a combined score >0.4 and a significant interaction relation. Thirteen hub genes were identified, and poor OS of patients was only associated with high expression of Matrix Metallopeptidase 7 (MMP7), which may be involved in the carcinogenesis, invasion or recurrence of colon cancer. In conclusion, we propose that the DEGs and hub genes identified in the present study may be regarded as diagnostic biomarkers for colon cancer. Moreover, the overexpression of MMP7 may correlate with poor prognosis.
ABSTRACT: Colon cancer is one of the most common cancers in the world. To identify the candidate genes in the carcinogenesis and progression of colon cancer, the microarray datasets GSE10950, GSE44861 and GSE74602 were downloaded from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) were identified, and functional enrichment analyses were performed. A total of 176 DEGs were identified, consisting of 55 genes upregulated and 121 genes downregulated in colon cancer tissues compared to non-cancerous tissues. The DEGs were mainly enriched in mineral absorption, nitrogen metabolism and complement and coagulation cascades. By using STRING database analysis, we constructed a coexpression network composed of 140 nodes and 280 edges for the DEGs with a combined score >0.4 and a significant interaction relation. Thirteen hub genes were identified, and poor OS of patients was only associated with high expression of Matrix Metallopeptidase 7 (MMP7), which may be involved in the carcinogenesis, invasion or recurrence of colon cancer. In conclusion, we propose that the DEGs and hub genes identified in the present study may be regarded as diagnostic biomarkers for colon cancer. Moreover, the overexpression of MMP7 may correlate with poor prognosis.
Colon cancer is a digestive tract cancer with high morbidity and mortality.[ In recent years, although the survival time of patients has increased with the development of science and technology and the improvement of surgical skills, many patients still suffer from poor prognosis and distant metastasis, which seriously affects their quality of life. The prognosis of colon cancer is closely related to early diagnosis and treatment.[ Due to the lack of effective early diagnostic methods, many patients are diagnosed at late stages, and most of them cannot undergo radical surgery; without surgery, the prognosis of patients is worse. In recent years, there have been many studies on the pathogenesis and treatment of colon cancer, but the effects of colon cancer treatment are still unsatisfactory, and we lack effective methods to decrease the mortality rate.[ Therefore, it is necessary to find more efficient early diagnostic molecules and new therapeutic targets to establish an effective system for prevention, early diagnosis and treatment.At present, research on genes and genomics is developing rapidly, and it is very helpful for understanding the underlying mechanism of certain diseases. For example, microarray technology and bioinformatics analysis have been widely used in research on cancer genetics.[In this study, we downloaded and analyzed three mRNA microarray datasets from the Gene Expression Omnibus (GEO) to get differentially expressed genes (DEGs) between colon cancer tissues and normal control tissues. Subsequently, we conducted Gene Ontology (GO), Kyoto Encyclopedia of genes and genomes (KEGG) pathway enrichment analysis and protein-protein interaction (PPI) network analysis to help us understand the molecular mechanisms of carcinogenesis and progression. In conclusion, 176 DEGs and 13 hub genes were identified, which may be the potential biomarkers of colon cancer.
Materials and methods
Microarray data
GEO[ (http://www.ncbi.nlm.nih.gov/geo) is a public functional genomics data warehouse for gene expression data, chips and microarrays. Three gene expression datasets (GSE10950,[ GSE44861 and GSE74602 [) were downloaded from GEO, which have been stored in GPL6104 (Illumina humanRef-8 v2.0 expression beadchip) and GPL3921 (Affymetrix HT Human Genome U133A Array). According to the annotation information in the platform, the probes were transformed into corresponding gene symbols. The GSE10950 dataset contains 24 colon cancer tissue samples and 24 non-cancerous tissue samples. GSE44861 contained 56 colon cancer tissue samples and 55 non-cancerous tissue samples. GSE74602 contained 30 colon cancer tissue samples and 30 non-cancerous tissue samples. This study was approved by the ethics committee of Sir Run Run Shaw Hospital Zhejiang University College of Medicine.
Identification of DEGs
Screening of DEGs in colon cancer and non-cancerous tissues by GEO2R method (http://www.ncbi.nlm.nih.gov/geo/geo2r). GEO2R is an interactive web tool that allows users to compare two or more datasets in the GEO series to identify DEGs under different experimental conditions. |log FC| >1 and adj. P-value <.01 were considered statistically significant.
Enrichment analysis of DEGs
The Database for Annotation, Visualization, and Integrated Discovery (DAVID)[ is a full-featured functional annotation tool, which has been used for systematic and comprehensive analysis of large gene lists. The significance of the gene ontology (GO)[ biological process terms and Kyoto Encyclopedia of Genes and Genomes (KEGG)[ pathway enrichment analyses were determined by DAVID database, P < .05.
PPI network construction and module analysis
The PPI network of DEGs was predicted by using the online database search tool of STRING.[ Then, the interaction of DEGs was selected to construct PPI network (combined score > 0.4), and cell landscape visualized using Cytoscape.[ The Molecular Complex Detection (MCODE)[ plugin in Cytoscape was used to identify the important modules of PPI network. The selection criteria were as follows: MCODE scores >5, degree cut-off = 2, node score cut-off = 0.2, max depth = 100 and k-score = 2. DAVID database was used to function and pathway enrichment analysis of DEGs in each module.
Hub genes selection and analysis
Genes with degrees ≥10 were selected as hub genes. The online tool cBioPortal was used to analyze the network of genes and their co-expressed genes[ (http://www.cbioportal.org). The biological process analysis of hub genes was carried out and visualized by using the Biological Networks Gene Oncology tool (BiNGO)[ plugin for Cytoscape. The UCSC Cancer Genomics Browser[ (http://genome-cancer.ucsc.edu) was used to construct the hierarchical clustering of hub genes. The Kaplan-Meier curve was used to performing the survival analyses of hub genes by TCGA database. Online meta-analysis analysis of tumor genes in the different gene databases was carried out by using the online database Oncomine (http://www.oncomine.com).
Results
After preprocessing of microarray results, DEGs (3978 in GSE10950, 412 in GSE44861 and 1673 in GSE74602) were identified. The overlap of the three datasets contains 176 genes. As shown in Venn diagram (Fig. 1), 55 genes were up-regulated and 121 genes were down-regulated in colon cancer tissues compared to non-cancerous tissues.
Figure 1
Venn diagram of DEGs. DEGs = differentially expressed genes.
Venn diagram of DEGs. DEGs = differentially expressed genes.We uploaded all 176 genes to the online software David to determine overrepresented GO categories and KEGG pathway. The results of GO analysis showed that the changes of the DEGs biological process (BP) were significantly enriched in the cellular response to zinc ion, negative regulation of growth and transport of bicarbonate. The changes of the cell composition (CC) were mainly enriched in the extracellular space, extracellular exosomes and extracellular regions. The changes of molecular function (MF) of DEGs were mainly enriched in metalloendopeptidase activity, zinc ion binding and hormone activity. KEGG pathway analysis showed that DEGs were mainly enriched in mineral absorption, nitrogen metabolism and complement and coagulation cascades (Table 1).
Table 1
GO and KEGG pathway enrichment analysis of DEGs in colon cancer.
ID
Term
Count
P-value
GO function
GO_BP:0071294
Cellular response to zinc ion
7
1.80E-08
GO_BP:0045926
Negative regulation of growth
7
1.80E-08
GO_BP:0015701
Bicarbonate transport
7
3.83E-06
GO_BP:0006508
Proteolysis
18
6.95E-06
GO_BP:0071276
Cellular response to cadmium ion
5
1.80E-05
GO_CC:0005615
Extracellular space
34
3.45E-07
GO_CC:0070062
Extracellular exosome
53
6.61E-07
GO_CC:0005576
Extracellular region
33
4.13E-05
GO_CC:0005578
Proteinaceous extracellular matrix
10
0.001026
GO_CC:0016324
Apical plasma membrane
10
0.001815
GO_MF:0004222
Metalloendopeptidase activity
9
1.20E-05
GO_MF:0008270
Zinc ion binding
26
1.05E-04
GO_MF:0005179
Hormone activity
7
2.70E-04
GO_MF:0004089
Carbonate dehydratase activity
4
2.82E-04
GO_MF:0005254
Chloride channel activity
5
0.0017
KEGG pathway
hsa04978
Mineral absorption
8
1.15E-06
hsa00910
Nitrogen metabolism
4
0.001208
hsa04610
Complement and coagulation cascades
5
0.011494
hsa04960
Aldosterone-regulated sodium reabsorption
4
0.013266
hsa05205
Proteoglycans in cancer
8
0.013765
GO and KEGG pathway enrichment analysis of DEGs in colon cancer.Through the analysis of STRING database, we constructed a coexpression network for DEGs composed of 140 nodes and 280 edges, the combined score of the network is > 0.4, which has a significant interaction relationship (Fig. 2). The combined score > 0.4 indicates that the interaction between nodes is significant. Using MCODE Cytoscape software, we further identified the distinct modules of the 140 DEGs and their interaction genes. In these modules, two subnetworks with scores >5 were selected (Fig. 3A & B). The score between the two subnetworks >5 indicates that the relationship between them is significant. DAVID was used to analyze the function of the genes involved in module 1. The results showed that the genes in module 1 were mainly enriched in the negative regulation of growth, perinuclear region of cytoplasm and absorption of mineral (Table 2).
Figure 2
The PPI network of DEGs. Notes: Blue represents downregulated DEGs; red represents upregulated DEGs. DEGs = differentially expressed genes, PPI = protein–protein interaction.
Figure 3
Functional modules in the PPI network. Notes: We clustered two functional modules, using MCODE: module 1(A) and module 2(B). Blue represents downregulated DEGs; red represents upregulated DEGs. DEGs = differentially expressed genes, MCODE = Molecular Complex Detection, PPI = protein–protein interaction.
Table 2
GO and KEGG pathway enrichment analysis of DEGs in module 1.
ID
Term
Count
P-value
GO function
GO_BP:0045926
Negative regulation of growth
7
6.10E-18
GO_BP:0071294
Cellular response to zinc ion
7
6.10E-18
GO_BP:0071276
Cellular response to cadmium ion
5
2.51E-11
GO_BP:0036018
Cellular response to erythropoietin
2
8.34E-04
GO_BP:0010038
Response to metal ion
2
0.004162
GO_CC:0048471
Perinuclear region of cytoplasm
7
1.04E-08
GO_CC:0005737
Cytoplasm
7
0.002918
GO_CC:0005634
Nucleus
7
0.003585
GO_MF:0008270
Zinc ion binding
7
7.18E-07
GO_MF:0046872
Metal ion binding
7
2.11E-05
KEGG pathway
hsa04978
Mineral absorption
7
4.81E-14
The PPI network of DEGs. Notes: Blue represents downregulated DEGs; red represents upregulated DEGs. DEGs = differentially expressed genes, PPI = protein–protein interaction.Functional modules in the PPI network. Notes: We clustered two functional modules, using MCODE: module 1(A) and module 2(B). Blue represents downregulated DEGs; red represents upregulated DEGs. DEGs = differentially expressed genes, MCODE = Molecular Complex Detection, PPI = protein–protein interaction.GO and KEGG pathway enrichment analysis of DEGs in module 1.
Hub gene selection and analysis
A total of 13 genes (CXCL8, CCND1, MYC, CXCL1, CXCL12, PLAU, CXCL2, CD19, SLC30A10, MMP3, MMP7, SLC26A3 and PYY) were identified as hub genes with degrees ≥10. A network of hub genes and their co-expressed genes was analyzed using the cBioPortal online platform (Fig. 4A). The biological process analysis of the hub genes is shown in Figure 4B. Hierarchical clustering showed that the hub genes could basically differentiate the colon cancer samples from the non-cancerous samples (Fig. 5). Meta-analysis showed that MMP7 was significantly over-expressed in colon cancer samples from the different datasets (Fig. 6). Subsequently, we downloaded clinical data and mRNA expression of 308 colon cancer patients of TCGA database from the Firebrowse website (http://firebrowse.org/api-docs/) for Cox survival regression analysis. In univariate analysis, we found that high pathologic stage and high mRNA expressions of MMP7 were related to shorter OS of colon cancer patients (Table 3). Multivariate analysis showed that high mRNA expressions of MMP7 and high pathologic stage were independently associated with significantly shorter OS of colon cancer patients (Table 4). These results showed that transcriptional expressions of MMP7 were independent prognostic factors for OS of colon cancer patients. Besides, our results showed that poor OS of patients was only associated with high expression of MMP7 in the TCGA COAD cohort (Fig. 7).
Figure 4
Interaction network and biological process analysis of the hub genes. Notes: (A) Hub genes and their co-expression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the co-expression genes. (B) The biological process analysis of hub genes was constructed using BiNGO. P < .01 was considered statistically significant.
Figure 5
Hierarchical clustering of hub genes was constructed using UCSC. Notes: Upregulation of genes is marked in red; downregulation of genes is marked in blue.
Figure 6
Oncomine analysis of cancer vs. normal tissue of MMP7.
Table 3
Univariate analysis of overall survival in 308 TCGA-COAD specimens.
Variables
Hazard ratio
95%CI
P-value
Pathologic stage
I
1
<.001
II
2.357
0.672–8.275
III
6.771
2.048–22.383
IV
14.665
4.493–47.862
MMP7 expression
low
1
.007
high
2.002
1.205–3.326
Table 4
Multivariate analysis of overall survival in 308 TCGA-COAD specimens.
Variables
Hazard ratio
95%CI
P-value
Pathologic stage
I
1
<.001
II
2.458
0.700–8.636
III
6.842
2.069–22.625
IV
14.358
4.398–46.867
MMP7 expression
low
1
.034
high
1.741
1.044–2.903
Figure 7
Survival analysis of MMP7 was performed using TCGA database.
Interaction network and biological process analysis of the hub genes. Notes: (A) Hub genes and their co-expression genes were analyzed using cBioPortal. Nodes with bold black outline represent hub genes. Nodes with thin black outline represent the co-expression genes. (B) The biological process analysis of hub genes was constructed using BiNGO. P < .01 was considered statistically significant.Hierarchical clustering of hub genes was constructed using UCSC. Notes: Upregulation of genes is marked in red; downregulation of genes is marked in blue.Oncomine analysis of cancer vs. normal tissue of MMP7.Univariate analysis of overall survival in 308 TCGA-COAD specimens.Multivariate analysis of overall survival in 308 TCGA-COAD specimens.Survival analysis of MMP7 was performed using TCGA database.
Discussion
Colorectal cancer is a common malignant tumor of the digestive tract, with high morbidity and mortality. The prognosis of colorectal cancer is associated with early diagnosis and treatment.[ It has been reported that the 5-year survival rate of early-stage patients after radical surgery can be as high as 90%, while the 5-year survival rate of patients with metastatic colorectal cancer is lower.[ Thus, potential biomarkers for diagnosis and highly effective treatment are urgently needed.In 1986, Herrera et al. found that the APC gene was closely related to familial adenomatous polyposis (FAP) and sporadic colorectal cancer.[ In 1990, Fearon et al. proposed a classical molecular model for gradual changes in oncogenes and tumor suppressor genes during the histogenesis of colorectal cancer from normal mucosal epithelium to adenoma to adenocarcinoma.[ The accumulation of these genetic and epigenetic changes leads to the dysfunction of important signaling pathways, which is a key factor in the development of colorectal cancer. In particular, mutations of APC, KRAS, SMAD4, DCC, TP53 and DNA mismatch repair genes are the most common and characteristic molecular events.[ Due to the popularity of single cell sequencing and the study of gene and genomics, more and more new tumor markers have been applied, especially in colorectal cancer. Yang et al. found seven differentially expressed mRNAs which were associated with prognosis in colon cancer patients by transcriptomic analysis, involving (SGCG, CLDN23, SLC4A4, CCDC78, SLC17A7, OTOP3, and SMPDL3A).[ Nfonsam et al. found that secreted frizzled-related protein 4 (SFRP4) was over-expressed in early-onset colon cancer that could be targeted for diagnosis and therapy.[ Although many molecular markers are still in the basic research stage, at present, microsatellite instability (MSI) status, RAS gene, RAF gene and so on have played a key role in the treatment and prognosis of colorectal cancer. With the development of genomics, microarray technology has enabled us to explore genetic alterations in colorectal cancer.[ In addition, based on bioinformatics analysis, we can discover molecules and pathways that play an important role in the process of tumorigenesis, some of which have been proven to be potential tumor markers or therapeutic targets.In this study, 3 mRNA microarray datasets were analyzed to identify DEGs between colon cancer tissues and non-cancerous tissues. A total of 176 DEGs were screened, which consisted of 55 upregulated genes and 121 downregulated genes. Moreover, we selected two significant modules with several key DEGs in the colon carcinoma regulatory network.We selected 13 DEGs with degrees ≥10 as hub genes. Among these hub genes, CXCL8 and CCND1 showed the highest node degrees. The data showed that CXCL8 is enriched in bladder cancer, NOD-like receptor signaling, transcriptional misregulation in cancer and NF-kappa B signaling, which has a significant association with the progression of cancer. CXCL8 is a protein coding gene. The protein encoded by this gene is a member of the CXC chemokine family and is a major mediator of the inflammatory response.[ Previously, it was reported that the overexpression of CXCL8 induced cell proliferation, migration and invasion of colon cancer LoVo cells. The author found that CXCL8 may act by inducing EMT via the PI3K/AKT/NF-κB signaling axis.[ In addition, Signs SA et al. found that in stromal fibroblasts, miR-20a modulates CXCL8 function, therefore influencing tumor latency.[ The protein encoded by this gene is also a potent angiogenic factor, which may play an important role in tumor metastasis. The data showed that CCND1 (cyclin D1) is enriched in proteoglycans in cancer, bladder cancer and PI3K-Akt signaling.[ The protein encoded by this gene, which belongs to the highly conserved cyclin family, is a regulatory protein involved in mitosis. Cell proliferation, differentiation, senescence and apoptosis of both normal cells and cancer cells are cell cycle-dependent. Previously, cyclin D was shown to be misregulated in many cancer types. Although there may be no statistically significant difference in survival analysis in these two genes due to small sample sizes and incomplete data, we also speculate that the overexpression of CXCL8 and CCND1 may contribute to colon cancer progression and metastasis.Finally, the survival analysis of these hub genes revealed that the overexpressed gene that significantly correlated with the poor OS of patients in the TCGA COAD cohort was MMP7. This gene encodes a member of the peptidase M10 family of matrix metalloproteinases (MMPs).[ Proteins in this family are involved in the breakdown of extracellular matrix in normal physiological processes, such as embryonic development, reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and metastasis.[ Klupp et al. found that the expression of MMP7 in the serum of patients with colon cancer was different from that of healthy individuals.[ In addition, the overexpression of MMP7 in the serum of patients with colon cancer was associated with poor prognosis.[ Similarly, UALCAN analysis showed that the expression of MMP7 increased with the later stage of colon cancer. Moreover, Fan et al. found that MMP7 was of great significance in the treatment of colon cancer with chemotherapy.[ Kobayashi et al. revealed that MMP played an important role in the epithelial-mesenchymal transition and invasion of colon cancer.[ MMP-7 plays an important role in the occurrence and development of colorectal cancer, including the transformation from early colorectal adenoma to invasive carcinoma and distant metastasis.[ MMP-7 expression is a predictor of poor prognosis in colorectal cancer. According to our study and previous studies,[ it may be a potential target for tumor therapy. The present study is still in the initial stage, and further large-scale clinical studies are needed to evaluate the therapeutic effect of MMP-7 inhibitors in the future. Of course, this result might be affected by the gene expression of non-neoplastic cells in the tissue, but considering the influence is small. Taken together, we speculate that the overexpression of MMP7 may contribute to colon cancer progression and invasion and correlate with poor prognosis.In conclusion, the present study was designed to identify DEGs that may be involved in the carcinogenesis or progression of colon cancer. A total of 176 DEGs and 13 hub genes were identified and may be regarded as diagnostic biomarkers for colon cancer. Moreover, the overexpression of MMP7 may correlate with poor prognosis.[ However, further experimental studies are still required to prove our findings and determine the potential clinical value of these as biomarkers.
Acknowledgments
This work was supported by the Zhejiang University College of Medicine.Our special thanks are due to Prof. He Chao for his helpful discussion with preparing the manuscript.
Author contributions
Conceptualization: Li Chen.Data curation: Li Chen.Formal analysis: Li Chen.Methodology: Li Chen.Resources: Xueying Ke.Software: Xueying Ke.Writing – original draft: Li Chen, Xueying Ke.Writing – review & editing: Li Chen, Xueying Ke.
Authors: Stefan Enroth; Alvaro Rada-Iglesisas; Robin Andersson; Ola Wallerman; Alkwin Wanders; Lars Påhlman; Jan Komorowski; Claes Wadelius Journal: BMC Cancer Date: 2011-10-19 Impact factor: 4.430
Authors: Steven A Signs; Robert C Fisher; Uyen Tran; Susmita Chakrabarti; Samaneh K Sarvestani; Shao Xiang; David Liska; Veronique Roche; Wei Lai; Haley R Gittleman; Oliver Wessely; Emina H Huang Journal: Oncotarget Date: 2018-02-14
Authors: Antonio Altadill; Noemi Eiro; Luis O González; Alejandro Andicoechea; Silvia Fernández-Francos; Luis Rodrigo; José Luis García-Muñiz; Francisco J Vizoso Journal: Biomedicines Date: 2021-04-30