Literature DB >> 35774271

Identification of Hub Genes for Early Diagnosis and Predicting Prognosis in Colon Adenocarcinoma.

Shuo Xu1, Dingsheng Liu1, Mingming Cui1, Yao Zhang1, Yu Zhang1, Shiqi Guo1, Hong Zhang1.   

Abstract

Colon adenocarcinoma (COAD) is among the most common digestive system malignancies worldwide, and its pathogenesis and gene signatures remain unclear. This study explored the genetic characteristics and molecular mechanisms underlying colon cancer development. Three gene expression data sets were obtained from the Gene Expression Omnibus (GEO) database. GEO2R was used to determine differentially expressed genes (DEGs) between COAD and normal tissues. Then, the intersection of the data sets was obtained. Metascape was used to perform the functional enrichment analyses. Next, STRING was used to build protein-protein interaction (PPI) networks. Hub genes were identified and analysed using Cytoscape. Next, survival analysis and expression analysis of the hub genes were performed. ROC curve analysis was performed for further test of the diagnostic efficacy. Finally, alterations in the hub genes were predicted and analysed by cBioPortal. Altogether, 436 DEGs were detected. The DEGs were mainly enriched in cell cycle phase transition, nuclear division, meiotic nuclear division, and cytokinesis. Based on PPI networks, 20 hub genes were selected. Among them, 6 hub genes (CCNB1, CCNA2, AURKA, NCAPG, DLGAP5, and CENPE) showed significant prognostic value in colon cancer (P < 0.05), while 5 hub genes (CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5) were associated with early colon cancer diagnosis and ROC curve analysis showed good diagnostic accuracy. In conclusion, integrated bioinformatics analysis was used to identify hub genes that reveal the potential mechanism of carcinogenesis and progression of colon cancer. The hub genes might be novel biomarkers for early diagnosis, treatment, and prognosis of colon cancer.
Copyright © 2022 Shuo Xu et al.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35774271      PMCID: PMC9239823          DOI: 10.1155/2022/1893351

Source DB:  PubMed          Journal:  Biomed Res Int            Impact factor:   3.246


1. Introduction

Colon adenocarcinoma (COAD) is among the most common digestive system malignancies worldwide. There were 1,096,601 new colon cancer cases and 551,269 deaths worldwide in 2018 [1]. In the last decade, both the incidence and mortality of colon cancer increased in rapidly transitioning countries including the Baltic countries, Russia, China, and Brazil [2]. As previously reported, the 5-year survival rate was more than 90% for patients diagnosed with stage I, but only 12% for patients diagnosed with stage IV [3]. Thus, early diagnosis and surgical resection of colon cancer will greatly improve disease prognosis. The current early screening tests included noninvasive tests of stool and blood-based tests, radiologic tests, and invasive test like colonoscopy. However, the participation and adherence rates of screening were low, mainly due to the unreliable accuracy of noninvasive tests and low acceptance of the invasive tests as well as the expensive cost [4]. Computed tomographic colonography (CTC) with bowel preparation was reported to have a diagnostic sensitivity of 68.5% and specificity of 88.8% for adenoma ≥ 6 mm, while overall sensitivity (55.3%) and specificity (34.1%) were much lower for adenomas of all sizes [5]. Another study reported that the sensitivity of faecal immunochemical test (FIT) in detecting adenoma, advanced neoplasm, and cancer was 9.5%, 35.1%, and 25.0%, respectively, which showed a low diagnostic accuracy [6]. As a result, only 39% of tumours were diagnosed at an early stage, and the colon cancer remained a serious health burden worldwide [7]. Thus, it is essential to uncover the molecular mechanism and to explore novel biomarkers for early colon cancer diagnosis. At present, molecular biomarkers are mainly divided into three categories [8]: prognostic biomarkers such as tumour suppressor p53, vascular endothelial growth factor (VEGF), and epidermal growth factor receptor (EGFR); diagnostic biomarkers such as telomerase and pyruvate kinase M2 (PKM2); and predictive biomarkers such as KRAS and B-Raf V600E. Currently, some molecular markers have been applied in clinical practice. A study confirmed prostaglandin E receptor 4 (PTGER4)/short stature homeobox 2 (SHOX2) DNA methylation as a biomarker for early detection of lung cancer [9]. The panel of trefoil factor (TFF) 1, TFF2, and TFF3 may be potential biomarkers for early screening of breast cancer [10]. However, the accuracy and reliability of many markers were not satisfactory [8, 11]. Therefore, it is urgent to explore a single or a series of accurate and effective markers for early diagnosis and better individualized treatment of colon cancer [12]. RNA sequencing and gene expression microarrays were widely applied in cancer studies. Bioinformatics analysis of these data can be used to identify significant biomarkers which may improve cancer early diagnosis, predict prognosis, and inform therapeutic responses [13, 14]. Although there were some previous studies of gene expression in colon cancer, but few studies involved multiple gene expression files and focused on an early diagnosis of the disease. Hence, we performed this study in order to deepen the understanding of the underlying mechanism and provide novel biomarkers for early diagnosis and prognosis of the disease.

2. Materials and Methods

2.1. Microarray Data

We first searched the GEO database [15] and identified three microarray datasets (GSE110224, GSE44076, and GSE47063) [16-18] describing gene expression differences between COAD and normal colon tissue. GSE110224 is based on platform GPL570 ([HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array), GSE44076 is based on platform GPL13667 ([HG-U219] Affymetrix Human Genome U219 Array), and GSE47063 is based on platform GPL6102 (Illumina human-6 v2.0 expression beadchip). All data are freely available online.

2.2. DEG Identification

GEO2R is commonly used to process sample information from GEO series and to identify DEGs among user-defined groups. After screening the sample information in the three data sets, only the COAD samples and the corresponding normal tissues were included. After GEO2R analysis, DEGs were obtained by intersecting genes with an adjusted P < 0.05 and |logFC| ≥ 1 in each data set using a Venn diagram.

2.3. Gene Ontology and Pathway Enrichment Analysis of DEGs

Metascape [19] is an open access online tool for comprehensive gene list annotation and analysis. In this study, DEG pathway and process enrichment analyses were performed using Metascape. The parameters were set as follows: 3 for min overlap, 1.5 for min enrichment, and P value cutoff of 0.05. The enrichment results were presented as bar charts. Corresponding network graph nodes with similarity degree more than 0.3 were connected with curved edges. Edge thickness was positively correlated with the degree of similarity.

2.4. PPI Network Construction and Module Analysis

The Search Tool for the Retrieval of Interacting Genes (STRING) database [20] was used to construct the PPI network with an interaction score > 0.4. Then, Cytoscape (Version 3.7.2) [21] software was used to visualise and analyse PPI networks. Molecular Complex Detection (MCODE) (Version 1.6) [22], a Cytoscape plugin, was used to identify the most significant gene module in colon cancer. Then, we annotated the function of the module genes using Metascape.

2.5. Hub Gene Selection and Analysis

CytoHubba (Version 0.1) [23], a Cytoscape plugin, was used to identify the network hub genes. We used a degree-ranked method to identify hub genes with a criterion of degree no less than 67. ClueGO [24] is another Cytoscape plugin that can creates and visualises functionally grouped networks of biological terms and pathways. The CluePedia [25] Cytoscape plugin is a functional extension of ClueGO and a search tool for new markers potentially associated with pathways. In our study, ClueGO (Version 2.5.6) and CluePedia (Version 1.5.6) were used to analyse the biological processes and pathway enrichment of hub genes.

2.6. Analysis of Prognostic Value of Hub Genes

GEPIA [26] is an integrated bioinformatics analysis tool which was designed for transforming genomic big data into intuitive graphics. In this study, GEPIA was used to perform survival analysis based on gene expression. P < 0.05 was considered statistically significant.

2.7. Hub Gene Expression Analysis and ROC Curve Analysis

UALCAN [27] is a comprehensive interactive online resource which contains clinical data from 31 cancer types from the TCGA database. We used UALCAN to perform differential expression analysis of the hub genes and their association with clinicopathological parameters of COAD patients. Moreover, the Human Protein Atlas [28] is a website for users to freely access data for exploration of the human proteome, which contains transcriptome data from 17 main cancer types using data from nearly 8000 patients. In this study, histopathological data of the hub genes were downloaded and used for direct comparison the protein expression. We selected an additional dataset for ROC curve analysis of diagnostic accuracy for the hub genes. GSE87211 [29] is based on platform GPL13497 (Agilent-026652 Whole Human Genome Microarray 4x44K v2). All data are freely available online.

2.8. Analysis of Alterations of Hub Genes

cBioPortal [30] is a free web server for interactively exploring cancer genomics datasets. In this study, cBioPortal was utilised to predict the genetic alterations of eight hub genes in 378 COAD samples (TCGA, PanCancer Atlas) which contained mutations and putative copy-number alterations from GISTIC and mRNA expression z-scores (RNASeq V2 RSEM) with a z-score threshold ±2.0.

2.9. Statistical Analysis

Microarray data analysis was performed by using GEO2R. GEOquery R package was used to transform the original data into R data structure, and then, the statistical test of limma (linear models for microarray analysis) R package was used to identify DEGs. Survival analysis was performed by using GEPIA and log-rank test. The transcripts per million (TPM) expression value and t-test were used for analysis of the relationship between hub genes expression and clinicopathological parameters. SPSS 26.0 was used for ROC curve analysis. P < 0.05 was considered statistically significant.

3. Results

3.1. DEGs in Colon Cancer

Among the three datasets (GSE110224, GSE44076, and GSE47063), there were 127 COAD tissues and 117 normal tissues. After GEO2R analysis, we screened 1617 DEGs (745 upregulated and 872 downregulated) from GSE110224, 4450 DEGs (2095 upregulated and 2355 downregulated) from GSE44076, and 2259 DEGs (1056 upregulated and 1203 downregulated) from GSE47063. Then, 436 DEGs were obtained by overlapping the three dataset results, including 267 downregulated genes (Figure 1(a)) and 169 upregulated genes (Figure 1(b)).
Figure 1

Venn diagram of DEGs from three datasets. (a) 267 downregulated DEGs. (b) 169 upregulated DEGs. Abbreviations: DEGs: differentially expressed genes.

3.2. DEG Gene Ontology (GO) and Pathway Enrichment in Colon Cancer

The top 20 GO items were divided into 3 categories: biological processes (14 items), cellular components (4 items), and molecular functions (2 items; Table 1 and Figures 2(a) and 2(b)). The DEGs were mainly enriched in cell cycle, transcriptional regulation, and ion transport. Enriched biological processes included cell cycle phase transition, nuclear division, meiotic nuclear division, cytokinesis, DNA replication, negative regulation of cell proliferation, regulation of reproductive process, regulation of MAPK cascade, positive regulation of transferase activity, bicarbonate transport, inorganic ion homeostasis, cellular response to organic cyclic compound, cellular response to nitrogen compound, and mesenchymal cell differentiation. Cellular component analysis showed that the DEGs were significantly enriched in the apical part of the cell, spindle, microvillus, and basolateral plasma membrane. Molecular functions of these genes were histone kinase activity and activity of hydrolase acting on ester bond.
Table 1

Gene ontology (GO) annotation of DEGs in COAD.

GOCategoryDescriptionCount%Log10(P)Log10(q)
GO:0044770GO biological processesCell cycle phase transition4510.32-14.12-10.06
GO:0000280GO biological processesNuclear division368.26-14.06-10.06
GO:0051347GO biological processesPositive regulation of transferase activity378.49-8.32-5.38
GO:0140013GO biological processesMeiotic nuclear division173.90-7.60-4.94
GO:0006260GO biological processesDNA replication194.36-6.27-3.79
GO:2000241GO biological processesRegulation of reproductive process143.21-6.22-3.75
GO:0015701GO biological processesBicarbonate transport81.83-6.06-3.60
GO:0008285GO biological processesNegative regulation of cell proliferation347.80-5.72-3.29
GO:0048762GO biological processesMesenchymal cell differentiation163.67-5.61-3.19
GO:0071407GO biological processesCellular response to organic cyclic compound276.19-5.60-3.19
GO:0043408GO biological processesRegulation of MAPK cascade337.57-5.52-3.13
GO:0098771GO biological processesInorganic ion homeostasis337.57-5.46-3.08
GO:1901699GO biological processesCellular response to nitrogen compound306.88-5.39-3.02
GO:0000910GO biological processesCytokinesis143.21-5.33-2.97
GO:0045177GO cellular componentsApical part of cell347.80-12.92-9.27
GO:0005819GO cellular componentsSpindle276.19-9.32-6.23
GO:0005902GO cellular componentsMicrovillus132.98-8.20-5.35
GO:0016323GO cellular componentsBasolateral plasma membrane163.67-5.41-3.04
GO:0035173GO molecular functionsHistone kinase activity61.38-6.46-3.97
GO:0016788GO molecular functionsHydrolase activity, acting on ester bonds337.57-5.61-3.19

Abbreviations: DEGs: differentially expressed genes; COAD: colon adenocarcinoma.

Figure 2

DEG and neighbouring gene enrichment analysis in COAD using Metascape. (a) Heatmap of GO enriched terms coloured by P value. (b) Network of GO enriched terms coloured by P value. Each node represents an enriched term. Dark colours indicate increased statistical significance. (c) Heatmap of KEGG and Reactome enriched terms coloured by P value. (d) Network of KEGG and Reactome enriched terms coloured by P value. Each node represents an enriched term. Darker colour indicates more statistical significance. Abbreviations: DEGs: differentially expressed genes; COAD: colon adenocarcinoma; GO: Gene Ontology; KEGG: Kyoto Encyclopedia of Genes and Genomes.

The top 20 Kyoto Encyclopedia of Genes and Genomes (KEGG) and Reactome pathways were shown in Table 2 and Figures 2(c) and 2(d). DEGs were mainly enriched for terms associated with the cell cycle, reversible hydration of carbon dioxide, proximal tubule bicarbonate reclamation, transport of small molecules, cyclin A/B1/B2 associated events during G2/M transition, and regulation of TP53 activity through phosphorylation pathway.
Table 2

KEGG and Reactome annotation of DEGs in COAD.

GOCategoryDescriptionCount%Log10(P)Log10(q)
hsa04110KEGG pathwayCell cycle163.67-9.05-6.08
hsa04964KEGG pathwayProximal tubule bicarbonate reclamation51.15-4.32-2.26
hsa04923KEGG pathwayRegulation of lipolysis in adipocytes61.38-3.38-1.5
hsa04960KEGG pathwayAldosterone-regulated sodium reabsorption51.15-3.3-1.48
hsa04978KEGG pathwayMineral absorption51.15-2.65-1.03
R-HSA-1640170Reactome gene setsCell cycle419.4-10.9-7.44
R-HSA-1475029Reactome gene setsReversible hydration of carbon dioxide51.15-5.87-3.47
R-HSA-382551Reactome gene setsTransport of small molecules296.65-4.18-2.14
R-HSA-69273Reactome gene setsCyclin A/B1/B2 associated events during G2/M transition51.15-4.13-2.11
R-HSA-6804756Reactome gene setsRegulation of TP53 activity through phosphorylation81.83-3.58-1.64
R-HSA-109582Reactome gene setsHaemostasis245.5-3.35-1.5
R-HSA-8979227Reactome gene setsTriglyceride metabolism51.15-3.3-1.48
R-HSA-418594Reactome gene setsG alpha (i) signalling events173.9-2.89-1.16
R-HSA-420029Reactome gene setsTight junction interactions40.92-2.76-1.09
R-HSA-200425Reactome gene setsCarnitine metabolism30.69-2.74-1.09
R-HSA-420092Reactome gene setsGlucagon-type ligand receptors40.92-2.55-0.95
R-HSA-211945Reactome gene setsPhase I—functionalization of compounds71.61-2.5-0.91
R-HSA-6785807Reactome gene setsInterleukin-4 and interleukin-13 signalling71.61-2.46-0.88
R-HSA-983189Reactome gene setsKinesins51.15-2.38-0.82
R-HSA-422085Reactome gene setsSynthesis, secretion, and deacylation of ghrelin30.69-2.34-0.8

Abbreviations: KEGG: Kyoto Encyclopedia of Genes and Genomes; DEGs: differentially expressed genes; COAD: colon adenocarcinoma.

3.3. DEG PPI Network and Modules

A PPI network composed of 369 nodes and 2708 edges was constructed (Figure 3). Then, MCODE was used to isolate the significant network modules. We selected the most significant module with the highest degree (Figure 4(a)) and functionally annotated the involved genes (Table 3). GO enrichment analysis showed that the genes were mainly enriched in biological processes, including chromosome segregation, cell cycle phase transition, positive regulation of cell cycle, DNA replication, meiotic cell cycle, attachment of spindle microtubules to kinetochore, DNA conformation change, signal transduction by p53 class mediator, positive regulation of transferase activity, sister chromatid cohesion, cytokinetic process, and protein localisation to cytoskeleton. Cellular component analysis showed that these genes were mainly enriched in the spindle, midbody, kinesin complex, and intercellular bridge. Molecular function analysis showed that these genes were mainly enriched in catalytic activity, acting on DNA, and chromatin binding. Pathway analysis revealed that these genes were mainly enriched in cyclin A/B1/B2-associated events during G2/M transition and APC-Cdc20-mediated degradation of Nek2A.
Figure 3

PPI network of DEGs, containing 369 nodes and 2708 edges. Red represents upregulated genes. Blue represents downregulated genes. Abbreviations: PPI: protein-protein interaction; DEGs: differentially expressed genes.

Figure 4

The most significant module gene network and hub genes analysis. (a) The most significant module in the PPI network contains 62 nodes and 1708 edges. (b) Network of 20 hub genes. Darker colours represent higher scores. (c) Biological process annotation of hub genes using ClueGO and CluePedia. P < 0.01 was considered statistically significant. (d) KEGG annotation of hub genes using ClueGO and CluePedia. P < 0.01 was considered statistically significant. (e) Heatmap of the top 20 hub genes was constructed using the UALCAN database. Abbreviations: PPI: protein-protein interaction; KEGG: Kyoto Encyclopedia of Genes and Genomes.

Table 3

Functional annotation of the genes involved in the most significant module.

GOCategoryDescriptionCount%Log10(P)Log10(q)
GO:0007059GO biological processesChromosome segregation2946.77-37.06-32.66
GO:0044770GO biological processesCell cycle phase transition3353.23-35.09-31.16
GO:0045787GO biological processesPositive regulation of cell cycle1727.42-15.9-13.21
GO:0006260GO biological processesDNA replication1422.58-14.21-11.7
GO:0051321GO biological processesMeiotic cell cycle1219.35-11.78-9.35
GO:0008608GO biological processesAttachment of spindle microtubules to kinetochore711.29-11.68-9.27
GO:0071103GO biological processesDNA conformation change1219.35-10.45-8.08
GO:0072331GO biological processesSignal transduction by p53 class mediator1016.13-8.81-6.51
GO:0051347GO biological processesPositive regulation of transferase activity1422.58-8.75-6.45
GO:0007062GO biological processesSister chromatid cohesion69.68-7.87-5.62
GO:0032506GO biological processesCytokinetic process58.06-7.29-5.1
GO:0044380GO biological processesProtein localisation to cytoskeleton58.06-6.41-4.27
GO:0005819GO cellular componentsSpindle2337.1-25.83-22.66
GO:0030496GO cellular componentsMidbody1422.58-16.65-13.9
GO:0005871GO cellular componentsKinesin complex58.06-6.53-4.38
GO:0045171GO cellular componentsIntercellular bridge58.06-5.97-3.86
GO:0140097GO molecular functionsCatalytic activity, acting on DNA711.29-5.87-3.78
GO:0003682GO molecular functionsChromatin binding1016.13-5.86-3.76
R-HSA-69273Reactome gene setsCyclin A/B1/B2 associated events during G2/M transition46.45-6.32-4.18
R-HSA-179409Reactome gene setsAPC-Cdc20 mediated degradation of Nek2A46.45-6.32-4.18

Abbreviations: GO: Gene Ontology.

3.4. Hub Genes

According to the node degree calculated by CytoHubba, 20 hub genes were screened out, and they were all upregulated (Figure 4(b)). The gene symbols and corresponding degree were shown in Table 4. Functional annotation of the 20 hub genes was shown in Figures 4(c) and 4(d). Heat map visualisation showed that the expression of these 20 hub genes in COAD tissues was higher than in normal tissues (Figure 4(e)).
Table 4

Top 20 hub genes and corresponding degree.

Gene symbolGene descriptionScore
CDK1Cyclin dependent kinase 178
CCNB1Cyclin B176
CCNA2Cyclin A275
AURKAAurora kinase A75
CDC20Cell division cycle 2074
AURKBAurora kinase B72
TPX2TPX2 microtubule nucleation factor71
BUB1BUB1 mitotic checkpoint serine/threonine kinase70
CDC45Cell division cycle 4570
MAD2L1Mitotic arrest deficient 2 like 169
KIF2CKinesin family member 2C69
NCAPGNon-SMC condensin I complex subunit G69
DLGAP5DLG associated protein 569
FOXM1Forkhead box M169
CENPFCentromere protein F68
CENPECentromere protein E68
BUB1BBUB1 mitotic checkpoint serine/threonine kinase B68
TTKTTK protein kinase68
ASPMAbnormal spindle microtubule assembly68
KIF20AKinesin family member 20A67

3.5. Survival Based on Hub Gene Expression

Because several hub genes were closely related to the cell cycle, we further analysed their survival curves using the GEPIA database. Our results showed that overexpression of six hub genes influenced COAD prognosis, including CCNB1, CCNA2, AURKA, NCAPG, DLGAP5, and CENPE. Overexpression of the six genes was associated with favourable overall survival (OS) of colon cancer patients (Figures 5(a)–5(f)). Additionally, AURKA and CENPE overexpressions showed a favourable prognosis of disease-free survival (DFS) in COAD patients (Figures 5(g) and 5(h)).
Figure 5

Overall survival of the hub genes in COAD patients. (a)–(f) CCNB1, CCNA2, AURKA, NCAPG, DLGAP5, and CENPE showed a significant difference in overall survival (OS). High expression of the 6 genes indicated favourable OS in COAD (P < 0.05). (g, h) AURKA and CENPE showed statistically significant association with disease-free survival (DFS) and indicated favourable disease-free survival in COAD (P < 0.05). Abbreviations: COAD: colon adenocarcinoma.

3.6. Differential Expression of Hub Genes

UALCAN was used to analyse mRNA expression of the identified hub genes. We found 5 hub genes were related to clinicopathological parameters, including CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5. Additionally, we observed that these five genes were significantly overexpressed in tumour tissues (Figures 6(a), 6(d), 6(g), 6(j), and 6(m)). Then, we analysed their mRNA expression under different clinicopathological parameters. Our results revealed that the mRNA expression of the five genes was significantly correlated with the clinical stage, and that the highest mRNA expression appeared in the first tumour stage (Figures 6(b), 6(e), 6(h), 6(k), and 6(n)). Moreover, the mRNA expression of the five genes showed a significant correlation with lymph node metastasis, and the highest mRNA expression appeared at the N0 phase (Figures 6(c), 6(f), 6(i), 6(l), and 6(o)).
Figure 6

Differential expression analysis of the 5 hub genes was performed by UALCAN. (a, d, g, j, and m) mRNA expression of the five genes was overexpressed in colon cancer compared to normal colon tissues. (b, e, h, k, and n) mRNA expression of the five genes was significantly related to individual cancer stage, with the highest expressions tending to appear at stage 1. (c, f, i, l, and o) mRNA expression of the five genes was significantly related to nodal metastasis status, and the highest mRNA expression tended to appear at the N0 phase. ∗p < 0.05, ∗∗p < 0.01, ∗∗∗p < 0.001.

Moreover, we analysed the protein expressions of hub genes using histopathological images from HPA. Our results showed that CDK1 staining was low in normal tissues and moderate in COAD tissues (Figure 7(a)). CCNB1 and CCNA2 staining were moderate in normal colon tissues, whereas high staining was observed in COAD tissues (Figures 7(b) and 7(c)). DLGAP5 staining was not detected in normal tissues, while moderate staining was observed in COAD tissues (Figure 7(d)). MAD2L1 was moderately stained in both tumour and normal tissues (Figure 7(e)).
Figure 7

Protein expression analysis of the 5 hub genes was performed using the HPA database. Except for MAD2L1, the other 4 proteins showed a higher degree of staining in tumour tissue compared to normal tissues.

In order to further test the diagnostic efficacy of these hub genes for colon cancer, ROC curve analysis was performed on these five genes (Figure 8). We used gene expression data from GSE87211 for analysis. The dataset contained 363 cases (203 colon tumours and 160 healthy mucosa). AUCs were used to assess the diagnostic accuracy. ROC analysis showed that AUCs for CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5 were 0.928 (95% CI: 0.901-0.956), 0.931 (95% CI: 0.905-0.956), 0.904 (95% CI: 0.847-0.934), 0.917 (95% CI: 0.887-0.947), and 0.911 (95% CI: 0.881-0.940), respectively.
Figure 8

ROC curves analysis of the five hub genes, CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5. AUCs were used to assess the five hub genes, and the results showed high diagnostic accuracy.

3.7. Alteration of Hub Genes

We also analysed alterations of the six prognostic hub genes CCNB1, CCNA2, AURKA, NCAPG, DLGAP5, and CENPE together with the five hub genes which were associated with clinicopathological parameters: CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5. Eight hub genes including CDK1, CCNB1, CCNA2, AURKA, MAD2L1, NCAPG, DLGAP5, and CENPE were detected by cBioPortal. Altogether, 378 samples of COAD were included, and our analysis revealed that the hub genes were altered in 42.86% of the 378 samples. AURKA (28%) was the most frequently altered gene of the eight hub genes (Figure 9).
Figure 9

Alterations of the eight hub genes analysed by cBioPortal. (a) OncoPrint of genetic alterations in 378 COAD cases. (b) Alteration frequency of eight hub genes. Gene expression was altered in 42.86% of 378 cases. Abbreviations: COAD: colon adenocarcinoma.

4. Discussion

Colon cancer was the fourth most commonly diagnosed malignant tumour worldwide in 2018, with increasing incidence in countries undergoing major developmental transition [31]. Due to a lack of specific symptoms for early detection, patients are usually diagnosed at an advanced stage which leads to a poor prognosis [32]. Therefore, it is crucial to uncover the underlying molecular mechanism and to explore key biomarkers for early colon cancer diagnosis. In this study, we analysed three microarray datasets that included 127 tumours and 117 normal samples. A total of 436 DEGs were screened. Functional annotation showed that the DEGs were mainly enriched in biological processes associated with cell cycle phase transition, nuclear division, positive regulation of transferase activity, meiotic nuclear division, and DNA replication. These results suggested that these genes were closely related to the cell cycle. Many studies indicated that dysregulation of cell cycle progression was closely related to cancer progression [33, 34]. Finetti et al. [35] found that several genes participated in regulating the cell cycle, like CDK1 and AURKA. Moreover, their expressions were correlated with breast cancer prognosis. In our colon cancer study, we obtained many DEGs involved in cell cycle progression, including CCND1, BLM, BUB1, BUB1B, CCNA2, CCNB1, CDK1, and CDC20. Some genes were closely related to the transformation of cancer. For example, CCND1 belonged to the cyclin family whose members were characterised by dramatic periodicity in protein abundance throughout the cell cycle. Deregulation of CCND1 was observed frequently in numerous human cancers, including pancreatic cancer, head and neck squamous cell carcinoma, breast cancer, and colorectal carcinoma [36, 37]. Accumulation of CCND1 in the nucleus caused uncontrolled cell cycle progression and acted as a tumour-initiating event [38]. Overexpression of cyclin D1 (T286A), an oncogenic mutant allele of CCND1, promoted stabilization and overexpression of the DNA replication licensing factor, Cdt1, by inhibiting its proteolysis. This caused DNA rereplication and damage and resulted in cellular aneuploidy, genomic instability, and further neoplastic growth [39]. Cyclin dependent kinases (CDKs) were necessary functional partner kinases with cyclin D1. Thus, CDK inhibitors would be an effective drug for targeting malignant tumours [40]. However, given the development of resistance and side effects of CDK inhibitors, further research is warranted [36]. Pathway analysis also revealed that DEGs were mainly enriched for terms associated with the cell cycle pathway. Cyclin A/B1/B2-associated events in the “G2/M transition” and “Regulation of TP53 Activity through Phosphorylation” pathways were closely related to tumourigenesis. Like the cyclin D1 mentioned above, cyclins A/B1/B2 were also cyclin members that binded to CDKs and regulated the cell cycle. Abundant evidence showed that G2/M phase arrest was closely related to the inhibition of tumour cell proliferation [41, 42]. Additional studies focusing on cyclins are aimed at identifying novel therapeutic strategies for cancer treatment. Ma [43] revealed that the microRNA miR-219-5p downregulated CCNA2 expression and induced G2/M phase arrest to inhibit tumour formation in oesophageal cancer. Tu et al. [44] found CCNA2 was downregulated by the small molecule FH535 in colorectal cancer, which caused G2/M phase arrest and inhibited tumour proliferation. Thus, inhibiting CCNA2 and CCNB1 may contribute to the development of novel anticancer drugs. The p53 signalling pathway significantly contributed to cell cycle regulation, suppression of tumour expression, metabolism, aging, development, and reproduction [45]. Phosphorylation of p53 protein stabilized the protein and extended its half-life, thus, causing cell cycle arrest, apoptosis, and inhibited tumour cell proliferation [46]. A study of natural polyphenols as anticancer agents revealed that polyphenols could induce apoptosis, which was achieved by stabilizing p53 protein through phosphorylation and showed remarkable effects in human gastric carcinoma cells [47]. We also identified some pathways associated with metabolism, including triglyceride metabolism, carnitine metabolism, regulation of lipolysis in adipocytes, and phase I—functionalization of compounds. Among these pathways, we found that FABP4, which encoded fatty acid binding protein, was involved in fatty acid uptake, transport, and metabolism and was related to tumour metastasis. Gharpure et al. [48] observed that overexpression of FABP4 played a key role in aggressive metastasis of ovarian cancer via various metabolites and protein pathways. Likewise, FABP4 had crucial effects on adipocyte-induced cholangiocarcinoma metastasis [49]. Collectively, metabolic disorder was among the leading causes of tumour development. Thus, the study of tumour metabolism may provide new targets for tumour treatment. The PPI network was built using STRING. Twenty hub genes were screened, and their functional annotations were most closely related to the cell cycle. Survival analysis showed that higher mRNA expression of six hub genes was significantly related to longer OS in colon cancer patients, including CCNB1, CCNA2, AURKA, NCAPG, DLGAP5, and CENPE. Moreover, AURKA and CENPE exhibited favourable effects on both OS and DFS. Studies showed that CCNB1 was highly expressed in colorectal cancer tissues and was negatively correlated with tumour invasion and distant metastasis, which may be caused by regulating the expression of E-cadherin [50]. This was consistent with our findings. A murine colorectal cancer model showed that CCNA2 deletion in colonic epithelial cells promoted the development of dysplasia and adenocarcinomas [51]. Analysis of CCNA2 expression in clinical samples revealed that higher expression of CCNA2 in tumours of stage 1 or 2 colon cancer patients is compared with stage 3 or 4 patients [51], which was also consistent with our results. However, previous studies had shown that CCNA2 was tumour-promoting and associated with advanced tumour stage and tumour development [52, 53]. This was inconsistent with our results, which may be due to the heterogeneity of the sample. Besides, high expression of DLGAP5 was associated with poor prognosis in well differentiated colon cancer, whereas the prognosis was better in some molecular subtypes of colon cancer, such as patients with a stem cell gene signature [54] and Budinska subtypes A (surface crypt-like) [55]. In our study, AURKA exhibited favourable prognostic effects. Interestingly, AURKA was upregulated across cancer types, but was only positively associated with prognosis in colon cancer patients [56]. Current studies supported that AURKA was associated with the development of colorectal cancer by causing genomic instability [57], but high expression of AURKA in colon cancer enhanced the chemotherapy sensitivity of platinum drugs by inhibiting the expression of TP53-regulated DNA damage response genes, which may explain the corresponding better prognosis [56]. However, it has also been reported that high expression of AURKA is associated with poor prognosis in colon cancer patients with liver metastasis [58]. Therefore, there was still controversy, and further exploration was needed. NCAPG and CENPE have also been reported to play a role in various types of cancer [59, 60], but the underlying mechanisms behind the observed changes in prognosis remain unknown. In summary, these 6 hub genes were significantly associated with the prognosis of colon cancer and may serve as potential prognostic markers as well as therapeutic targets, but further studies were needed to explain and verify their underlying mechanisms. For early COAD diagnosis, we identified CDK1, CCNB1, CCNA2, MAD2L1, and DLGAP5, which were closely related to clinicopathological parameters. CDK1 plays a key role in the regulation of eukaryotic cell cycle and is essential for G1/S and G2/M transition of eukaryotic cell cycle [61]. Many biological experiments have demonstrated that CDK1 is highly expressed in colon cancer cells [62, 63] and participates in apoptosis. CDK1 may act as a potential diagnostic and therapeutic target in view of its extensive involvement in the regulation of colorectal cancer development and progression [62]. CCNB1 and CCNA2 are closely related to mitosis. In addition to colon cancer, they have also been found to be highly expressed in pancreatic cancer [64], breast cancer [65], lung cancer [66], and many other cancers, suggesting their potential diagnostic value. MAD2L1 was highly expressed in active proliferating colon cancer cells, and its expression level gradually increased with the stage of colon cancer [67]. DLGAP5 was involved in cell proliferation (ClueGO analysis: mitotic chromosome movement towards spindle pole) which was highly expressed in colon cancer cells [54, 68]. One study showed that DLGAP5 was overexpressed in 293 T cells, resulting in excessive cell proliferation, which may play a potential role in carcinogenesis [69]. In summary, our results showed that both the mRNA and protein expressions of these five hub genes were higher in tumour tissue than in normal tissue, which indicated that the hub genes may be closely related to COAD progression and the possibility of five gene biomarkers in the diagnosis of CRC. Previous studies observed that the expression of these genes was correlated with tumour size and stage [52, 54, 70]. In our study, we found that mRNA expression of the five hub genes was significantly related to mild clinical pathological parameters, so these genes may play an important role in the early diagnosis of colon cancer. In addition, AUCs of these five genes were all greater than 0.9 in ROC curve analysis, which further verified the favourable diagnostic accuracy of these five genes. The relationship between these genes and COAD has not yet been fully determined, but our data indicate that the increased expression in early COAD stages may provide an indicator for early diagnosis. At present, machine learning and deep learning are widely used in disease diagnosis [71, 72]. Deep learning, with its ability to process large-scale data, is a powerful solution for tissue classification and segmentation of histopathological images of colon cancer and other diseases [73, 74]. We finally performed alteration analysis of eight hub genes which showed significant effects on survival analysis, including CDK1, CCNB1, CCNA2, AURKA, MAD2L1, NCAPG, DLGAP5, and CENPE. The result showed that more than 40% of the patient tumours analysed had at least one hub gene alteration. AURKA was the most frequently altered (28%) of the 8 hub genes. The protein encoded by this gene is a cell cycle-regulated kinase that appears to be involved in spindle assembly, cytokinesis, centrosome maturation, and separation [75]. In our study, AURKA exhibited favourable effects on both OS and DFS. Previous studies showed that AURKA was frequently upregulated and correlated with prognosis in several types of cancers, which may reveal an important role in human cancer [76, 77]. There were some limitations in this study. First, all the data analysed in our study was retrieved from online databases. Thus, further studies with larger sample sizes and biological experiments were required to validate our findings. Our future research will focus on experimental verification of these results. Second, we did not explore the underlying mechanisms of hub genes in COAD. Future studies should investigate the detailed mechanism between hub genes and COAD. In conclusion, our study identified and analysed DEGs and 20 core genes associated with COAD, which might deepen the understanding of carcinogenesis and provide indicators for prognosis and early diagnosis of the disease.
  76 in total

1.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

Review 2.  p53 in survival, death and metabolic health: a lifeguard with a licence to kill.

Authors:  Flore Kruiswijk; Christiaan F Labuschagne; Karen H Vousden
Journal:  Nat Rev Mol Cell Biol       Date:  2015-07       Impact factor: 94.444

3.  Involvement of Bcl-2 family members, phosphatidylinositol 3'-kinase/AKT and mitochondrial p53 in curcumin (diferulolylmethane)-induced apoptosis in prostate cancer.

Authors:  Sharmila Shankar; Rakesh K Srivastava
Journal:  Int J Oncol       Date:  2007-04       Impact factor: 5.650

4.  Phosphorylation-dependent regulation of cyclin D1 nuclear export and cyclin D1-dependent cellular transformation.

Authors:  J R Alt; J L Cleveland; M Hannink; J A Diehl
Journal:  Genes Dev       Date:  2000-12-15       Impact factor: 11.361

5.  The efficacy of intravenous contrast-enhanced 16-raw multidetector CT colonography for detecting patients with colorectal polyps in an asymptomatic population in Korea.

Authors:  Young Sun Kim; Nayoung Kim; Se Hyung Kim; Min Jung Park; Seon Hee Lim; Jeong Yoon Yim; Kyung Ran Cho; Sun Sin Kim; Dong Hee Kim; Hyo Won Eun; Kyoung Soo Cho; Jeong Hoon Kim; Byung Inhn Choi; Hyun Chae Jung; In Sung Song; Chan Soo Shin; Sang-Heon Cho; Byung-Hee Oh
Journal:  J Clin Gastroenterol       Date:  2008-08       Impact factor: 3.062

6.  Sixteen-kinase gene expression identifies luminal breast cancers with poor prognosis.

Authors:  Pascal Finetti; Nathalie Cervera; Emmanuelle Charafe-Jauffret; Christian Chabannon; Colette Charpin; Max Chaffanet; Jocelyne Jacquemier; Patrice Viens; Daniel Birnbaum; François Bertucci
Journal:  Cancer Res       Date:  2008-02-01       Impact factor: 12.701

7.  Expression of CENPE and its Prognostic Role in Non-small Cell Lung Cancer.

Authors:  Xuezhi Hao; Tao Qu
Journal:  Open Med (Wars)       Date:  2019-06-17

8.  STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets.

Authors:  Damian Szklarczyk; Annika L Gable; David Lyon; Alexander Junge; Stefan Wyder; Jaime Huerta-Cepas; Milan Simonovic; Nadezhda T Doncheva; John H Morris; Peer Bork; Lars J Jensen; Christian von Mering
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  CluePedia Cytoscape plugin: pathway insights using integrated experimental and in silico data.

Authors:  Gabriela Bindea; Jérôme Galon; Bernhard Mlecnik
Journal:  Bioinformatics       Date:  2013-01-16       Impact factor: 6.937

10.  Aurora kinase A (AURKA) expression in colorectal cancer liver metastasis is associated with poor prognosis.

Authors:  J A C M Goos; V M H Coupe; B Diosdado; P M Delis-Van Diemen; C Karga; J A M Beliën; B Carvalho; M P van den Tol; H M W Verheul; A A Geldof; G A Meijer; O S Hoekstra; R J A Fijneman
Journal:  Br J Cancer       Date:  2013-10-08       Impact factor: 7.640

View more
  1 in total

1.  Identification of hub genes and pathophysiological mechanism related to acute unilateral vestibulopathy by integrated bioinformatics analysis.

Authors:  Yajing Cheng; Jianrong Zheng; Ying Zhan; Cong Liu; Bihua Lu; Jun Hu
Journal:  Front Neurol       Date:  2022-09-27       Impact factor: 4.086

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.