OBJECTIVE: The objective was to identify potential hub genes associated with the pathogenesis and prognosis of hepatocellular carcinoma (HCC). METHODS: Gene expression profile datasets were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) between HCC and normal samples were identified via an integrated analysis. A protein-protein interaction network was constructed and analyzed using the STRING database and Cytoscape software, and enrichment analyses were carried out through DAVID. Gene Expression Profiling Interactive Analysis and Kaplan-Meier plotter were used to determine expression and prognostic values of hub genes. RESULTS: We identified 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) that might be closely related to the pathogenesis and prognosis of HCC. Enrichment analyses indicated that the DEGs were significantly enriched in metabolism-associated pathways, and hub genes and module 1 were highly associated with cell cycle pathway. CONCLUSIONS: In this study, we identified key genes of HCC, which indicated directions for further research into diagnostic and prognostic biomarkers that could facilitate targeted molecular therapy for HCC.
OBJECTIVE: The objective was to identify potential hub genes associated with the pathogenesis and prognosis of hepatocellular carcinoma (HCC). METHODS: Gene expression profile datasets were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) between HCC and normal samples were identified via an integrated analysis. A protein-protein interaction network was constructed and analyzed using the STRING database and Cytoscape software, and enrichment analyses were carried out through DAVID. Gene Expression Profiling Interactive Analysis and Kaplan-Meier plotter were used to determine expression and prognostic values of hub genes. RESULTS: We identified 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) that might be closely related to the pathogenesis and prognosis of HCC. Enrichment analyses indicated that the DEGs were significantly enriched in metabolism-associated pathways, and hub genes and module 1 were highly associated with cell cycle pathway. CONCLUSIONS: In this study, we identified key genes of HCC, which indicated directions for further research into diagnostic and prognostic biomarkers that could facilitate targeted molecular therapy for HCC.
On a global scale, cancer is the main public health problem and liver cancer is a major contributor to both cancer morbidity and mortality.[1] Liver cancer is the sixth most common cancer and the fourth highest cause of cancer-related mortality worldwide.[2] There were expected to be 42,030 newly diagnosed cases and 31,780 deaths of liver cancer in the United States during 2019.[3] Hepatocellular carcinoma (HCC) is the most common form of primary liver cancer, comprising 75% to 85% of cases.[2] The well-recognized risk factors for HCC include chronic infection with hepatitis B (HBV) or hepatitis C virus, exposure to dietary aflatoxin, alcohol-induced cirrhosis, smoking, obesity, and type 2 diabetes.[2,4] In Asia (especially China), chronic HBV infection is the leading etiologic factor of HCC.[5] Most HCC patients are diagnosed at an advanced stage, and locoregional treatments (chemoembolization) and surgical treatments are relatively disappointing in terms of overall survival (OS) of patients with advanced disease.[6] In addition, traditional chemotherapies have not shown promising outcomes in treatment of HCC and have significant toxicity.[6,7] Meanwhile, the lack of early detection of diagnostic markers and limited treatment strategies increase the risk of poor prognosis and death.[8] Therefore, there is a pressing need to develop robust diagnostic strategies and effective therapies for HCC patients.[9]Over the past decades, microarray technology and bioinformatics have been extensively applied to identify the molecular mechanisms of HCC, which provide strong research support for the diagnosis, treatment, and prognosis of HCC.[10] Because of the ability to process a large number of datasets quickly, integrated bioinformatics analysis and microarray technology have allowed researchers to comprehensively identify the functions of numerous differentially expressed genes (DEGs) in HCC, and they help researchers explore the complicated process of HCC occurrence and development.[10,11] A work by He et al.[12] identified four hub genes and two important pathways in the development of HCC from cirrhosis from one Gene Expression Omnibus (GEO) dataset using a bioinformatics method, including DEG screening, enrichment analyses, and construction of a protein–protein interaction (PPI) network. Zhang et al.[13] screened hub genes and pathways correlated with the occurrence and progression of HCC via a series of bioinformatics analyses incorporating DEGs identification, functional enrichment analyses, PPI network and module analysis, and weighted correlation network analysis. Zhou et al.[11] identified the pivotal genes and microRNAs in HCC using a bioinformatics approach, including analysis of raw data via GEO2R, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, and construction of PPI network. However, to improve the diagnosis and treatment of HCC, novel diagnostic and prognostic biomarkers for HCC are needed. The flowchart of the study approach is shown in Figure 1.
Figure 1.
Flowchart for identification of core genes and pathways for hepatocellular carcinoma (HCC). GEO, Gene Expression Omnibus; DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein–protein interaction; GEPIA, Gene Expression Profiling Interactive Analysis.
Flowchart for identification of core genes and pathways for hepatocellular carcinoma (HCC). GEO, Gene Expression Omnibus; DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein–protein interaction; GEPIA, Gene Expression Profiling Interactive Analysis.
Materials and methods
Ethical approval
Ethical approval was not required in this study because we analyzed only published data from the GEO database.
Gene expression profile data
Gene expression profile data (GSE36376,[14] GSE39791,[15] GSE41804,[16] GSE54236,[17,18] GSE57957,[19] GSE62232,[20] GSE64041,[21] GSE69715,[22] GSE76427,[23] GSE84005, GSE87630,[24] GSE112790,[25] and GSE121248[26]) were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/),[27] a public data repository, including high-throughput gene expression and other functional genome datasets. The selection criteria for the included datasets were as follows: (1) tissue samples collected from human HCC and corresponding adjacent or normal tissues; and (2) including at least 40 samples.
Integrated analysis of microarray datasets
The matrix data of each GEO dataset were normalized and log2 transformed using the R software package limma,[28] and the DEGs between HCC and corresponding adjacent or normal tissues were also filtered using the limma package. Integration of DEGs identified from the 13 datasets was performed by RobustRankAggreg package[29] in R software. A |log2 fold change (FC)| ≥1 and adjusted P-value < 0.05 were considered significant for the DEGs.
Enrichment analyses of DEGs
Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/, version, 6.8)[30] is a comprehensive functional annotation tool for extracting biological significance from large gene/protein datasets. In this study, the GO and KEGG pathway enrichment analyses for the DEGs were conducted via DAVID. The visualization of enrichment analysis results was conducted by using ggplot2[31] and the GOplot[32] package in the R software.
PPI network and module analysis
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; https://string-db.org/)[33] is a database of known and predicted protein interactions, showing direct and indirect interactions among proteins. This database was applied to obtain potential interactions among the DEGs. PPIs with a confidence score ≥0.7 were reserved and imported into Cytoscape software[34] to construct the PPI network. Furthermore, the clustering modules in this PPI network were analyzed using the MCODE (Molecular Complex Detection) plugin in Cytoscape.[35] Pathway enrichment analyses for important modules were also carried out. The visualization of enrichment analysis results was performed by using the imageGP platform (http://www.ehbio.com/ImageGP/index.php/Home/Index/GOenrichmentplot.html).
Survival analysis of hub genes
Kaplan–Meier plotter (KM plotter; http://kmplot.com/analysis/) is a database containing clinical data and gene expression data.[36] This database is used to further understanding the molecular basis of disease and identifying biomarkers associated with survival.[37] The recurrence-free survival and OS information were based on GEO, the European Genome-phenome Archive (EGA), and The Cancer Genome Atlas (TCGA) databases. Hazard ratios (HR) with 95% confidence intervals and log rank P-value were calculated to assess the association of gene expression with survival and are shown in plots.[38]
Expression level analysis and correlation analysis of hub genes
The Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer pku.cn/index.html)[39] is a newly developed web-based tool that applies a standard processing pipeline to analyze gene expression data between tumor and normal tissues. The relationship of expression of hub genes in HCC and normal tissues were visualized by boxplot.[38] In addition, correlation analysis was performed by GEPIA to check the relative ratios between two genes.[39]
Results
Identification of DEGs
In the present study, 13 datasets were downloaded from GEO that included 1100 cancer tissues and 717 corresponding adjacent or normal tissues (Table 1). After integrated analysis, 380 DEGs (293 downregulated and 87 upregulated) were identified (Figure 2a-m and Appendix). Figure 2n shows the top 20 down- and upregulated genes.
Table 1.
Information for the 13 Gene Expression Omnibus datasets included in the current study.
GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
198 (183/15)
GSE121248
GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array
107 (70/37)
Figure 2.
Identification of DEGs. Volcano plots of Gene Expression Omnibus datasets (a) GSE36376, (b) GSE39791, (c) GSE41804, (d) GSE54236, (e) GSE57957, (f) GSE62232, (g) GSE64041, (h) GSE69715, (i) GSE76427, (j) GSE84005, (k) GSE87630, (l) GSE112790, and (m) GSE121248; (n) heat map of DEGs. Blue indicates lower expression levels, red indicates higher expression levels, and white indicates no differentially expression among the genes. Each column represents one dataset and each row represents one gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from blue to red represents the changing process from downregulation to upregulation. DEG, differentially expressed gene.
Information for the 13 Gene Expression Omnibus datasets included in the current study.Identification of DEGs. Volcano plots of Gene Expression Omnibus datasets (a) GSE36376, (b) GSE39791, (c) GSE41804, (d) GSE54236, (e) GSE57957, (f) GSE62232, (g) GSE64041, (h) GSE69715, (i) GSE76427, (j) GSE84005, (k) GSE87630, (l) GSE112790, and (m) GSE121248; (n) heat map of DEGs. Blue indicates lower expression levels, red indicates higher expression levels, and white indicates no differentially expression among the genes. Each column represents one dataset and each row represents one gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from blue to red represents the changing process from downregulation to upregulation. DEG, differentially expressed gene.
GO and KEGG pathway enrichment analyses of DEGs
To deepen our understanding of DEGs, we performed GO and KEGG pathway enrichment analyses. Thirty-one significantly enriched GO terms were selected based on a false discovery rate (FDR) < 0.05 (Figure 3a and Appendix). In the GO terms were 13 terms for biological process, mainly related to metabolic process, P450 pathway, and oxidation-reduction process; 12 terms for molecular function, highly involved with multiple enzyme activities, heme binding, iron ion binding and oxygen binding; and 6 terms for cellular components, associated with organelle membrane, extracellular exosome, extracellular region, extracellular space, blood microparticle, and membrane attack complex.
Figure 3.
Enrichment analysis of DEGs. (a) GO enrichment analysis of DEGs, (b) top 5 terms of GO enrichment, and (c) KEGG pathway analysis of DEGs. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.
Enrichment analysis of DEGs. (a) GO enrichment analysis of DEGs, (b) top 5 terms of GO enrichment, and (c) KEGG pathway analysis of DEGs. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.In the KEGG pathway enrichment analyses, we screened nine pathways according to FDR < 0.05. Figure 3c shows the results of KEGG analysis; the DEGs primarily participated in diverse metabolism-associated signaling pathways, such as metabolic pathways, retinol metabolism, drug metabolism-cytochrome P450, among others.
PPI network establishment and module analysis
The PPI network of DEGs comprised 242 nodes and 1267 interactions (Figure 4a); degree was calculated to identify candidate key nodes. Finally, 11 potential key nodes were identified, the degrees of which were all more than four times the corresponding average values: CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA (Table 2). Moreover, to determine important clustering modules in the PPI network, module analysis was performed using MCODE, and the two modules with the highest scores (score >10) were obtained (Figure 4b, 4c). The enrichment pathways of module 1 and module 2 are shown in Figure 5. Module 1 was highly associated with cell cycle and oocyte meiosis; module 2 was closely connected to drug metabolism-cytochrome P450, linoleic acid metabolism, chemical carcinogenesis, arachidonic acid metabolism, retinol metabolism, metabolism of xenobiotics by cytochrome P450, and metabolic pathways.
Figure 4.
PPI network and hub clustering modules. (a) The PPI network of DEGs, (b) module 1 (MCODE score = 38.769), and (c) module 2 (MCODE score = 10.364). Blue circles represent downregulated genes and red circles represent upregulated genes. PPI, protein–protein interaction; DEG, differentially expressed gene; MCODE, Molecular Complex Detection.
Table 2.
Upregulated hub genes with high degrees.
Gene
Degree
Type
MCODE Cluster
CDK1
47
up
Module 1
CCNB2
46
up
Module 1
CDC20
45
up
Module 1
CCNB1
45
up
Module 1
TOP2A
44
up
Module 1
CCNA2
44
up
Module 1
MELK
43
up
Module 1
PBK
43
up
Module 1
TPX2
43
up
Module 1
KIF20A
43
up
Module 1
AURKA
43
up
Module 1
Figure 5.
Pathway analysis of the two modules with the highest scores. The y-axis shows significantly enriched KEGG pathways, and x-axis shows the two modules. KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.
PPI network and hub clustering modules. (a) The PPI network of DEGs, (b) module 1 (MCODE score = 38.769), and (c) module 2 (MCODE score = 10.364). Blue circles represent downregulated genes and red circles represent upregulated genes. PPI, protein–protein interaction; DEG, differentially expressed gene; MCODE, Molecular Complex Detection.Upregulated hub genes with high degrees.Pathway analysis of the two modules with the highest scores. The y-axis shows significantly enriched KEGG pathways, and x-axis shows the two modules. KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.
Survival analysis, expression, and correlation analysis of hub genes
Survival analysis of 11 hub genes was performed using the KM plotter. The results showed that high expression of CDK1 (HR = 2.15, 95% CI: 1.52–3.06; P = 1.1e−05), CCNB2 (HR = 1.91, 95% CI: 1.28–2.87; P = 0.0013), CDC20 (HR = 2.49, 95% CI: 1.72–3.59; P = 5.1e−07), CCNB1 (HR = 2.34, 95% CI: 1.55–3.54; P = 3.4e−05), TOP2A (HR = 1.99, 95% CI: 1.39–2.86; P = 0.00012), CCNA2 (HR = 1.92, 95% CI: 1.36–2.72; P = 0.00018), MELK (HR = 2.22, 95% CI: 1.5–3.27; P = 3.7e−05), PBK (HR = 2.24, 95% CI: 1.5–3.34; P = 4.8e−05), TPX2 (HR = 2.29, 95% CI: 1.62–3.24; P = 1.4e−06), KIF20A (HR = 2.33, 95% CI: 1.63–3.32; P = 1.8e−06), and AURKA (HR = 1.77, 95% CI: 1.25–2.5; P = 0.0011) was related to unfavorable OS for HCC patients (Figure 6). Furthermore, GEPIA was adopted to analyze the different expression level of hub genes in HCC and normal tissues and the 11 hub genes were confirmed to be highly expressed in HCC tissues (Figure 7). The correlations between hub genes were also analyzed by GEPIA, which showed that the 11 hub genes were significantly correlated with each other. Figure 8 showed that the increase in expression of CDK1 was strongly associated with increased expression of the other 10 hub genes.
Figure 6.
Prognostic roles of 11 hub genes in patients with HCC shown as survival curves. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; HR, hazard ratio.
Figure 7.
Analysis of expression levels of 11 hub genes in human HCC. The red and gray boxes represent cancer and normal tissues, respectively. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; LIHC, liver hepatocellular carcinoma.
Figure 8.
Correlation analysis of 10 hub genes in HCC with CDK1: (a) CCNB2, (b) CDC20, (c) CCNB1, (d) TOP2A, (e) CCNA2, (f) MELK, (g) PBK, (h) TPX2, (i) KIF20A, and (j) AURKA. HCC, hepatocellular carcinoma.
Prognostic roles of 11 hub genes in patients with HCC shown as survival curves. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; HR, hazard ratio.Analysis of expression levels of 11 hub genes in human HCC. The red and gray boxes represent cancer and normal tissues, respectively. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; LIHC, liver hepatocellular carcinoma.Correlation analysis of 10 hub genes in HCC with CDK1: (a) CCNB2, (b) CDC20, (c) CCNB1, (d) TOP2A, (e) CCNA2, (f) MELK, (g) PBK, (h) TPX2, (i) KIF20A, and (j) AURKA. HCC, hepatocellular carcinoma.
Discussion
HCC is the most common type of malignancy and one of the leading causes of cancer-related mortality worldwide.[40,41] Although much research has been done on HCC, its early diagnosis and treatment remains difficult because of a lack of understanding of the molecular mechanisms associated with HCC occurrence and development.[41] Therefore, in-depth studies of the etiological factors and molecular mechanisms of HCC are of critical importance for HCC diagnosis and treatment.[11] Currently, bioinformatics analysis and microarray technology are developing rapidly and this approach can be used to identify therapeutic targets for diagnosis, therapy, and prognosis of a variety of neoplasms.[42] In this research, we identified 380 DEGs, including 293 downregulated and 87 upregulated genes, between HCC and corresponding adjacent or normal tissues. Enrichment analyses indicated that the DEGs were mostly associated with metabolic processes, such as metabolism of retinol, drugs, xenobiotics, tyrosine, tryptophan, and histidine, as well as fatty acid degradation. This indicated that metabolic dysregulation is closely related to HCC. In addition, we obtained 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) in the PPI network, which were upregulated in HCC tissues compared with normal tissues; expression of the first hub gene, CDK1, was strongly correlated with that of the other hub genes. Overexpression of the 11 hub genes was correlated with worse OS.Recent evidence implies that tumor cells need specific interphase cyclin-dependent kinases (CDKs) to proliferate.[43] Cyclin-dependent kinase 1 (CDK1) belongs to the CDK family, a member of the serine/threonine protein kinases, and it is crucial for the cell cycle phase transitions G1/S and G2/M.[44,45] CDK1 is required for cell proliferation because it is the only CDK that can initiate mitosis.[46] The deregulation of CDK1 is likely related to HCC tumorigenesis.[47] Research has found that high expression of CDK1 is correlated with poor OS of HCC.[45] Cyclins act as the regulatory subunits of the CDKs, regulating temporal transitions among various stages of the cell cycle via CDK activation.[48] Cyclin-A2 (CCNA2), cyclin-B1 (CCNB1), and cyclin-B2 (CCNB1), encoded by the CCNA2, CCNB1, and CCNB2 genes, respectively, all belong to the cyclin family. CCNA2 activates CDK1 at the end of interphase to facilitate the onset of mitosis, and CCNA2 overexpression has been reported in numerous types of cancers.[49] A previous study indicated that CCNA2 amplification and overexpression is associated with carcinogenesis of transgenic mouseliver tumors.[50] Moreover, research has demonstrated that inhibition of CCNA2 can arrest HCC cell proliferation and tumorigenesis.[51] High expression of CCNA2 is associated with reduced survival in patients with breast cancer and HCC.[52,53] CCNB1 and CCNB2 are the principal activators of CDK1 and, together with CDK1, they promote the G2/M transition.[54,55] Expression of CCNB1 changes periodically throughout the cell cycle, and is a crucial initiator of mitosis.[56] Decreased CCNB1 expression is related to inhibition of HCC occurrence and development, and activation of CCNB1 expression can promote proliferation in human HCC cells.[56,57] Furthermore, previous research has shown that CCNB1 is closely connected to prognosis of HCC patients. [56,58] The dimerization of CCNB2 with CDK1 is an essential component of the cell cycle regulatory machinery, and an increase in expression of CCNB2/CDK1 could promote tumor cell proliferation.[55] Furthermore, CCNB2 is highly expressed in several malignant tumors and overexpression of CCNB2 is related to poor prognosis in HCC.[59] Cell division cycle protein 20 (encoded by CDC20) is a regulator of cell cycle checkpoints, which plays a crucial role in anaphase initiation and exit from mitosis.[60,61] It degrades several important substrates, including cyclin A and CCNB1, to regulate cell cycle progression.[62,63]
CDC20 overexpression is related to progression and poor prognosis in various malignant tumors.[64-67] Thus, it is a potential target in multiple cancer treatments.[68] A recent study found that increased expression of CDC20 is related to HCC development and progression.[67] Additionally, research has indicated that silencing expression of CDC20 and heparanase can activate cell apoptosis; thus, targeting inhibition of both CDC20 and heparanase expression is an ideal approach for the treatment of HCC.[69]Aurora kinase A (encoded by AURKA) is involved in centrosome duplication, spindle formation, chromosomal amplification and segregation, and cytokinesis, and it plays a significant role in centrosome maturation and mitotic commitment in the late G2 phase.[70,71] Abnormal activity of AURKA promotes tumorigenic progression and transformation via defective control at the checkpoint of mitotic spindle.[72] Meanwhile, AURKA is highly expressed in a variety of humancancers, including breast cancer,[73] lung cancer,[74] gastrointestinal cancer,[75] bladder cancer,[76] and oral cancer.[77] A study demonstrated that genetic variations in AURKA might be a reliable predictor of early-stage HCC and a crucial biomarker for HCC development.[78] Moreover, other research has indicated that AURKA contributes in metastasis and invasiveness of HCC.[79] Therefore, AURKA might represent a new therapeutic target for HCC. Topoisomerase II alpha (TOP2A), a potential biomarker for cancer therapy, has been detected in various types of cancer.[80-82] It participates in chromosome condensation and chromatid separation.[80]
TOP2A encodes topoisomerase II alpha[81] and is reported to be overexpressed in HCC tissues.[83] Furthermore, a study has shown that TOP2A has prognostic value in HCC and its reactive agents can be used in HCC therapy.[84] Maternal embryonic leucine zipper kinase (encoded by MELK) is a member of the AMP protein kinase family of serine/threonine kinases, which affect many stages of tumorigenesis.[85] Several studies have shown that MELK is an oncogenic kinase essential for early HCC recurrence, and its expression is upregulated in HCC.[86-88] Furthermore, MELK inhibition is associated with suppression of tumor growth, indicating that MELK is a potential therapeutic target for HCC.[89] PDZ-binding kinase (encoded by PBK) can regulate cell cycle processes.[90] Although PBK is barely detectable in normal somatic tissues, it is often elevated in various tumor tissues and is therefore an important target for cancer screening and targeted therapy.[91,92] Recent research has shown that PBK overexpression promotes migration and invasion of HCC, and it could be a therapeutic target for HCC metastasis.[93] Targeting protein for Xklp2 (TPX2) expression is modulated by the cell cycle, and it is detected in G1/S phase and disappears after cytokinesis.[94,95] Several studies have indicated that TPX2 is highly expressed in different types of cancers.[96,97] Additionally, expression of TPX2 is related to proliferation and apoptosis in HCC.[98]
TPX2 overexpression promotes the growth and metastasis of HCC.[99] Kinesin family member 20A (KIF20A) is required during mitosis for the final step of cytokinesis.[100,101] Studies have found that high expression of KIF20A is correlated with progression or poor prognosis of many types of cancers.[102-104] Furthermore, KIF20A is aberrantly expressed in HCC tissues and its expression may be associated with poor OS.[105]According to enrichment analyses of two modules, we found that module 1 was mostly associated with cell cycle and module 2 was closely related to metabolic pathways. Furthermore, all 11 hub genes belonged to module 1 and most are associated with cell cycle and enriched in the “cell cycle” pathway. A number of studies have elucidated that cell cycle disorders are closely related to humancancer.[43] Therefore, the carcinogenesis and progression of HCC may be associated with the cell cycle pathway, and we might be able to suppress HCC cell cycle progression, inhibit HCC cell proliferation, and reduce HCC malignancy by downregulating expression of the 11 hub genes identified herein.Compared with previous studies, this work has several advantages, as follows. First, the current integrated microarray data used a relatively large sample size from several GEO datasets (GSE36376,[14] GSE39791,[15] GSE41804,[16] GSE54236,[17,18] GSE57957,[19] GSE62232,[20] GSE64041,[21] GSE69715,[22] GSE76427,[23] GSE84005, GSE87630,[24] GSE112790,[25] and GSE121248[26]). Second, functional enrichment analyses were performed to identify the main biological functions and pathways regulated by DEGs. Third, we established PPI networks, conducted module analysis, discovered potential biomarkers for diagnosis and prognosis of HCC, and performed correlation analysis of hub genes.The limitations of this work were as follows: First, our results need to be verified by corresponding experimental studies. Second, we obtained data from the GEO database, and data quality cannot be verified. Finally, our study focused on genes that are typically identified as significant changes in diverse datasets, without regard to sex, age, or grading and staging of tumors from which the samples were derived.
Conclusion
In the present work, we identified 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) associated with the development and poor prognosis of HCC by integrated bioinformatics analysis. However, because our study was based on data analysis only, further experiments are required to confirm the results. Our findings will provide evidence and new insights to enhance approaches for the early diagnosis, prognosis, and treatment of HCC.
Name
logFC
Type
Name
logFC
Type
Name
logFC
Type
CLEC1B
−3.33713
down
IL13RA2
−1.41685
down
CSRNP1
−1.20759
down
C9
−2.93972
down
PAMR1
−1.30729
down
ZGPAT
−1.283655
down
FCN3
−3.32589
down
CYP26A1
−1.82557
down
FAM150B
−1.096361
down
CYP1A2
−3.61576
down
JCHAIN
−1.90133
down
LPA
−1.568535
down
HAMP
−3.72675
down
ADIRF
−1.34189
down
ALPL
−1.135143
down
SLCO1B3
−2.84405
down
NNMT
−1.65555
down
S100A8
−1.149369
down
SPP2
−2.19217
down
TAT
−1.77239
down
GPM6A
−1.287388
down
APOF
−2.7681
down
MS4A6A
−1.02381
down
RCL1
−1.112209
down
NAT2
−2.42415
down
VNN1
−1.43431
down
CYP2B7P
−1.31568
down
CLRN3
−2.35658
down
HSD17B2
−1.27883
down
CCBE1
−1.131678
down
RDH16
−2.05491
down
FAM134B
−1.27241
down
LINC01093
−1.711116
down
SLC25A47
−2.3928
down
CTH
−1.2995
down
ST3GAL6
−1.008844
down
SLC22A1
−2.49578
down
ACAA1
−1.06823
down
TBX15
−1.105089
down
THRSP
−2.37999
down
OTC
−1.12724
down
BCO2
−1.572843
down
CLEC4G
−2.8104
down
CYP2A7
−1.7189
down
LUM
−1.123456
down
GBA3
−2.26827
down
C6
−1.48624
down
ESR1
−1.022446
down
DNASE1L3
−2.22313
down
GREM2
−1.17719
down
CYR61
−1.101151
down
SHBG
−1.96811
down
HPD
−1.56635
down
HBA2
−1.227362
down
LY6E
−2.01561
down
KBTBD11
−1.69651
down
KDM8
−1.06201
down
CDHR2
−2.02873
down
CA2
−1.30707
down
GADD45G
−1.126764
down
TMEM27
−2.33949
down
AKR7A3
−1.25278
down
ASPG
−1.055061
down
C7
−2.2597
down
RNF125
−1.03098
down
FCGR2B
−1.141195
down
FBP1
−1.79884
down
TTC36
−1.69649
down
ASPA
−1.025006
down
SRD5A2
−1.89056
down
PROM1
−1.44661
down
PBLD
−1.006234
down
MT1M
−3.02758
down
ADH6
−1.22168
down
HHIP
−1.37843
down
BBOX1
−2.04999
down
ETNPPL
−1.15368
down
CRP
−1.053533
down
APOA5
−1.774
down
HSD17B13
−1.50866
down
FREM2
−1.522232
down
IGFBP3
−1.70456
down
ANXA10
−1.62516
down
ADRA1A
−1.161964
down
ADH4
−2.15911
down
FXYD1
−1.41243
down
CNTN3
−1.176196
down
KMO
−1.91086
down
OGDHL
−1.30838
down
ITLN1
−1.034492
down
CYP8B1
−1.76864
down
PON1
−1.17061
down
UGT2B10
−1.031179
down
CXCL14
−2.31161
down
ACSM3
−1.52866
down
DIRAS3
−1.123875
down
GHR
−2.12511
down
SLC27A5
−1.33347
down
STEAP4
−1.061309
down
ADGRG7
−1.85853
down
LIFR
−1.47372
down
CYP4A22
−1.074568
down
MARCO
−2.25079
down
HABP2
−1.06311
down
TFPI2
−1.00071
down
MT1F
−2.59948
down
GRAMD1C
−1.07675
down
MT1A
−1.093671
down
CYP39A1
−1.86139
down
TKFC
−1.07859
down
RAB25
−1.081375
down
OIT3
−2.4803
down
STEAP3
−1.09586
down
RDH5
−1.006888
down
MBL2
−1.62953
down
IL1RAP
−1.21549
down
EPCAM
−1.336797
down
VIPR1
−1.89347
down
GCDH
−1.02343
down
SPINK1
3.633978
up
TDO2
−1.44452
down
HAL
−1.262
down
GPC3
2.807155
up
BHMT
−1.68706
down
GABARAPL1
−1.07919
down
AKR1B10
2.588879
up
PCK1
−1.85362
down
ID1
−1.32236
down
ASPM
1.804629
up
MT1H
−2.20509
down
INMT
−1.65209
down
CAP2
2.086341
up
AFM
−1.90272
down
SKAP1
−1.06342
down
TOP2A
2.232845
up
HGFAC
−2.18902
down
FETUB
−1.31249
down
PRC1
1.923672
up
MT1G
−2.64319
down
CFHR4
−1.07478
down
CDKN3
1.778794
up
CYP2A6
−2.05548
down
HSD11B1
−1.27605
down
CDC20
1.910919
up
CETP
−1.77384
down
G6PC
−1.00804
down
PTTG1
1.451774
up
SMIM24
−1.81333
down
MFAP4
−1.53268
down
NCAPG
1.551838
up
FCN2
−1.90705
down
ABCA8
−1.10284
down
LCN2
1.551605
up
FOSB
−2.12211
down
CYP2J2
−1.03103
down
CCL20
1.667526
up
ECM1
−1.72876
down
AKR1D1
−1.77452
down
FAM83D
1.570755
up
MT1X
−2.07498
down
GPD1
−1.01057
down
KIF20A
1.644679
up
SLC10A1
−1.70131
down
HAO1
−1.0889
down
PBK
1.6372
up
CRHBP
−2.55698
down
TACSTD2
−1.09909
down
AURKA
1.321582
up
F9
−1.86997
down
GCGR
−1.51767
down
UBE2T
1.429052
up
SRPX
−1.99247
down
C8orf4
−1.53773
down
NUSAP1
1.447842
up
CYP2C9
−1.7781
down
DMGDH
−1.11277
down
AKR1C3
1.315793
up
GNMT
−1.80416
down
PON3
−1.07722
down
MELK
1.397481
up
CYP2C8
−1.84304
down
MAT1A
−1.15605
down
SRXN1
1.101781
up
PGLYRP2
−1.57039
down
AADAT
−1.45288
down
HMMR
1.429779
up
LECT2
−1.71324
down
HPX
−1.1201
down
COL15A1
1.679907
up
HAO2
−2.05962
down
KCNN2
−1.76035
down
UBD
1.793116
up
FOS
−2.10062
down
ACADL
−1.16219
down
PLVAP
1.303945
up
ANGPTL6
−1.40198
down
SLC13A5
−1.18455
down
HSPB1
1.057592
up
CNDP1
−2.19859
down
ASS1
−1.22714
down
SPP1
1.372928
up
CXCL12
−1.91941
down
PRSS8
−1.15745
down
CENPF
1.339564
up
AGXT2
−1.39193
down
CPED1
−1.24941
down
SQLE
1.28364
up
ACOT12
−1.27878
down
FTCD
−1.25547
down
CEP55
1.130246
up
RSPO3
−1.62341
down
TMEM45A
−1.37559
down
KIF4A
1.431933
up
PZP
−1.76877
down
ALDH6A1
−1.08996
down
TRIP13
1.223148
up
COLEC10
−1.85319
down
SLC27A2
−1.02491
down
S100P
1.428178
up
HOGA1
−1.43807
down
ETFDH
−1.15312
down
DLGAP5
1.462148
up
MT1E
−1.80442
down
GCKR
−1.00475
down
ALDH3A1
1.048498
up
CYP3A4
−2.39818
down
OAT
−1.35234
down
CDCA5
1.222277
up
SLC39A5
−1.47867
down
SFRP5
−1.04433
down
SFN
1.002947
up
KLKB1
−1.57229
down
CYP3A43
−1.2044
down
ESM1
1.15394
up
LCAT
−1.87391
down
SLC6A12
−1.11241
down
TTK
1.378481
up
IGFALS
−1.94508
down
SOCS2
−1.38986
down
TPX2
1.091732
up
GLYAT
−1.72131
down
CYP4F2
−1.0376
down
PAGE4
1.240802
up
ADH1C
−1.64914
down
PHYHD1
−1.0017
down
COL4A1
1.236208
up
PROZ
−1.52487
down
SLC7A2
−1.05182
down
HJURP
1.034534
up
CYP2E1
−2.04247
down
C1RL
−1.01827
down
RACGAP1
1.407851
up
GSTZ1
−1.39923
down
PLG
−1.09969
down
IGF2BP3
1.019851
up
CHST4
−1.72521
down
CPS1
−1.29626
down
ANLN
1.53779
up
MFSD2A
−1.51912
down
ADAMTSL2
−1.24169
down
MCM2
1.109517
up
IDO2
−1.83679
down
MTTP
−1.02368
down
UBE2C
1.0809
up
SDS
−1.75694
down
CXCL2
−1.43349
down
NQO1
1.365462
up
ENO3
−1.37195
down
HRG
−1.00696
down
CCNB2
1.303069
up
GLS2
−1.75439
down
ACSL1
−1.14524
down
CCNA2
1.185444
up
DCN
−1.94676
down
MAN1C1
−1.18965
down
MUC13
1.14796
up
PLAC8
−1.80012
down
PCOLCE
−1.00609
down
MCM6
1.016314
up
SERPINA4
−1.2352
down
MT2A
−1.54319
down
CENPW
1.083208
up
ZG16
−1.56869
down
CD1D
−1.02692
down
TGM3
1.050965
up
BCHE
−1.77407
down
XDH
−1.11927
down
RAD51AP1
1.049223
up
CFP
−1.47416
down
PPP1R1A
−1.10299
down
THY1
1.046852
up
SLC38A4
−1.32606
down
HBB
−1.31952
down
NUF2
1.25884
up
ADH1A
−1.27277
down
RBP5
−1.04885
down
CKAP2L
1.054397
up
CLEC4M
−2.35545
down
CFHR3
−1.10107
down
MAGEA1
1.282995
up
CYP4A11
−1.5036
down
RELN
−1.02856
down
ECT2
1.065576
up
GYS2
−1.66608
down
NPY1R
−1.34248
down
ACSL4
1.16679
up
PHGDH
−1.40019
down
CLDN10
−1.34641
down
MDK
1.076885
up
BGN
−1.2236
down
ATF5
−1.11652
down
PEG10
1.104051
up
CIDEB
−1.27052
down
GNE
−1.04957
down
COX7B2
1.333566
up
CYP2C19
−1.55814
down
CYP4V2
−1.05634
down
CCNB1
1.362239
up
IYD
−1.22582
down
CD5L
−1.49237
down
RRM2
1.542665
up
C8A
−1.49471
down
TIMD4
−1.24178
down
REG3A
1.140254
up
STAB2
−1.82665
down
EGR1
−1.41173
down
CDK1
1.236442
up
CDA
−1.14527
down
GADD45B
−1.21416
down
KIF14
1.054151
up
HPGD
−1.37821
down
GPT2
−1.15763
down
ZIC2
1.320155
up
OLFML3
−1.38115
down
ACMSD
−1.02364
down
BUB1B
1.118801
up
PTH1R
−1.35746
down
CCL19
−1.32425
down
NDC80
1.234218
up
EPHX2
−1.29488
down
RBP1
−1.15142
down
NEK2
1.144213
up
COLEC11
−1.34767
down
ACADS
−1.05741
down
RBM24
1.220962
up
CYP2C18
−1.21134
down
MYOM2
−1.03989
down
NMRAL1P1
1.314053
up
AMDHD1
−1.14346
down
DCXR
−1.01852
down
DTL
1.283296
up
LYVE1
−1.69466
down
PLGLB1
−1.07364
down
SULT1C2
1.181554
up
GSPT2
−1.16851
down
CYP2B6
−1.37318
down
ROBO1
1.247873
up
C8B
−1.16715
down
UROC1
−1.06129
down
SSX1
1.001365
up
ADH1B
−1.77846
down
PDK4
−1.08546
down
FLVCR1
1.006476
up
DPT
−1.68413
down
PPARGC1A
−1.08395
down
CTHRC1
1.120384
up
AZGP1
−1.23501
down
NDRG2
−1.01145
down
ZWINT
1.066653
up
ALDH8A1
−1.37768
down
IGF1
−1.14785
down
GINS1
1.03249
up
RND3
−1.62821
down
ASPDH
−1.15589
down
SMPX
1.089408
up
SLC19A3
−1.18742
down
DBH
−1.50296
down
GPR158
1.061576
up
WDR72
−1.27875
down
PRG4
−1.13337
down
FC, fold change.
Category
ID
Term
−log10(FDR)
Count
BP
GO:0055114
Oxidation−reduction process
16.45646128
56
BP
GO:0019373
Epoxygenase P450 pathway
12.72414085
13
BP
GO:0006805
Xenobiotic metabolic process
6.801196269
16
BP
GO:0017144
Drug metabolic process
6.713310124
11
BP
GO:0045926
Negative regulation of growth
5.354060264
9
BP
GO:0071276
Cellular response to cadmium ion
4.258416753
8
BP
GO:0042738
Exogenous drug catabolic process
3.873727759
7
BP
GO:0071294
Cellular response to zinc ion
3.86110044
8
BP
GO:0008202
Steroid metabolic process
3.349012692
10
BP
GO:0097267
Omega−hydroxylase P450 pathway
3.048831706
6
BP
GO:0016098
Monoterpenoid metabolic process
2.284734835
5
BP
GO:0007067
Mitotic nuclear division
1.901221899
19
BP
GO:0006569
Tryptophan catabolic process
1.382839511
5
CC
GO:0031090
Organelle membrane
12.13504583
21
CC
GO:0070062
Extracellular exosome
10.96203625
117
CC
GO:0005576
Extracellular region
8.944226201
78
CC
GO:0005615
Extracellular space
8.079401711
68
CC
GO:0072562
Blood microparticle
3.941029653
17
CC
GO:0005579
Membrane attack complex
2.131478756
5
MF
GO:0016705
Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen
12.77849851
19
MF
GO:0020037
Heme binding
11.82105086
25
MF
GO:0004497
Monooxygenase activity
11.5463498
18
MF
GO:0005506
Iron ion binding
10.69763162
25
MF
GO:0008392
Arachidonic acid epoxygenase activity
10.22404973
11
MF
GO:0019825
Oxygen binding
9.168975245
15
MF
GO:0016491
Oxidoreductase activity
5.664542324
22
MF
GO:0008395
Steroid hydroxylase activity
5.613513145
10
MF
GO:0070330
Aromatase activity
2.805232257
8
MF
GO:0004024
Alcohol dehydrogenase activity, zinc−dependent
2.38141982
5
MF
GO:0016712
Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen
Authors: Cristina Aguirre-Portolés; Alexander W Bird; Anthony Hyman; Marta Cañamero; Ignacio Pérez de Castro; Marcos Malumbres Journal: Cancer Res Date: 2012-01-20 Impact factor: 12.701
Authors: Li Weng; Juan Du; Qinghui Zhou; Binbin Cheng; Jun Li; Denghai Zhang; Changquan Ling Journal: Mol Cancer Date: 2012-06-08 Impact factor: 27.401
Authors: Aaron Mobley; Shizhen Zhang; Jolanta Bondaruk; Yan Wang; Tadeusz Majewski; Nancy P Caraway; Li Huang; Einav Shoshan; Guermarie Velazquez-Torres; Giovanni Nitti; Sangkyou Lee; June Goo Lee; Enrique Fuentes-Mattei; Daniel Willis; Li Zhang; Charles C Guo; Hui Yao; Keith Baggerly; Yair Lotan; Seth P Lerner; Colin Dinney; David McConkey; Menashe Bar-Eli; Bogdan Czerniak Journal: Sci Rep Date: 2017-01-19 Impact factor: 4.379