Literature DB >> 32722976

Identification of potential hub genes associated with the pathogenesis and prognosis of hepatocellular carcinoma via integrated bioinformatics analysis.

Ziqi Meng¹, Jiarui Wu¹, Xinkui Liu¹, Wei Zhou¹, Mengwei Ni¹, Shuyu Liu¹, Siyu Guo¹, Shanshan Jia¹, Jingyuan Zhang¹.

Abstract

OBJECTIVE: The objective was to identify potential hub genes associated with the pathogenesis and prognosis of hepatocellular carcinoma (HCC).
METHODS: Gene expression profile datasets were downloaded from the Gene Expression Omnibus database. Differentially expressed genes (DEGs) between HCC and normal samples were identified via an integrated analysis. A protein-protein interaction network was constructed and analyzed using the STRING database and Cytoscape software, and enrichment analyses were carried out through DAVID. Gene Expression Profiling Interactive Analysis and Kaplan-Meier plotter were used to determine expression and prognostic values of hub genes.
RESULTS: We identified 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) that might be closely related to the pathogenesis and prognosis of HCC. Enrichment analyses indicated that the DEGs were significantly enriched in metabolism-associated pathways, and hub genes and module 1 were highly associated with cell cycle pathway.
CONCLUSIONS: In this study, we identified key genes of HCC, which indicated directions for further research into diagnostic and prognostic biomarkers that could facilitate targeted molecular therapy for HCC.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: Gene Expression Omnibus; Hepatocellular carcinoma; bioinformatics analysis; differentially expressed genes; hub genes; survival

Mesh：

Substances：

Year: 2020 PMID： 32722976 PMCID： PMC7391448 DOI： 10.1177/0300060520910019

Source DB: PubMed Journal: J Int Med Res ISSN： 0300-0605 Impact factor: 1.671

Introduction

On a global scale, cancer is the main public health problem and liver cancer is a major contributor to both cancer morbidity and mortality.[1] Liver cancer is the sixth most common cancer and the fourth highest cause of cancer-related mortality worldwide.[2] There were expected to be 42,030 newly diagnosed cases and 31,780 deaths of liver cancer in the United States during 2019.[3] Hepatocellular carcinoma (HCC) is the most common form of primary liver cancer, comprising 75% to 85% of cases.[2] The well-recognized risk factors for HCC include chronic infection with hepatitis B (HBV) or hepatitis C virus, exposure to dietary aflatoxin, alcohol-induced cirrhosis, smoking, obesity, and type 2 diabetes.[2,4] In Asia (especially China), chronic HBV infection is the leading etiologic factor of HCC.[5] Most HCC patients are diagnosed at an advanced stage, and locoregional treatments (chemoembolization) and surgical treatments are relatively disappointing in terms of overall survival (OS) of patients with advanced disease.[6] In addition, traditional chemotherapies have not shown promising outcomes in treatment of HCC and have significant toxicity.[6,7] Meanwhile, the lack of early detection of diagnostic markers and limited treatment strategies increase the risk of poor prognosis and death.[8] Therefore, there is a pressing need to develop robust diagnostic strategies and effective therapies for HCC patients.[9] Over the past decades, microarray technology and bioinformatics have been extensively applied to identify the molecular mechanisms of HCC, which provide strong research support for the diagnosis, treatment, and prognosis of HCC.[10] Because of the ability to process a large number of datasets quickly, integrated bioinformatics analysis and microarray technology have allowed researchers to comprehensively identify the functions of numerous differentially expressed genes (DEGs) in HCC, and they help researchers explore the complicated process of HCC occurrence and development.[10,11] A work by He et al.[12] identified four hub genes and two important pathways in the development of HCC from cirrhosis from one Gene Expression Omnibus (GEO) dataset using a bioinformatics method, including DEG screening, enrichment analyses, and construction of a protein–protein interaction (PPI) network. Zhang et al.[13] screened hub genes and pathways correlated with the occurrence and progression of HCC via a series of bioinformatics analyses incorporating DEGs identification, functional enrichment analyses, PPI network and module analysis, and weighted correlation network analysis. Zhou et al.[11] identified the pivotal genes and microRNAs in HCC using a bioinformatics approach, including analysis of raw data via GEO2R, Gene Ontology (GO), and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses, and construction of PPI network. However, to improve the diagnosis and treatment of HCC, novel diagnostic and prognostic biomarkers for HCC are needed. The flowchart of the study approach is shown in Figure 1.

Figure 1.

Flowchart for identification of core genes and pathways for hepatocellular carcinoma (HCC). GEO, Gene Expression Omnibus; DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; PPI, protein–protein interaction; GEPIA, Gene Expression Profiling Interactive Analysis.

Materials and methods

Ethical approval

Ethical approval was not required in this study because we analyzed only published data from the GEO database.

Gene expression profile data

Gene expression profile data (GSE36376,[14] GSE39791,[15] GSE41804,[16] GSE54236,[17,18] GSE57957,[19] GSE62232,[20] GSE64041,[21] GSE69715,[22] GSE76427,[23] GSE84005, GSE87630,[24] GSE112790,[25] and GSE121248[26]) were downloaded from the GEO database (http://www.ncbi.nlm.nih.gov/geo/),[27] a public data repository, including high-throughput gene expression and other functional genome datasets. The selection criteria for the included datasets were as follows: (1) tissue samples collected from human HCC and corresponding adjacent or normal tissues; and (2) including at least 40 samples.

Integrated analysis of microarray datasets

The matrix data of each GEO dataset were normalized and log2 transformed using the R software package limma,[28] and the DEGs between HCC and corresponding adjacent or normal tissues were also filtered using the limma package. Integration of DEGs identified from the 13 datasets was performed by RobustRankAggreg package[29] in R software. A |log2 fold change (FC)| ≥1 and adjusted P-value < 0.05 were considered significant for the DEGs.

Enrichment analyses of DEGs

Database for Annotation, Visualization and Integrated Discovery (DAVID; https://david.ncifcrf.gov/, version, 6.8)[30] is a comprehensive functional annotation tool for extracting biological significance from large gene/protein datasets. In this study, the GO and KEGG pathway enrichment analyses for the DEGs were conducted via DAVID. The visualization of enrichment analysis results was conducted by using ggplot2[31] and the GOplot[32] package in the R software.

PPI network and module analysis

Search Tool for the Retrieval of Interacting Genes/Proteins (STRING; https://string-db.org/)[33] is a database of known and predicted protein interactions, showing direct and indirect interactions among proteins. This database was applied to obtain potential interactions among the DEGs. PPIs with a confidence score ≥0.7 were reserved and imported into Cytoscape software[34] to construct the PPI network. Furthermore, the clustering modules in this PPI network were analyzed using the MCODE (Molecular Complex Detection) plugin in Cytoscape.[35] Pathway enrichment analyses for important modules were also carried out. The visualization of enrichment analysis results was performed by using the imageGP platform (http://www.ehbio.com/ImageGP/index.php/Home/Index/GOenrichmentplot.html).

Survival analysis of hub genes

Kaplan–Meier plotter (KM plotter; http://kmplot.com/analysis/) is a database containing clinical data and gene expression data.[36] This database is used to further understanding the molecular basis of disease and identifying biomarkers associated with survival.[37] The recurrence-free survival and OS information were based on GEO, the European Genome-phenome Archive (EGA), and The Cancer Genome Atlas (TCGA) databases. Hazard ratios (HR) with 95% confidence intervals and log rank P-value were calculated to assess the association of gene expression with survival and are shown in plots.[38]

Expression level analysis and correlation analysis of hub genes

The Gene Expression Profiling Interactive Analysis (GEPIA; http://gepia.cancer pku.cn/index.html)[39] is a newly developed web-based tool that applies a standard processing pipeline to analyze gene expression data between tumor and normal tissues. The relationship of expression of hub genes in HCC and normal tissues were visualized by boxplot.[38] In addition, correlation analysis was performed by GEPIA to check the relative ratios between two genes.[39]

Results

Identification of DEGs

In the present study, 13 datasets were downloaded from GEO that included 1100 cancer tissues and 717 corresponding adjacent or normal tissues (Table 1). After integrated analysis, 380 DEGs (293 downregulated and 87 upregulated) were identified (Figure 2a-m and Appendix). Figure 2n shows the top 20 down- and upregulated genes.

Table 1.

Information for the 13 Gene Expression Omnibus datasets included in the current study.

Dataset	Platform	Number of samples (tumor/control)
GSE36376	GPL10558-Illumina HumanHT-12 V4.0 expression beadchip	433 (240/193)
GSE39791	GPL10558-Illumina HumanHT-12 V4.0 expression beadchip	144 (72/72)
GSE41804	GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array	40 (20/20)
GSE54236	GPL6480-Agilent-014850 Whole Human Genome Microarray 4x44K G4112F (Probe Name version)	161 (81/80)
GSE57957	GPL10558-Illumina HumanHT-12 V4.0 expression beadchip	78 (39/39)
GSE62232	GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array	91 (81/10)
GSE64041	GPL6244-[HuGene-1_0-st] Affymetrix Human Gene 1.0 ST Array [transcript (gene) version]	125 (60/65)
GSE69715	GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array	103 (37/66)
GSE76427	GPL10558-Illumina HumanHT-12 V4.0 expression beadchip	167 (115/52)
GSE84005	GPL5175-[HuEx-1_0-st] Affymetrix Human Exon 1.0 ST Array [transcript (gene) version]	76 (38/38)
GSE87630	GPL6947-Illumina HumanHT-12 V3.0 expression beadchip	94 (64/30)
GSE112790	GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array	198 (183/15)
GSE121248	GPL570-[HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array	107 (70/37)

Figure 2.

Identification of DEGs. Volcano plots of Gene Expression Omnibus datasets (a) GSE36376, (b) GSE39791, (c) GSE41804, (d) GSE54236, (e) GSE57957, (f) GSE62232, (g) GSE64041, (h) GSE69715, (i) GSE76427, (j) GSE84005, (k) GSE87630, (l) GSE112790, and (m) GSE121248; (n) heat map of DEGs. Blue indicates lower expression levels, red indicates higher expression levels, and white indicates no differentially expression among the genes. Each column represents one dataset and each row represents one gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from blue to red represents the changing process from downregulation to upregulation. DEG, differentially expressed gene.

Information for the 13 Gene Expression Omnibus datasets included in the current study. Identification of DEGs. Volcano plots of Gene Expression Omnibus datasets (a) GSE36376, (b) GSE39791, (c) GSE41804, (d) GSE54236, (e) GSE57957, (f) GSE62232, (g) GSE64041, (h) GSE69715, (i) GSE76427, (j) GSE84005, (k) GSE87630, (l) GSE112790, and (m) GSE121248; (n) heat map of DEGs. Blue indicates lower expression levels, red indicates higher expression levels, and white indicates no differentially expression among the genes. Each column represents one dataset and each row represents one gene. The number in each rectangle represents the normalized gene expression level. The gradual color ranged from blue to red represents the changing process from downregulation to upregulation. DEG, differentially expressed gene.

GO and KEGG pathway enrichment analyses of DEGs

To deepen our understanding of DEGs, we performed GO and KEGG pathway enrichment analyses. Thirty-one significantly enriched GO terms were selected based on a false discovery rate (FDR) < 0.05 (Figure 3a and Appendix). In the GO terms were 13 terms for biological process, mainly related to metabolic process, P450 pathway, and oxidation-reduction process; 12 terms for molecular function, highly involved with multiple enzyme activities, heme binding, iron ion binding and oxygen binding; and 6 terms for cellular components, associated with organelle membrane, extracellular exosome, extracellular region, extracellular space, blood microparticle, and membrane attack complex.

Figure 3.

Enrichment analysis of DEGs. (a) GO enrichment analysis of DEGs, (b) top 5 terms of GO enrichment, and (c) KEGG pathway analysis of DEGs. DEG, differentially expressed gene; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate. In the KEGG pathway enrichment analyses, we screened nine pathways according to FDR < 0.05. Figure 3c shows the results of KEGG analysis; the DEGs primarily participated in diverse metabolism-associated signaling pathways, such as metabolic pathways, retinol metabolism, drug metabolism-cytochrome P450, among others.

PPI network establishment and module analysis

The PPI network of DEGs comprised 242 nodes and 1267 interactions (Figure 4a); degree was calculated to identify candidate key nodes. Finally, 11 potential key nodes were identified, the degrees of which were all more than four times the corresponding average values: CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA (Table 2). Moreover, to determine important clustering modules in the PPI network, module analysis was performed using MCODE, and the two modules with the highest scores (score >10) were obtained (Figure 4b, 4c). The enrichment pathways of module 1 and module 2 are shown in Figure 5. Module 1 was highly associated with cell cycle and oocyte meiosis; module 2 was closely connected to drug metabolism-cytochrome P450, linoleic acid metabolism, chemical carcinogenesis, arachidonic acid metabolism, retinol metabolism, metabolism of xenobiotics by cytochrome P450, and metabolic pathways.

Figure 4.

Table 2.

Upregulated hub genes with high degrees.

Gene	Degree	Type	MCODE Cluster
CDK1	47	up	Module 1
CCNB2	46	up	Module 1
CDC20	45	up	Module 1
CCNB1	45	up	Module 1
TOP2A	44	up	Module 1
CCNA2	44	up	Module 1
MELK	43	up	Module 1
PBK	43	up	Module 1
TPX2	43	up	Module 1
KIF20A	43	up	Module 1
AURKA	43	up	Module 1

Figure 5.

Pathway analysis of the two modules with the highest scores. The y-axis shows significantly enriched KEGG pathways, and x-axis shows the two modules. KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.

PPI network and hub clustering modules. (a) The PPI network of DEGs, (b) module 1 (MCODE score = 38.769), and (c) module 2 (MCODE score = 10.364). Blue circles represent downregulated genes and red circles represent upregulated genes. PPI, protein–protein interaction; DEG, differentially expressed gene; MCODE, Molecular Complex Detection. Upregulated hub genes with high degrees. Pathway analysis of the two modules with the highest scores. The y-axis shows significantly enriched KEGG pathways, and x-axis shows the two modules. KEGG, Kyoto Encyclopedia of Genes and Genomes; FDR, false discovery rate.

Survival analysis, expression, and correlation analysis of hub genes

Survival analysis of 11 hub genes was performed using the KM plotter. The results showed that high expression of CDK1 (HR = 2.15, 95% CI: 1.52–3.06; P = 1.1e−05), CCNB2 (HR = 1.91, 95% CI: 1.28–2.87; P = 0.0013), CDC20 (HR = 2.49, 95% CI: 1.72–3.59; P = 5.1e−07), CCNB1 (HR = 2.34, 95% CI: 1.55–3.54; P = 3.4e−05), TOP2A (HR = 1.99, 95% CI: 1.39–2.86; P = 0.00012), CCNA2 (HR = 1.92, 95% CI: 1.36–2.72; P = 0.00018), MELK (HR = 2.22, 95% CI: 1.5–3.27; P = 3.7e−05), PBK (HR = 2.24, 95% CI: 1.5–3.34; P = 4.8e−05), TPX2 (HR = 2.29, 95% CI: 1.62–3.24; P = 1.4e−06), KIF20A (HR = 2.33, 95% CI: 1.63–3.32; P = 1.8e−06), and AURKA (HR = 1.77, 95% CI: 1.25–2.5; P = 0.0011) was related to unfavorable OS for HCC patients (Figure 6). Furthermore, GEPIA was adopted to analyze the different expression level of hub genes in HCC and normal tissues and the 11 hub genes were confirmed to be highly expressed in HCC tissues (Figure 7). The correlations between hub genes were also analyzed by GEPIA, which showed that the 11 hub genes were significantly correlated with each other. Figure 8 showed that the increase in expression of CDK1 was strongly associated with increased expression of the other 10 hub genes.

Figure 6.

Figure 7.

Analysis of expression levels of 11 hub genes in human HCC. The red and gray boxes represent cancer and normal tissues, respectively. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; LIHC, liver hepatocellular carcinoma.

Figure 8.

Correlation analysis of 10 hub genes in HCC with CDK1: (a) CCNB2, (b) CDC20, (c) CCNB1, (d) TOP2A, (e) CCNA2, (f) MELK, (g) PBK, (h) TPX2, (i) KIF20A, and (j) AURKA. HCC, hepatocellular carcinoma.

Prognostic roles of 11 hub genes in patients with HCC shown as survival curves. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; HR, hazard ratio. Analysis of expression levels of 11 hub genes in human HCC. The red and gray boxes represent cancer and normal tissues, respectively. (a) CDK1, (b) CCNB2, (c) CDC20, (d) CCNB1, (e) TOP2A, (f) CCNA2, (g) MELK, (h) PBK, (i) TPX2, (j) KIF20A, and (k) AURKA. HCC, hepatocellular carcinoma; LIHC, liver hepatocellular carcinoma. Correlation analysis of 10 hub genes in HCC with CDK1: (a) CCNB2, (b) CDC20, (c) CCNB1, (d) TOP2A, (e) CCNA2, (f) MELK, (g) PBK, (h) TPX2, (i) KIF20A, and (j) AURKA. HCC, hepatocellular carcinoma.

Discussion

HCC is the most common type of malignancy and one of the leading causes of cancer-related mortality worldwide.[40,41] Although much research has been done on HCC, its early diagnosis and treatment remains difficult because of a lack of understanding of the molecular mechanisms associated with HCC occurrence and development.[41] Therefore, in-depth studies of the etiological factors and molecular mechanisms of HCC are of critical importance for HCC diagnosis and treatment.[11] Currently, bioinformatics analysis and microarray technology are developing rapidly and this approach can be used to identify therapeutic targets for diagnosis, therapy, and prognosis of a variety of neoplasms.[42] In this research, we identified 380 DEGs, including 293 downregulated and 87 upregulated genes, between HCC and corresponding adjacent or normal tissues. Enrichment analyses indicated that the DEGs were mostly associated with metabolic processes, such as metabolism of retinol, drugs, xenobiotics, tyrosine, tryptophan, and histidine, as well as fatty acid degradation. This indicated that metabolic dysregulation is closely related to HCC. In addition, we obtained 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) in the PPI network, which were upregulated in HCC tissues compared with normal tissues; expression of the first hub gene, CDK1, was strongly correlated with that of the other hub genes. Overexpression of the 11 hub genes was correlated with worse OS. Recent evidence implies that tumor cells need specific interphase cyclin-dependent kinases (CDKs) to proliferate.[43] Cyclin-dependent kinase 1 (CDK1) belongs to the CDK family, a member of the serine/threonine protein kinases, and it is crucial for the cell cycle phase transitions G1/S and G2/M.[44,45] CDK1 is required for cell proliferation because it is the only CDK that can initiate mitosis.[46] The deregulation of CDK1 is likely related to HCC tumorigenesis.[47] Research has found that high expression of CDK1 is correlated with poor OS of HCC.[45] Cyclins act as the regulatory subunits of the CDKs, regulating temporal transitions among various stages of the cell cycle via CDK activation.[48] Cyclin-A2 (CCNA2), cyclin-B1 (CCNB1), and cyclin-B2 (CCNB1), encoded by the CCNA2, CCNB1, and CCNB2 genes, respectively, all belong to the cyclin family. CCNA2 activates CDK1 at the end of interphase to facilitate the onset of mitosis, and CCNA2 overexpression has been reported in numerous types of cancers.[49] A previous study indicated that CCNA2 amplification and overexpression is associated with carcinogenesis of transgenic mouse liver tumors.[50] Moreover, research has demonstrated that inhibition of CCNA2 can arrest HCC cell proliferation and tumorigenesis.[51] High expression of CCNA2 is associated with reduced survival in patients with breast cancer and HCC.[52,53] CCNB1 and CCNB2 are the principal activators of CDK1 and, together with CDK1, they promote the G2/M transition.[54,55] Expression of CCNB1 changes periodically throughout the cell cycle, and is a crucial initiator of mitosis.[56] Decreased CCNB1 expression is related to inhibition of HCC occurrence and development, and activation of CCNB1 expression can promote proliferation in human HCC cells.[56,57] Furthermore, previous research has shown that CCNB1 is closely connected to prognosis of HCC patients. [56,58] The dimerization of CCNB2 with CDK1 is an essential component of the cell cycle regulatory machinery, and an increase in expression of CCNB2/CDK1 could promote tumor cell proliferation.[55] Furthermore, CCNB2 is highly expressed in several malignant tumors and overexpression of CCNB2 is related to poor prognosis in HCC.[59] Cell division cycle protein 20 (encoded by CDC20) is a regulator of cell cycle checkpoints, which plays a crucial role in anaphase initiation and exit from mitosis.[60,61] It degrades several important substrates, including cyclin A and CCNB1, to regulate cell cycle progression.[62,63] CDC20 overexpression is related to progression and poor prognosis in various malignant tumors.[64-67] Thus, it is a potential target in multiple cancer treatments.[68] A recent study found that increased expression of CDC20 is related to HCC development and progression.[67] Additionally, research has indicated that silencing expression of CDC20 and heparanase can activate cell apoptosis; thus, targeting inhibition of both CDC20 and heparanase expression is an ideal approach for the treatment of HCC.[69] Aurora kinase A (encoded by AURKA) is involved in centrosome duplication, spindle formation, chromosomal amplification and segregation, and cytokinesis, and it plays a significant role in centrosome maturation and mitotic commitment in the late G2 phase.[70,71] Abnormal activity of AURKA promotes tumorigenic progression and transformation via defective control at the checkpoint of mitotic spindle.[72] Meanwhile, AURKA is highly expressed in a variety of human cancers, including breast cancer,[73] lung cancer,[74] gastrointestinal cancer,[75] bladder cancer,[76] and oral cancer.[77] A study demonstrated that genetic variations in AURKA might be a reliable predictor of early-stage HCC and a crucial biomarker for HCC development.[78] Moreover, other research has indicated that AURKA contributes in metastasis and invasiveness of HCC.[79] Therefore, AURKA might represent a new therapeutic target for HCC. Topoisomerase II alpha (TOP2A), a potential biomarker for cancer therapy, has been detected in various types of cancer.[80-82] It participates in chromosome condensation and chromatid separation.[80] TOP2A encodes topoisomerase II alpha[81] and is reported to be overexpressed in HCC tissues.[83] Furthermore, a study has shown that TOP2A has prognostic value in HCC and its reactive agents can be used in HCC therapy.[84] Maternal embryonic leucine zipper kinase (encoded by MELK) is a member of the AMP protein kinase family of serine/threonine kinases, which affect many stages of tumorigenesis.[85] Several studies have shown that MELK is an oncogenic kinase essential for early HCC recurrence, and its expression is upregulated in HCC.[86-88] Furthermore, MELK inhibition is associated with suppression of tumor growth, indicating that MELK is a potential therapeutic target for HCC.[89] PDZ-binding kinase (encoded by PBK) can regulate cell cycle processes.[90] Although PBK is barely detectable in normal somatic tissues, it is often elevated in various tumor tissues and is therefore an important target for cancer screening and targeted therapy.[91,92] Recent research has shown that PBK overexpression promotes migration and invasion of HCC, and it could be a therapeutic target for HCC metastasis.[93] Targeting protein for Xklp2 (TPX2) expression is modulated by the cell cycle, and it is detected in G1/S phase and disappears after cytokinesis.[94,95] Several studies have indicated that TPX2 is highly expressed in different types of cancers.[96,97] Additionally, expression of TPX2 is related to proliferation and apoptosis in HCC.[98] TPX2 overexpression promotes the growth and metastasis of HCC.[99] Kinesin family member 20A (KIF20A) is required during mitosis for the final step of cytokinesis.[100,101] Studies have found that high expression of KIF20A is correlated with progression or poor prognosis of many types of cancers.[102-104] Furthermore, KIF20A is aberrantly expressed in HCC tissues and its expression may be associated with poor OS.[105] According to enrichment analyses of two modules, we found that module 1 was mostly associated with cell cycle and module 2 was closely related to metabolic pathways. Furthermore, all 11 hub genes belonged to module 1 and most are associated with cell cycle and enriched in the “cell cycle” pathway. A number of studies have elucidated that cell cycle disorders are closely related to human cancer.[43] Therefore, the carcinogenesis and progression of HCC may be associated with the cell cycle pathway, and we might be able to suppress HCC cell cycle progression, inhibit HCC cell proliferation, and reduce HCC malignancy by downregulating expression of the 11 hub genes identified herein. Compared with previous studies, this work has several advantages, as follows. First, the current integrated microarray data used a relatively large sample size from several GEO datasets (GSE36376,[14] GSE39791,[15] GSE41804,[16] GSE54236,[17,18] GSE57957,[19] GSE62232,[20] GSE64041,[21] GSE69715,[22] GSE76427,[23] GSE84005, GSE87630,[24] GSE112790,[25] and GSE121248[26]). Second, functional enrichment analyses were performed to identify the main biological functions and pathways regulated by DEGs. Third, we established PPI networks, conducted module analysis, discovered potential biomarkers for diagnosis and prognosis of HCC, and performed correlation analysis of hub genes. The limitations of this work were as follows: First, our results need to be verified by corresponding experimental studies. Second, we obtained data from the GEO database, and data quality cannot be verified. Finally, our study focused on genes that are typically identified as significant changes in diverse datasets, without regard to sex, age, or grading and staging of tumors from which the samples were derived.

Conclusion

In the present work, we identified 11 hub genes (CDK1, CCNB2, CDC20, CCNB1, TOP2A, CCNA2, MELK, PBK, TPX2, KIF20A, and AURKA) associated with the development and poor prognosis of HCC by integrated bioinformatics analysis. However, because our study was based on data analysis only, further experiments are required to confirm the results. Our findings will provide evidence and new insights to enhance approaches for the early diagnosis, prognosis, and treatment of HCC.

Name	logFC	Type	Name	logFC	Type	Name	logFC	Type
CLEC1B	−3.33713	down	IL13RA2	−1.41685	down	CSRNP1	−1.20759	down
C9	−2.93972	down	PAMR1	−1.30729	down	ZGPAT	−1.283655	down
FCN3	−3.32589	down	CYP26A1	−1.82557	down	FAM150B	−1.096361	down
CYP1A2	−3.61576	down	JCHAIN	−1.90133	down	LPA	−1.568535	down
HAMP	−3.72675	down	ADIRF	−1.34189	down	ALPL	−1.135143	down
SLCO1B3	−2.84405	down	NNMT	−1.65555	down	S100A8	−1.149369	down
SPP2	−2.19217	down	TAT	−1.77239	down	GPM6A	−1.287388	down
APOF	−2.7681	down	MS4A6A	−1.02381	down	RCL1	−1.112209	down
NAT2	−2.42415	down	VNN1	−1.43431	down	CYP2B7P	−1.31568	down
CLRN3	−2.35658	down	HSD17B2	−1.27883	down	CCBE1	−1.131678	down
RDH16	−2.05491	down	FAM134B	−1.27241	down	LINC01093	−1.711116	down
SLC25A47	−2.3928	down	CTH	−1.2995	down	ST3GAL6	−1.008844	down
SLC22A1	−2.49578	down	ACAA1	−1.06823	down	TBX15	−1.105089	down
THRSP	−2.37999	down	OTC	−1.12724	down	BCO2	−1.572843	down
CLEC4G	−2.8104	down	CYP2A7	−1.7189	down	LUM	−1.123456	down
GBA3	−2.26827	down	C6	−1.48624	down	ESR1	−1.022446	down
DNASE1L3	−2.22313	down	GREM2	−1.17719	down	CYR61	−1.101151	down
SHBG	−1.96811	down	HPD	−1.56635	down	HBA2	−1.227362	down
LY6E	−2.01561	down	KBTBD11	−1.69651	down	KDM8	−1.06201	down
CDHR2	−2.02873	down	CA2	−1.30707	down	GADD45G	−1.126764	down
TMEM27	−2.33949	down	AKR7A3	−1.25278	down	ASPG	−1.055061	down
C7	−2.2597	down	RNF125	−1.03098	down	FCGR2B	−1.141195	down
FBP1	−1.79884	down	TTC36	−1.69649	down	ASPA	−1.025006	down
SRD5A2	−1.89056	down	PROM1	−1.44661	down	PBLD	−1.006234	down
MT1M	−3.02758	down	ADH6	−1.22168	down	HHIP	−1.37843	down
BBOX1	−2.04999	down	ETNPPL	−1.15368	down	CRP	−1.053533	down
APOA5	−1.774	down	HSD17B13	−1.50866	down	FREM2	−1.522232	down
IGFBP3	−1.70456	down	ANXA10	−1.62516	down	ADRA1A	−1.161964	down
ADH4	−2.15911	down	FXYD1	−1.41243	down	CNTN3	−1.176196	down
KMO	−1.91086	down	OGDHL	−1.30838	down	ITLN1	−1.034492	down
CYP8B1	−1.76864	down	PON1	−1.17061	down	UGT2B10	−1.031179	down
CXCL14	−2.31161	down	ACSM3	−1.52866	down	DIRAS3	−1.123875	down
GHR	−2.12511	down	SLC27A5	−1.33347	down	STEAP4	−1.061309	down
ADGRG7	−1.85853	down	LIFR	−1.47372	down	CYP4A22	−1.074568	down
MARCO	−2.25079	down	HABP2	−1.06311	down	TFPI2	−1.00071	down
MT1F	−2.59948	down	GRAMD1C	−1.07675	down	MT1A	−1.093671	down
CYP39A1	−1.86139	down	TKFC	−1.07859	down	RAB25	−1.081375	down
OIT3	−2.4803	down	STEAP3	−1.09586	down	RDH5	−1.006888	down
MBL2	−1.62953	down	IL1RAP	−1.21549	down	EPCAM	−1.336797	down
VIPR1	−1.89347	down	GCDH	−1.02343	down	SPINK1	3.633978	up
TDO2	−1.44452	down	HAL	−1.262	down	GPC3	2.807155	up
BHMT	−1.68706	down	GABARAPL1	−1.07919	down	AKR1B10	2.588879	up
PCK1	−1.85362	down	ID1	−1.32236	down	ASPM	1.804629	up
MT1H	−2.20509	down	INMT	−1.65209	down	CAP2	2.086341	up
AFM	−1.90272	down	SKAP1	−1.06342	down	TOP2A	2.232845	up
HGFAC	−2.18902	down	FETUB	−1.31249	down	PRC1	1.923672	up
MT1G	−2.64319	down	CFHR4	−1.07478	down	CDKN3	1.778794	up
CYP2A6	−2.05548	down	HSD11B1	−1.27605	down	CDC20	1.910919	up
CETP	−1.77384	down	G6PC	−1.00804	down	PTTG1	1.451774	up
SMIM24	−1.81333	down	MFAP4	−1.53268	down	NCAPG	1.551838	up
FCN2	−1.90705	down	ABCA8	−1.10284	down	LCN2	1.551605	up
FOSB	−2.12211	down	CYP2J2	−1.03103	down	CCL20	1.667526	up
ECM1	−1.72876	down	AKR1D1	−1.77452	down	FAM83D	1.570755	up
MT1X	−2.07498	down	GPD1	−1.01057	down	KIF20A	1.644679	up
SLC10A1	−1.70131	down	HAO1	−1.0889	down	PBK	1.6372	up
CRHBP	−2.55698	down	TACSTD2	−1.09909	down	AURKA	1.321582	up
F9	−1.86997	down	GCGR	−1.51767	down	UBE2T	1.429052	up
SRPX	−1.99247	down	C8orf4	−1.53773	down	NUSAP1	1.447842	up
CYP2C9	−1.7781	down	DMGDH	−1.11277	down	AKR1C3	1.315793	up
GNMT	−1.80416	down	PON3	−1.07722	down	MELK	1.397481	up
CYP2C8	−1.84304	down	MAT1A	−1.15605	down	SRXN1	1.101781	up
PGLYRP2	−1.57039	down	AADAT	−1.45288	down	HMMR	1.429779	up
LECT2	−1.71324	down	HPX	−1.1201	down	COL15A1	1.679907	up
HAO2	−2.05962	down	KCNN2	−1.76035	down	UBD	1.793116	up
FOS	−2.10062	down	ACADL	−1.16219	down	PLVAP	1.303945	up
ANGPTL6	−1.40198	down	SLC13A5	−1.18455	down	HSPB1	1.057592	up
CNDP1	−2.19859	down	ASS1	−1.22714	down	SPP1	1.372928	up
CXCL12	−1.91941	down	PRSS8	−1.15745	down	CENPF	1.339564	up
AGXT2	−1.39193	down	CPED1	−1.24941	down	SQLE	1.28364	up
ACOT12	−1.27878	down	FTCD	−1.25547	down	CEP55	1.130246	up
RSPO3	−1.62341	down	TMEM45A	−1.37559	down	KIF4A	1.431933	up
PZP	−1.76877	down	ALDH6A1	−1.08996	down	TRIP13	1.223148	up
COLEC10	−1.85319	down	SLC27A2	−1.02491	down	S100P	1.428178	up
HOGA1	−1.43807	down	ETFDH	−1.15312	down	DLGAP5	1.462148	up
MT1E	−1.80442	down	GCKR	−1.00475	down	ALDH3A1	1.048498	up
CYP3A4	−2.39818	down	OAT	−1.35234	down	CDCA5	1.222277	up
SLC39A5	−1.47867	down	SFRP5	−1.04433	down	SFN	1.002947	up
KLKB1	−1.57229	down	CYP3A43	−1.2044	down	ESM1	1.15394	up
LCAT	−1.87391	down	SLC6A12	−1.11241	down	TTK	1.378481	up
IGFALS	−1.94508	down	SOCS2	−1.38986	down	TPX2	1.091732	up
GLYAT	−1.72131	down	CYP4F2	−1.0376	down	PAGE4	1.240802	up
ADH1C	−1.64914	down	PHYHD1	−1.0017	down	COL4A1	1.236208	up
PROZ	−1.52487	down	SLC7A2	−1.05182	down	HJURP	1.034534	up
CYP2E1	−2.04247	down	C1RL	−1.01827	down	RACGAP1	1.407851	up
GSTZ1	−1.39923	down	PLG	−1.09969	down	IGF2BP3	1.019851	up
CHST4	−1.72521	down	CPS1	−1.29626	down	ANLN	1.53779	up
MFSD2A	−1.51912	down	ADAMTSL2	−1.24169	down	MCM2	1.109517	up
IDO2	−1.83679	down	MTTP	−1.02368	down	UBE2C	1.0809	up
SDS	−1.75694	down	CXCL2	−1.43349	down	NQO1	1.365462	up
ENO3	−1.37195	down	HRG	−1.00696	down	CCNB2	1.303069	up
GLS2	−1.75439	down	ACSL1	−1.14524	down	CCNA2	1.185444	up
DCN	−1.94676	down	MAN1C1	−1.18965	down	MUC13	1.14796	up
PLAC8	−1.80012	down	PCOLCE	−1.00609	down	MCM6	1.016314	up
SERPINA4	−1.2352	down	MT2A	−1.54319	down	CENPW	1.083208	up
ZG16	−1.56869	down	CD1D	−1.02692	down	TGM3	1.050965	up
BCHE	−1.77407	down	XDH	−1.11927	down	RAD51AP1	1.049223	up
CFP	−1.47416	down	PPP1R1A	−1.10299	down	THY1	1.046852	up
SLC38A4	−1.32606	down	HBB	−1.31952	down	NUF2	1.25884	up
ADH1A	−1.27277	down	RBP5	−1.04885	down	CKAP2L	1.054397	up
CLEC4M	−2.35545	down	CFHR3	−1.10107	down	MAGEA1	1.282995	up
CYP4A11	−1.5036	down	RELN	−1.02856	down	ECT2	1.065576	up
GYS2	−1.66608	down	NPY1R	−1.34248	down	ACSL4	1.16679	up
PHGDH	−1.40019	down	CLDN10	−1.34641	down	MDK	1.076885	up
BGN	−1.2236	down	ATF5	−1.11652	down	PEG10	1.104051	up
CIDEB	−1.27052	down	GNE	−1.04957	down	COX7B2	1.333566	up
CYP2C19	−1.55814	down	CYP4V2	−1.05634	down	CCNB1	1.362239	up
IYD	−1.22582	down	CD5L	−1.49237	down	RRM2	1.542665	up
C8A	−1.49471	down	TIMD4	−1.24178	down	REG3A	1.140254	up
STAB2	−1.82665	down	EGR1	−1.41173	down	CDK1	1.236442	up
CDA	−1.14527	down	GADD45B	−1.21416	down	KIF14	1.054151	up
HPGD	−1.37821	down	GPT2	−1.15763	down	ZIC2	1.320155	up
OLFML3	−1.38115	down	ACMSD	−1.02364	down	BUB1B	1.118801	up
PTH1R	−1.35746	down	CCL19	−1.32425	down	NDC80	1.234218	up
EPHX2	−1.29488	down	RBP1	−1.15142	down	NEK2	1.144213	up
COLEC11	−1.34767	down	ACADS	−1.05741	down	RBM24	1.220962	up
CYP2C18	−1.21134	down	MYOM2	−1.03989	down	NMRAL1P1	1.314053	up
AMDHD1	−1.14346	down	DCXR	−1.01852	down	DTL	1.283296	up
LYVE1	−1.69466	down	PLGLB1	−1.07364	down	SULT1C2	1.181554	up
GSPT2	−1.16851	down	CYP2B6	−1.37318	down	ROBO1	1.247873	up
C8B	−1.16715	down	UROC1	−1.06129	down	SSX1	1.001365	up
ADH1B	−1.77846	down	PDK4	−1.08546	down	FLVCR1	1.006476	up
DPT	−1.68413	down	PPARGC1A	−1.08395	down	CTHRC1	1.120384	up
AZGP1	−1.23501	down	NDRG2	−1.01145	down	ZWINT	1.066653	up
ALDH8A1	−1.37768	down	IGF1	−1.14785	down	GINS1	1.03249	up
RND3	−1.62821	down	ASPDH	−1.15589	down	SMPX	1.089408	up
SLC19A3	−1.18742	down	DBH	−1.50296	down	GPR158	1.061576	up
WDR72	−1.27875	down	PRG4	−1.13337	down

FC, fold change.

Category	ID	Term	−log₁₀(FDR)	Count
BP	GO:0055114	Oxidation−reduction process	16.45646128	56
BP	GO:0019373	Epoxygenase P450 pathway	12.72414085	13
BP	GO:0006805	Xenobiotic metabolic process	6.801196269	16
BP	GO:0017144	Drug metabolic process	6.713310124	11
BP	GO:0045926	Negative regulation of growth	5.354060264	9
BP	GO:0071276	Cellular response to cadmium ion	4.258416753	8
BP	GO:0042738	Exogenous drug catabolic process	3.873727759	7
BP	GO:0071294	Cellular response to zinc ion	3.86110044	8
BP	GO:0008202	Steroid metabolic process	3.349012692	10
BP	GO:0097267	Omega−hydroxylase P450 pathway	3.048831706	6
BP	GO:0016098	Monoterpenoid metabolic process	2.284734835	5
BP	GO:0007067	Mitotic nuclear division	1.901221899	19
BP	GO:0006569	Tryptophan catabolic process	1.382839511	5
CC	GO:0031090	Organelle membrane	12.13504583	21
CC	GO:0070062	Extracellular exosome	10.96203625	117
CC	GO:0005576	Extracellular region	8.944226201	78
CC	GO:0005615	Extracellular space	8.079401711	68
CC	GO:0072562	Blood microparticle	3.941029653	17
CC	GO:0005579	Membrane attack complex	2.131478756	5
MF	GO:0016705	Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen	12.77849851	19
MF	GO:0020037	Heme binding	11.82105086	25
MF	GO:0004497	Monooxygenase activity	11.5463498	18
MF	GO:0005506	Iron ion binding	10.69763162	25
MF	GO:0008392	Arachidonic acid epoxygenase activity	10.22404973	11
MF	GO:0019825	Oxygen binding	9.168975245	15
MF	GO:0016491	Oxidoreductase activity	5.664542324	22
MF	GO:0008395	Steroid hydroxylase activity	5.613513145	10
MF	GO:0070330	Aromatase activity	2.805232257	8
MF	GO:0004024	Alcohol dehydrogenase activity, zinc−dependent	2.38141982	5
MF	GO:0016712	Oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, reduced flavin or flavoprotein as one donor, and incorporation of one atom of oxygen	1.824019096	6
MF	GO:0004745	Retinol dehydrogenase activity	1.391280368	6

FDR, false discovery rate.

105 in total

1. Tpx2 controls spindle integrity, genome stability, and tumor development.

Authors: Cristina Aguirre-Portolés; Alexander W Bird; Anthony Hyman; Marta Cañamero; Ignacio Pérez de Castro; Marcos Malumbres
Journal: Cancer Res Date: 2012-01-20 Impact factor: 12.701

2. Identifying hepatocellular carcinoma-related hub genes by bioinformatics analysis and CYP2C8 is a potential prognostic biomarker.

Authors: Chuanfei Li; Di Zhou; Xiaoling Jiang; Minhui Liu; Hui Tang; Zhechuan Mei
Journal: Gene Date: 2019-02-27 Impact factor: 3.688

3. Association of interleukin-28B genotype and hepatocellular carcinoma recurrence in patients with chronic hepatitis C.

Authors: Yuji Hodo; Masao Honda; Akihiro Tanaka; Yoshimoto Nomura; Kuniaki Arai; Taro Yamashita; Yoshio Sakai; Tatsuya Yamashita; Eishiro Mizukoshi; Akito Sakai; Motoko Sasaki; Yasuni Nakanuma; Mitsuhiko Moriyama; Shuichi Kaneko
Journal: Clin Cancer Res Date: 2013-02-20 Impact factor: 12.531

4. Targeting TPX2 suppresses proliferation and promotes apoptosis via repression of the PI3k/AKT/P21 signaling pathway and activation of p53 pathway in breast cancer.

Authors: Miaomiao Chen; Hongqin Zhang; Guihong Zhang; Ailing Zhong; Qian Ma; Jinyan Kai; Yin Tong; Suhong Xie; Yanchun Wang; Hui Zheng; Lin Guo; Renquan Lu
Journal: Biochem Biophys Res Commun Date: 2018-11-16 Impact factor: 3.575

5. Identification of cyclin B1 and Sec62 as biomarkers for recurrence in patients with HBV-related hepatocellular carcinoma after surgical resection.

Authors: Li Weng; Juan Du; Qinghui Zhou; Binbin Cheng; Jun Li; Denghai Zhang; Changquan Ling
Journal: Mol Cancer Date: 2012-06-08 Impact factor: 27.401

6. Genomic predictors for recurrence patterns of hepatocellular carcinoma: model derivation and validation.

Authors: Ji Hoon Kim; Bo Hwa Sohn; Hyun-Sung Lee; Sang-Bae Kim; Jeong Eun Yoo; Yun-Yong Park; Woojin Jeong; Sung Sook Lee; Eun Sung Park; Ahmed Kaseb; Baek Hui Kim; Wan Bae Kim; Jong Eun Yeon; Kwan Soo Byun; In-Sun Chu; Sung Soo Kim; Xin Wei Wang; Snorri S Thorgeirsson; John M Luk; Koo Jeong Kang; Jeonghoon Heo; Young Nyun Park; Ju-Seog Lee
Journal: PLoS Med Date: 2014-12-23 Impact factor: 11.069

7. MiR-93-5p Promotes Cell Proliferation through Down-Regulating PPARGC1A in Hepatocellular Carcinoma Cells by Bioinformatics Analysis and Experimental Verification.

Authors: Xinrui Wang; Zhijun Liao; Zhimin Bai; Yan He; Juan Duan; Leyi Wei
Journal: Genes (Basel) Date: 2018-01-22 Impact factor: 4.096

8. Aurora Kinase A is a Biomarker for Bladder Cancer Detection and Contributes to its Aggressive Behavior.

Authors: Aaron Mobley; Shizhen Zhang; Jolanta Bondaruk; Yan Wang; Tadeusz Majewski; Nancy P Caraway; Li Huang; Einav Shoshan; Guermarie Velazquez-Torres; Giovanni Nitti; Sangkyou Lee; June Goo Lee; Enrique Fuentes-Mattei; Daniel Willis; Li Zhang; Charles C Guo; Hui Yao; Keith Baggerly; Yair Lotan; Seth P Lerner; Colin Dinney; David McConkey; Menashe Bar-Eli; Bogdan Czerniak
Journal: Sci Rep Date: 2017-01-19 Impact factor: 4.379

9. KIF20A Affects the Prognosis of Bladder Cancer by Promoting the Proliferation and Metastasis of Bladder Cancer Cells.

Authors: Tianyu Shen; Long Yang; Zheng Zhang; Jianpeng Yu; Liang Dai; Minjun Gao; Zhiqun Shang; Yuanjie Niu
Journal: Dis Markers Date: 2019-04-09 Impact factor: 3.434

Review 10. Epigenetics of hepatocellular carcinoma.

Authors: Tan Boon Toh; Jhin Jieh Lim; Edward Kai-Hua Chow
Journal: Clin Transl Med Date: 2019-05-06

4 in total

1. A prognostic model for hepatocellular carcinoma based on apoptosis-related genes.

Authors: Renjie Liu; Guifu Wang; Chi Zhang; Dousheng Bai
Journal: World J Surg Oncol Date: 2021-03-12 Impact factor: 2.754

2. Comprehensive Analysis of Gene Expression Changes and Validation in Hepatocellular Carcinoma.

Authors: Hao Zhang; Renzheng Liu; Lin Sun; Weidong Guo; Xiaoyue Ji; Xiao Hu
Journal: Onco Targets Ther Date: 2021-02-15 Impact factor: 4.147

3. High expression of PDZ-binding kinase is correlated with poor prognosis and immune infiltrates in hepatocellular carcinoma.

Authors: Wei Mu; Yaoli Xie; Jinhu Li; Runzhi Yan; Jingxian Zhang; Yu'e Liu; Yimin Fan
Journal: World J Surg Oncol Date: 2022-01-22 Impact factor: 2.754

4. Bioinformatics Analysis of Candidate Genes and Pathways Related to Hepatocellular Carcinoma in China: A Study Based on Public Databases.

Authors: Peng Zhang; Jing Feng; Xue Wu; Weike Chu; Yilian Zhang; Ping Li
Journal: Pathol Oncol Res Date: 2021-03-26 Impact factor: 3.201

4 in total