Literature DB >> 32476992

Bioinformatics Analysis based on Multiple Databases Identifies Hub Genes Associated with Hepatocellular Carcinoma.

Lu Zeng^1,2, Xiude Fan¹, Xiaoyun Wang¹, Huan Deng¹, Kun Zhang¹, Xiaoge Zhang¹, Shan He^1,2, Na Li¹, Qunying Han¹, Zhengwen Liu¹.

Abstract

BACKGROUND: Hepatocellular carcinoma (HCC) is the most common liver cancer and the mechanisms of hepatocarcinogenesis remain elusive.
OBJECTIVE: This study aims to mine hub genes associated with HCC using multiple databases.
METHODS: Data sets GSE45267, GSE60502, GSE74656 were downloaded from GEO database. Differentially expressed genes (DEGs) between HCC and control in each set were identified by limma software. The GO term and KEGG pathway enrichment of the DEGs aggregated in the datasets (aggregated DEGs) were analyzed using DAVID and KOBAS 3.0 databases. Protein-protein interaction (PPI) network of the aggregated DEGs was constructed using STRING database. GSEA software was used to verify the biological process. Association between hub genes and HCC prognosis was analyzed using patients' information from TCGA database by survminer R package.
RESULTS: From GSE45267, GSE60502 and GSE74656, 7583, 2349, and 553 DEGs were identified respectively. A total of 221 aggregated DEGs, which were mainly enriched in 109 GO terms and 29 KEGG pathways, were identified. Cell cycle phase, mitotic cell cycle, cell division, nuclear division and mitosis were the most significant GO terms. Metabolic pathways, cell cycle, chemical carcinogenesis, retinol metabolism and fatty acid degradation were the main KEGG pathways. Nine hub genes (TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2 and TTK) were selected by PPI network and all of them were associated with prognosis of HCC patients.
CONCLUSION: TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2 and TTK were hub genes in HCC, which may be potential biomarkers of HCC and targets of HCC therapy.

Entities: Chemical

Keywords: Hepatocellular carcinoma; bioinformatics; database; differentially expressed gene; hub gene; mRNA

Year: 2019 PMID： 32476992 PMCID： PMC7235396 DOI： 10.2174/1389202920666191011092410

Source DB: PubMed Journal: Curr Genomics ISSN： 1389-2029 Impact factor: 2.236

INTRODUCTION

Hepatocellular carcinoma (HCC), the most common type of primary liver cancer, is one of the most common malignant tumors of digestive system [1]. There were many causes of HCC, including viral infection [2], alcohol intake [3], obesity [4], and environmental pollution [5]. Though many efforts have been done, the molecular mechanisms of hepatocarcinogenesis remain elusive due to the complexity. With the advancement of precise cancer treatment, the research of HCC has turned into gene mutation and genome-wide studies. A large number of genes have been reported to play roles in the carcinogenesis of HCC, which may be useful for the development of effective prevention and treatment regimens for HCC. However, the identification of hub genes and potential underlying mechanisms of their associations with HCC from the information of various studies remain a challenge. Bioinformatics, a new subject of genetic data collection, analysis and dissemination to the research community, combines medicine, statistics and mathematics and refers to database-like activities, involving persistent sets of data that are maintained in a consistent state over essentially indefinite periods of time. It has been rapidly applied in the field of oncogene. Resources of the public databases can be fully used to screen useful information from a large number of data by bioinformatics, improving the efficiency and accuracy of research [6]. The Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO) and Oncomine are the commonly used databases for cancer research. In this study, we used integrated bioinformatics to explore genes, especially the hub genes, related to the pathogenesis of HCC and the underlying mechanisms of the associations of the hub genes with HCC. The identification of hub genes and the potential mechanisms such as signal pathways associated with HCC may shed light on the understanding of HCC pathogenesis and thus provide new information for the precise management of HCC.

METHODS

Gene mRNA Expression Data

GEO (Gene Expression Omnibus) database [7] (https://www.ncbi.nlm.nih.gov/geo/) is currently the largest and most comprehensive public database containing gene mRNA expression data resources. We obtained the datasets in recent 5 years (from 2014 to 2018) from the GEO database by setting the keyword as “hepatocellular carcinoma”, organism as “Homo sapiens”, and study type as “expression profiling by array”. Clinical samples should include HCC and non-HCC liver tissues. Finally, the data sets GSE45267 [8], GSE60502 [9], and GSE74656 (Yin HY, unpublished data, 2015), which provided data of both HCC and non-HCC samples, were selected. There were 46 HCC tissue samples and 41 non-HCC tissue samples in GSE45267, and the gene detection platform was GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array. A total of 18 HCC tissue samples and 18 non-HCC tissue samples were selected from GSE60502, and the platform was GPL96 [HG-U133A] Affymetrix Human Genome U133A Array. Five HCC tissue samples and 5 non-HCC tissue samples were chosen from GSE74656, and the platform was GPL16043 GeneChip® PrimeView™ Human Gene Expression Array (with External spike-in RNAs). HCC tissue samples were regarded as HCC group, and non-HCC tissue samples were considered as control group. We downloaded the raw data from website, and used R language to process the data. Subsequently, the differentially expressed genes (DEGs) in the HCC group compared with control group from the 3 datasets were identified using the “limma software” package [10]. Adjusted p value (adj_pval) < 0.01 and logFC (log fold change) > 0 were considered to be significant. The genes with logFC > 0 were thought to be up-regulated genes, and those with logFC < 0 were regarded as down-regulated genes. Finally, we obtained the DEGs of each dataset. The DEGs aggregated in the three datasets (aggregated DEGs) were identified by the “Robust Rank Aggregation (RRA)” package [11].

Functional Enrichment Analysis

The aggregated DEGs were further applied for functional and enrichment analysis. GO (Gene Ontology) terms were analyzed by the online tools DAVID 6.7 [12] (https://david-d.ncifcrf.gov/), and the terms with adj_pval < 0.01 were chosen. KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways of the aggregated DEGs were analyzed on KOBAS 3.0 website [13] (http://kobas.cbi.pku.edu.cn/), and pathways with p < 0.01 (unadjusted p value) were selected.

Protein-protein Interaction (PPI) Network

The protein-protein interaction (PPI) network of the aggregated DEGs was constructed using STRING (version 10.5) [14] (https://string-db.org/) and visualized using Cytoscape 3.6.1 software. The nodes with degree ≥ 1 were reserved in the PPI network. Genes of their proteins with degree ≥ 50 were considered as hub genes. Proteins encoded by these hub genes may have high interaction with other proteins, and are at central positions in PPI network.

Gene Set Enrichment Analysis (GSEA)

GSEA V2.2.4.jar software [15] was used to perform GSEA analysis. The HCC and control samples from GSE45267 were chosen. The Hallmarks in MSigDB (Molecular Signatures Database) were selected as reference gene set. The number of permutations was 1,000. The gene sets with normalized enrichment score (NES) of > 1, normal p value (NOM p-val) < 0.05 and false discovery rate value (FDR) of < 0.25 were regarded as significant enrichment gene sets.

Overall Survival Analysis

We downloaded the data of HCC patients’ clinical information from TCGA (The Cancer Genome Atlas) database [16] (https://cancergenome.nih.gov/). These data included the diagnosis time, time of death, and gene mRNA expression levels of the patients. We processed these data using perl language for further analysis. The survival time and survival status of all HCC patients and the mRNA expression data of each hub gene were obtained using the “hash” package [17]. The association between each hub gene and prognosis of HCC patients was analyzed using “survminer” package [18] and p < 0.05 (unadjusted p value) was considered significant.

RESULTS

Screening of DEGs Aggregated in the Datasets

Datasets GSE45267, GSE60502, and GSE74656 which were submitted within 5 years (from 2014 to 2018) and have data of both HCC and non-HCC samples were chosen for study. The boxplots of the three datasets were shown in Supplementary Fig. (), which confirmed that the expression values in the three datasets had small deviations in distribution. The analysis of the three datasets and comparison of HCC group with control group identified 7,583 (4,020 up-regulated and 3,563 down-regulated), 2,349 (1,247 up-regulated and 1,102 down-regulated) and 553 (297 up-regulated and 256 down-regulated) DEGs from GSE45267, GSE60502 and GSE74656, respectively. Distributions of the DEGs in the three data sets were shown in Fig. , respectively. The top 20 up-regulated and down-regulated DEGs of GSE45267, GSE60502 and GSE74656 were shown in Supplementary Tables , respectively. A total of 221 aggregated DEGs were obtained by further analysis of the DEGs. The top 20 up- and down-regulated aggregated DEGs were displayed in Table . In addition, we showed the data by a heatmap (Fig. ). The trend of the mRNA expression of the aggregated DEGs screened by RRA method was consistent in all the three datasets, indicating that the aggregated DEGs we selected were representative.

GO Analysis

We performed GO analysis using the aggregated DEGs. A total of 109 significant GO terms were obtained (adj_pval < 0.01). GO terms contained BP (Biological Process), MF (Molecular Function), and CC (Cellular Component) categories. The number of target genes enriched in the top 5 of the three GO categories were shown in Fig. (. Detailed information was shown in Table . We used circle diagram to demonstrate the gene distribution of the main enriched GO terms (Fig. ). Cell cycle phase (adj_pval = 5.57E-14), mitotic cell cycle (adj_pval = 1.45E-13), cell division (adj_pval = 2.62E-13), nuclear division (adj_pval = 2.76E-13), and mitosis (adj_pval = 2.76E-13) were the top 5 BP terms. Electron carrier activity (adj_pval = 1.35E-11), oxygen binding (adj_pval = 1.78E-10), heme binding (adj_pval = 7.11E-08), cadmium ion binding (adj_pval = 9.47E-08), and tetrapyrrole binding (adj_pval = 1.45E-07) were the top 5 MF terms. Spindle (adj_pval = 1.40E-11), microsome (adj_pval = 2.37E-08), vesicular fraction (adj_pval = 3.64E-08), condensed chromosome kinetochore (adj_pval = 7.84E-08), and condensed chromosome, centromeric region (adj_pval = 2.48E-07) were the top 5 CC terms. All these GO terms were enriched with many genes of the DEGs we selected. Cell cycle phase, mitotic cell cycle and cell division were the most significant enriched GO terms of the DEGs selected.

Fig. (3)

The number of genes enriched in the top five biological process (BP), molecular function (MF) and cellular component (CC) terms (A) and the circle diagram of the top five GO enriched biological process (BP), molecular function (MF) and cellular component (CC) terms (B).

Enriched KEGG Pathways

We converted gene symbol of all the aggregated DEGs to ensemble ID by DAVID database, and then analyzed the KEGG pathways by KOBAS 3.0. Finally, we obtained 29 significant enriched KEGG pathways (p < 0.01, Table ). Metabolic pathways (hsa01100, p = 8.71E-12), cell cycle (hsa04110, p = 1.48E-10), chemical carcinogenesis (hsa05204, p = 5.45E-09), retinol metabolism (hsa00830, p = 1.18E-08), and fatty acid degradation (hsa00071, p = 1.36E-08) were the most significant enrichment pathways. With visualization by Cytoscape software, we demonstrated the relationship between the enriched KEGG pathways and the aggregated DEGs (Fig. ). We found that the down-regulated DEGs enriched in the most KEGG pathways, and mainly focused on metabolic pathways (hsa01100), chemical carcinogenesis (hsa05204), metabolism of xenobiotics by cytochrome P450 (hsa00980), retinol metabolism (hsa00830), and drug metabolism - cytochrome P450 (hsa00982). The up-regulated DEGs mainly concentrated on cell cycle (hsa04110), oocyte meiosis (hsa04114), p53 signaling pathway (hsa04115), and progesterone-mediated oocyte maturation (hsa04914). CYP3A4, CYP1A2, ADH1C, ADH6, and CYP2E1 were the genes participated in the most KEGG pathways and they were all down-regulated genes in HCC.

Protein-protein Interaction Network Construction

STRING website was used to establish the PPI network of all the aggregated DEGs. A total of 161 proteins had been identified to interact with other proteins. Of the 161 proteins, 87 were up-regulated and 74 were down-regulated. We observed that there existed close connection between the up-regulated proteins (Fig. ). We considered the genes of the protein with degree ≥ 50 as hub genes. TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2, and TTK were selected as hub genes with degrees of 65, 54, 53, 52, 51, 50, 50, 50, and 50, respectively. They were all up-regulated genes.

GSEA Analysis

GSE45267 was chosen as a validation dataset to verify the biological process associated with HCC. By comparing HCC with control, we found that the mainly enriched gene sets associated with HCC were mitotic spindle (NOM p-val = 0.002, FDR = 0.214, Fig. ), G2M checkpoint (NOM p-val = 0.004, FDR = 0.125, Fig. ), E2F targets (NOM p-val = 0.004, FDR = 0.070, Fig. ), spermatogenesis (NOM p-val = 0.020, FDR = 0.097, Fig. ) and DNA repair (NOM p-val = 0.032, FDR = 0.151, Fig. ). The top ten core enriched genes in each enriched gene set were presented in Table . A total of 373 HCC patients’ clinical information data were downloaded from TCGA database. The high expression group (n = 187) and low expression group (n = 186) of each hub gene were obtained based on the median mRNA expression value of each hub gene in all HCC samples. The median mRNA expression values of the hub genes were 1290.13 (TOP2A), 286.75 (NDC80), 529.68 (CDK1), 739.87 (CCNB1), 285.39 (KIF11), 295.75 (BUB1), 341.56 (CCNB2), 370.13 (CCNA2) and 154.84 (TTK), respectively. Those with the values of mRNA expression greater than or equal to the median value were included in high expression group, and those with the values of mRNA expression less than the median value were included in low expression group. Subsequently, the overall survival analyses of HCC patients with different mRNA expression levels of the hub genes we selected were performed. The results were shown in Fig. (. The p values of the survival curves of the nine hub genes were TOP2A, 0.0042 (Fig. ), NDC80, 0.0012 (Fig. ), CDK1, 0.0012 (Fig. ), CCNB1, 0.00024 (Fig. ), KIF11, 0.00044, (Fig. ), BUB1, 0.00015 (Fig. ), CCNB2, 0.049 (Fig. ), CCNA2, 0.0065 (Fig. ) and TTK, 0.0016 (Fig. ), respectively. Among these genes, BUB1, CCNB1 and KIF11 were most significant.

Fig. (6)

Associations of the mRNA expression levels of hub genes with the overall survival of HCC patients. (A) TOP2A; (B) NDC80; (C) CDK1; (D) CCNB1; (E) KIF11; (F) BUB1; (G) CCNB2; (H) CCNA2; (I) TTK. The area around each survival curve represents the confidence interval of the survival curve.

DISCUSSION

The alteration of gene mRNA expression and gene mutation play important roles in the occurrence and progression of HCC. Clarification the changes of genes in HCC and the underlying mechanisms of action will provide a basis for the biomarkers of HCC diagnosis and prognosis and the molecular targets of anti-tumor therapy of HCC. Gene chip detection provides a new method for screening specific genes of HCC, and for investigating the pathogenesis and treatment of HCC. This study screened genes with significant differences in mRNA expression levels between HCC and normal tissues by examining three chip datasets of GEO database, and clarified their specific biological mechanisms in HCC by analyzing multiple datasets of HCC. In this study, we first got the DEGs of each dataset we selected, and then, to avoid the differences of each dataset in measurement platforms and laboratory conditions, we used the “RRA” package to get the DEGs aggregated in the datasets (aggregated DEGs). After obtaining the aggregated DEGs, we analyzed the biological processes which these genes participate in. The results showed that there were 109 GO terms significantly associated with these DEGs. The top five BP terms with the most significant statistical difference included cell cycle phase, mitotic cell cycle, cell division, nuclear division and mitosis. The top five MF terms with the most significant statistical difference consisted of electron carrier activity, oxygen binding, heme binding, cadmium ion binding and tetrapyrrole binding. The top five CC terms with the most significant statistical difference included spindle, microsome, vesicular fraction, condensed chromosome kinetochore and condensed chromosome, centromeric region. We observed that all these GO terms play important roles in maintaining the normal growth and metabolism of the organism. Through the KOBAS website, we got 29 KEGG pathways with statistically significant differences. The 10 pathways with the most obvious statistical differences included metabolic pathways, cell cycle, chemical carcinogenesis, retinol metabolism, fatty acid degradation, metabolism of xenobiotics by cytochrome P450, drug metabolism - cytochrome P450, tyrosine metabolism, p53 signaling pathway and mineral absorption. Through STRING database, we constructed the PPI network of the aggregated DEGs, and found that TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2 and TTK were the hub genes of the network. All the hub genes were up-regulated genes. To further verify that the aggregated DEGs were involved in important biological processes in HCC pathogenesis, we chose GSE45267 as a validation dataset to conduct GSEA analysis. The results showed that mitotic spindle, G2M checkpoint, E2F targets, spermatogenesis, and DNA repair were the most significantly enriched gene sets potentially associated with HCC. These findings, together with the results of GO and KEGG analysis, confirmed that the development of HCC implicated important biological processes. The core genes of each enrichment gene set in GSEA were obtained. Some important core genes, such as TTK, NDC80, TOP2A, KIF11, and CDK1, also belong to the hub genes selected by PPI network, further confirming the key roles of the hub genes we selected from the three datasets. Moreover, survival analyses showed that the expressions of all the hub genes (TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2, and TTK) were associated with the prognosis of HCC patients. TOP2A (DNA Topoisomerase II Alpha) plays a role in DNA transcription. The mRNA expression of TOP2A was abnormal in many tumors, mainly breast cancer [19] and malignant peripheral nerve sheath tumors [20]. TOP2A mRNA expression was shown to be distinct between HCC tumors and adjacent non-tumoral liver, and microarray analysis of 172 cases of HCC tissues found that the increased mRNA expression of TOP2A was related to advanced histological grading, microvascular invasion, an early age onset of the malignancy and chemoresistance [21]. In this study, we demonstrated that TOP2A had the highest degree in the PPI network although TOP2A was not enriched in the main GO terms and KEGG pathways. NDC80 (Nuclear division cycle 80), also called Hec1, is a newly discovered gene that plays a role in tumorigenesis. Studies have confirmed its association with HCC progression, mainly by reducing apoptosis and cell cycle arrest at S-phase [22, 23]. Our analysis also showed that NDC80 was highly expressed in HCC, ranked second in all hub genes, and participated in many GO terms. CDK1 (Cyclin dependent kinase 1) was the third hub gene in our study, and was also enriched in multiple GO terms and KEGG pathways. Previous study has demonstrated that cell cycle was the main biological pathway of CDK1 participation [24]. CDK1 was found to play a particularly important role in KRAS mutant tumours [25], and be an important partner in the regulation of apoptin-induced apoptosis [26]. In the development of HCC, CDK1 was also found to be closely related to miR-378 [27] and miR-582-5p [28]. CCNB1 (Cyclin B1), CCNB2 (Cyclin B2), and CCNA2 (Cyclin A2) were also hub genes in the PPI network and their degrees were 52, 50, and 50, respectively. They are all members of the cyclin family. Cyclin expression and degradation play an important role in cell mitosis [29], mainly by regulating CDK kinase [30]. All the three genes, CCNB1 [31], CCNB2 [32] and CCNA2 [33], have been shown to play roles in HCC. The mechanism of action of cyclin family in HCC deserves further exploration. KIF11 (Kinesin family member 11) has also been shown to play a role in the pathogenesis of cancer [34], but few studies have been done on HCC. BUB1 [35] and TTK [36] were proven to be elevated in HCC tissues, and may promote the progression and metastasis of HCC, but relevant studies are still scarce. Overall survival analysis of the hub genes in our study confirmed that their expressions were all associated with the prognosis of HCC patients, and the most significant of them were the expressions of BUB1, CCNB1 and KIF11. All of these results provide clues and directions for the further study of these genes in HCC. The present study chose GSE45267, GSE60502, and GSE74656 datasets for performing analyses. The numbers of samples between the three datasets were different. However, the boxplots of the three datasets indicated high quality of the data. In the screening of DEGs in each dataset, adj_p val < 0.01 and logFC > 0 were considered to be significant. The value of logFC was set at a low level so that genes with mRNA expression of small differences could also be included in our study. This may avoid the overlook of some genes with important biological role but with no significant difference in expression. “RRA” package was used to identify the genes which had different mRNA expression between HCC tissues and normal liver tissues in the three datasets. The “aggregated DEGs” were proposed, because the DEGs identified by “RRA” method were different from previous “overlapping DEGs” and “aggregated DEGs” can be a better indication of the meaning of the DEGs we screened. “RRA” is a method of using probability model to aggregate sorting list [11, 37], which can make a proper aggregation of gene sets from different microarray platforms, and the results we get finally represent the ranking after aggregating analysis but not the ranking in each dataset. In this study, a total of 221 aggregated DEGs were obtained. Among the top 20 up- and down-regulated aggregated DEGs (Table ), 5 hub genes (CDK1, CCNB1, KIF11, BUB1 and CCNB2) obtained from STRING did not list at the top of the aggregated DEGs ranking. The hub genes encode key proteins which have stronger interaction with other proteins in our queried gene list but may not necessarily exhibit significant difference in mRNA expression between tumor and normal tissues. In GO term and KEGG pathway analyses, TOP2A which was the top 1 hub gene with the highest interaction score in our study did not enrich in the most significant GO terms and KEGG pathways, and the other hub genes also enriched in a few KEGG pathways, but the GSEA analyses confirmed the involvement of TOP2A and other hub genes in important pathways. This inconsistency may be caused by the different principles between the GO/KEGG enrichment analysis and GSEA analysis. GO/KEGG enrichment analysis needs to set a threshold before analyzing, and only focuses on the genes with obvious expression differences. It is possible to omit some genes that have important biological functions but are not significantly different in expression. Regulatory networks between genes may be overlooked. However, GSEA analysis concerns about gene set, emphasizes identifying synergistic gene sets from all target genes, and can include the genes that were left out by GO/KEGG enrichment analysis. In the present study, p < 0.01 was set as the threshold of KEGG enrichment analysis, and this could lead to the overlook of some minimally significant biological pathways that the hub genes involved in. This might also be the reason why some hub genes we selected, such as TOP2A, did not enrich in the most significant KEGG pathways, but enriched in the results of GSEA analyses. From the findings of this study, we speculated that there could be important regulatory networks between TOP2A and other proteins which have potentially important roles in HCC. Actually, our GSEA analysis showed that TOP2A was among the mainly enriched gene sets associated with HCC including mitotic spindle, G2M checkpoint and E2F targets. Survival analysis showed that the expression levels of TOP2A were associated with the overall survival of HCC patients. Previous studies have also shown the overexpression of TOP2A in HCC [21, 38]. Therefore, TOP2A may play an important role in HCC by interaction with other proteins although further study will be needed to confirm this hypothesis. In this study, we chose the most recent datasets for analysis. Boxplots of the datasets suggested high quality of the data. Data analysis was performed using the RRA method, which is considered to be highly appropriate for analyzing datasets from multiple databases [11, 37]. We also validated the findings using a validation dataset. All of these may reflex the reliability and comprehensiveness of the hub genes identified. Among the hub genes identified in the present study, TOP2A [39-45], CDK1 [39, 42, 43, 45-48], CCNB1 [41, 43, 45, 46], BUB1 [39-41, 46, 49], CCNB2 [40, 41, 43, 44], CCNA2 [39, 43], and TTK [45] have been reported to be hub genes that participate in HCC in previous studies. The present study confirmed the core roles of these genes in HCC. However, NDC80 and KIF11 have not been described as hub genes involved in HCC in previous studies. Therefore, the findings of the present study enriched the information of hub genes associated with HCC. The present study was based on bioinformatics methods, some studies have confirmed the roles of certain hub genes mined in HCC but the specific mechanisms of the involvement of these hub genes in HCC remain mostly unexplored. Therefore, laboratory and clinical studies will be required to verify the reliability of our results and find out the genes most closely related to the pathogenesis of HCC in order to provide new and precise information for the prevention and control of HCC.

CONCLUSION

Our study found some DEGs and explored the main GO terms and KEGG pathways involved in HCC. We selected nine hub genes (TOP2A, NDC80, CDK1, CCNB1, KIF11, BUB1, CCNB2, CCNA2, and TTK) and confirmed that their mRNA expressions were all related to the prognosis of HCC. These findings may shed light on future investigations pertaining to the diagnosis and therapy of HCC.

Table 1

The top 20 up-regulated and down-regulated DEGs aggregated in GSE45267, GSE60502 and GSE74656.

Up-regulated Genes	Score	Down-regulated Genes	Score
NDC80	3.81E-08	FCN3	1.16E-08
SPINK1	4.89E-07	HAMP	1.47E-07
RACGAP1	7.81E-06	APOF	2.20E-07
ACSL4	7.81E-06	CYP1A2	2.64E-07
GINS1	1.09E-05	SLC22A1	4.30E-07
PLVAP	2.10E-05	CRHBP	9.44E-07
EZH2	4.16E-05	GYS2	9.44E-07
GPC3	5.91E-05	HGFAC	5.23E-06
CEP55	6.25E-05	AKR1D1	6.32E-06
DLGAP5	7.43E-05	CLEC4M	6.32E-06
CENPE	8.95E-05	MT1F	1.05E-05
CDC20	0.00011	CFHR4	1.77E-05
TOP2A	0.000125	LPA	2.09E-05
TTK	0.000141	DNASE1L3	2.65E-05
KNTC1	0.000187	MT1E	2.75E-05
ZWILCH	0.000194	GNMT	3.54E-05
NUSAP1	0.000258	KCNN2	3.66E-05
CTHRC1	0.000258	CFP	3.79E-05
SHCBP1	0.000278	CYP2B6	4.32E-05
NCAPG	0.000281	CYP2E1	4.98E-05

Abbreviation: DEGs, Differentially Expressed Genes.

Table 2

The main GO terms of the DEGs aggregated in GSE45267, GSE60502 and GSE74656.

Category	ID	Term	Count	adj_pval	Genes
BP	GO:0022403	cell cycle phase	31	5.57E-14	PRC1, NEK2, DBF4, KNTC1, TTK, CEP55, KIF2C, SAC3D1, NCAPG, BUB1, ZWILCH, CCNA2, ASPM, CDCA3, CDC7, CDK1, KIF11, DLGAP5,TPX2, NUSAP1, CENPE, NDC80, CDC20, PBK, TACC3, CDK4, CDKN3, CCNB1, CCNB2, CKS2, BUB1B
BP	GO:0000278	mitotic cell cycle	29	1.45E-13	PRC1, NEK2, DBF4, KNTC1, TTK, CEP55, KIF2C, SAC3D1, NCAPG, BUB1, ZWILCH, CCNA2, ASPM, CDCA3, CDC7, CDK1, KIF11, DLGAP5,TPX2, NUSAP1, CENPE, NDC80, CDC20, PBK, CDK4, CDKN3, CCNB1, CCNB2, BUB1B
BP	GO:0051301	cell division	26	2.62E-13	PRC1, NEK2, KNTC1, CEP55, SAC3D1, NCAPG, BUB1, ZWILCH, CCNA2, ASPM, CDCA3, CDC7, CDK1, KIF11, NUSAP1, CENPE, NDC80,CDC20, CDK4, RACGAP1, MCM5, CCNB1, CCNB2, CKS2, BUB1B, CDCA7L
BP	GO:0000280	nuclear division	23	2.76E-13	CDK1, KIF11, NEK2, DLGAP5, TPX2, KNTC1, NUSAP1, NDC80, CENPE, CDC20, PBK, CEP55, CCNB1, KIF2C, CCNB2, SAC3D1, NCAPG,BUB1, BUB1B, ZWILCH, CCNA2, ASPM, CDCA3
BP	GO:0007067	mitosis	23	2.76E-13	CDK1, KIF11, NEK2, DLGAP5, TPX2, KNTC1, NUSAP1, NDC80, CENPE, CDC20, PBK, CEP55, CCNB1, KIF2C, CCNB2, SAC3D1, NCAPG,BUB1, BUB1B, ZWILCH, CCNA2, ASPM, CDCA3
MF	GO:0009055	electron carrier activity	21	1.35E-11	CYP3A4, STEAP3, GCDH, CYP2C19, CYP2B6, ADH6, CYP26A1, KMO, CYP4F12, CYP2E1, CYP1A2, CYP4A11, CYP39A1, HAAO, CYP2A6,AKR7A3, CYP2A7, CYP4F2, RDH16
MF	GO:0019825	oxygen binding	11	1.78E-10	CYP3A4, CYP4A11, CYP2C19, CYP2B6, HAAO, CYP26A1, CYP2A7, CYP4F12, CYP2E1, CYP1A2, CYP4F2
MF	GO:0020037	heme binding	13	7.11E-08	CYP3A4, CYP4A11, CYP39A1, CYP27A1, CYP2C19, CYP2B6, CYP26A1, CYP2A6, CYP2A7, CYP4F12, CYP2E1, CYP1A2, CYP4F2
MF	GO:0046870	cadmium ion binding	6	9.47E-08	MT1M, MT1E, MT1H, MT1G, MT1X, MT1F
MF	GO:0046906	tetrapyrrole binding	13	1.45E-07	CYP3A4, CYP4A11, CYP39A1, CYP27A1, CYP2C19, CYP2B6, CYP26A1, CYP2A6, CYP2A7, CYP4F12, CYP2E1, CYP1A2, CYP4F2
CC	GO:0005819	spindle	18	1.40E-11	CDK1, KIF4A, KIF11, PRC1, NEK2, DLGAP5, KNTC1, TPX2, NUSAP1, TTK, CENPE, CDC20, CBX1, RACGAP1, SAC3D1, BUB1, BUB1B, ASPM
CC	GO:0005792	microsome	18	2.37E-08	CYP3A4, AQP9, CYP2C19, CYP2B6, IGFALS, CYP26A1, CYP4F12, CYP2E1, CYP1A2, CYP4A11, CYP39A1, CYP2A6, SORT1, CYP2A7,SRD5A2, CYP4F2, ACSL4, RDH16
CC	GO:0042598	vesicular fraction	18	3.64E-08	CYP3A4, AQP9, CYP2C19, CYP2B6, IGFALS, CYP26A1, CYP4F12, CYP2E1, CYP1A2, CYP4A11, CYP39A1, CYP2A6, SORT1, CYP2A7,SRD5A2, CYP4F2, ACSL4, RDH16
CC	GO:0000777	condensed chromosome kinetochore	10	7.84E-08	KIF2C, CENPM, HJURP, KNTC1, BUB1, BUB1B, CENPE, NDC80, CENPK, ZWILCH
CC	GO:0000779	condensed chromosome, centromeric region	10	2.48E-07	KIF2C, CENPM, HJURP, KNTC1, BUB1, BUB1B, CENPE, NDC80, CENPK, ZWILCH

Abbreviations: DEGs, Differentially Expressed Genes; BP, Biological Process; MF, Molecular Function; CC, Cellular Component.

Table 3

The KEGG pathways of the DEGs aggregated in GSE45267, GSE60502 and GSE74656.

S. No.	Term	ID	Input Number	p
1	Metabolic pathways	hsa01100	38	8.71E-12
2	Cell cycle	hsa04110	13	1.48E-10
3	Chemical carcinogenesis	hsa05204	10	5.45E-09
4	Retinol metabolism	hsa00830	9	1.18E-08
5	Fatty acid degradation	hsa00071	8	1.36E-08
6	Metabolism of xenobiotics by cytochrome P450	hsa00980	9	2.65E-08
7	Drug metabolism - cytochrome P450	hsa00982	8	2.46E-07
8	Tyrosine metabolism	hsa00350	6	1.13E-06
9	p53 signaling pathway	hsa04115	7	3.51E-06
10	Mineral absorption	hsa04978	6	9.06E-06
11	Progesterone-mediated oocyte maturation	hsa04914	7	2.80E-05
12	Caffeine metabolism	hsa00232	3	3.15E-05
13	Glycolysis / Gluconeogenesis	hsa00010	6	3.43E-05
14	Tryptophan metabolism	hsa00380	5	3.66E-05
15	Oocyte meiosis	hsa04114	7	0.0001
16	Linoleic acid metabolism	hsa00591	4	0.000182
17	Steroid hormone biosynthesis	hsa00140	5	0.000216
18	Arachidonic acid metabolism	hsa00590	5	0.000286
19	Prion diseases	hsa05020	4	0.000313
20	Bile secretion	hsa04976	5	0.000448
21	PPAR signaling pathway	hsa03320	5	0.000505
22	Primary bile acid biosynthesis	hsa00120	3	0.000594
23	Complement and coagulation cascades	hsa04610	5	0.00071
24	Drug metabolism - other enzymes	hsa00983	4	0.000815
25	Melanoma	hsa05218	4	0.004018
26	AMPK signaling pathway	hsa04152	5	0.004832
27	FoxO signaling pathway	hsa04068	5	0.006584
28	HTLV-I infection	hsa05166	7	0.007323
29	Valine, leucine and isoleucine degradation	hsa00280	3	0.008979

Table 4

The HCC enrichment gene sets in GSE45267 by GSEA database.

Enrichment Gene Set	NES	NOM p-val	FDR	Top Ten Core Enriched Genes
MITOTIC_SPINDLE	1.611	0.002	0.214	TTK, NDC80, RACGAP1, KIF4A, PRC1,TOP2A, ANLN, DLGAP5, KIF11, NUSAP1
G2M_CHECKPOINT	1.553	0.004	0.125	TTK, NDC80, CDKN3, RACGAP1, KIF4A,PRC1, TOP2A, PBK, CDC20, HMMR
E2F_TARGETS	1.540	0.004	0.070	GINS1, CDKN3, RACGAP1, KIF4A, TOP2A,BUB1B, CDC20, HMMR, EZH2, DLGAP5
SPERMATOGENESIS	1.550	0.020	0.097	TTK, CDKN3, EZH2, NEK2, AURKA,CDK1, RFC4, CLGN, KIF2C, RPL39L
DNA_REPAIR	1.578	0.032	0.151	ZWINT, RFC4, PRIM1, SAC3D1, FEN1,POLA1, TYMS, RFC5, PCNA, POLR3C

Abbreviations: NES, Enrichment Score; NOM p-val, Normal p value; FDR, False Discovery Rate Value.

46 in total

1. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources.

Authors: Da Wei Huang; Brad T Sherman; Richard A Lempicki
Journal: Nat Protoc Date: 2009 Impact factor: 13.491

2. FOXM1 promotes proliferation in human hepatocellular carcinoma cells by transcriptional activation of CCNB1.

Authors: Na Chai; Hua-Hong Xie; Ji-Peng Yin; Ke-di Sa; Yi Guo; Meng Wang; Jun Liu; Xiao-Fang Zhang; Xiang Zhang; Hong Yin; Yong-Zhan Nie; Kai-Chun Wu; An-Gang Yang; Rui Zhang
Journal: Biochem Biophys Res Commun Date: 2018-04-30 Impact factor: 3.575

3. MiR-490-5p Suppresses Cell Proliferation and Invasion by Targeting BUB1 in Hepatocellular Carcinoma Cells.

Authors: Bin Xu; Tangpeng Xu; Huali Liu; Qian Min; Shidong Wang; Qibin Song
Journal: Pharmacology Date: 2017-08-16 Impact factor: 2.547

4. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

5. Identifying hepatocellular carcinoma-related hub genes by bioinformatics analysis and CYP2C8 is a potential prognostic biomarker.

Authors: Chuanfei Li; Di Zhou; Xiaoling Jiang; Minhui Liu; Hui Tang; Zhechuan Mei
Journal: Gene Date: 2019-02-27 Impact factor: 3.688

6. Prognostic significance of DNA topoisomerase IIalpha expression in human hepatocellular carcinoma. .

Authors: Akira Watanuki; Susumu Ohwada; Toshio Fukusato; Fujio Makita; Tatsuya Yamada; Akihiro Kikuchi; Yasuo Morishita
Journal: Anticancer Res Date: 2002 Mar-Apr Impact factor: 2.480

7. The role of CDK1 in apoptin-induced apoptosis in hepatocellular carcinoma cells.

Authors: Jing Zhao; Su-Xia Han; Jin-Lu Ma; Xia Ying; Peijun Liu; Juan Li; Lijuan Wang; Ying Zhang; Jiguang Ma; Li Zhang; Qing Zhu
Journal: Oncol Rep Date: 2013-04-25 Impact factor: 3.906

8. TOP2A overexpression in hepatocellular carcinoma correlates with early age onset, shorter patients survival and chemoresistance.

Authors: Nathalie Wong; Winnie Yeo; Wai-Lap Wong; Navy L-Y Wong; Kathy Y-Y Chan; Frankie K-F Mo; Jane Koh; Stephan Lam Chan; Anthony T-C Chan; Paul B-S Lai; Arthur K-K Ching; Joanna H-M Tong; Ho-Keung Ng; Philip J Johnson; Ka-Fai To
Journal: Int J Cancer Date: 2009-02-01 Impact factor: 7.396

9. NCBI GEO: archive for functional genomics data sets--10 years on.

Authors: Tanya Barrett; Dennis B Troup; Stephen E Wilhite; Pierre Ledoux; Carlos Evangelista; Irene F Kim; Maxim Tomashevsky; Kimberly A Marshall; Katherine H Phillippy; Patti M Sherman; Rolf N Muertter; Michelle Holko; Oluwabukunmi Ayanbule; Andrey Yefanov; Alexandra Soboleva
Journal: Nucleic Acids Res Date: 2010-11-21 Impact factor: 16.971

10. Expression and inhibition of BRD4, EZH2 and TOP2A in neurofibromas and malignant peripheral nerve sheath tumors.

Authors: Azadeh Amirnasr; Rob M Verdijk; Patricia F van Kuijk; Walter Taal; Stefan Sleijfer; Erik A C Wiemer
Journal: PLoS One Date: 2017-08-15 Impact factor: 3.240

18 in total

1. Tumor stemness and immune infiltration synergistically predict response of radiotherapy or immunotherapy and relapse in lung adenocarcinoma.

Authors: Hongjie Shi; Linzhi Han; Jinping Zhao; Kaijie Wang; Ming Xu; Jiajun Shi; Zhe Dong
Journal: Cancer Med Date: 2021-11-05 Impact factor: 4.452

2. Use of Single Cell Transcriptomic Techniques to Study the Role of High-Risk Human Papillomavirus Infection in Cervical Cancer.

Authors: Lingzhang Meng; Shengcai Chen; Guiling Shi; Siyuan He; Zechen Wang; Jiajia Shen; Jiajia Wang; Suren Rao Sooranna; Jingjie Zhao; Jian Song
Journal: Front Immunol Date: 2022-06-13 Impact factor: 8.786

3. Identification of Hub Genes Associated With Immune Infiltration and Predict Prognosis in Hepatocellular Carcinoma via Bioinformatics Approaches.

Authors: Huaping Chen; Junrong Wu; Liuyi Lu; Zuojian Hu; Xi Li; Li Huang; Xiaolian Zhang; Mingxing Chen; Xue Qin; Li Xie
Journal: Front Genet Date: 2021-01-11 Impact factor: 4.599

4. An Integrative Bioinformatic Analysis of Microbiome and Transcriptome for Predicting the Risk of Colon Adenocarcinoma.

Authors: Jieyang Yu; Cuizhen Nong; Jingjie Zhao; Lingzhang Meng; Jian Song
Journal: Dis Markers Date: 2022-01-20 Impact factor: 3.434

5. GDI2 is a novel diagnostic and prognostic biomarker in hepatocellular carcinoma.

Authors: Wen Zhang; Zhongjian Liu; Shilin Xia; Lei Yao; Lan Li; Ziying Gan; Hui Tang; Qiang Guo; Xinmin Yan; Zhiwei Sun
Journal: Aging (Albany NY) Date: 2021-12-11 Impact factor: 5.682

6. The Identification of a Tumor Infiltration CD8+ T-Cell Gene Signature That Can Potentially Improve the Prognosis and Prediction of Immunization Responses in Papillary Renal Cell Carcinoma.

Authors: Jie Wang; Meiying Huang; Peng Huang; Jingjie Zhao; Junhua Tan; Feifan Huang; Ruiying Ma; Yu Xiao; Gao Deng; Liuzhi Wei; Qiuju Wei; Zechen Wang; Siyuan He; Jiajia Shen; Suren Sooranna; Lingzhang Meng; Jian Song
Journal: Front Oncol Date: 2021-11-10 Impact factor: 6.244

7. Construction of a Promising Tumor-Infiltrating CD8+ T Cells Gene Signature to Improve Prediction of the Prognosis and Immune Response of Uveal Melanoma.

Authors: Yifang Sun; Jian Wu; Yonggang Yuan; Yumin Lu; Ming Luo; Ling Lin; Shengsheng Ma
Journal: Front Cell Dev Biol Date: 2021-05-28