Min Wu1, Zhaobo Liu1, Aiying Zhang2, Ning Li1,2. 1. Department of General surgery. 2. Beijing Institute of Hepatology, Beijing Youan Hospital, Capital Medical University, Beijing, China.
Abstract
BACKGROUND: Hepatocellular carcinoma (HCC) is one of the most prevalent cancers worldwide. However, the precise mechanisms of the development and progression of HCC remain unclear. The present study attempted to identify and functionally analyze the differentially expressed genes between HCC and cirrhotic tissues by using comprehensive bioinformatics analyses. METHODS: The GSE63898 gene expression profile was downloaded from the Gene Expression Omnibus (GEO) and analyzed using the online tool GEO2R to identify differentially expressed genes (DEGs). Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the DEGs were performed in DAVID. The STRING database was used to evaluate the interactions of DEGs and to construct a protein-protein interaction (PPI) network using Cytoscape software. Hub genes were selected using the cytoHubba plugin and were validated with the cBioPortal database. RESULTS: A total of 301 DEGs were identified between HCC and cirrhotic tissues. The GO analysis results showed that these DEGs were significantly enriched in certain biological processes including negative regulation of growth and cell chemotaxis. Several significant pathways, including the p53 signaling pathway, were identified as being closely associated with these DEGs. The top 12 hub genes were screened and included TTK, NCAPG, TOP2A, CCNB1, CDK1, PRC1, RRM2, UBE2C, ZWINT, CDKN3, AURKA, and RACGAP1. The cBioPortal analysis found that alterations in hub genes could result in significantly reduced disease-free survival in HCC. CONCLUSION: The present study identified a series of key genes and pathways that may be involved in the tumorigenicity and progression of HCC, providing a new understanding of the underlying molecular mechanisms of carcinogenesis in HCC.
BACKGROUND: Hepatocellular carcinoma (HCC) is one of the most prevalent cancers worldwide. However, the precise mechanisms of the development and progression of HCC remain unclear. The present study attempted to identify and functionally analyze the differentially expressed genes between HCC and cirrhotic tissues by using comprehensive bioinformatics analyses. METHODS: The GSE63898 gene expression profile was downloaded from the Gene Expression Omnibus (GEO) and analyzed using the online tool GEO2R to identify differentially expressed genes (DEGs). Gene ontology (GO) functional analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis of the DEGs were performed in DAVID. The STRING database was used to evaluate the interactions of DEGs and to construct a protein-protein interaction (PPI) network using Cytoscape software. Hub genes were selected using the cytoHubba plugin and were validated with the cBioPortal database. RESULTS: A total of 301 DEGs were identified between HCC and cirrhotic tissues. The GO analysis results showed that these DEGs were significantly enriched in certain biological processes including negative regulation of growth and cell chemotaxis. Several significant pathways, including the p53 signaling pathway, were identified as being closely associated with these DEGs. The top 12 hub genes were screened and included TTK, NCAPG, TOP2A, CCNB1, CDK1, PRC1, RRM2, UBE2C, ZWINT, CDKN3, AURKA, and RACGAP1. The cBioPortal analysis found that alterations in hub genes could result in significantly reduced disease-free survival in HCC. CONCLUSION: The present study identified a series of key genes and pathways that may be involved in the tumorigenicity and progression of HCC, providing a new understanding of the underlying molecular mechanisms of carcinogenesis in HCC.
Hepatocellular carcinoma (HCC) is one of the most prevalent cancers worldwide, especially in developing areas such as China and Southeast Asia.[ It has been estimated that HCC patients in China account for over 50% of all cases around the world per year.[ However, in Europe and North America, the HCC incidence rates over the last few decades have not been high.[ It has been reported that the incidence of HCC in Romania is 5.3/100,000 for both genders.[ The major risk factors for the malignant transformation of HCC include chronic hepatitis B or C virus infection, alcoholic liver disease and liver cirrhosis.Accumulating evidence has revealed that the molecular pathogenesis of HCC may be closely associated with environmental influences and genetic factors, such as tumor suppressor gene inactivation, oncogene activation and gene mutation.[ However, the precise mechanisms of the development and progression of HCC remain unclear. HCC is difficult to diagnose early and difficult to treat, and it also has a poor prognosis and high recurrence rate. Therefore, it is urgent to identify potentially useful molecular biomarkers to precisely predict HCC onset and progression and to find key therapeutic targets for curative purposes.The rapid development of high-throughput DNA microarray analysis, an efficient tool, has been helpful for us to better understand the underlying mechanisms and general genetic alterations involved in cancer initiation and metastasis. DNA microarrays have been extensively applied to investigate HCC carcinogenesis through the profiling of gene expression and the identification of altered genes.[The aim of this study was to identify hub genes and pathways in HCC distinct from those in liver cirrhosis using bioinformatics methods and to then investigate the potential molecular mechanisms of hepatocarcinogenesis.
Methods
Microarray data from HCC and cirrhotic tissues
To explore the gene expression differences between HCC and cirrhotic tissues, the GSE63898 gene expression profile was downloaded from the Gene Expression Omnibus (GEO, ), a public functional genomics data repository. Ethical approval was not necessary for this study because public datasets were analyzed. The GSE63898 dataset was submitted by Augusto et al and was designed to analyze the genome-wide expression in 228 primary HCC and 168 non-tumor cirrhotic samples from patients treated with surgical resection. In particular, the 228 primary HCC tissues included 19 BLCL0 and 178 BLCLA HCC tissues, which were very helpful in investigating the mechanism of hepatocarcinogenesis.
Identification of differentially expressed genes
The identification of differentially expressed genes (DEGs) between HCC and cirrhotic tissues was performed in GEO2R (https://www.ncbi.nlm.nih.gov/geo/geo2r/), an online tool designed to compare different groups of samples. The P values were adjusted to correct for the occurrence of false positive results by using the Benjamini and Hochberg False Discovery Rate method. An adjusted P < .05 and a |log2FC| > 1.5 were used as the cutoff values for identifying DEGs.
Functional and pathway analysis of DEGs
The Database for Annotation, Visualization, and Integrated Discovery (DAVID, https://david.ncifcrf.gov/), which is a useful online database platform for high-throughput gene functional analysis, was used to analyze the functions and signaling pathways of the DEGs identified in this study. The analyses performed on the DEGs in DAVID included gene ontology (GO) function analysis (categories: biological processes (BP), cellular components (CC) and molecular functions (MF)) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The cutoff criterion was a P value < .05.
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, https://string-db.org/) is a database of known and predicted protein-protein interactions, including direct (physical) and indirect (functional) associations. To better illustrate the interactions and functions of the DEGs, the STRING database was used in this study to evaluate their functional associations and construct a PPI network. All of the default parameters were used. Then, the PPI network was visualized with Cytoscape 3.6.1, an open-access tool for creating integrated models of bio-molecular interaction networks. The key DEGs were selected using the maximal clique centrality (MCC) algorithm, and the cytoHubba plugin, a Cytoscape plugin, was used to determine the hub proteins or genes in the PPI network. The top 12 key DEGs were selected as hub genes.
Analysis of hub genes using the cBioPortal for cancer genomics
To analyze the integrative relationships of the hub genes and their clinical characteristics in HCC, the cBioPortal for Cancer Genomics (http://www.cbioportal.org/) was used, which is an open-access resource for exploring and analyzing genetic alterations across samples from multidimensional studies. The analyses of genomic mutations and survival prognosis in the selected TCGA datasets could be performed in the cBioPortal according to the instructions.[ In this study, patients with HCC (except for intrahepatic cholangiocarcinomas and fibrolamellar liver cancer) in the liver hepatocellular carcinoma dataset (TCGA, Provisional), were selected for analysis in the present study. (To reviewer #3)
Results
Identification of DEGs
A total of 434 probe set IDs were found to be differentially expressed between HCC and cirrhotic tissues with thresholds of adjusted P < .05 and |log2FC| > 1.5. The probes were matched to gene symbols using the Affymetrix database. Finally, 301 DEGs between HCC and liver cirrhosis were screened and used for further analysis.
GO and KEGG pathway enrichment analyses
To further investigate the functions and mechanisms of the DEGs, GO and KEGG enrichment analyses was performed in DAVID. The GO analysis results demonstrated that the DEGs were significantly enriched for the negative regulation of growth, cell chemotaxis, cell adhesion, cellular response to cadmium ion, and inflammatory response terms in the biological processes category. In addition, in the cell component analysis, it was revealed that these DEGs were predominantly involved in the extracellular region, extracellular space, extracellular matrix, extracellular exosomes and blood microparticles. Moreover, in the molecular function analysis, these genes were mainly associated with heparin binding, chemokine activity, immunoglobulin receptor binding, heme binding and oxygen binding. Furthermore, the KEGG pathway enrichment analysis indicated that the DEGs were mainly involved in mineral absorption (hsa04978), the p53 signaling pathway (hsa04115), caffeine metabolism (hsa00232), tryptophan metabolism (hsa00380) and arachidonic acid metabolism (hsa00590); these results are comprehensively summarized in Table 1. The results obtained for the GO and KEGG enrichment analyses are shown in Figure 1.
Table 1
The altered Kyoto encyclopedia of genes and genomes (KEGG) pathways in HCC.
Figure 1
Results of GO analysis and KEGG pathway enrichment analysis of the differentially expressed genes (DEGs). The x-axis indicates the functional annotations or significantly enriched KEGG pathways. The left y-axis indicates the –log10 (P value). The right y-axis indicates the number of enriched genes. GO = Gene ontology. KEGG = Kyoto encyclopedia of genes and genomes.
The altered Kyoto encyclopedia of genes and genomes (KEGG) pathways in HCC.Results of GO analysis and KEGG pathway enrichment analysis of the differentially expressed genes (DEGs). The x-axis indicates the functional annotations or significantly enriched KEGG pathways. The left y-axis indicates the –log10 (P value). The right y-axis indicates the number of enriched genes. GO = Gene ontology. KEGG = Kyoto encyclopedia of genes and genomes.
PPI network construction and analysis
Based on the STRING database, a PPI network of DEGs was constructed and visualized, as shown in Figure 2. A total of 269 nodes and 851 edges were mapped in the PPI network, with a local clustering coefficient of 0.48 and a PPI enrichment P value < 1.0e–16. The hub genes selected from the PPI network using the maximal clique centrality (MCC) algorithm and cytoHubba plugin are shown in Figure 3. The top 12 hub genes were TTK protein kinase (TTK), non-SMC condensin I complex, subunit G (NCAPG), topoisomerase (DNA) II alpha (TOP2A), Cyclin B1 (CCNB1), Cyclin-dependent kinase 1 (CDK1), Protein regulator of cytokinesis 1 (PRC1), Ribonucleotide reductase M2 (RRM2), Ubiquitin-conjugating enzyme E2C (UBE2C), ZW10 interactor (ZWINT), Cyclin-dependent kinase inhibitor 3 (CDKN3), Aurora kinase A (AURKA) and Rac GTPase activating protein 1 (RACGAP1).
Figure 2
Visualization of the protein-protein interaction (PPI) network of the identified differentially expressed genes (DEGs). Blue nodes represent the productions of DEGs. Edges represent the protein-protein associations.
Figure 3
Visualization of the hub genes selected from the PPI network using the maximal clique centrality algorithm and the cytoHubba plugin. Edges represent the protein-protein associations. Red octagons represent differentially expressed genes (DEGs) with the high PPI scores. Yellow octagons represent differentially expressed genes (DEGs) with the low PPI scores. PPI = Protein-protein interaction.
Visualization of the protein-protein interaction (PPI) network of the identified differentially expressed genes (DEGs). Blue nodes represent the productions of DEGs. Edges represent the protein-protein associations.Visualization of the hub genes selected from the PPI network using the maximal clique centrality algorithm and the cytoHubba plugin. Edges represent the protein-protein associations. Red octagons represent differentially expressed genes (DEGs) with the high PPI scores. Yellow octagons represent differentially expressed genes (DEGs) with the low PPI scores. PPI = Protein-protein interaction.
Results of hub gene analysis using the cBioPortal for cancer genomics
Hub genes were further examined in an independent TCGA cohort of HCC (N = 429, except for intrahepatic cholangiocarcinomas and fibrolamellar liver cancer) (cBioPortal). The OncoPrint results of genomic alterations for these genes were depicted by using the cBioPortal (Fig. 4A). Although mutations of the top 12 hub genes did not show any impact on overall survival (OS) (P = .145) in HCC patients, they were robustly associated with reductions disease-free survival (DFS) (P = .0154) in these patients (Fig. 4B and C), validating their importance in hepatocarcinogenesis.
Figure 4
Analysis of genomic alterations in the identified hub genes and its correlations with survival prognosis in HCC using cBioPortal. (A) The genomic alterations of the hub genes in the selected TCGA dataset of HCC (N = 429). Each column represents a patient. (B) Analysis of overall survival (OS) in patients (N = 365) with or without hub gene alterations using the TCGA dataset. Total#, total number of cases. Dec#, number of deceased cases. (C) Analysis of disease-free survival (DFS) in patients (N = 315) with or without hub gene alterations using the TCGA dataset. Relap#, number of relapsed cases. HCC = hepatocellular carcinoma, MMS = median months of survival, MMFS = median months of DFS.
Analysis of genomic alterations in the identified hub genes and its correlations with survival prognosis in HCC using cBioPortal. (A) The genomic alterations of the hub genes in the selected TCGA dataset of HCC (N = 429). Each column represents a patient. (B) Analysis of overall survival (OS) in patients (N = 365) with or without hub gene alterations using the TCGA dataset. Total#, total number of cases. Dec#, number of deceased cases. (C) Analysis of disease-free survival (DFS) in patients (N = 315) with or without hub gene alterations using the TCGA dataset. Relap#, number of relapsed cases. HCC = hepatocellular carcinoma, MMS = median months of survival, MMFS = median months of DFS.
Discussion
Although numerous studies have been conducted to investigate HCC, the early diagnosis and timely treatment of HCC remain difficult because of the complicated underlying mechanisms of HCC initiation and progression.[ Thus, it is vital to elucidate the detailed molecular mechanisms of HCC development for further prevention and treatment of HCC. With the development of high-throughput sequencing technology and DNA microarrays, the profiling of differentially expressed genes closely associated with HCC has become easier and more common worldwide, providing a novel and effective way to explore promising targets in preventing and treating HCC.In this study, the GSE63898 dataset was extracted from GEO, and a total of 301 DEGs between HCC and cirrhotic tissues were screened. Functional analysis showed that these DEGs were robustly associated with various biological processes, such as cell adhesion, inflammatory responses, cell chemotaxis and the negative regulation of growth, most of which are closely related to the genesis and progression of cancer. In addition, the enriched KEGG pathways of DEGs were mainly involved in p53 signaling, mineral absorption, cell cycle progression, metabolism, pathways related to proteoglycans in cancer, and cytokine-cytokine receptor interactions. Moreover, a PPI network with the DEGs was constructed, and 12 hub genes, including TTK, NCAPG, TOP2A, CCNB1, CDK1, PRC1, RRM2, UBE2C, ZWINT, CDKN3, AURKA and RACGAP1, were identified as the key genes in HCC.TTK encodes a dual specificity protein kinase with the ability to phosphorylate serine, threonine, and tyrosine. Studies have established that the TTK protein kinase is most likely associated with cell proliferation and is essential for the accurate segregation of chromosomes in cellular mitosis. The alteration of this protein may result in aberrant mitotic spindles, and tumorigenesis may then occur. TTK has been found to be overexpressed in various tumors, such as pancreatic cancer, breast cancer and HCC.[ Thus, TTK may be a favorable prognostic biomarker and a therapeutic target in cancer. NCAPG encodes the G subunit of the non-SMC condensin I complex, which is responsible for the conversion of interphase chromatin into mitotic-like condensed chromosomes during cell division. NCAPG has been found to be overexpressed in HCC compared with adjacent normal tissue and is closely associated with poor overall survival in HCC.[TOP2A encodes DNA topoisomerase II alpha, a nuclear enzyme that is involved in altering topologic states of DNA during transcription and replication. It has been reported that overexpression of TOP2A is significantly associated with low survival and tumor metastasis in many types of tumors.[ Cyclin B1 (CCNB1), a regulatory protein, plays an important role in controlling the G2/M transition phase during mitosis. CCNB1 upregulation has been reported to be a significant prognostic marker for poor outcome in HCC.[Cyclin-dependent kinase 1 (CDK1), a member of the Ser/Thr protein kinase family, plays an essential role in the control of the eukaryotic cell cycle by modulating the centrosome cycle. CDK1 has been extensively investigated in ovarian cancer and colorectal cancer.[ However, little is known about the role of CDK1 in HCC carcinogenesis. Protein regulator of cytokinesis 1 (PRC1), a microtubule-associated protein, has been shown to be involved in apoptosis inhibition and carcinogenic progression and to promote early recurrence in HCC.[ Ribonucleotide reductase regulatory subunit M2 (RRM2) plays a role in catalyzing the biosynthesis of deoxyribonucleotides from ribonucleotides. RRM2 has been shown to be a prognostic biomarker and therapeutic target for breast cancer, prostate cancer and gastric adenocarcinoma. However, RRM2 has rarely been researched in HCC.Ubiquitin-conjugating enzyme E2C (UBE2C), a member of the E2 ubiquitin-conjugating enzyme family, has been reported to be involved in processes such as the destruction of mitotic cyclins, cell cycle progression and even cancer progression. UBE2C has been shown to promote cell proliferation and cancer invasion in prostate carcinoma and breast cancer.[UBE2C is also overexpressed in hepatocellular carcinoma.[ Thus, UBE2C may be a potential biomarker for cancer diagnosis and may be used to predict cancer prognosis and sensitivity to cancer treatment. ZW10 interacting kinetochore protein (ZWINT) is a protein that is involved in kinetochore formation and spindle checkpoint activity through interactions with ZW10. It has been demonstrated that ZWINT is notably overexpressed in different human tumors, such as gastric, colon, breast and liver cancer tumors.[ Thus, ZWINT might contribute to the tumorigenicity of hepatocellular carcinoma by upregulating cell proliferation.[Cyclin-dependent kinase inhibitor 3 (CDKN3), a member of the dual-specificity protein phosphatase family, plays a vital role in cell cycle regulation by dephosphorylating CDK2 kinase. CDKN3 has been demonstrated to be overexpressed and mutated in several types of cancers, such as gastric cancer, breast cancer and hepatocellular carcinoma.[ Knockdown of CDKN3 may inhibit migration and proliferation in human cancer and promote apoptosis of cancer cells.[ AURKA encodes a cell cycle-regulated kinase that plays a role in various mitotic events during chromosome segregation. It has been reported that AURKA was a target gene of beta-catenin in several tumors such as gastric cancer, multiple myeloma disease, hepatocellular carcinoma, esophageal squamous cell carcinoma.[ Moreover, it had been demonstrated that AURKA might further participate in tumor development and metastasis by directly phosphorylating beta-catenin in esophageal squamous cell carcinoma.[ Beta-catenin is a multifunctional protein that plays a critical role in the Wnt signaling pathway and in E-cadherin-mediated intercellular adhesion.[ Moreover, nuclear beta-catenin may activate the transcription of target genes, such as CD44, cyclin D1 and c-myc.[ Aberrant localization of beta-catenin has been found to correlate with the overexpression of special AT-rich sequence-binding protein 1 (SATB1) and with APC gene alteration in colorectal cancer.[ Activated AKT may also promote the upregulation of beta-catenin in human epidermoid carcinoma cells.[ Rac GTPase activating protein 1 (RACGAP1), a component of the centralspindlin complex, plays a key role in controlling cell growth, differentiation and cytokinesis. RACGAP1 had been reported to serve as a metastatic driver in uterine carcinosarcoma and to increase tumor malignant potential in colorectal cancer.[ RACGAP1 may be a useful biomarker to monitor the early recurrence of HCC.[To further explore the relationships between the hub genes and HCC, a provisional TCGA cohort of HCC was used to evaluate the correlations between the hub genes and survival prognosis. The cBioPortal analysis found that mutations or alterations in the hub genes could result in significantly reduced disease-free survival and poorer clinical outcomes in HCC.
Conclusion
The present study attempted to identify and functionally analyze the differentially expressed genes between HCC and cirrhotic tissues by using comprehensive bioinformatics analyses and unveiled a series of key genes and pathways that may be involved in the tumorigenicity and progression of hepatocellular carcinoma. The cBioPortal analysis also demonstrated that mutations in the screened key genes could result in reduced disease-free survival in hepatocellular carcinoma. However, more experimental studies need to be carried out in the future to further validate these bioinformatics analysis results for hepatocellular carcinoma.
Acknowledgments
We appreciate all contributors to the GEO, cBioPortal, DAVID and STRING databases for providing open-access resources for exploring cancer genomics data.
Author contributions
Conceptualization: Zhaobo Liu, Ning Li.Formal analysis: Min Wu.Project administration: Ning Li.Writing – original draft: Aiying Zhang.
Authors: T Brabletz; A Jung; S Reu; M Porzner; F Hlubek; L A Kunz-Schughart; R Knuechel; T Kirchner Journal: Proc Natl Acad Sci U S A Date: 2001-08-28 Impact factor: 11.205
Authors: M I Sandri; D Hochhauser; P Ayton; R C Camplejohn; R Whitehouse; H Turley; K Gatter; I D Hickson; A L Harris Journal: Br J Cancer Date: 1996-06 Impact factor: 7.640