Jingcheng Wang1, Yang Tian1, Hui Chen1, Hui Li1, Shusen Zheng1. 1. Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, First Affiliated Hospital, School of Medicine, Zhejiang University, Key Laboratory of Combined Multi‑Organ Transplantation, Ministry of Public Health Key Laboratory of Organ Transplantation, Collaborative Innovation Center for Diagnosis Treatment of Infectious Diseases, Hangzhou, Zhejiang 310003, P.R. China.
Abstract
The purpose of the present study was to investigate the underlying molecular mechanism of hepatocellular carcinoma (HCC) using bioinformatics approaches. The microarray dataset GSE64041 was downloaded from the Gene Expression Omnibus database, which included 60 tumor liver samples and 60 matched control samples. Differentially expressed genes (DEGs) between HCC and control groups were identified. Then functional enrichment analyses, protein‑protein interaction (PPI) network, sub‑network and integrated transcription factor (TF)‑microRNA (miRNA)‑target network analyses were performed for these DEGs. A total of 378 DEGs were obtained, including 101 upregulated and 277 downregulated DEGs. In addition, functional enrichment analysis for DEGs in the sub‑network revealed 'cell division' and 'cell cycle' as key Gene Ontology (GO) terms and pathways. Topoisomerase (DNA) IIα (TOP2A) and integrin subunit α2 (ITGA2) were hub nodes in the PPI network. TOP2A, cyclin dependent kinase 1 (CDK1) and polo like kinase 1 (PLK1) were revealed to be hub nodes in the sub‑network. Finally, 4 TFs including forkhead box M1 (FOXM1), E2F transcription factor 4 (E2F4), SIN3 transcription regulator family member A (SIN3A) and transcription factor 7 like 1 (TCF7L1) were obtained through integrated network analysis. TOP2A, ITGA2, PLK1 and CDK1 may be key genes involved in HCC development. 'Cell division' and 'cell cycle' were indicated to act as key GO terms and Kyoto Encyclopedia of Genes and Genomes pathways in HCC. In addition, FOXM1, TCF7L1, E2F4 and SIN3A were revealed to be key TFs associated with HCC.
The purpose of the present study was to investigate the underlying molecular mechanism of hepatocellular carcinoma (HCC) using bioinformatics approaches. The microarray dataset GSE64041 was downloaded from the Gene Expression Omnibus database, which included 60 tumor liver samples and 60 matched control samples. Differentially expressed genes (DEGs) between HCC and control groups were identified. Then functional enrichment analyses, protein‑protein interaction (PPI) network, sub‑network and integrated transcription factor (TF)‑microRNA (miRNA)‑target network analyses were performed for these DEGs. A total of 378 DEGs were obtained, including 101 upregulated and 277 downregulated DEGs. In addition, functional enrichment analysis for DEGs in the sub‑network revealed 'cell division' and 'cell cycle' as key Gene Ontology (GO) terms and pathways. Topoisomerase (DNA) IIα (TOP2A) and integrin subunit α2 (ITGA2) were hub nodes in the PPI network. TOP2A, cyclin dependent kinase 1 (CDK1) and polo like kinase 1 (PLK1) were revealed to be hub nodes in the sub‑network. Finally, 4 TFs including forkhead box M1 (FOXM1), E2F transcription factor 4 (E2F4), SIN3 transcription regulator family member A (SIN3A) and transcription factor 7 like 1 (TCF7L1) were obtained through integrated network analysis. TOP2A, ITGA2, PLK1 and CDK1 may be key genes involved in HCC development. 'Cell division' and 'cell cycle' were indicated to act as key GO terms and Kyoto Encyclopedia of Genes and Genomes pathways in HCC. In addition, FOXM1, TCF7L1, E2F4 and SIN3A were revealed to be key TFs associated with HCC.
Hepatocellular carcinoma (HCC) is one of the most prevalent cancers worldwide, causing the third most death of cancers (1). Patients with HCC have a high risk in liver cirrhosis and other symptoms such as pain, fatigue, weight loss, and obstructive syndromes including ascites and jaundice (2). As not all of HCCpatients qualifying surgical treatments including tumor resection and liver transplantation, the prognosis of these patients is poor (3). Although great improvements in HCC treatments have been achieved, further investigation of safe and accurate diagnosis methods and effective therapies for HCC should be taken into consideration.In the past decades, efforts on molecular mechanism researches of HCC have thrown light on molecular diagnosis and molecular targeted therapies of the disease. Numerous genes and transcription factors (TFs) associated with HCC development have been revealed. Like all the other cancers, cells of HCC lost control of cell cycle (4). Genes such as cyclin-dependent kinases and TFs such as E2F transcription factors (E2Fs) are involved in cell cycle (5,6). Another hallmark of cancer cells is metastasis. Genes like matrix metalloproteinase (MMP)2 and MMP9, and TFs like hypoxia inducible TFs are found to play important role in the invasion and metastasis of HCC cells (7–9). What's more, signaling pathways such as retinoblastoma pathways, Ras/MAPK pathway and Wnt/β-catenin pathway are altered in HCC cells (4). Based on achieved improvements in the molecular mechanism underlying HCC, many efforts have been made to investigate promising biomarkers and therapeutic targets for the diagnosis and treatment of HCC. Quantitative detection of methylated GSH-sulphur-transferase P1 in serum can be used for the early diagnosis of HCC (10). Besides, methylation status of plasma P16 gene has been indicated to have the value for the detection of primary HCC (11). In addition, it is revealed that the expression level of glypican 3 (GPC-3) in liver tissues can be used as an indicator of HCC (12) and silencing of GPC-3 can inhibit the proliferation of HCC cells (13). Furthermore, DNA methyltransferase 1 (DNMT1) knockdown can inhibit HCC cell proliferation and increase apoptosis, suggesting that DNMT1 may serve as a targets for HCC therapy (14). Dysruption of pathways involved in HCC have been proven to be efficient for HCC treatment. The candidate pathways include pathways involved in signal transduction such as growth factor receptors, pathways involved in apoptosis such as intrinsic pathway and pathways participating in cell cycle (15). Despite these great advances, the key mechanism underlying HCC remains to be further elucidated to screen promising biomarkers and potential targets for the diagnosis and treatment of HCC.Bioinformatics approaches are effective for mechanism research. Makowska et al (16) developed the gene expression file GSE64041 using HCC samples and matched control samples to establish a molecular classification of humanHCC. Based on differential gene expression analysis, subclass prediction, and pathway analysis, they found that three subgroups of HCCpatients were identified based on the gene expression data and different subgroups of patients with different prognosis (16). In comparison with this study, we downloaded this gene expression file (GSE64041) to further investigate the potential mechanism underlying HCC by a comprehensive bioinformatics analysis. In addition to differential gene expression and pathway analyses, protein-protein interaction (PPI) network, sub-network and integrated TF-miRNA-target network were also constructed and analyzed to further elucidate the key genes, TFs and pathways associated with HCC. Our results may be helpful for better understanding of the molecular mechanisms underlying HCC and provide valuable information for the diagnosis and treatment of this disease.
Materials and methods
Microarray data
Gene expression profile data GSE64041 which was deposited by Makowska et al (16), was downloaded from Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/) database (17). The datasets were sequenced on the platform Affymetrix Human Gene 1.0 ST Array (Affymetrix; Thermo Fisher Scientific, Inc., Waltham, MA, USA). A total of 60 biopsy pairs from patients with HCC were obtained, including 60 HCC samples and 60 matched non-tumor liver samples.
Data preprocessing and differentially expressed genes (DEGs) analysis
The downloaded data were preprocessed using limma package (18) (version 3.30.11, http://www.bioconductor.org/packages/release/bioc/html/limma.html) in R language. Expression calculation was performed after background correction and normalization. Then the probes were corresponded to gene symbols according to the downloaded NCBI gene data. If different probes were corresponding to the same symbol, the average expression values were taken. DEGs between HCC and non-tumor control groups were identified using limma package (18). P-values were calculated using bayesian t-test method. The cut-off thresholds were P<0.05 and fold-change ≥1.
Functional enrichment analysis
Gene Ontology (GO) (19) terms including molecular function (MF), cellular component (CC) and biological process (BP), and Kyoto Encyclopedia of Genes and Genomes (KEGG) (20) pathways were enriched for the upregulated and downregulated DEGs using the database for annotation, visualization and integrated discovery (DAVID) (version 6.8, https://david-d.ncifcrf.gov/), respectively (21). P<0.05 was considered to indicate a statistically significant difference.
PPI network analysis
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, version 10.0; http://www.string-db.org/) (22) is an online database proving assessment and integration of PPIs. In this study, PPIs were identified based on the information of STRING, followed by PPI network construction using Cytoscape software (23) (version 3.4.0; http://www.cytoscape.org/). The topology analysis for the PPI network was performed using CytoNCA plugin (24) (version 2.1.6; http://apps.cytoscape.org/apps/cytonca). The parameter setting was network without weight. Results arranged in descending order included scores of degree centrality, betweenness centrality and closeness centrality. Nodes with top 10 highest centralities were regarded as hub genes. Besides, we performed a sub-network analysis to identify sub-networks with great importance from PPI network using MCODE plugin (25) (version 1.4.2; http://apps.cytoscape.org/apps/mcode) in cytoscape software. Additionally, functional enrichment analyses for genes in the screened sub-networks were carried out.
TF-miRNA-target regulatory network analysis
HCC relating miRNAs and experimentally validated target genes were extracted from Mirwalk2 database (26) (http://zmf.umm.uni-heidelberg.de/apps/zmf/mirwalk2/). HCC relating miRNA-target pairs were identified through comparing the DEGs with the downloaded miRNA-target pairs. Then the miRNA-target regulatory network was constructed using Cytoscape software. Besides, TF-target pairs in the miRNA-target network were predicted using iRegulon plugin (27) (version 1.3; http://apps.cytoscape.org/apps/iregulon) in Cytoscape with normalized enrichment score (NES) >5 as threshold. Finally, TF-miRNA-target regulatory network construction was performed using Cytoscape software.
Results
Analysis of DEGs
A total of 32,321 probes and 18,710 genes were obtained after data preprocessing. Among these obtained genes, a total of 378 genes were differentially expressed in HCC group compared with control group, including 101 upregulated and 277 downregulated DEGs. The heatmap of DEGs was shown in Fig. 1.
Figure 1.
Heat map of DEGs. GSM2052460, GSM2052462 and GSM2052578 are control samples; GSM2052461, GSM2052463 and GSM205579 are disease samples. The bottom axis represents the samples, the top axis represents sample clustering, the right-hand axis represents DEGs and the left-hand axis represents the clustering of DEGs. Red indicates upregulated genes and green indicates downregulated genes. DEG, differentially expressed gene.
The KEGG pathways of the upregulated and downregulated DEGs were obtained. The upregulated DEGs were enriched in nine KEGG pathways such as cell cycle, pathways in cancer and PI3K-Akt signaling pathway; the downregulated DEGs were enriched in ten KEGG pathways such as metabolic pathways, biosynthesis of antibiotics and chemical carcinogenesis. In addition, significant GO terms enriched by DEGs were obtained. The upregulated DEGs were enriched in GO terms such as cell cycle, microtubule cytoskeleton and enzyme binding (Fig. 2A); and the downregulated DEGs were enriched in GO terms such as extracellular region, organic acid metabolic process and cofactor binding (Fig. 2B).
Figure 2.
GO terms of DEGs. (A) GO terms of upregulated DEGs, and (B) GO terms of downregulated DEGs. GO, Gene Ontology; MF, molecular function; CC, cellular component; BP, biological process; DEG, differentially expressed gene.
PPI network analysis for the DEGs was performed, containing 211 nodes and 618 PPI pairs (Fig. 3). Nodes with top ten highest score in three algorithms were listed in Table I, such as topoisomerase (DNA) IIα (TOP2A), cyclin dependent kinase 1 (CDK1) and integrin subunit α2 (ITGA2) (degree centrality); estrogen receptor 1 (ESR1), TOP2A and ITGA2 (betweenness centrality); and ESR1, TOP2A and ITGA2 (closeness centrality). In addition, a sub-network including the most genes was identified via sub-network analysis, including 18 nodes and 123 PPI pairs (Fig. 4). In the identified sub-network, TOP2A (degree=35), CDK1 (degree=31) and polo like kinase 1 (PLK1) (degree=27) were the nodes with most PPI pairs. Functional enrichment analysis for the genes in this sub-network revealed that these genes were significantly enriched in one GO term (cell division) and three KEGG pathways including cell cycle, progesterone-mediated oocyte maturation and oocyte meiosis (Table II). Additionally, CDK1, PLK1 and mitotic checkpoint serine/threonine kinase (BUB1) were the common genes in the GO term and KEGG pathways mentioned above (Table II).
Figure 3.
Protein-protein interaction network of DEGs. Red circles represent upregulated DEGs and green circles represent downregulated DEGs. DEG, differentially expressed gene.
Table I.
List of top 10 highest scoring nodes in the protein-protein interaction network.
Rank
Top 10 genes
Degree
Top 10 genes
Betweenness
Top 10 genes
Closeness
1
TOP2A
35
ESR1
9275.706
ESR1
0.05781939
2
CDK1
31
ITGA2
7831.359
ITGA2
0.05731441
3
ITGA2
29
TOP2A
6589.938
TOP2A
0.05729877
4
PLK1
27
APOA1
4281.257
CDK1
0.05675676
5
ESR1
26
DCN
4087.898
IGF1
0.05663430
6
CCNB2
26
CDK1
3732.873
TXNRD1
0.05658852
7
AURKA
25
TXNRD1
3077.681
CFTR
0.05657328
8
BUB1
25
CXCL12
2707.838
DCN
0.05652759
9
CCNA2
24
IGF1
2424.979
FOXM1
0.05651238
10
BUB1B
20
TAT
2213.226
CYP2B6
0.05648198
Figure 4.
Sub-network with the greatest number of DEGs. Red circles represent upregulated DEGs. DEG, differentially expressed gene.
Table II.
Functional enrichment analysis of differentially expressed genes in the module.
GO, Gene Ontology; BP, biological process; KEGG, Kyoto Encyclopedia of Genes and Genomes.
The miRNA-target network was constructed, including 107 miRNAs and 40 target genes. TF prediction for the DEGs in the miRNA-target network revealed 4 TFs including forkhead box M1 (FOXM1), E2F4, SIN3 transcription regulator family member A (SIN3A) and transcription factor 7 like 1 (TCF7L1). As shown in the integrated TF-miRNA-target network (Fig. 5), FOXM1 (NES=8.642) targeted 32 genes, followed by TCF7L1 (NES=5.449) targeting 17 genes, E2F4 (NES=5.398) targeting 12 genes and SIN3A (NES=5.095) targeting nine genes.
Figure 5.
TF-miRNA-target network of DEGs. Purple circles represent DEGs, white triangles represent miRNAs and green octagons represent TFs. TF, transcription factor; miRNA, microRNA; DEG, differentially expressed gene.
Discussion
HCC is among the most deadliest cancers worldwide. Identification of promising biomarker or targets for the diagnosis and treatment of HCC is needed. In our study, a comprehensive microarray analysis of liver tumor and normal samples from patients with HCC was performed. The results of our analyses revealed that key upregulated DEGs such as TOP2A, ITGA2, PLK1 and CDK1 were associated with HCC. Besides, cell division and cell cycle might play key roles in the development of HCC. Furthermore, crucial TFs including FOXM1, TCF7L1, E2F4 and SIN3A were revealed to be key TFs related to HCC.TOP2A and ITGA2 were identified as the key nodes in the PPI network. Moreover, TOP2A, PLK1 and CDK1 were the genes with the highest degrees in the sub-network. It is reported that TOP2A is overexpressed in HCC tissues (28), which is in accordance to our results. Furthermore, it is revealed that the expression of TOP2A is related to the onset of malignancy and chemoresistance, shortening the survival time of patients (29). Silencing of ITGA2 can promote the migration of breast cancer cells (30). What's more, inhibition of ITGA2 by miR-128 can significantly decrease the metastasis of HCC cells (31). It is indicated that the positive expression rate of PLK1 in HCC is higher than that in healthy controls (32). Additionally, it is reported that knockdown of PLK1 in HCC tumor-derived endothelial cells can inhibit the migration of these cells (33). CDK1 is reported to have the capacity of apoptosis modulation in HCC by co-acting with apoptin (34). Moreover, it is demonstrated that miR-582-5p can inhibit the proliferation of HCC cells through targeting CDK1 and another gene AKT serine/threonine kinase 3 directly (35). Based on our results, we speculate that TOP2A, ITGA2, PLK1 and CDK1 may be the key genes involved in HCC.Functional enrichment analysis for genes in the sub-network revealed that PLK1 and CDK1 were the common genes involved in both cell division and cell cycle. HCC cells lost control of the cell cycle, which is a common feature occurring in all tumorigenic cells (4). Inhibition of PLK1 activity can decrease the polyploid cell population during cell division (36). Moreover, it is revealed that the downregulation of PLK1 results in the promotion of cell cycle arrest and apoptosis (37). It is confirmed that CDK1 is essential for cell division through knockdown of CDK1 in mice (38). Besides, CDK1 has been considered to be the only essential cell cycle CDK (39). Thus, we suspect that PLK1 and CDK1 may play roles in HCC by participating in cell cycle and cell division.The result of TF-miRNA-target network analysis showed that FOXM1, TCF7L1, E2F4 and SIN3A were TFs regulating the most DEGs. It is reported that FOXM1 overexpression can promote the metastasis of HCC (40). Meanwhile, inhibition of FOXM1 is shown to have the capacity of promoting the senescence of HCC cells (41). TCF7L1, a member of the T cell factor/lymphoid enhancer factor family, is reported to modulate colorectal cancer growth via inhibiting the tumor suppressor EPH Receptor B3 (42). As another member of the family, TCF7L2 is revealed to be associated with the susceptibility of HCC in patients with liver cirrhosis (43). As a member of the E2F family, E2F4 acts as an anti-proliferative TF, playing crucial role in cell cycle (44). Besides, comparing with the wild-type E2F4, E2F4 mutants can increase the growth of colorectal cancer cells (45). It is reported that the upregulation of miR-210 can inhibit proliferation of HCC cells (46). In addition, miR-210 upregulation is revealed to inhibit proliferation and induce apoptosis in glioma cells via targeting SIN3A, a member of the SIN3 transcription regulator family (47). Based on these data, we speculate that FOXM1, TCF7L1, E2F4 and SIN3A may be key TFs participating in the development of HCC.Notably, Maswska et al (16) reported that three subgroups of HCCpatients were identified based on the gene expression data of GSE64041 and the subgroups of patients have different prognosis, suggesting that the different expression pattern of genes in different subgroups may be an important factor to influence HCC progression and patients' prognosis. However, we did not analyze the possible mechanism underlying HCC in the different subgroups due to lack of the related expression data of different subgroups, let alone the gene expression and the patients' prognosis. Further analyses should be performed to further elucidate the key mechanism associated with the patients' prognosis in different subgroups. Moreover, some experiments, such as expression validation or knockdown assays, were not performed to determine the role of these key genes and TFs in HCC development. Taken together, further investigations remain to be done to confirm the results.In conclusion, the results of the present study reveal that TOP2A, ITGA2, PLK1 and CDK1 may act as key genes associated with HCC. Moreover, cell cycle and cell division may function as key pathways in HCC through the regulation of key genes PLK1 and CDK1. Additionally, FOXM1, TCF7L1, E2F4 and SIN3A are revealed to be key TFs involved in HCC. Our results might provide valuable data for selection of biomarker and therapeutic targets of the disease.
Authors: Lee Jin Lim; Yu Jin; Henry Yang; Alexander Y F Chung; Brian K P Goh; Pierce K H Chow; Chung Yip Chan; William K Blanks; Peng Chung Cheow; Ser Yee Lee; Tony K H Lim; Samuel S Chong; London L P J Ooi; Caroline G Lee Journal: Sci Rep Date: 2020-07-07 Impact factor: 4.379
Authors: Isabel Graupera; Laura Isus; Mar Coll; Elisa Pose; Alba Díaz; Julia Vallverdú; Teresa Rubio-Tomás; Celia Martínez-Sánchez; Patricia Huelin; Marta Llopis; Cristina Solé; Constantino Fondevila; Juan José Lozano; Pau Sancho-Bru; Pere Ginès; Patrick Aloy Journal: JHEP Rep Date: 2022-04-04