Xiaocong Jiang1, Ting Song2, Xiuhua Pan1, Xinyu Zhang1, Yuhong Lan1, Li Bai1. 1. Department of Radiotherapy Oncology, Huizhou Central People's Hospital, Huizhou, 516001, Guangdong, People's Republic of China. 2. Department of Hepatology, The Sixth People's Hospital of Qingdao, Qingdao, 266033, Shandong, People's Republic of China.
Abstract
OBJECTIVE: The occurrence and development of hepatocellular carcinoma (HCC) remain unclear. This study aimed to investigate potential diagnostic or prognostic markers for early HCC by applying bioinformatic analysis. METHODS: The gene expression profiles of early HCC and normal tissues from a TCGA dataset were used to identify differentially expressed genes (DEGs) and then analysed by weighted gene coexpression network analysis. The integrated genes were selected to construct the protein-protein interaction (PPI) network and determine the hub genes. The prognostic impact of the hub genes was then analysed. RESULTS: A total of 508 integrated genes were selected from the 615 DEGs and 8956 genes in the turquoise module. A PPI network was constructed, and the top 20 hub genes, including apolipoprotein A-IV (APOA4), fibrinogen gamma chain (FGG), vitamin K-dependent protein Z (PROZ), secreted phosphoprotein 24 (SPP2) and fetuin-B (FETUB), were identified. Only PROZ was significantly associated with the prognosis of early HCC. CONCLUSION: In this study, we demonstrated that the expression of PROZ was decreased in early HCC compared with normal liver controls, and low PROZ expression might result in poor overall survival of early HCC.
OBJECTIVE: The occurrence and development of hepatocellular carcinoma (HCC) remain unclear. This study aimed to investigate potential diagnostic or prognostic markers for early HCC by applying bioinformatic analysis. METHODS: The gene expression profiles of early HCC and normal tissues from a TCGA dataset were used to identify differentially expressed genes (DEGs) and then analysed by weighted gene coexpression network analysis. The integrated genes were selected to construct the protein-protein interaction (PPI) network and determine the hub genes. The prognostic impact of the hub genes was then analysed. RESULTS: A total of 508 integrated genes were selected from the 615 DEGs and 8956 genes in the turquoise module. A PPI network was constructed, and the top 20 hub genes, including apolipoprotein A-IV (APOA4), fibrinogen gamma chain (FGG), vitamin K-dependent protein Z (PROZ), secreted phosphoprotein 24 (SPP2) and fetuin-B (FETUB), were identified. Only PROZ was significantly associated with the prognosis of early HCC. CONCLUSION: In this study, we demonstrated that the expression of PROZ was decreased in early HCC compared with normal liver controls, and low PROZ expression might result in poor overall survival of early HCC.
Liver cancer is one of the most commonly occurring tumours around the world, especially in epidemic areas of hepatic viruses, such as Japan, South Korea, China and Thailand.1–3 Hepatocellular carcinoma (HCC) is the majority type of liver cancer. Although factors, including alcoholic abuse, fatty liver and aflatoxin, could induce the carcinogenesis of HCC,4 hepatic B virus infection is the first aetiology of HCC in these Asian countries.5,6 The number of patients with HCC is still increasing, and the prognosis of HCC remains poor due to a lack of effective treatment.7 It has been reported that the five-year survival rate of advanced-stage HCC is less than 15%,8 while the five-year survival rate of early-stage HCC is more than 75%.9 Thus, the medical burden is heavy in terms of patients with HCC, and thus, early diagnosis of HCC is extremely important.The current methods for diagnosing HCC mainly include the detection of tumour biomarkers, imaging examination and liver biopsy.10 Among the tumour biomarkers, alpha-fetoprotein (AFP) is the most frequently used biomarker for HCC diagnosis and surveillance.11 Other biomarkers, such as glypican-3 (GPC3),12 Golgi protein 73 (GP73),13 and des-γ-carboxyprothrombin (DCP),14 are also believed to be potentially useful in diagnosing HCC. However, the above biomarkers are not effective for diagnosing early HCC. Thus, exploring novel biomarkers is extremely important for improving the long-term survival of patients with HCC. Protein Z, a vitamin K-dependent plasma glycoprotein (PROZ), is synthesized in the liver and secreted into the plasma and has been found to be a novel biomarker for diagnosing early-stage pancreatic cancer.15 However, the value of PROZ for detecting early HCC or predicting the prognosis of early HCC has not been reported. B ultrasonication is the most widely applied examination for HCC and is especially associated with AFP for monitoring HCC development. Liver biopsy is the most useful method for diagnosing HCC. However, for diagnosis of those with HCC with a background of liver cirrhosis, biopsy does not always work.The poor prognosis of those with HCC is mainly related to the late clinical stage when HCC is diagnosed. Thus, exploring early and even very early diagnostic methods for HCC is the most effective approach for improving the prognosis of those with HCC. This study aimed to investigate potential diagnostic or prognostic markers for HCC by applying bioinformatic analysis.
Materials and Methods
Identification of Differentially Expressed Genes (DEGs)
The fragments per kilobase per million values (FPKMs) of the gene expression profiles of 100 early HCCs and 40 normal controls were downloaded from The Cancer Genome Atlas (TCGA) dataset (). The clinical data of the early HCCs were obtained from the TCGA dataset. Early HCC has been defined by the American Joint Committee on Cancer TNM Staging System (AJCC TNM, 2018 Edition) stage I. Only early HCC patients with survival data were included in this study. The gene expression data were then analysed via R software by using the Limma package for identifying the DEGs. The cut-off criterion of DEGs was adjusted with P<0.05 and |log2FC|>1. Written consent was not needed for the public data used in this study.
The TCGA gene dataset of early HCC and controls was further explored by applying WGCNA. The most significant module genes in WGCNA were selected and then intersected with the DEGs that were identified above to obtain the integrated genes.
Functional Enrichment Analysis
Gene ontology (GO) analysis of the integrated genes, including categories of biological processes (BP), cellular components (CC) and molecular functions (MF), was performed in R by employing the Database for Annotation, Visualization and Integrated Discovery (DAVID). Moreover, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the above genes was performed in R.
Protein-Protein Interaction (PPI) Network Analysis and Selection of Hub Genes
The PPI network of the integrated genes was built by using the STRING dataset (the Retrieval of Interacting Genes/Proteins, ), a public website for exploring protein-protein interactions. Then, the hub genes were selected from the PPI network by applying the cytoHubba plugin. The algorithm of maximal clique centrality in cytoHubba was chosen to identify the top 20 hub genes.
Overall Survival Analysis and Expression Validation of the Hub Gene
The clinical data of the included early HCCs were downloaded from TCGA. The overall survival (OS) and disease-free survival (DFS) analyses of the hub genes were performed in R by using the survival package and the clinical data. Then, the gene expression of the prognosis-associated hub genes was validated by using the GSE12443 gene expression profile. Gene expression data of 24 cirrhotic nodules (CNs) and 10 early HCCs in GSE12443 were used in this study.
Statistical Analysis
The data were analysed in SPSS version 22.0 for Windows (IBM, Armonk, New York, USA), GraphPad Software version 7 (GraphPad Software, San Diego, California) and R software version 4.0.2 (The R Foundation for Statistical Computing, USA). The Wilcoxon test was used to identify the DEGs with |Log FC| >1 and an adjusted P value of <0.05. The Pearson’s χ2 test was applied to compare the clinico-pathological characteristics. The Kaplan-Meier method was used for survival analysis. Student’s t-test for normally distributed data and the Mann–Whitney test for nonnormally distributed data were used to identify the differences. The criteria for statistical significance were a P-value of <0.05.
Results
Integrated Genes Identified in Early HCC
A total of 615 DEGs, including 67 upregulated and 548 downregulated DEGs, were identified by comparing 100 early HCCs and 40 normal controls. The ethnicities of the early HCCs were mainly non-Hispanic or non-Latino. The clinical patterns of the 100 early HCCs are summarized in Table 1. Figures 1 and 2 present the heat map and volcano plots of the DEGs. The genes were divided into eight modules by WGCNA (Figure 3A). Among these modules, the turquoise module, which contained 8956 genes, had the highest significance (Figure 3B). Finally, 508 integrated genes were selected by intersecting the genes in the turquoise module and DEGs.
Table 1
Clinical Patterns of the Early HCCs Included in This Study
Terms
Cases
PROZ
P value
High
Low
Age (years)
<0.01
>50
74
45
29
≦50
26
12
14
Sex
<0.01
Male
78
40
38
Female
22
10
12
Risk factor
<0.01
Hepatitis B
57
23
34
Hepatitis C
13
10
3
Alcohol consumption
13
8
5
Non-Alcoholic fatty liver disease
3
2
1
Others
14
6
8
Histologic Grade
0.525
G1
13
8
5
G2
40
22
18
G3
39
17
22
G4
8
3
5
Child Pugh Grade
0.317
A
89
44
45
B
4
1
3
NA
7
5
2
Note: Clinical characteristics were compared with Pearson’s χ2 test.
Heatmap of the differentially expressed genes in early hepatocellular carcinoma. Red, upregulated. Green, downregulated. T, tumour. N, normal. 4, 4, level of gene expression.
Figure 2
Volcano plot of the differentially expressed genes in early hepatocellular carcinoma. Red, upregulated. Green, downregulated.
Figure 3
Weighted gene coexpression network analysis of gene expression profiles between early hepatocellular carcinoma and normal controls. (A) Gene dendrogram. (B) Module-trait relationships. Genes in modules are marked with different colours (brown, red, pink, green, black, magenta, blue and turquoise).
Clinical Patterns of the Early HCCs Included in This StudyNote: Clinical characteristics were compared with Pearson’s χ2 test.Abbreviation: HCC, hepatocellular carcinoma; R, resection.Heatmap of the differentially expressed genes in early hepatocellular carcinoma. Red, upregulated. Green, downregulated. T, tumour. N, normal. 4, 4, level of gene expression.Volcano plot of the differentially expressed genes in early hepatocellular carcinoma. Red, upregulated. Green, downregulated.Weighted gene coexpression network analysis of gene expression profiles between early hepatocellular carcinoma and normal controls. (A) Gene dendrogram. (B) Module-trait relationships. Genes in modules are marked with different colours (brown, red, pink, green, black, magenta, blue and turquoise).
GO and KEGG Analysis of the Integrated Genes
The GO functional analysis results (Figure 4) showed that the integrated genes were mostly enriched in the BP category: small molecule catabolic process, organic acid catabolic process, carboxylic acid catabolic process and fatty acid metabolic process; CC category: mitochondrial matrix, blood microparticle, cytoplasmic vesicle lumen; MF category: coenzyme binding, oxidoreductase activity and monooxygenase activity.
Figure 4
Gene ontology functional enrichment analysis of the integrated genes. The left y-axis represents the P-value. The x-axis represents the ratio of enriched genes.
Gene ontology functional enrichment analysis of the integrated genes. The left y-axis represents the P-value. The x-axis represents the ratio of enriched genes.The KEGG pathway enrichment analysis results (Figure 5) showed that the integrated genes were mainly enriched in pathways, such as chemical carcinogenesis, retinol metabolism and metabolism of xenobiotics by cytochrome P450.
Figure 5
Kyoto encyclopedia of genes and genomes pathway enrichment analysis of the integrated genes. The left y-axis represents the P-value. The x-axis represents the ratio of enriched genes.
Kyoto encyclopedia of genes and genomes pathway enrichment analysis of the integrated genes. The left y-axis represents the P-value. The x-axis represents the ratio of enriched genes.
PPI Network Construction and Identification of Hub Genes
A PPI network with 477 nodes and 3752 edges was constructed for the integrated genes to evaluate the protein-protein interactions (Figure 6). The average local clustering coefficient was 0.429, and the PPI enrichment P value was < 1.0e-16. The hub genes with the highest connections were identified (Figure 6 and ), namely, apolipoprotein A-IV (APOA4), fibrinogen gamma chain (FGG), protein Z, vitamin K dependent plasma glycoprotein (PROZ), secreted phosphoprotein 24 (SPP2) and fetuin-B (FETUB), complement C8 alpha chain (C8A), fibrinogen alpha chain (FGA), coagulation factor XI (F11), fibrinogen beta chain (FGB), mannose-binding lectin 2 (MBL2), kallikrein B1 (KLKB1), histidine rich glycoprotein (HRG), coagulation factor XII (F12), HGF activator (HGFAC), apolipoprotein A-I (APOA1), coagulation factor IX (F9), complement C6 (C6), GC vitamin D binding protein (GC), formimidoyltransferase cyclodeaminase (FTCD) and angiopoietin-like-3 (ANTL3). All these hub genes were downregulated in early HCCs.
Figure 6
Protein-protein interaction network and the selected hub genes from the integrated genes. Blue nodes represent genes. Edges represent the associations. Red and yellow nodes represent hub genes.
Protein-protein interaction network and the selected hub genes from the integrated genes. Blue nodes represent genes. Edges represent the associations. Red and yellow nodes represent hub genes.
Overall Survival Analysis and Expression Validation of the Hub Genes
OS and DFS analyses of the identified hub genes were performed by using TCGA clinical data of the early HCCs. The results showed that the low expression of PROZ was significantly associated with poor prognosis of early HCCs (P=0.019) (Figure 7 and ). There was no significant correlation between the expression level of PROZ and the prognosis of early HCC (). However, DFS analysis results showed that there was no significant difference in the expression level of PROZ (=0.444) (Figure 7). The GSE data analysis validated that PROZ was significantly downregulated in early HCC compared with cirrhotic nodules (P=0.012) (Figure 8).
Figure 7
Prognostic analysis of PROZ in early HCC. (A) Disease-free survival, (B) overall survival, 1, low expression of PROZ, 2, relatively low expression of PROZ, 3 relatively high expression of PROZ, 4, high expression of PROZ.
Figure 8
Validation of the downregulation of PROZ in early HCC and controls.
Prognostic analysis of PROZ in early HCC. (A) Disease-free survival, (B) overall survival, 1, low expression of PROZ, 2, relatively low expression of PROZ, 3 relatively high expression of PROZ, 4, high expression of PROZ.Validation of the downregulation of PROZ in early HCC and controls.
Discussion
HCC is a leading cause of death worldwide, especially in Asian areas where chronic hepatitis virus infection spreads. The poor prognosis of HCC was mainly associated with the low early diagnosis rate. Thus, exploring the underlying mechanism of the development of HCC is important for identifying early diagnostic biomarkers for HCC and for improving the prognosis of HCC. Although various studies have attempted to discover new biomarkers for predicting the prognosis of HCC or detecting early HCC,16,17 these biomarkers lack effectiveness for HCC, and most studies have focused on whole-stage HCC.In this study, we obtained 508 significant genes by exploring the expression dataset of early HCC and normal controls by using WGCNA. These genes were mainly involved in biological processes, such as small molecule catabolic process, organic acid catabolic process, oxidoreductase activity and monooxygenase activity and the pathways of chemical carcinogenesis, retinol metabolism and metabolism of xenobiotics by cytochrome P450. Then, we identified 20 hub genes from the integrated genes, but only the PROZ gene was significantly associated with the prognosis of early HCC.PROZ encodes a glycoprotein, namely, the protein Z (PZ)/protein Z-dependent protease inhibitor (ZPI), which plays a critical role in blood coagulation under physiological conditions. PROZ is mainly synthesised in the liver and kidney tissues and then secreted into the blood. Although the plasma level of PZ has been reported to be decreased in cancers, such as acute leukaemia and acute lymphoblastic leukaemia,18,19 the PROZ expression level was found to be increased in most cancer types, especially in advanced cancer stages.18 The function of PROZ in tumorigenesis is still unknown. PROZ might promote cancer progression by limiting the activation of blood coagulation.20 PROZ was found to be highly expressed in pancreatic cancer compared with healthy controls and pancreatic benign controls, which demonstrated that PROZ might be a novel biomarker for the early diagnosis of pancreatic cancer.15 This illustrated that PROZ might play a critical role in the early diagnosis of HCC.In breast cancer, the PZ protein has been reported to be strongly expressed in breast cancer cells compared to normal breast tissue.21,22 Similar results were also observed for endometrial cancer.23 Moreover, for gastric cancer, the mRNA expression of PROZ was detected in gastric cancer cells and not in normal tissues.24 Factors X and PZ have been demonstrated to be colocalized in gastric cancer tissues.24 The above phenomenon was also observed in colon cancer and non-small lung cancer.25,26 Additionally, the mRNA and protein expression levels of PROZ were elevated in lung adenocarcinoma cells compared to normal healthy lung tissues and might serve as a prognostic biomarker for lung cancer.27However, PROZ was found to be decreased in HCC tissues compared with control tissues and was significantly associated with overall survival in HCC.28 This was consistent with the results of this study. However, in our study, a difference in PROZ was found by comparing early HCC tissues with normal liver tissues. The low expression of PROZ contributed to the poor prognosis of early HCC. Furthermore, PROZ has been shown to be hypermethylated in HCC compared to normal liver controls.29 Although hypermethylated PROZ might increase the cell clonogenicity and viability in HCC cells and promote their invasion and metastasis, PROZ cannot be completely considered a tumour suppressor gene.29,30Although we preliminarily verified PROZ expression by using the GSE dataset, our study had some limitations, such as the relatively low number of early HCCs and the lack of experimental results validating PROZ. Thus, more experiments and more samples need to be performed and included to validate our results.
Authors: Ewa Sierko; Marek Z Wojtukiewicz; Lech Zimnoch; Piotr Tokajuk; Krystyna Ostrowska-Cichocka; Walter Kisiel Journal: Thromb Res Date: 2011-11-17 Impact factor: 3.944
Authors: Ewa Sierko; Ewa Zabrocka; Krystyna Ostrowska-Cichocka; Piotr Tokajuk; Lech Zimnoch; Marek Z Wojtukiewicz Journal: In Vivo Date: 2019 May-Jun Impact factor: 2.155
Authors: Simona Signoriello; Annalisa Annunziata; Nicola Lama; Giuseppe Signoriello; Paolo Chiodini; Ilario De Sio; Bruno Daniele; Giovanni G Di Costanzo; Fulvio Calise; Graziano Olivieri; Vincenzo Castaldo; Rosario Lanzetta; Guido Piai; Giampiero Marone; Mario Visconti; Mario Fusco; Massimo Di Maio; Francesco Perrone; Ciro Gallo; Giovanni B Gaeta Journal: ScientificWorldJournal Date: 2012-05-03