Literature DB >> 30720105

Prediction and analysis of weighted genes in hepatocellular carcinoma using bioinformatics analysis.

Qifan Zhang1, Shibo Sun1, Chen Zhu2, Yujian Zheng3, Qing Cai3, Xiaolu Liang1, Haorong Xie1, Jie Zhou1.   

Abstract

The aim of the present study was to identify the differentially expressed genes (DEGs) between primary tumor tissue and adjacent non‑tumor tissue of hepatocellular carcinoma (HCC) samples in order to investigate the mechanisms of HCC. The microarray data of the datasets GSE76427, GSE84005 and GSE57957 were downloaded from the Gene Expression Omnibus database. DEGs were identified using the limma package in the R programming language. Following the intersection of the DEGs screened from the three datasets, 218 genes were selected for further study. A protein‑protein interaction (PPI) network was constructed using the Search Tool for the Retrieval of Interacting Genes database. The construction and analysis of modules were performed using Cytoscape and the module with the highest score was selected for further analysis. Gene Ontology enrichment analysis and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were conducted for genes involved in the PPI network and the selected subnetwork. The network of the enriched pathways and their associated genes was constructed using Cytoscape. For the genes in the global PPI network, metabolism‑associated pathways were significantly enriched; whereas, for the genes in the subnetwork, 'cell cycle', 'oocyte meiosis' and 'DNA replication' pathways were significantly enriched. To demonstrate the portability and repeatability of the prognostic value of the weighted genes, a validation cohort was obtained from datasets of The Cancer Genome Atlas and Kaplan‑Meier survival analysis was conducted. Evidence is presented that the expression levels of aldehyde dehydrogenase 2 family member, cytochrome P450 family 2 subfamily C member 8, alcohol dehydrogenase 4 (class II), pi polypeptide, alcohol dehydrogenase 1B (class I), β polypeptide and cytochrome P450 family 2 subfamily C member 9 were associated with the overall survival of patients with HCC and that the expression levels of pituitary tumor‑transforming 1, cell division cycle 20, DNA topoisomerase II α and cyclin B2 were negatively associated with the overall survival of patients with HCC. In conclusion, 9 weighted genes, involved in the development and progression of HCC, were identified using bioinformatics and survival analyses.

Entities:  

Mesh:

Year:  2019        PMID: 30720105      PMCID: PMC6423588          DOI: 10.3892/mmr.2019.9929

Source DB:  PubMed          Journal:  Mol Med Rep        ISSN: 1791-2997            Impact factor:   2.952


Introduction

Cancer is a principal public health problem globally and hepatocellular carcinoma (HCC) is one of the most frequently diagnosed types of cancer. HCC is the most common form of liver cancer. It is estimated that there will be 42,220 newly diagnosed cases of liver and intrahepatic bile duct cancer in the United States in 2018 (1). HCC is a complex and heterogeneous malignancy that arises in the context of progressive underlying liver dysfunction. Hepatitis B and C viruses are the primary risk factors for HCC and 80–90% of the incidence of HCC is associated with chronic viral hepatitis B or C (2,3). Recurrence is the principal cause of HCC-associated death. Five-year recurrence rates >70% have been reported despite the use of surgical or locoregional therapies in the earlier stages (4). In addition, the prognosis of patients with advanced-stage HCC is poor, with an overall survival rate <5% (5). Due to the great threat of HCC to human health, novel diagnostic and therapeutic methods are required for early cancer detection and effective treatment. In previous years, a large number of genomic and proteomic studies have been conducted in order to examine the molecular mechanisms underlying the development and progression of HCC. The characterization of HCC has provided valuable information regarding this complex disease. Previous advances in high-throughput microarrays have received a large amount of attention and have made substantial progress in reconstructing the gene regulatory networks involved in medical biology (6). Using microarray analysis, significant differences in the levels of gene expression in normal and diseased tissues have been observed. However, as a result of the underlying shortcomings of microarray technology, including small sample size, measurement error and information insufficiency, unveiling the disease mechanism has remained a principal challenge to HCC research (7). Therefore, Gene Ontology (GO), pathway information, network-based approaches and machine learning algorithms have been employed to identify the mechanisms underlying the development of HCC (7). In the present study, microarray data were obtained from the Gene Expression Omnibus (GEO) database. The differentially expressed genes (DEGs) between primary tumor (PT) tissue and adjacent non-tumor tissue (ANTT) were identified from samples of HCC. In total, nine significant target genes for the diagnosis of HCC were identified based on GO processes, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, protein-protein interaction (PPI) networks and prognosis analysis of the clinical information from The Cancer Genome Atlas (TCGA) database.

Materials and methods

Datasets

All of the datasets included in the present study were obtained from the National Center of Biotechnology Information's GEO database (www.ncbi.nlm.nih.gov/gds/). The original mRNA expression profiles were obtained from the GSE76427 (8), GSE84005 (not published) and GSE57957 (9) datasets. In the GSE76427 dataset, there were 167 total RNA samples consisting of 115 PT tissue samples and 52 ANTT samples that were derived from 115 patients with HCC. The platform used was GPL10558 Illumina HumanHT-12 V4.0 Expression BeadChip. In the GSE84005 dataset, PT tissue samples and ANTT samples were from 38 patients with primary HCC and the platform used was GPL5175 Affymetrix Human Exon 1.0 ST Array. In the GSE57957 dataset, total RNA was obtained from 39 PT tissue samples and ANTT samples and the platform used was GPL10558 Illumina HumanHT-12 V4.0 Expression BeadChip. The clinical information of the patients was included in the three datasets (data not shown).

Identification of DEGs

Background correction and quartile data normalization of the downloaded data were performed using the robust multi-array average algorithm (10). Probes without a corresponding gene symbol were subsequently filtered and the average value of the gene symbols with multiple probes was calculated. Student's t-test and fold change (FC) filtering were conducted to screen the DEGs between two groups by using the R software (version 3.4.2; www.R-project.org/) limma (version 3.32.3) package (11,12). With a threshold of P-value <0.05 and absolute value of FC >2, volcano plot filtering was performed using the R software ggplot2 package to identify the DEGs with statistical significance between PT tissue samples and ANTT samples. The DEGs from the three datasets were intersected and the DEGs with different expression tendency were eliminated.

PPI and the module analysis

The Search Tool for the Retrieval of Interacting Genes (STRING) database (string-db.org) provides uniquely comprehensive coverage of, and ease of access to experimental and predicted interaction information. To better understand the DEGs from an interactive perspective, a PPI network was built based on information from the STRING database. A combined score of >0.4 was set as the reliability threshold. Cytoscape is a useful tool for integrated analysis and visualization of biological networks. Cytoscape software (version 3.4) was used to visualize the PPI network (13). The network topology was analyzed using the CentiScaPe app (14) and the module analysis was conducted using the MCODE app (15).

Gene function analysis

GO enrichment analysis of the DEGs was implemented using DAVID (http://david.abcc.ncifcrf.gov/). GO terms (‘molecular function’, ‘biological processes’ and ‘cellular components’) with a P-value <0.05 were considered significantly enriched by the DEGs. KEGG is a database resource for understanding high-level functions and effects of the biological system (http://www.genome.jp/kegg/). DAVID was additionally used to test the statistical enrichment of genes or target genes of microRNA that were differentially expressed in KEGG pathways. The networks of the pathways (P<0.05) and pathway-associated genes were constructed by using the Cytoscape (version 3.4.0) plugin ClueGO (16) in addition to the Cluepedia (17) app. The network topology was analyzed using the CentiScaPe app (14). The genes that were associated with at least three pathways (degree ≥3) were defined as cross-talk genes.

TCGA datasets analysis

TCGA is a platform for researchers to download and assess free public datasets. In the present study, the prognostic value of the weighted genes was confirmed by Kaplan-Meier survival analysis based on the clinical information from TCGA datasets using OncoLnc (18). For the statistical analysis, the patients were divided into two groups based on gene expression values. Specifically, patients with expression values greater than the median value were classified into the high expression group, while the rest were classified into the low expression group.

Results

Screening of DEGs

With a threshold of P-value <0.05 and absolute value of FC >1, DEGs were identified from each of the three datasets. 494 DEGs were screened from the GSE76427 dataset, which consisted of 92 upregulated genes and 402 downregulated genes. A total of 1,005 DEGs were screened from the GSE84005 dataset, consisting of 478 upregulated genes and 527 downregulated genes. 417 DEGs were screened from the GSE57957 dataset, consisting of 109 upregulated genes and 308 downregulated genes. Volcano plots were used to visualize differential expression of genes between the tumor group and non-tumor group (Fig. 1A-C). Subsequent to the DEGs from the three datasets being intersected, 218 DEGs were selected for further analysis (Fig. 1D).
Figure 1.

Identification of DEGs. Volcano plots of the gene expression data from the (A) GSE76427, (B) GSE84005 and (C) GSE57957 datasets. The horizontal axis represents the log2 (fold change) and the vertical axis represents the -log10 (P-value). The blue plots represent the selected DEGs. (D) 218 DEGs were selected for further analysis after the DEGs from three datasets were intersected. DEGS, differentially expressed genes.

Construction of the PPI network and module analysis

Following elimination of the DEGs with different expression tendency and isolated nodes in the network, a total of 170 nodes and 666 edges were included in the PPI network (Fig. 2A). The top five DEGs with the highest degrees were DNA topoisomerase II α (TOP2A; degree=42), CYP2E1 (cytochrome P450 family 2 subfamily E member 1; degree=27), cell division cycle 20 (CDC20; degree=22), cytochrome P450 family 2 subfamily C member 9 (CYP2C9; degree=22) and kynurenine 3-monooxygenase (degree=22). Furthermore, a total of 12 modules were separated, and the module with the highest MCODE score (of 17.765) was selected for further analysis (Fig. 2B). In this module, a total of 18 DEGs, including TOP2A, were involved and all of them were downregulated genes.
Figure 2.

Protein-protein interaction network of the DEGs. (A) The global network and (B) the module obtained from the global network with the highest score. The red nodes represent the downregulated DEGs and the blue nodes indicate the upregulated DEGs. The molecular interactions between the DEGs are indicated by edges and the deeper the color of edges, the higher the combined score. DEGS, differentially expressed genes.

GO and KEGG enrichment analysis of the selected DEGs

GO and KEGG enrichment analysis were performed for the selected 218 mRNAs to investigate the biological functions of these genes. In the GO analysis, all of the results were ranked by enrichment score [-log (P-value)] and the top 10 results of each category are presented in Fig. 3. In the analysis of the ‘biological processes’, ‘oxidation-reduction process’, ‘drug metabolic process’ and ‘epoxygenase P450 pathway’ were the top three enriched terms. In the ‘cellular component’ analysis, ‘organelle membrane’, ‘extracellular region’ and ‘extracellular space’ were the top three enriched terms. In the ‘molecular function’ analysis, ‘heme binding’, ‘oxidoreductase activity and acting on paired donors with incorporation or reduction of molecular oxygen’ and ‘monooxygenase activity’ were the top three enriched terms. The results of the KEGG pathway analysis were additionally ranked by enrichment score and the pathways with P-value <0.05 are demonstrated in Fig. 4A. The top three enriched pathways were ‘metabolic pathways’, ‘retinol metabolism’ and ‘chemical carcinogenesis’. The KEGG pathway network composed of the significantly enriched pathways is presented in Fig. 4B and it revealed that a number of metabolism-associated pathways were enriched. The network constructed with the pathways and their associated genes (Fig. 4C) revealed that CYP1A2, alcohol dehydrogenase 1B (class I), β polypeptide (ADH1B), alcohol dehydrogenase 4 (class II), pi polypeptide (ADH4), cytochrome P450 family 3 subfamily A member 4, cytochrome P450 family 2 subfamily A member 6 (CYP2A6), CYP2C9, CYP2E1, cytochrome P450 family 2 subfamily C member 8 (CYP2C8), alanine-glyoxylate aminotransferase 2, aldehyde dehydrogenase 2 family member (ALDH2), N-acetyltransferase 2 and UDP glucuronosyltransferase family 2 member B10 were cross-talk genes associated with at least three pathways. KEGG pathway analysis was additionally conducted for the 18 DEGs from the subnetwork and it revealed that ‘cell cycle’, ‘oocyte meiosis’ and ‘DNA replication’ pathways were significantly enriched (Fig. 5). In the KEGG pathway network, CDC20, cyclin B2 (CCNB2) and pituitary tumor-transforming 1 (PTTG1) were associated with ‘cell cycle’ and ‘oocyte meiosis’, and minichromosome maintenance complex component 2 was associated with ‘cell cycle’ and ‘DNA replication’.
Figure 3.

Top 10 enrichment scores in the Gene Ontology enrichment analysis of the selected mRNAs. Red bars represent ‘biological process’ terms, green bars represent ‘cell component’ terms and blue bars represent ‘molecular function’ terms.

Figure 4.

KEGG pathway enrichment analysis of the selected mRNAs. (A) The significantly enriched pathways (P<0.05) in the KEGG pathway analysis of the selected mRNAs. The network of significantly enriched pathways (P<0.05) (B) without associated genes. Larger nodes represent larger enrichment scores and the different colors represent different enrichment modules. KEGG pathway enrichment analysis of the selected mRNAs. (C) The network of significantly enriched pathways (P<0.05) with associated genes. Larger nodes represent larger enrichment scores and the different colors represent different enrichment modules. KEGG, Kyoto Encyclopedia of Genes and Genomes.

Figure 5.

KEGG pathway enrichment analysis of the genes involved in the subnetwork. The network of the significantly enriched pathways (P<0.05) with their associated genes. Larger nodes represent larger enrichment scores and the different colors represent different enrichment modules. KEGG, Kyoto Encyclopedia of Genes and Genomes; AURKA, aurora kinase A; CDC20, cell division cycle 20; CCNB2, cyclin B2; PTTG1, pituitary tumor-transforming 1; RFC4, replication factor C subunit 4; MCM2, minichromosome maintenance complex component 2.

To demonstrate the portability and repeatability of the prognostic value of the weighted genes, a validation cohort was obtained from TCGA datasets and Kaplan-Meier survival analysis was performed. The log-rank test confirmed that high expression levels of PTTG1, CDC20, TOP2A and CCNB2 and low expression levels of ALDH2, CYP2C8, ADH4, ADH1B and CYP2C9 were negatively associated with the overall survival of patients with HCC (Fig. 6).
Figure 6.

Kaplan-Meier survival analysis of the expression levels of the weighted genes and the overall survival of patients with HCC. Expression levels of (A) ALDH2, (B) PPTG1, (C) CYP2C8, (D) ADH4, (E) ADH1B, (F) CYP2C8, (G) CDC20, (H) TOP2A and (I) CCNB2 are presented. HCC, hepatocellular carcinoma; ALDH2, aldehyde dehydrogenase 2 family member; CYP2C8, cytochrome P450 family 2 subfamily C member 8; ADH4, alcohol dehydrogenase 4 (class II), pi polypeptide; ADH1B, alcohol dehydrogenase 1B (class I), β polypeptide; CYP2C9, cytochrome P450 family 2 subfamily C member 9; PTTG1, pituitary tumor-transforming 1; CDC20, cell division cycle 20; TOP2A, DNA topoisomerase II α; CCNB2, cyclin B2.

Discussion

HCC is the leading cause of cancer-associated mortality worldwide, owing to limited insights into the pathogenesis and the unsatisfactory efficacy of current therapies. Even in cases of curative surgical treatment, recurrence is common. Sorafenib and regorafenib, two oral multi-kinase inhibitors, are the only therapeutic agents that have been demonstrated to be effective in the treatment of advanced HCC (19,20), thus novel curative approaches are urgently required. The human cytochrome P450 (CYP) enzymes are a superfamily comprised of >50 different genes categorized into 18 families, which share ~40% sequence homology (21). CYPs are primarily expressed in the liver and their primary role is the metabolism of xenobiotics in order to protect the organism from xenobiotics and environmental toxins (22). Due to the important role of CYPs in drug metabolism, the alterations in CYP activity caused by HCC may influence the pharmacokinetics of drugs used in treating HCC. A number of studies have reported that dysregulation of CYPs, including CYP2A6, CYP2C9, CYP2E1 and CYP3A5, serves an important role in the development of HCC (23–25). In the present study, evidence is presented that downregulation of CYP2C8 and CYP2C9 is associated with poor prognosis and all of the CYPs in the PPI network were downregulated. The expression levels of four of the genes screened from the subnetwork, PTTG1, CDC20, TOP2A and CCNB2, negatively associated with the overall survival of patients with HCC. The prognostic value of TOP2A in HCC has been highlighted prior; Wong et al (26) identified that overexpression of TOP2A in HCC was associated with early-age onset, shorter survival and chemo-resistance. The results of the KEGG pathway analysis in the present study demonstrated that CDC20, CCNB2 and PTTG1 were associated with ‘cell cycle’ and ‘oocyte meiosis’, suggesting that overexpression of these genes may be responsible for the proliferation of the tumor cells. It has been reported in previous studies that increased expression levels of CDC20 and PTTG1 are associated with the development and progression of HCC (27,28); however, the association between CCNB2 and HCC is still largely unknown. The prognostic value of PTTG1, CDC20 and CCNB2 requires further evaluation and the interaction between the genes in the subnetwork requires further investigation. From the systematic bioinformatics analysis and survival analysis of the clinical information from TCGA, another three key genes involved in HCC, ALDH2, ADH4 and ADH1B, were screened. ADH4 is an important member of the ADH family that metabolizes a wide variety of substrates, including ethanol and retinol (29). Previous studies have demonstrated that ADH4 is involved in cancer, including HCC (30,31). Mitochondrial ALDH2 in the liver removes toxic aldehydes including acetaldehyde, an intermediate of ethanol metabolism, and ALDH2 mutation increases protein turnover and promotes murine HCC (32). To the best of the authors' knowledge, there is no study regarding the association between ADH1B and HCC. In conclusion, nine weighted genes involved in the development and progression of HCC were identified using bioinformatics analysis and survival analysis. However, further experimental verification is required to confirm the potential effects of the weighted genes in HCC.
  30 in total

1.  Identification of ADH4 as a novel and potential prognostic marker in hepatocellular carcinoma.

Authors:  Rong-Rong Wei; Mei-Yin Zhang; Hui-Lan Rao; Heng-Ying Pu; Hui-Zhong Zhang; Hui-Yun Wang
Journal:  Med Oncol       Date:  2011-12-07       Impact factor: 3.064

2.  Insight into hepatocellular carcinogenesis at transcriptome level by comparing gene expression profiles of hepatocellular carcinoma with those of corresponding noncancerous liver.

Authors:  X R Xu; J Huang; Z G Xu; B Z Qian; Z D Zhu; Q Yan; T Cai; X Zhang; H S Xiao; J Qu; F Liu; Q H Huang; Z H Cheng; N G Li; J J Du; W Hu; K T Shen; G Lu; G Fu; M Zhong; S H Xu; W Y Gu; W Huang; X T Zhao; G X Hu; J R Gu; Z Chen; Z G Han
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-18       Impact factor: 11.205

Review 3.  Hepatocellular carcinoma--epidemiological trends and risk factors.

Authors:  Kerstin Schütte; Jan Bornschein; Peter Malfertheiner
Journal:  Dig Dis       Date:  2009-06-22       Impact factor: 2.404

Review 4.  Sorafenib for HCC: a pragmatic perspective.

Authors:  Nataliya Razumilava; Gregory J Gores
Journal:  Oncology (Williston Park)       Date:  2011-03       Impact factor: 2.990

5.  Patterns of expression of cytochrome P450 genes in progression of hepatitis C virus-associated hepatocellular carcinoma.

Authors:  Ryouichi Tsunedomi; Norio Iizuka; Yoshihiko Hamamoto; Shunji Uchimura; Takanobu Miyamoto; Takao Tamesa; Toshimasa Okada; Norikazu Takemoto; Motonari Takashima; Kazuhiko Sakamoto; Kenji Hamada; Hisafumi Yamada-Okabe; Masaaki Oka
Journal:  Int J Oncol       Date:  2005-09       Impact factor: 5.650

6.  Alcohol dehydrogenase genes: restriction fragment length polymorphisms for ADH4 (pi-ADH) and ADH5 (chi-ADH) and construction of haplotypes among different ADH classes.

Authors:  K Edman; W Maret
Journal:  Hum Genet       Date:  1992-12       Impact factor: 4.132

7.  Tissue microarrays for high-throughput molecular profiling of tumor specimens.

Authors:  J Kononen; L Bubendorf; A Kallioniemi; M Bärlund; P Schraml; S Leighton; J Torhorst; M J Mihatsch; G Sauter; O P Kallioniemi
Journal:  Nat Med       Date:  1998-07       Impact factor: 53.440

8.  TOP2A overexpression in hepatocellular carcinoma correlates with early age onset, shorter patients survival and chemoresistance.

Authors:  Nathalie Wong; Winnie Yeo; Wai-Lap Wong; Navy L-Y Wong; Kathy Y-Y Chan; Frankie K-F Mo; Jane Koh; Stephan Lam Chan; Anthony T-C Chan; Paul B-S Lai; Arthur K-K Ching; Joanna H-M Tong; Ho-Keung Ng; Philip J Johnson; Ka-Fai To
Journal:  Int J Cancer       Date:  2009-02-01       Impact factor: 7.396

9.  Methylation profiles reveal distinct subgroup of hepatocellular carcinoma patients with poor prognosis.

Authors:  Way-Champ Mah; Thomas Thurnherr; Pierce K H Chow; Alexander Y F Chung; London L P J Ooi; Han Chong Toh; Bin Tean Teh; Yogen Saunthararajah; Caroline G L Lee
Journal:  PLoS One       Date:  2014-08-05       Impact factor: 3.240

10.  Tumor-adjacent tissue co-expression profile analysis reveals pro-oncogenic ribosomal gene signature for prognosis of resectable hepatocellular carcinoma.

Authors:  Oleg V Grinchuk; Surya P Yenamandra; Ramakrishnan Iyer; Malay Singh; Hwee Kuan Lee; Kiat Hon Lim; Pierce Kah-Hoe Chow; Vladamir A Kuznetsov
Journal:  Mol Oncol       Date:  2017-12-12       Impact factor: 6.603

View more
  5 in total

1.  Tacrolimus increases the expression level of the chemokine receptor CXCR2 to promote renal fibrosis progression.

Authors:  Dongdong Wang; Xiao Chen; Meng Fu; Hong Xu; Zhiping Li
Journal:  Int J Mol Med       Date:  2019-10-10       Impact factor: 4.101

2.  Identification and Validation of Key Genes in Hepatocellular Carcinoma by Bioinformatics Analysis.

Authors:  Jia Wang; Rui Peng; Zheng Zhang; Yixi Zhang; Yuke Dai; Yan Sun
Journal:  Biomed Res Int       Date:  2021-02-23       Impact factor: 3.411

3.  Identification of hub genes and biological pathways in hepatocellular carcinoma by integrated bioinformatics analysis.

Authors:  Qian Zhao; Yan Zhang; Shichun Shao; Yeqing Sun; Zhengkui Lin
Journal:  PeerJ       Date:  2021-01-19       Impact factor: 2.984

4.  Low expression of KIF20A suppresses cell proliferation, promotes chemosensitivity and is associated with better prognosis in HCC.

Authors:  Chuanxing Wu; Xiaosheng Qi; Zhengjun Qiu; Guilong Deng; Lin Zhong
Journal:  Aging (Albany NY)       Date:  2021-09-06       Impact factor: 5.682

5.  A novel immunodiagnosis panel for hepatocellular carcinoma based on bioinformatics and the autoantibody-antigen system.

Authors:  Jinyu Wu; Peng Wang; Zhuo Han; Tiandong Li; Chuncheng Yi; Cuipeng Qiu; Qian Yang; Guiying Sun; Liping Dai; Jianxiang Shi; Keyan Wang; Hua Ye
Journal:  Cancer Sci       Date:  2021-12-14       Impact factor: 6.716

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.