Literature DB >> 34760691

Bioinformatics Analysis Using ATAC-seq and RNA-seq for the Identification of 15 Gene Signatures Associated With the Prediction of Prognosis in Hepatocellular Carcinoma.

Hui Yang1, Gang Li1, Guangping Qiu1.   

Abstract

BACKGROUND: Gene expression (RNA-seq) and overall survival (OS) in TCGA were combined using chromosome accessibility (ATAC-seq) to search for key molecules affecting liver cancer prognosis.
METHODS: We used the assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) to analyse chromatin accessibility in the promoter regions of whole genes in liver hepatocellular carcinoma (LIHC) and then screened differentially expressed genes (DEGs) at the mRNA level by transcriptome sequencing technology (RNA-seq). We obtained genes significantly associated with overall survival (OS) by a one-way Cox analysis. The three were screened by taking intersection and further using a Kaplan-Meier (KM) for validation. A prognostic model was constructed using the obtained genes by LASSO regression analysis.The expression of these genes in hepatocellular carcinomas was then analysed. The protein expression of these genes was verified using the Human Protein Atlas(HPA) online datasets and immunohistochemistry.
RESULTS: ATAC-seq, RNA-seq and survival analysis, combined with a LASSO prediction model, identified signatures of 15 genes (PRDX6, GCLM, HTATIP2, SEMA3F, UCK2, NOL10, KIF18A, RAP2A, BOD1, GDI2, ZIC2, GTF3C6 SLC1A5, ERI3 and SAC3D1), all of which were highly expressed in hepatocellular carcinoma. The LASSO prognostic model showed that this risk score had high predictive accuracy for the survival prognosis at 1, 3 and 5 years. A KM curve analysis showed that high expression of all 15 gene signatures was significantly associated with a poor prognosis in LIHC patients. HPA analysis of protein expression showed that PRDX6, GCLM, HTATIP2, NOL10, KIF18A, RAP2A and GDI2 were highly expressed in the hepatocellular carcinoma tissues compared with normal control tissues.
CONCLUSIONS: PRDX6, GCLM, HTATIP2, SEMA3F, UCK2, NOL10, KIF18A, RAP2A, BOD1, GDI2, ZIC2, GTF3C6, SLC1A5, ERI3 and SAC3D1 may affect the prognosis of LIHC.
Copyright © 2021 Yang, Li and Qiu.

Entities:  

Keywords:  ATAC-seq; LASSO model; chromatin accessibility; hepatocellular liver cancer; prognosis

Year:  2021        PMID: 34760691      PMCID: PMC8573251          DOI: 10.3389/fonc.2021.726551

Source DB:  PubMed          Journal:  Front Oncol        ISSN: 2234-943X            Impact factor:   6.244


Introduction

According to Global Cancer Statistics 2020, hepatocellular liver cancer has the seventh-highest incidence rate but the second-highest mortality rate after lung cancer. As a common malignancy, with potentially fatal consequences, hepatocellular carcinomas have been widely studied (1). Although much research has focused on understanding hepatocellular cancer at the molecular level and therapeutic strategies have been developed, the biological mechanisms of hepatocarcinogenesis remain unclear. Due to slow progress in liver cancer research (2, 3), the patient survival rate remains low (i.e. < 8 months) (4, 5). Liver cancer is a more complex disease than other cancers, as its progression includes genetic modification processes, including gene mutations, gene deletions, translocations and DNA methylation (6). Therefore, early diagnosis and treatment are essential for improving the prognosis of liver cancer. To date, the only effective diagnostic method is detection of the serum tumour marker alpha-fetoprotein, which has an upper limit of normal value of 20 ng/mL (7). However, alpha-fetoprotein is nonspecific and has little statistical significance when detected in patients with different types of liver cancer (8). It is also ineffective for the diagnosis of early-stage liver cancer (9). In eukaryotic cells, nuclear DNA and proteins combine to form chromatin, which then undergoes complex and orderly folding to form chromosomes (10). For genes to be expressed, chromatin must be in an open conformation. Open chromatin allows regulatory proteins to bind to DNA and regulate DNA function (11). The assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) enables high-throughput sequencing of open chromatin regions with the help of transposases. This simple method, which is very similar to ChIP-seq, requires only a small number of samples to obtain clear and reproducible sequencing results (12). ATAC-seq detects chromatin accessibility of related genes and indicates their regulatory mechanisms. Thus, genes with chromatin accessibility in promoters are more likely to be differentially expressed at the mRNA level and regulated by transcription factors (13). Due to the complexity of gene expression regulatory mechanisms, it is crucial to be able to probe biological questions at different levels. Therefore, the integration and analysis of multi-omics is increasingly important. Differentially expressed genes (DEGs) can be analysed using RNA-seq and Chip-seq (14). Using Chip-seq, the regulatory role of specific transcription factors can be studied (15). ATAC-seq can shed light on the dynamics of chromatin accessibility. As chromatin accessibility is closely related to the binding of regulatory elements or transcription factors, it plays an important role in transcriptional regulation (12). Therefore, integration analysis can further explore the key factors of a biological process, as well as the target genes of a transcription factor. Currently, integration analysis studies combining ATAC-seq and RNA-seq are uncommon, with no such studies conducted on hepatocellular carcinomas. Therefore, in this study, we constructed a 15 gene signatures for predicting the prognosis in hepatocellular carcinoma patients by analysing and integrating ATAC-seq and RNA-seq.

Methods

Data Sources

ATAC-seq data on LIHC were obtained from the database of the University of California Santa Cruz (UCSC) (https://xenabrowser.net/datapages/). In total, ATAC-seq data of 404 LIHC samples were obtained, 17 of which were from TCGA database. The data were downloaded in promoter peak data format, with normalized correction. The calibration process included count conversion to CPM, after a base 2 logarithmic transformation. RNA-seq data on LIHC were downloaded from the TCGA database, with 371 tumour samples and 50 para cancer samples.

Chromatin Accessibility Analysis Using ATAC-seq

To explore the accessibility of chromatin, we first used the R package chromosome locator to show the peak regions on chromosomes. Peaks that could be mapped to TSS regions were aligned using the R-packaged ChIPseeker to construct a marker matrix. The nearest TSS region was selected for peak annotation. The annotation information was obtained from the R software. The relationship between open chromatin and promoter regions was revealed by UpSet plots.

Analysis of DEGs Using RNA-seq

To assets differential expression of mRNA, the Limma package of R software (version: 3.40.2) was used. Adjusted P values in the TCGA dataset were analysed to correct for false-positive results. DEGs were obtained by screening with |log2(FC)| > 1 (P < 0.05). Heat and volcano plots were plotted using the R package ggplot2.

Gene Oncology and Kyoto Encyclopedia of Genes and Genomes Enrichment Analysis

Peaks-associated genes were analysed by functional enrichment analysis. The ClusterProfiler package in R was used to analyse the GO/KEGG enrichment pathway of potential peaks-associated genes.

Survival Analysis

The R package Survival was used for the survival analysis. The correlation between the expression levels of all known genes in LIHC and the overall survival (OS) of patients with hepatocellular liver cancer was analysed by a one-way Cox regression analysis, reporting hazard ratios (HRs) and their 95% confidence intervals (CIs). A Kaplan–Meier (KM) test was performed to analyse the difference between the survival of patients with high and low gene expression.

LASSO Model

To compare survival differences between multiple groups, the log-rank test was used to test the KM survival analysis, and ROC analysis was used to compare gene prediction accuracy and risk scores. A LASSO regression was used for feature selection, with 10-fold cross-validation. The R package glmnet was used for the above analyses.

Immune Cell Infiltration

The TIMER database (http://timer.comp-genomics.org/) was used to analyse the correlation of gene expression in LIHC with the level of immune cell infiltration.

Protein Expression Validation

Immunohistochemical staining maps of gene signatures for protein expression in both liver cancer tissue and normal tissue were downloaded from The Human Protein Atlas (HPA) database.

Results

Identification of Chromatin Open Regions by ATAC-seq

We mapped the genomic coordinates from the Peak data to the 23 chromosomes of the human genome ( ). As can be seen, most regions of each chromosome are covered, with some chromosomes, such as chr13, chr14, chr21 and chr22, having less coverage of the short arms. shows that most of the peaks are concentrated at a distance of 10–100 kb from the TSS. Among these, the binding site tend to be distributed more at the 3’ end of the TSS. shows a large proportion of ATAC-peaks are located close to TSS, which means that the TSS tends to bind to transcription factors.
Figure 1

Identification of chromatin open regions by ATAC-seq method. [(A) genome coverage; (B) distribution map of transcription factors and TSS; (C) enrichment of Peaks in the TSS region].

Identification of chromatin open regions by ATAC-seq method. [(A) genome coverage; (B) distribution map of transcription factors and TSS; (C) enrichment of Peaks in the TSS region].

Genomic Characterization and Enrichment Analysis

Using the annotation file, we annotated the genomic coordinates corresponding to Peaks. shows the proportion of different components. As can be seen, 44% of the binding sites are in the distal intergenic region, with only 10% bound within the 3 kb region upstream and downstream of the TSS, mainly because the TSS region constitutes a small proportion of the whole genome compared to other regions. summarizes the relative enriched proportions of coding regions, intergenic regions, introns, exons and upstream and downstream regions. As shown in the figure, the downstream and distal regions have the highest proportions. show the GO function enrichment analysis and KEGG pathway enrichment analysis of the genes corresponding to the TSS binding sites, respectively. These show that most of the TSS binding sites are located in genes associated with regulation of cell morphology, intracellular transportation and kinase activity involved in the regulation of infection, cancer, stem cell pluripotency, the cell cycle and other related functional pathways. These functional pathways have been shown to be involved in cancer development.
Figure 2

Genome characterization and enrichment analysis. [(A) location distribution of Peaks on the genome; (B) relative proportions of gene coding regions, intergenic regions, introns, exons, and upstream and downstream regions; (C) KEGG pathway enrichment analysis; (D) GO functional annotation].

Genome characterization and enrichment analysis. [(A) location distribution of Peaks on the genome; (B) relative proportions of gene coding regions, intergenic regions, introns, exons, and upstream and downstream regions; (C) KEGG pathway enrichment analysis; (D) GO functional annotation].

DEG Screening

Differential expressed analysis of RNA-seq was performed on 371 LIHC tumour samples and 50 para tumour samples from TCGA database. Heat maps of the expression of each gene in each sample were drawn ( ). Volcano maps show the upregulated genes (N = 2,371) and downregulated genes (N = 544) obtained from the screening ( ). A one-way Cox analysis was used to derive 4,785 genes significantly associated with the prognosis of LIHC. The P values, risk factor HRs and CI column line table for the top 20 genes expressed, in addition to prognosis-related characteristics, are shown in . Finally, 190 overlapping genes were obtained by screening reproducible genes in ATAC-seq, DEGs in RNA-seq and prognosis-related genes ( ).
Figure 3

Differentially expressed gene screening. [(A) volcano map of differentially expressed genes; (B) heat map of differentially expressed genes; (C) DEGs associated with OS; (D) Venn diagram showing overlapping genes for ATAC-seq, RNA-seq, and survival-associated DEGs].

Differentially expressed gene screening. [(A) volcano map of differentially expressed genes; (B) heat map of differentially expressed genes; (C) DEGs associated with OS; (D) Venn diagram showing overlapping genes for ATAC-seq, RNA-seq, and survival-associated DEGs].

KM Analysis of Overlapping Genes

Validation of 190 overlapping genes using the KM method yielded 126 genes that were significantly associated with OS in LIHC. Hazard coefficient HRs, with their 95% CIs and P values for 126 genes were derived by a log-rank test and univariate Cox proportional hazards regression ( ).
Table 1

126 differential expressed genes about overall survival using Kaplan-Meier.

GenesPHRLow 95%CIHigh 95%CIGenesPHRLow 95%CIHigh 95%CIGenesPHRLow 95%CIHigh 95%CI
NIF3L1 0.00991.57961.11582.2362 DAD1 0.00351.67971.18562.3796 EHMT2 0.00131.77001.24852.5095
SOX4 0.00991.58341.11662.2452 RPP40 0.00351.69741.18992.4212 HTATIP2 0.00131.78141.25162.5355
RSU1 0.00971.58471.11802.2463 PTGES3 0.00341.68891.18932.3985 VMA21 0.00131.77131.24942.5110
ATAD3B 0.00961.58701.11912.2506 FANCE 0.00331.69491.19272.4085 VEGFA 0.00131.78291.25262.5376
FANCI 0.00951.58431.11912.2429 UTP18 0.00311.69281.19462.3988 CDCA2 0.00131.77541.25162.5184
TSEN15 0.00851.59921.12722.2688 CDC25A 0.00291.69781.19772.4068 MELK 0.00101.79441.26492.5456
NEU1 0.00851.59361.12622.2548 VPS35 0.00291.69671.19782.4034 MGME1 0.00101.81021.26962.5809
EPS8L3 0.00831.60131.12912.2712 PYGO2 0.00281.70471.20092.4199 TOP2A 0.00101.80081.26902.5555
TOMM34 0.00811.60041.12982.2670 INCENP 0.00281.69931.20012.4061 TMEM14C 0.00101.81321.27282.5829
TBC1D16 0.00811.60501.13102.2778 IMPDH2 0.00271.70821.20322.4252 PDCL3 0.00091.80521.27182.5623
AIFM2 0.00801.60411.13102.2753 ZIC2 0.00271.71281.20512.4344 LYRM4 0.00091.82871.28022.6123
COA6 0.00801.60221.13122.2694 ECT2 0.00271.70931.20472.4253 LCMT1 0.00091.82321.28032.5961
SEMA3F 0.00771.60801.13382.2806 SLC39A10 0.00261.71381.20632.4348 GDI2 0.00091.81751.27932.5820
MRPS23 0.00771.60321.13322.2681 NAP1L1 0.00261.71221.20622.4305 HSPB11 0.00081.81611.28072.5754
ZFP64 0.00761.61111.13492.2871 PARL 0.00241.71761.21092.4364 BOD1 0.00081.82791.28492.6004
MKI67 0.00721.61231.13772.2849 CDCA4 0.00231.72241.21462.4425 CLIC1 0.00081.82481.28482.5918
LSM2 0.00721.61761.13932.2966 NOL7 0.00221.73111.21752.4614 MID1IP1 0.00071.83761.29202.6135
SLC35B2 0.00711.61571.13962.2908 JAGN1 0.00221.72481.21612.4463 HDGF 0.00061.86071.30462.6538
XPR1 0.00701.61621.13992.2914 PRCC 0.00221.72911.21792.4547 SF3B4 0.00051.86081.31042.6424
PUF60 0.00671.62301.14352.3034 MZT1 0.00221.72741.21782.4503 TXNL1 0.00051.87431.31482.6720
PRIM2 0.00651.62021.14422.2941 GTF3C6 0.00221.72711.21782.4495 HOMER3 0.00051.88711.32262.6926
LRRC1 0.00641.62541.14602.3053 GGPS1 0.00221.72701.21782.4491 RAD54L 0.00051.87821.32032.6717
CDC45 0.00611.62701.14882.3043 PSMA5 0.00211.73081.21942.4568 STMN1 0.00041.88701.32782.6818
PIGM 0.00571.64111.15502.3318 DCTPP1 0.00211.73191.21972.4592 GINS1 0.00041.88881.32922.6841
RPA2 0.00571.63551.15412.3177 RAB13 0.00211.73591.22142.4671 SLC1A5 0.00041.89401.33092.6954
BLOC1S4 0.00571.63491.15422.3158 TCF3 0.00191.73931.22652.4664 NUF2 0.00041.89021.33082.6848
POLA2 0.00551.64031.15652.3265 MARCKS 0.00191.74701.22892.4835 CD320 0.00031.90441.33912.7083
PSMD8 0.00531.64731.15982.3399 RAP2A 0.00191.74251.22812.4722 YWHAB 0.00031.92111.34692.7402
PKMYT1 0.00521.64391.16042.3289 FAF1 0.00181.73901.22762.4634 ZWINT 0.00031.91591.34872.7217
RIPK2 0.00491.66111.16642.3657 FAM189B 0.00181.75681.23352.5021 POLR3C 0.00031.95881.36662.8076
WDYHV1 0.00451.66381.17052.3650 NUDT1 0.00181.75301.23262.4933 LAPTM4B 0.00021.98591.38352.8506
DEGS1 0.00451.66121.17052.3576 ALDOA 0.00171.74751.23222.4784 EIF3M 0.00011.98861.39662.8315
ALYREF 0.00451.65881.17022.3513 RAB10 0.00171.74841.23372.4776 PARD3 0.00012.06441.44162.9561
RABIF 0.00451.66221.17112.3591 HRAS 0.00171.75841.23682.5000 EXOSC9 0.00012.07301.45072.9622
TROAP 0.00441.66101.17102.3559 RACGAP1 0.00161.75561.23742.4908 SNRPC 0.00012.06501.44942.9422
SUCO 0.00421.66991.17572.3719 KIF18A 0.00161.75891.23902.4971 CD58 0.00012.09891.46243.0124
NOL10 0.00421.66631.17502.3631 ENAH 0.00161.76171.23992.5029 GEMIN7 0.00012.08531.46012.9781
SNRPE 0.00411.67251.17752.3756 DNAJC8 0.00151.76541.24332.5068 SAC3D1 0.00002.12401.48613.0357
CNOT10 0.00391.67581.18032.3795 GCLM 0.00151.78271.24812.5462 CLTA 0.00002.21591.54533.1775
SKA1 0.00381.67801.18202.3821 PRDX6 0.00141.77641.24862.5275 ERI3 0.00002.26901.58373.2509
BUB3 0.00371.67581.18262.3747 ANO10 0.00141.76681.24622.5048 AGTRAP 0.00002.29341.60473.2778
UBAP2L 0.00361.69131.18752.4086 SLC30A6 0.00131.77951.25102.5313 UCK2 0.00002.40631.67763.4514
126 differential expressed genes about overall survival using Kaplan-Meier.

LASSO Model Building

LASSO regression was applied to 125 genes for feature selection. A prognostic model consisting of 15 gene signatures was obtained after 10-fold cross-validation ( ). The complete gene names of 15 genes are shown in . shows the association between the risk score and survival time with survival status in the TCGA dataset. shows the distribution of KM curves of the risk model in the TCGA dataset. The gene signature model was divided into high-risk and low-risk groups according to the risk score, with a HR of 2.483 representing a risk factor. shows the ROC curves and AUC of the risk model at different times. The AUC values at 1, 3 and 5 years were 0.809, 0.723 and 0.706, respectively, indicating that the model has a strong predictive ability.
Figure 4

Prognostic model based on LASSO regression algorithm. [(A) partial likelihood deviation plotted against log(λ) using the LASSO-Cox regression model; (B) coefficients of selected features are shown by the lambda parameter, the horizontal axis represents the value of the independent variable lambda and the vertical axis represents the coefficients of the independent variable; [(C) Risk score and survival time and survival status cases; (D) this risk model in the TCGA KM survival curve distribution; (E) ROC curves with AUC at different times for this risk model].

Prognostic model based on LASSO regression algorithm. [(A) partial likelihood deviation plotted against log(λ) using the LASSO-Cox regression model; (B) coefficients of selected features are shown by the lambda parameter, the horizontal axis represents the value of the independent variable lambda and the vertical axis represents the coefficients of the independent variable; [(C) Risk score and survival time and survival status cases; (D) this risk model in the TCGA KM survival curve distribution; (E) ROC curves with AUC at different times for this risk model].

Expression of 15 Gene Signatures in LIHC

show the expression of the 15 gene signatures [PRDX6 ( ), GCLM ( ), HTATIP2 ( ), SEMA3F ( ), UCK2 ( ), NOL10 ( ), KIF18A ( ), RAP2A ( ), BOD1 ( ), GDI2 ( ), ZIC2 ( ), GTF3C6 ( ), SLC1A5 ( ), ERI3 ( ) and SAC3D1 ( )] in LIHC cancer tissues relative to that in paraneoplastic tissues and different cancer stages. shows the statistical significance (P value) of gene expression in different tissues and different stages. Fifteen genes were upregulated in the LIHC tissues. Most of these genes were not significantly expressed in different LIHC stages.
Figure 5

Expression of 15 Signature genes in LIHC in cancer/normal and different stages. [(A) PRDX6; (B) GCLM; (C) HTATIP2; (D) SEMA3F; (E) UCK2; (F) NOL10; (G) KIF18A; (H) RAP2A; (I) BOD1; (J) GDI2; (K) ZIC2; (L) GTF3C6; (M) SLC1A5; (N) ERI3; (O) SAC3D1].

Table 2

Statistical significance of 15 genes.

PRDX6 GCLM HTATIP2 SEMA3F UCK2 NOL10 KIF18A RAP2A BOD1 GDI2 ZIC2 GTF3C6 SLC1A5 ERI3 SAC3D1
Normal vs Primary3.53E-102.22E-041.62E-12<1E-121.62E-12<1E-121.62E-12<1E-121.62E-121.65E-12<1E-121.62E-125.78E-101.62E-121.62E-12
Normal vs Stage13.76E-071.59E-01<1E-12<1E-12<1E-121.62E-121.62E-121.62E-12<1E-125.30E-071.62E-12<1E-121.99E-04<1E-12<1E-12
Normal vs Stage21.00E-049.44E-031.87E-101.62E-121.62E-121.62E-121.33E-074.26E-101.62E-122.19E-054.32E-091.45E-135.74E-031.62E-121.62E-12
Normal vs Stage36.64E-035.04E-031.85E-12<1E-12<1E-121.63E-126.44E-109.48E-11<1E-121.41E-102.34E-111.91E-128.03E-051.64E-121.62E-12
Normal vs Stage49.72E-075.29E-011.07E-011.00E-021.14E-033.01E-021.38E-018.15E-028.76E-033.42E-015.50E-023.82E-026.53E-023.51E-029.62E-04
Stage1 vs Stage24.70E-014.81E-024.52E-018.30E-011.48E-023.12E-032.83E-024.52E-017.39E-021.41E-011.39E-011.62E-021.03E-015.63E-042.90E-03
Stage1 vs Stage32.52E-013.13E-029.41E-014.28E-023.80E-043.45E-041.76E-044.33E-023.46E-033.17E-047.20E-036.56E-041.23E-022.00E-031.54E-03
Stage1 vs Stage43.63E-016.71E-017.88E-016.91E-014.41E-014.67E-019.19E-019.83E-017.63E-018.82E-016.09E-011.56E-016.12E-019.15E-012.29E-01
Stage2 vs Stage31.21E-019.63E-014.58E-013.87E-023.32E-011.36E-011.16E-012.18E-011.79E-011.10E-013.03E-012.09E-017.58E-014.29E-015.86E-01
Stage2 vs Stage46.06E-018.47E-019.79E-017.13E-019.60E-018.13E-015.70E-018.38E-017.78E-015.97E-019.45E-016.91E-014.70E-012.78E-019.78E-01
Stage3 vs Stage41.76E-018.22E-017.72E-012.74E-016.85E-015.26E-012.98E-015.53E-015.07E-012.28E-016.54E-019.21E-012.50E-013.38E-018.68E-01
Expression of 15 Signature genes in LIHC in cancer/normal and different stages. [(A) PRDX6; (B) GCLM; (C) HTATIP2; (D) SEMA3F; (E) UCK2; (F) NOL10; (G) KIF18A; (H) RAP2A; (I) BOD1; (J) GDI2; (K) ZIC2; (L) GTF3C6; (M) SLC1A5; (N) ERI3; (O) SAC3D1]. Statistical significance of 15 genes.

Prognostic Analysis of 15 Gene Signatures in LIHC Patients

The KM analysis of the survival prognosis of the 15 gene signatures in LIHC patients showed that high expression of all 15 gene signatures was significantly associated with a poor prognosis in LIHC patients ( ).
Figure 6

Kaplan Meier curve of 15 gene signatures in LIHC patients. (A: UCK2; B: ERI3; C: SAC3D1; D: SLC1A5; E: BOD1; F: GDI2; G: HTATIP2; H: PRDX6; I: GCLM; J: KIF18A; K: RAP2A; L: GTF3C6; M: ZIC2; N: NOL10; O: SEMA3F).

Kaplan Meier curve of 15 gene signatures in LIHC patients. (A: UCK2; B: ERI3; C: SAC3D1; D: SLC1A5; E: BOD1; F: GDI2; G: HTATIP2; H: PRDX6; I: GCLM; J: KIF18A; K: RAP2A; L: GTF3C6; M: ZIC2; N: NOL10; O: SEMA3F).

Correlation of 15 Gene Signatures With Immune Cell Infiltration

shows the correlation of the 15 gene signatures with immune cell infiltration levels in LIHC. All the gene signatures other than those of PRDX6 and HTATIP2 were correlated with tumour purity and B cell, CD4+ T cell, CD8+ T cell, macrophage, neutrophil and dendritic cell levels. Infiltration levels were all significantly and positively correlated.
Figure 7

Correlation of 15 gene signatures with immune cell infiltration [(A) UCK2 correlates with immune cell infiltration; (B) ERI3 correlates with immune cell infiltration; (C) SAC3D1 correlates with immune cell infiltration; (D) SLC1A5 correlates with immune cell infiltration; (E) BOD1 correlates with immune cell infiltration; (F) GDI2 correlates with immune cell infiltration; (G) HTATIP2 correlates with immune cell infiltration; (H) PRDX6 correlated with immune cell infiltration; (I) GCLM correlated with immune cell infiltration; (J) KIF18A correlated with immune cell infiltration; (K) RAP2A correlated with immune cell infiltration; (L) GTF3C6 correlated with immune cell infiltration; (M) ZIC2 correlated with immune cell infiltration; (N) NOL10 correlated with immune cell infiltration; (O). SEMA3F correlated with immune cell infiltration).

Correlation of 15 gene signatures with immune cell infiltration [(A) UCK2 correlates with immune cell infiltration; (B) ERI3 correlates with immune cell infiltration; (C) SAC3D1 correlates with immune cell infiltration; (D) SLC1A5 correlates with immune cell infiltration; (E) BOD1 correlates with immune cell infiltration; (F) GDI2 correlates with immune cell infiltration; (G) HTATIP2 correlates with immune cell infiltration; (H) PRDX6 correlated with immune cell infiltration; (I) GCLM correlated with immune cell infiltration; (J) KIF18A correlated with immune cell infiltration; (K) RAP2A correlated with immune cell infiltration; (L) GTF3C6 correlated with immune cell infiltration; (M) ZIC2 correlated with immune cell infiltration; (N) NOL10 correlated with immune cell infiltration; (O). SEMA3F correlated with immune cell infiltration).

Validation of the Protein Expression of 15 Gene Signatures

The protein expression of the 15 gene signatures in hepatocellular carcinoma tissues and normal liver tissues was verified using the HPA online database ( ). The results showed that PRDX6, GCLM, HTATIP2, NOL10, KIF18A, RAP2A and GDI2 were highly expressed in the hepatocellular liver cancer tissues compared to the normal liver tissues. SEMA3F, BOD1, SLC1A5 and ERI3 were not detected in cholangiocytes and hepatocytes. SAC3D1 was not detected in hepatocellular liver cancer tissues. In addition, UCK2 and ZIC2 were not detected in the hepatocellular carcinoma samples in the protein expression data.
Figure 8

Protein expression of gene signature in hepatocellular carcinoma tissue and normal liver tissue.

Protein expression of gene signature in hepatocellular carcinoma tissue and normal liver tissue.

GO and KEGG Pathway Enrichment Analysis of 15 Gene Signatures

In the GO and KEGG enrichment analyses of the 15 gene signatures, cellular processes, binding, metabolism and cancer were enriched ( ).
Figure 9

GO and KEGG enrichment of 15 gene signatures. [(A) GO enrichment of 15 gene signatures; (B) KEGG enrichment of 15 gene signatures].

GO and KEGG enrichment of 15 gene signatures. [(A) GO enrichment of 15 gene signatures; (B) KEGG enrichment of 15 gene signatures].

Discussion

Tumours usually have altered epigenetic patterns, and epigenetic regulators are frequently mutated in cancer (16). ATAC-seq is an innovative technique for epigenetic studies that cleaves nuclear chromatin regions that are open at a particular condition location by transposes and thus obtains the regulatory sequences of all active transcripts in the genome at that particular condition (12). In the present study, we performed genomic interval distribution statistics using ATAC-seq for peaks identified in different LIHC samples, while comparing chromatin open site differences between different samples. Furthermore, we conducted GO and KEGG enrichment analysis for genes adjacent to these differential sites. A portion of genes associated with the GO functional enrichment pathway was identified. Subsequently, RNA-seq was used to analyse DEGs in the LIHC samples, and a one-way Cox analysis was performed to analyse genes significantly associated with the prognosis in LIHC. In total, 190 key genes were obtained by overlapping the genes screened using the three methods. Further validation by KM analysis yielded 125 key genes. However, due to the large number of 125 key genes, featured genes needed to be further extracted. To take account of the dimensional catastrophe problem, we used LASSO regression analysis for further analysis of the gene set. By using a penalty function to compress the regression coefficients of the independent variables, LASSO provides good shrinkage control and can compress the regression coefficients of some independent variables to 0 (17). Finally, we obtained a sparse model and then obtained genes with a higher significant correlation with the survival prognosis of liver cancer patients. According to 10-fold cross-validation, a penalty parameter, λ = 0.0425, was finally selected, and then λ was substituted into the regression equation of LASSO to ensure that the sum of the absolute values of the regression coefficients of all independent variables was less than or equal to the selected penalty parameter λ. Finally, the regression coefficients of a large number of genetic variables were compressed to 0, and genes with regression coefficients except that were selected and subjected to LASSO regression. Fifteen gene signatures were identified: PRDX6, GCLM, HTATIP2, SEMA3F, UCK2, NOL10, KIF18A, RAP2A, BOD1, GDI2, ZIC2, GTF3C6, SLC1A5, ERI3 and SAC3D1. We generated a prognostic index for each sample for the risk scores of each gene in each sample and then divided the samples into high risk and low risk according to the prognostic index to analyse the overall survival time of the samples. The results revealed that the survival of patients classified as high risk was significantly worse than that of patients classified as low risk. The prognostic model had good predictive power. Subsequently, we investigated the expression of each gene in LIHC patients and patients with various LIHC disease stages and found that each gene was significantly upregulated in cancerous tissues in LIHC patients compared with paracancerous tissues and that highly expressed genes were significantly associated with a poor patient prognosis. This finding suggested that these genes can be used as prognostic predictive biomarkers for LIHC and that the prediction model combining these genes performs well. We also analysed the correlation between gene expression and immune cell infiltration levels using TIMER data and found that other than PRDX6 and HTATIP2, all other 13 gene signatures were significantly and positively correlated with tumour purity and B cell, CD4+ T cell, CD8+ T cell, macrophage, neutrophil and dendritic cell infiltration levels. This finding suggested that these gene signatures may influence tumour progression by regulating the tumour microenvironment. PRDX6 is a unique bifunctional enzyme, which reduces both water-soluble and lipid-soluble peroxides and has unique phospholipase A2 activity (18). A previous study reported that inhibition of PRDX6 expression promoted apoptosis in Hepa1-6 cells (19). Another study reported that interventional treatment of primary liver cancer can reduce serum HTATIP2/TIP30 and B7-H4 levels, improve liver function and quality of life and prolong the survival times of patients (20). A number of studies reported that SEMA3F, UCK2, NOL10, RAP2A, ZIC2 and SAC3D1 are associated with prognosis and tumour immune infiltration in hepatocellular carcinomas (21–25). In addition, KIF18A was reported to be associated with tumour immune cell infiltration in adrenocortical carcinomas (26). SLC1A5 may serve as a potential target for enhancing anti-tumour immunity in the tumour microenvironment (27). According to the literature, GCLM gene polymorphisms are associated with hepatitis B virus-induced liver disease (28). There is limited research on BOD1, GDI2, GTF3C6 and ERI3’ impact on hepatocellular carcinomas. BOD1 is a novel mitogenic protein required for chromosomal localization (29). It may act as a unique cytoplasmic interacting protein to regulate signal pathway,thereby having potential in the treatment of various diseases, including cancer (30). GDI2 belongs to a small family of chaperone proteins expressed mainly in hematopoietic, endothelial and epithelial cells (31). GDI2 expression is abnormal in many cancer types, including pancreatic, ovarian, gastric and oesophageal squamous cell carcinomas (32–34). There are few studies on GTF3C6 and ERI3. One previous study found that an integrated model based on a six-gene signature (35) or a novel ferroptosis-related gene signature (36) could predict OS in patients with hepatocellular carcinomas. Most previous studies mainly used only RNA-seq to perform bioinformatic analyses. In contrast, we used ATAC-seq and RNA-seq, which are mutually authenticating, further strengthens the findings of the present study. In summary, we obtained 15 gene signatures associated with the survival prognosis of hepatocellular carcinoma patients based on ATAC-seq and RNA-seq integration analysis and LASSO regression analysis. Due to the limitations of the experimental conditions, it was not possible to obtain a detailed understanding of the specific mechanism of the action of each of the 15 hepatocellular carcinoma signature genes. Thus far, there have been few studies on BOD1, GDI2, GTF3C6 and ERI3’ impact on hepatocellular carcinomas. The findings of our study provide theoretical basis and directions for future studies. The 15 gene signatures of a survival prognosis in hepatocellular carcinoma patients identified herein may contribute to the development of targeted treatment for hepatocellular carcinoma patients.

Data Availability Statement

The original contributions presented in the study are included in the article/ . Further inquiries can be directed to the corresponding author.

Author Contributions

We contributed equally for this work. All authors contributed to the article and approved the submitted version.

Conflict of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s Note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
  34 in total

1.  On-treatment alpha-fetoprotein is a specific tumor marker for hepatocellular carcinoma in patients with chronic hepatitis B receiving entecavir.

Authors:  Grace L H Wong; Henry L Y Chan; Yee-Kit Tse; Hoi-Yun Chan; Chi-Hang Tse; Angeline O S Lo; Vincent W S Wong
Journal:  Hepatology       Date:  2014-01-30       Impact factor: 17.425

2.  Overexpression of Prdx6 and resistance to peroxide-induced death in Hepa1-6 cells: Prdx suppression increases apoptosis.

Authors:  Bridget Walsh; Amanda Pearl; Sarah Suchy; John Tartaglio; Kristin Visco; Shelley A Phelan
Journal:  Redox Rep       Date:  2009       Impact factor: 4.412

3.  Proteomic alterations of fibroblasts induced by ovarian cancer cells reveal potential cancer targets.

Authors:  X Y Zhang; S S Hong; M Zhang; Q Q Cai; M X Zhang; C J Xu
Journal:  Neoplasma       Date:  2018       Impact factor: 2.575

4.  A cytoplasmic COMPASS is necessary for cell survival and triple-negative breast cancer pathogenesis by regulating metabolism.

Authors:  Lu Wang; Clayton K Collings; Zibo Zhao; Kira Alia Cozzolino; Quanhong Ma; Kaiwei Liang; Stacy A Marshall; Christie C Sze; Rintaro Hashizume; Jeffrey Nicholas Savas; Ali Shilatifard
Journal:  Genes Dev       Date:  2017-11-14       Impact factor: 11.361

5.  Bod1, a novel kinetochore protein required for chromosome biorientation.

Authors:  Iain M Porter; Sarah E McClelland; Guennadi A Khoudoli; Christopher J Hunter; Jens S Andersen; Andrew D McAinsh; J Julian Blow; Jason R Swedlow
Journal:  J Cell Biol       Date:  2007-10-15       Impact factor: 10.539

6.  The accessible chromatin landscape of the human genome.

Authors:  Robert E Thurman; Eric Rynes; Richard Humbert; Jeff Vierstra; Matthew T Maurano; Eric Haugen; Nathan C Sheffield; Andrew B Stergachis; Hao Wang; Benjamin Vernot; Kavita Garg; Sam John; Richard Sandstrom; Daniel Bates; Lisa Boatman; Theresa K Canfield; Morgan Diegel; Douglas Dunn; Abigail K Ebersol; Tristan Frum; Erika Giste; Audra K Johnson; Ericka M Johnson; Tanya Kutyavin; Bryan Lajoie; Bum-Kyu Lee; Kristen Lee; Darin London; Dimitra Lotakis; Shane Neph; Fidencio Neri; Eric D Nguyen; Hongzhu Qu; Alex P Reynolds; Vaughn Roach; Alexias Safi; Minerva E Sanchez; Amartya Sanyal; Anthony Shafer; Jeremy M Simon; Lingyun Song; Shinny Vong; Molly Weaver; Yongqi Yan; Zhancheng Zhang; Zhuzhu Zhang; Boris Lenhard; Muneesh Tewari; Michael O Dorschner; R Scott Hansen; Patrick A Navas; George Stamatoyannopoulos; Vishwanath R Iyer; Jason D Lieb; Shamil R Sunyaev; Joshua M Akey; Peter J Sabo; Rajinder Kaul; Terrence S Furey; Job Dekker; Gregory E Crawford; John A Stamatoyannopoulos
Journal:  Nature       Date:  2012-09-06       Impact factor: 49.962

7.  Comparison of hepatocellular carcinoma miRNA expression profiling as evaluated by next generation sequencing and microarray.

Authors:  Yoshiki Murakami; Toshihito Tanahashi; Rina Okada; Hidenori Toyoda; Takashi Kumada; Masaru Enomoto; Akihiro Tamori; Norifumi Kawada; Y-h Taguchi; Takeshi Azuma
Journal:  PLoS One       Date:  2014-09-12       Impact factor: 3.240

8.  Super-enhancer-driven AJUBA is activated by TCF4 and involved in epithelial-mesenchymal transition in the progression of Hepatocellular Carcinoma.

Authors:  Chi Zhang; Shi Wei; Wei-Peng Sun; Kai Teng; Miao-Miao Dai; Feng-Wei Wang; Jie-Wei Chen; Han Ling; Xiao-Dan Ma; Zi-Hao Feng; Jin-Ling Duan; Mu-Yan Cai; Dan Xie
Journal:  Theranostics       Date:  2020-07-11       Impact factor: 11.556

9.  Variability in estimated gene expression among commonly used RNA-seq pipelines.

Authors:  Sonali Arora; Siobhan S Pattwell; Eric C Holland; Hamid Bolouri
Journal:  Sci Rep       Date:  2020-02-17       Impact factor: 4.379

10.  Identification of tumor-infiltrating immune cells and prognostic validation of tumor-infiltrating mast cells in adrenocortical carcinoma: results from bioinformatics and real-world data.

Authors:  Xi Tian; Wenhao Xu; Yuchen Wang; Aihetaimujiang Anwaier; Hongkai Wang; Fangning Wan; Yu Zhu; Dalong Cao; Guohai Shi; Yiping Zhu; Yuanyuan Qu; Hailiang Zhang; Dingwei Ye
Journal:  Oncoimmunology       Date:  2020-06-23       Impact factor: 8.110

View more
  2 in total

Review 1.  Single-Cell Sequencing and Its Applications in Liver Cancer.

Authors:  Binle Tian; Qi Li
Journal:  Front Oncol       Date:  2022-04-21       Impact factor: 5.738

2.  Integration of RNA-seq and ATAC-seq identifies muscle-regulated hub genes in cattle.

Authors:  Jianfang Wang; Bingzhi Li; Xinran Yang; Chengcheng Liang; Sayed Haidar Abbas Raza; Yueting Pan; Ke Zhang; Linsen Zan
Journal:  Front Vet Sci       Date:  2022-08-11
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.