Yuting Dong1,2,3, Xiaozhao Liu1,2,3, Bijun Jiang1,2,3, Siting Wei1,2,3, Bangde Xiang4, Ruichu Liao1,2,3, Qiuyan Wang5,6, Ximiao He1,2,3. 1. Department of Physiology, School of Basic Medical Science, Huazhong University of Science and Technology, Wuhan, China. 2. Center for Genomics and Proteomics Research, School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China. 3. Hubei Key Laboratory of Drug Target Research and Pharmacodynamic Evaluation, Huazhong University of Science and Technology, Wuhan, China. 4. Department of Hepatobiliary Surgery, Guangxi Medical University Cancer Hospital, Nanning, China. 5. Center for Genomic and Personalized Medicine, Guangxi Medical University, Nanning, China. 6. Guangxi Key Laboratory for Genomic and Personalized Medicine, Guangxi, Collaborative Innovation Center for Genomic and Personalized Medicine, Nanning, China.
Abstract
BACKGROUND: The alternative usage of promoters provides a way to regulate gene expression, has a significant influence on the transcriptome, and contributes to the cellular transformation of cancer. However, the function of alternative promoters (APs) in hepatocellular carcinoma (HCC) has not been systematically studied yet. In addition, the potential mechanism of regulation to the usage of APs remains unclear. DNA methylation, one of the most aberrant epigenetic modifications in cancers, is known to regulate transcriptional activity. Whether DNA methylation regulates the usage of APs needs to be explored. Here, we aim to investigate the effects of DNA methylation on usage of APs in HCC. METHODS: Promoter activities were calculated based on RNA-seq data. Functional enrichment analysis was implemented to conduct GO terms. Correlation tests were used to detect the correlation between promoter activity and methylation status. The LASSO regression model was used to generate a diagnostic model. Kaplan-Meier analysis was used to compare the overall survival between high and low methylation groups. RNA-seq and whole-genome bisulfite sequencing (WGBS) in HCC samples were performed to validate the correlation of promoter activity and methylation. RESULTS: We identified 855 APs in total, which could be well used to distinguish cancer from normal samples. The correlation of promoter activity and DNA methylation in APs was observed, and the APs with negative correlation were defined as methylation-regulated APs (mrAPs). Six mrAPs were identified to generate a diagnostic model with good performance (AUC = 0.97). Notably, the majority of mrAPs had CpG sites that could be used to predict clinical outcomes by methylation status. Finally, we verified 85.6% of promoter activity variation and 92.3% of methylation changes in our paired RNA-seq and WGBS samples, respectively. The negative correlation between promoter activity and methylation status was further confirmed in our HCC samples. CONCLUSION: The aberrant methylation status plays a critical role in the precision usage of APs in HCC, which sheds light on the mechanism of cancer development and provides a new insight into cancer screening and treatment.
BACKGROUND: The alternative usage of promoters provides a way to regulate gene expression, has a significant influence on the transcriptome, and contributes to the cellular transformation of cancer. However, the function of alternative promoters (APs) in hepatocellular carcinoma (HCC) has not been systematically studied yet. In addition, the potential mechanism of regulation to the usage of APs remains unclear. DNA methylation, one of the most aberrant epigenetic modifications in cancers, is known to regulate transcriptional activity. Whether DNA methylation regulates the usage of APs needs to be explored. Here, we aim to investigate the effects of DNA methylation on usage of APs in HCC. METHODS: Promoter activities were calculated based on RNA-seq data. Functional enrichment analysis was implemented to conduct GO terms. Correlation tests were used to detect the correlation between promoter activity and methylation status. The LASSO regression model was used to generate a diagnostic model. Kaplan-Meier analysis was used to compare the overall survival between high and low methylation groups. RNA-seq and whole-genome bisulfite sequencing (WGBS) in HCC samples were performed to validate the correlation of promoter activity and methylation. RESULTS: We identified 855 APs in total, which could be well used to distinguish cancer from normal samples. The correlation of promoter activity and DNA methylation in APs was observed, and the APs with negative correlation were defined as methylation-regulated APs (mrAPs). Six mrAPs were identified to generate a diagnostic model with good performance (AUC = 0.97). Notably, the majority of mrAPs had CpG sites that could be used to predict clinical outcomes by methylation status. Finally, we verified 85.6% of promoter activity variation and 92.3% of methylation changes in our paired RNA-seq and WGBS samples, respectively. The negative correlation between promoter activity and methylation status was further confirmed in our HCC samples. CONCLUSION: The aberrant methylation status plays a critical role in the precision usage of APs in HCC, which sheds light on the mechanism of cancer development and provides a new insight into cancer screening and treatment.
Liver cancer is the sixth most prevalent cancer and fourth most lethal malignancy globally (1). Hepatocellular carcinoma (HCC) is the dominant tissue subtype of aggressive primary liver cancer, accounting for a great majority of the diagnoses and deaths (2). HCC prognosis is poor worldwide, with the 5-year survival rate ranging from 5% to 30% (3). The main treatment options for patients with HCC include vascular intervention, surgical resection, radiofrequency ablation, or liver transplantation. Most patients have reached the advanced stage of HCC when they are first diagnosed, and only about 20%–30% of patients are eligible for effective treatment (4). Early detection with surveillance is the most effective way to reduce the mortality of HCC (5). Further study on the pathogenesis of HCC is of great significance to the diagnosis and prognosis of tumors.Promoters are the key element in regulating gene expression. In human genomes, most protein-coding genes are co-regulated by numerous promoters (6). The differential usage of promoters has been reported to be highly correlated with disease. For example, the dominance of c-MYC, which is silent in normal tissue, is abnormally activated in Burkitt lymphoma cells as a result of aberrant alternative promoter (AP) usage at the MYC gene locus (7). Another well-studied AP example is RASSF1, which encodes different subtypes RASSF1A and RASSF1C. The former acted as a tumor suppressor gene and the latter had carcinogenic activity (8). These studies of differential promoter usage usually focused on single genes. With the development of sequencing technology, approaches of detecting genome-wide promoter activities were available, including H3K4me3 ChIP-seq (9), Cap Analysis of Gene Expression (CAGE) (10), and short-read (11) and long-read (12) sequencing of RNAs. It is worth noting that the approach to predict promoter activity based on RNA-seq of short reads has a good consistency with previous methods (11). Previous studies have shown that increasing AP repertories is accompanied by elevated differential expression and disease susceptibility (13). In addition, tissue-specific promoter activity could be used to distinguish different cancer subtypes (12).As one of the most essential epigenetic modifications, DNA methylation is involved in oncogenesis (14, 15). In various cancers, gene expression could be silenced by hypermethylation of promoter regions by the interfering transcription factors binding or recruitment of transcriptional repressors (16–18), while the overexpression of oncogenic drivers (19) or instability of chromosomes (20) could be associated with hypomethylated regions. Therefore, DNA methylation detection may be helpful to elucidate molecular mechanisms of HCC development (21). Furthermore, changes in DNA methylation could be used as promising targets for diagnosis or prognosis biomarkers in HCC (22, 23). For example, methylation of the GSTP1 promoter has been reported as a diagnostic marker and indicates poor outcomes (24). Due to the stability and non-invasive detectability in blood, circulating tumor DNA (ctDNA) methylation markers have also been reported for HCC diagnosis in several studies (25–27).In HCC, differential usage of promoters has not been systematically studied, and whether DNA methylation regulates the usage of APs in HCC remains unclear. Here, we firstly systematically analyzed the promoter activities and identified the APs in HCC, and the results indicated that APs could distinguish cancer from normal cells. Then, we correlated the promoter activity of APs with DNA methylation, and the results suggested that AP activity could be regulated by the methylation changes. Furthermore, a diagnostic model by methylation-related APs was generated and the methylation of APs could also be used as prognostic markers, which indicated that AP-related methylation has the potential for molecular diagnoses and prognosis prediction of HCC. Finally, RNA-seq and WGBS were performed to verify the correlation of the promoter activity and methylation status of APs in HCC.
Materials and Methods
Data Collection
The RNA-seq raw data and Infinium Human Methylation 450 K Bead Chip (Illumina 450 K array) matrix data of liver tissues (19 HCC and 19 paired adjacent normal samples) were downloaded from the Gene Expression Omnibus (GEO) cohort (https://www.ncbi.nlm.nih.gov/geoprofiles) GSE77276 (28). Two other independent cohorts of HCC RNA-seq data were downloaded from GSE55758 (8 HCC and 8 paired adjacent normal samples) (29) and GSE105130 (25 HCC and 25 paired adjacent normal samples) (30). In addition, an independent cohort of two paired HCC and normal liver samples with RNA-seq raw data and whole-genome bisulfite sequencing (WGBS) methylation data was downloaded from GSE70091 (31). Liver cancer Illumina 450 K array and related clinical details of GDC TCGA Liver Cancer (LIHC) were downloaded from the UCSC Xena database (32) (https://xenabrowser.net/datapages/). The ONGene and TSGene lists were downloaded from ONGene (33) (http://ongene.bioinfo-minzhao.org/) and TSGene (34) (http://bioinfo.mc.vanderbilt.edu/TSGene/).
Validation Samples Collection
The fresh-frozen tissue specimens were collected from two HCC patients from Guangxi Medical University Cancer Hospital for validation. For each patient, tumor tissues and adjacent normal liver tissues were collected through surgery. Each fresh tissue was aliquoted into three pieces and separately stored at liquid nitrogen using a cryopreservation tube until DNA and RNA extraction. All samples were sequenced by both RNA-seq and Whole Genome Bisulfite Sequencing (WGBS).
RNA-seq and Data Processing
The total RNA was extracted with HiPure Universa miRNA Kit (Magen) from two pairs of fresh frozen tissue samples and quality was confirmed by Nanodrop measurement of OD 260/280 and 260/230 ratios. The material for library construction was 1 μg per sample. Sequencing libraries were constructed following the Illumina TruSeq Stranded protocol. Total RNA Gold kit with Ribo-Zero Gold (Illumina, USA) was used following the manufacturer’s recommendations. Sequencing (2×150 paired-end reads) was performed at Mingma Technologies Co., Ltd in Shanghai.FASTQ format data were assessed using FastQC (v0.11.9) and then fastp (v0.20.1) (35) was used to remove the bases with an average quality value less than 20 and to cut the reads of adapters. Clean reads were mapped to the human reference genome (Gencode v19) by STAR (2.7.5b) (36). The gene and transcript isoform expression was quantified using RSEM (v1.3.1) (37). Bedtools (v2.29.2) (38) was used to transform the bam files to bw format for UCSC genome browser viewing.
Whole-Genome Bisulfite Sequencing
HiPure Tissue DNA Mini Kit (Magen) was used for tissue genomic DNA extraction. After quantification by Qubit fluorometer, 1% unmethylated Lambda DNA was added to 200 ng of gDNA, and then randomly fragmented to 300-bps insert size with Covaris LE220. After end repair and adenylation, methylated adapters were ligated to the fragmented DNA. Bisulfite treatment was performed according to the EZ DNA Methylation-Gold kit (Zymo Research) instruction manual. KAPA HiFi HotStart Uracil + ReadyMix (2×) was used to amplify and purify the DNA fragments. Next, the Qubit Fluorometer dsDNA HS Assay (Thermo Fisher Scientific) and Agilent BioAnalyzer (Agilent) were used to measure and analyze the size distribution of the sequencing library; 2×150 paired-end reads sequencing is performed using an Illumina NovaSeq6000 following Illumina-provided protocols at Mingma Technologies Co., Ltd.
WGBS Data Preprocessing
Standard WGBS data analysis pipeline was followed. The raw FASTQ data were firstly trimmed, adapters were removed using TrimGalore (v0.4.3), and approximately 42 Gbps of data were reserved. Clean reads were next aligned with the human reference genome (hg19) using BSMAP (v2.89) (39). Mapped BAM files were then sorted and PCR deduplicates were removed through SAMtools (v. 1.3.1) and Picard Tools (v.1.92). Finally, MOABS (v. 1.3.4) (40) was used to calculate the methylation ratio per CpG. In promoter regions (−2 kb to 2 kb around TSS), methylation profile was smoothed by gam (Generalized Additive Models) or 50-bps sliding windows with 25-bps steps.
Methylation Analysis of 450K Methylation Array
In the 450K methylation array matrix, the delta mean beta (β) was calculated by β (mean tumor) − β (mean normal). A positive delta β value indicated relative hypermethylation in the tumor while a negative delta β value exhibited relative hypomethylation. The paired Student’s t-test was used for statistics. Methylation profile in promoter regions (−2 kb to 2 kb around TSS) smoothed using the same method as above.
Promoter Identification and Activity Estimation
The R package “proActiv” (v.0.99.0) was used to identify possible promoters and calculate the promoter activity. GTF files (Gencode v19) and STAR junction files were used as input. Promoter activity was obtained by removing single-exon transcripts/promoters and eliminating promoter counts that are NA and zero both in tumor and normal samples. When identifying the differentially regulated promoters (DRPs), the internal promoter activity was also considered.
Differential Analysis of Gene Expression and Promoter Activity
Differential analysis of gene expression was performed by the R package “DESeq2” (v1.28.0) [p-value < 0.05 and |log2(Fold Change)| > 1]. The degree of promoter change is calculated by log2 [promoter activity (mean tumor)/promoter activity (mean normal)]. For each promoter, the one-way analysis of variance (ANOVA) was used to assess the significance of absolute and relative promoter activity variance between tumor and normal samples. The promoters with an activity change level of |Fold Change| > 1.2 and p-value < 0.05 were considered significant DRPs.
Identification of Alternative Promoters
We identified APs by screening both gene expression and promoter activity. The criteria were as follows: (1) p-value ≥ 0.05 and |Fold Change| < 2 of gene expression; (2) mean absolute promoter activity > 0.25 in HCC and normal group; (3) both absolute and relative promoter activity were significantly changed (p-value < 0.05); (4) |Fold Change| > 1.2 of absolute promoter activity.
Dimensionality Reduction and Clustering
Gene expression and promoter activity were subjected to dimensionality reduction using the principal component analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE) through R packages “stats” and “Rtsne”.
Functional Enrichment Analysis
To identify the possible functions and pathways of hub genes, gene set enrichment analysis was implemented to conduct GO terms through Metascape (41) online (http://metascape.org/). p-value < 0.01 was used as the cutoff criteria.
Correlation Analysis
In the correlation analysis of CpG methylation and gene expression or promoter activity, the representative CpG sites of 450K were selected as follows: (1) For each CpG site upstream and downstream ±1kb of TSS, Pearson correlation test between methylation and gene expression (or promoter activity) was calculated. (2) The CpG sites with a minimum p-value of Pearson correlation test were selected to represent the methylation level of gene or promoter. Gene expression change [log2(Fold Change)] was obtained from Deseq2, and promoter activity change was normalized by log2(Fold Change) of promoter activity. For the WGBS methylation data of validation part, both representative CpG sites and mean methylation levels of ±1kb of TSS were calculated for promoter methylation, and Pearson correlation test was used for all correlation analysis.
LASSO Regression Analysis
The LASSO regression analysis of binary data was applied to construct a diagnosis model by the R package “glmnet”. The penalty parameter (λ) of this diagnosis model was confirmed through 10-fold cross-validation. The risk score was calculated as follows: model Score = ∑ (promoter activity × regression coefficient). The GSE55758 and GSE105130 datasets were both used for further cross-verification. Receiver operating characteristic (ROC) curves were used to visualize the reliability of the diagnostic model, and the area under the curve (AUC) was also calculated.
Survival Analysis
Kaplan–Meier analysis in 10 years was performed in the R software “survival” package. All samples were classified into two groups according to the best-performed cutoff methylation β value using the “surv-cutpoint” function. p-value < 0.05 was considered statistically significant.
Statistical Analysis
R version 4.0.2 was used for all statistical analysis and visualization. Statistical analysis was performed through R base package stats (v4.0.2). All figures were generated using ggplot2 (v3.1.0) and ggpubr (v0.2). Significance levels were defined as follows: ns: p > 0.05, *p < 0.05, **p < 0.01, ***p < 0.001, and ****p < 0.0001 in boxplot.
Results
The Landscape of Promoter Activities in HCC
In mammal genomes, most genes are co-regulated by multiple promoters. As shown in
, the demo gene has three isoforms but two promoters, because two isoforms (e.g., isofrom1 and isoform2) with the same or nearby transcript start sites (TSS) could be regulated by the identical promoter. To detect the promoter activities in HCC, we analyzed the RNA-seq data of paired HCC and adjusted normal tissues that were downloaded from GEO (GSE77276) (28). RNA-seq reads mapped to the first exons were integrated and normalized to measure the promoter activities by the R package “proActiv” (11). In total, we obtained the activity status of 113,076 possible promoters from the human reference genome, and 70,736 promoter activities of 25,085 genes were obtained from liver tissue. Approximately 57.4% (14,411/25,085) of genes had two or more different promoters (
and
). Then, we compared the differences in gene expression and promoter activity between tumor and normal tissues, respectively (
). We identified 6,879 differentially expressed genes (DEGs) and 8,976 genes with 16,049 DRPs. The upregulated DEGs and genes with DRPs (DRPGs) were partially overlapped, and so were the downregulated ones (
). Using t-distributed stochastic neighbor embedding (t-SNE), it was hard to distinguish the tumor from normal samples by expression of either all genes or DEGs (
), whereas distinguishment was successfully achieved by the activities of either all promoters or DRPs (
). A similar result was obtained through principal component analysis (PCA;
). Those results indicated that promoter activity exhibited a more obvious effect than gene expression in revealing the differences between tumor and normal samples.
Figure 1
Analysis of promoter activities in HCC. (A) Schematic representation of promoter activities of different transcript isoforms. Transcripts with the same or nearby transcript start site are grouped into the identical promoter. Promoter activity is defined as the total unique junction reads spanning at each promoter (see also Materials and Methods). The green track represents gene expression of tissue normalized by reads counts, blue track represents the activity of each promoter. (B) The number of promoters with activities in HCC per gene, a total of 25,085 genes with 70,736 promoters included. (C) Venn diagrams showing the overlap of differentially expressed genes (DEGs) and genes with differentially regulated promoters (DRPGs) for upregulated (top) and downregulated (bottom) ones. (D, E) t-distributed stochastic neighbor embedding (t-SNE) clustering the sequenced samples by FPKM for all genes or DEGs (D) and by promoter activities for all promoters or DRPs (E). Samples were colored by sample types (dark red: HCC; dark blue: adjacent normal tissue). (F, G) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of gene groups in (C). The bubble color represents the log2 (p-value) while the bubble size represents enriched gene counts.
Analysis of promoter activities in HCC. (A) Schematic representation of promoter activities of different transcript isoforms. Transcripts with the same or nearby transcript start site are grouped into the identical promoter. Promoter activity is defined as the total unique junction reads spanning at each promoter (see also Materials and Methods). The green track represents gene expression of tissue normalized by reads counts, blue track represents the activity of each promoter. (B) The number of promoters with activities in HCC per gene, a total of 25,085 genes with 70,736 promoters included. (C) Venn diagrams showing the overlap of differentially expressed genes (DEGs) and genes with differentially regulated promoters (DRPGs) for upregulated (top) and downregulated (bottom) ones. (D, E) t-distributed stochastic neighbor embedding (t-SNE) clustering the sequenced samples by FPKM for all genes or DEGs (D) and by promoter activities for all promoters or DRPs (E). Samples were colored by sample types (dark red: HCC; dark blue: adjacent normal tissue). (F, G) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of gene groups in (C). The bubble color represents the log2 (p-value) while the bubble size represents enriched gene counts.Furthermore, we performed functional enrichment analysis on DEG and DRPG overlapped genes, genes unique to DEGs (DEGs-only), and DRPG (DRPGs-only) (
). We noticed that overlapped upregulated genes were associated with proliferation-related ontologies, such as positive regulation of cell cycle and DNA replication. In addition, some cancer-related ontologies, such as regulation of apoptotic signaling pathway, histone acetylation, and positive regulation of ERBB signaling pathway, were enriched in DRPGs-only (
). In downregulated genes, only DRPGs can be specifically enriched to the regulation of cell morphogenesis or Ras protein signal transduction (
). Taken together, those results supported that there was general diversity in promoter activities between HCC and normal tissues. Compared with traditional gene expression analysis, the promoter activity analysis was more effective and more accurate in distinguishing HCC from normal tissues, which could provide more clues to investigate the potential mechanism of tumorigenesis and development.
Identification of Alternative Promoters in HCC
Next, we aimed to identify the APs based on the above calculated promoter activities in HCC. To this end, we firstly defined the APs according to the gene expression and promoter activity. As shown in
, the promoters with differential promoter activities (1.2-fold changes), but whose gene expression was not significantly changed, were defined as APs. A total of 855 APs from 709 genes were filtered by this screening of promoter activity and gene expression (
and
). The heatmap with the normalized promoter activities and the plot with promoter activity and gene expression changes were drawn to show the properties of all 855 APs (
). Sixty-four genes with both upregulated and downregulated APs could be good examples of switch usages of promoters: when one promoter is suppressed, another nearby one is activated (
and
). For APs, while the gene expression changes were not obvious, the promoter activity changed significantly. For example, in the proto-oncogene RARA, the activity of prmtr.27493 was remarkably higher in the tumor, while the activity of prmtr.27494 remained unchanged in both tumor and normal samples (
). Compared with normal tissues, the gene expression of RARA was unchanged (
), while the promoter activity of prmtr.27493 was significantly enhanced in HCC (
). When reviewing the genes with APs, we noticed that there are several other known cancer-associated genes, such as MET (42) (
), MICU1 (43) (
), and SLC19A1 (44) (
). The abnormally upregulated promoter activities in HCC may lead to the changes of the CDS region and produce new protein subtypes, as reported (12). For example, the upregulated prmtr.14927 in MET may lead to the accumulation of a 960-aa protein isoform that lacks the SEMA domain in HCC (
). The t-SNE analysis suggested that APs activity could obviously distinguish tumor tissues from normal tissues (
). A similar result was obtained by PCA (
). We further investigated the association between AP and corresponding transcript isoforms. As shown in
, 60.5% (517/855) of APs only regulate one transcript isoform, and 39.5% (388/855) of APs regulate two or more isoforms. The expression levels of transcription isoforms were positively correlated significantly with the promoter activities (R = 0.65, p < 2.2e-16;
). For APs with only one transcript isoform, differences in promoter activities could lead to 42.4% (219/517) of transcript isoform with significant expression changing (p-value < 0.05) and 31.5% (163/517) of transcript isoform with expression changing (|Fold Change| > 1.2). The remaining promoters may have little effect on the transcript expression changing (
). For APs with multiple transcription isoforms, an AP was identified as an AP with major significant differentially expressed isoforms if one transcription isoform has the most significant change (p-value < 0.05), and an AP was identified as an AP with major differentially expressed isoforms if one transcription isoform has the most expression change (|Fold Change| > 1.2). The results demonstrated that 53% of multiple isoform promoters were classified as AP with major significant differentially expressed isoforms and 33.7% (114/388) were classified as AP with major differentially expressed isoforms (
). Further functional enrichment analysis showed that genes with both upregulated or downregulated APs were enriched in cancer-related ontologies, such as ERBB signaling pathway and positive regulation of cell migration (
). All the results suggested that the usage of APs may play a significant role in the cellular transformation and progression of HCC.
Figure 2
Identification of alternative promoters (APs) in HCC. (A) The schematic illustration of the approach to identify APs in HCC. The promoters with differential activities (tumor vs. normal) but without differential expression were defined as APs (see also Materials and Methods). Green track and red track represent gene expression of normal and tumor tissue normalized by reads counts; blue tracks represent the activity of each promoter. (B) Heatmap showing the normalized promoter activities of APs for HCC and paired normal tissue (upper). The middle and bottom dot plots represent AP activity and gene expression normalized by log2FC. Promoters ranked by log2FC normalized promoter activity. (C) Venn diagram showing 64 genes with APs concurrently harboring both upregulated (Up) and downregulated (Down) promoters. (D) UCSC genome browser screenshot showing mean read count of prmtr.27493 and prmtr.27494 at the RARA gene locus in HCC (red track) and normal tissues (blue track). (E) The boxplot showing the expression of gene RARA in tumor and normal was nearly the same. ns: not significant. p-value > 0.05 (ANOVA, p-value = 0.88). (F) The boxplot showing promoter activity of prmtr.27493 was significantly higher in HCC samples. ***p-value <0.001 (ANOVA, p-value = 1.35e-04). (G) t-SNE plot showing normal (blue dots) and HCC (red dots) samples can be clustered by activities of all APs. (H) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of gene groups in (C). Bubble color represents the log2(p-value) while the bubble size represents enriched gene counts.
Identification of alternative promoters (APs) in HCC. (A) The schematic illustration of the approach to identify APs in HCC. The promoters with differential activities (tumor vs. normal) but without differential expression were defined as APs (see also Materials and Methods). Green track and red track represent gene expression of normal and tumor tissue normalized by reads counts; blue tracks represent the activity of each promoter. (B) Heatmap showing the normalized promoter activities of APs for HCC and paired normal tissue (upper). The middle and bottom dot plots represent AP activity and gene expression normalized by log2FC. Promoters ranked by log2FC normalized promoter activity. (C) Venn diagram showing 64 genes with APs concurrently harboring both upregulated (Up) and downregulated (Down) promoters. (D) UCSC genome browser screenshot showing mean read count of prmtr.27493 and prmtr.27494 at the RARA gene locus in HCC (red track) and normal tissues (blue track). (E) The boxplot showing the expression of gene RARA in tumor and normal was nearly the same. ns: not significant. p-value > 0.05 (ANOVA, p-value = 0.88). (F) The boxplot showing promoter activity of prmtr.27493 was significantly higher in HCC samples. ***p-value <0.001 (ANOVA, p-value = 1.35e-04). (G) t-SNE plot showing normal (blue dots) and HCC (red dots) samples can be clustered by activities of all APs. (H) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of gene groups in (C). Bubble color represents the log2(p-value) while the bubble size represents enriched gene counts.
The Activities of AP Were Significantly Correlated With DNA Methylation Status
DNA methylation, one of the most abnormal epigenetic modifications in cancers, is known to regulate transcriptional activity (45). To explore whether DNA methylation regulates the usage of APs in HCC, we first obtained the methylation status of the same paired tissues based on Infinium Human Methylation 450 K BeadChip (Illumina 450 K array) of GSE77276 (28). Then, all the promoters were classified into four groups by the quartiles of promoter activities, and the overall CpG methylation status of the four groups in the region (−2kb–2kb) of transcription start sites (TSSs) was calculated in either cancer or normal samples (
). Notably, in the region 1,000 bps around TSS, the higher promoter activity correlated with lower methylation status in both tumors and normal tissues (
). Next, we compared the methylation status of DRPs. In the vicinity of TSS, promoters upregulated in tumors have lower methylation status than normal tissues, whereas the downregulated promoters would be inclined to have a higher methylation status in tumors (
and
, upper). Last, we focused on the genes with APs, and a similar correlation between the promoter activity and methylation was observed in both upregulated and downregulated APs (
and
, lower).
Figure 3
The activity of AP was significantly correlated with DNA methylation status. (A) Methylation levels of CpGs within ±2 kbps relative to TSS were assessed in four groups classified by the quartiles of promoter activities. Methylation profile was smoothed by gam (Generalized Additive Models). Green to red represents the promoter activity levels from 0 to 100%. (B, C) The methylation profile showing mean methylation levels of TSS nearby region ( ± 2kb) of the DRPs (B) and APs (C). Upregulated (top) and downregulated (bottom) promoters were shown separately. The blue line and red line represent normal and HCC samples, respectively. (D, E) Scatter plots showing the correlation between differential methylation (HCC – normal) and promoter activity by normalized change fold for DRPs (D) and APs (E). The representative CpG sites were filtered from the ±1kb upstream and downstream of TSS (see also Materials and Methods). Black dots represent the differentially regulated promoters with significant methylation changes (|Diff. methylation| > 0.1); only these black dots were used for the Pearson correlation test. (F) The proportions of correlation categories between the promoter activities of APs and their methylation status are shown in the pie chart. Negative, positive, and no correlations are colored by green, orange, and gray, respectively. (G) Similar to (D, E), but for methylation regulated APs (mrAPs). (H) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of APs with negative and positive correlations in (F).
The activity of AP was significantly correlated with DNA methylation status. (A) Methylation levels of CpGs within ±2 kbps relative to TSS were assessed in four groups classified by the quartiles of promoter activities. Methylation profile was smoothed by gam (Generalized Additive Models). Green to red represents the promoter activity levels from 0 to 100%. (B, C) The methylation profile showing mean methylation levels of TSS nearby region ( ± 2kb) of the DRPs (B) and APs (C). Upregulated (top) and downregulated (bottom) promoters were shown separately. The blue line and red line represent normal and HCC samples, respectively. (D, E) Scatter plots showing the correlation between differential methylation (HCC – normal) and promoter activity by normalized change fold for DRPs (D) and APs (E). The representative CpG sites were filtered from the ±1kb upstream and downstream of TSS (see also Materials and Methods). Black dots represent the differentially regulated promoters with significant methylation changes (|Diff. methylation| > 0.1); only these black dots were used for the Pearson correlation test. (F) The proportions of correlation categories between the promoter activities of APs and their methylation status are shown in the pie chart. Negative, positive, and no correlations are colored by green, orange, and gray, respectively. (G) Similar to (D, E), but for methylation regulated APs (mrAPs). (H) Bubble plots showing the enriched biological processes by gene ontology (GO) analysis of APs with negative and positive correlations in (F).As shown in
, the most significant changes in CpG methylation were located upstream and downstream of 1,000 bps to TSS, and we next focused on these regions to examine the correlation of methylation of each CpG with promoter activity. As shown in the illustrative cartoon (
), we calculated the correlation between each CpG methylation and the related promoter activity and selected the CpG site with the most significant p-value to represent the CpG methylation status of the promoter. The activities of 37.0% (2,468/6,674) of DRPs were significantly negatively correlated (green) with their methylation status, and 23.0% (1,536/6,674) were positively correlated (orange) (
). A negative correlation between the changes of gene expression and methylation in DEGs could be observed (
). The correlation test results showed that negative correlation between promoter activity and methylation status in DRPs was stronger than gene expression (R = −0.23, p-value < 2.2e-16;
and
), as was the correlation results in APs (R = −0.29, p-value = 0.00093;
and
). When examining the correlation of promoter activity and methylation for each AP, we observed that the activities of more than half of APs were significantly correlated with their methylation status, of which 32.8% (189/576) were negatively correlated (green) and 19.8% (114/576) were positively correlated (orange) (
and
). Previous studies showed that gene expression could be silenced by hypermethylation of promoters (16–18) and the overexpression of oncogenes (19) could be associated with hypomethylated regions. We then termed the 189 APs with negative correlations as methylation-regulated APs (mrAPs). Consistent with our expectation, the negative correlation in mrAPs was significant (R = −0.76, p-value = 12e-13;
and
). Next, we further investigated the association between mrAPs and the corresponding transcript isoforms (
). When comparing the transcription isoform status of mrAPs to the ones of APs, we observed that both the frequency of significantly differentially expressed isoform for promoters with one transcript isoform (56.3% of mrAPs versus 42.4% of APs) and the frequency of major significantly differentially expressed isoforms for promoters with multiple transcript isoforms (mrAPs 57.8% vs. APs 53%) in mrAPs were higher than in APs (
). Gene ontology analysis revealed that those methylation-associated promoters were enriched for ontologies known to be deregulated in HCC, such as negative regulation of growth, positive regulation of apoptotic process, and cell matrix adhesion (
). Those results demonstrated that usage of APs may be regulated by DNA methylation in HCC.
The Methylation Regulated APs Could Be Used as Tumor Diagnostic Markers
As shown above, the correlation of promoter activity and DNA methylation in APs was observed, and we then explored whether the activities of mrAPs could serve as diagnostic markers. We evaluated it by the least absolute shrinkage and selection operator (LASSO) models. By the LASSO regression model, six out of 189 mrAPs were selected to generate a diagnostic model (
). The diagnostic model scores of tumors and normal tissues were significantly different (
), and the dimensional-reduction analysis based on the promoter activities of the six mrAPs showed that the classifier was particularly effective (
). The six mrAPs were clustered into four upregulated mrAPs (prmtr.53735 of TNFRSF10, prmtr.32651 of RGS3, prmtr.36049 of CCDC150, and prmtr.5237 of RASSF1) and two downregulated mrAPs (prmtr.37640 of TACC1 and prmtr.39585 of RABGAPL1) by promoter activities (
, upper;
and
). The CpG methylation status of the six mrAPs showed an opposite trend when compared to the promoter activities (
, lower;
and
).
Figure 4
A good performing HCC diagnosis model was generated using mrAPs. (A) The cross-validation fit curve was calculated by the LASSO regression using promoter activities of all mrAPs, and six mrAPs were filtered to generate a diagnostic model. (B) The boxplot showing the significant different model scores of 19 paired tumors and normal samples based on the diagnostic model of 6mrAPs. **** p-value <0.0001 (Wilcoxon test, p-value = 5.7e-11) (C) t-SNE plot showing normal (blue dots) and HCC (red dots) samples can be significantly distinguished by the promoter activities of six mrAPs. (D) The upper heatmap showing the promoter activities of six mrAPs, the lower heatmap showing the screened CpG methylation status of six mrAPs. Samples are arranged in consistent order in both diagrams. (E) t-SNE plot showing normal (blue dots) and HCC (red dots) samples could also be grouped by the activities of six mrAPs in the independent test dataset of GSE105130. (F) Boxplot showing the significantly different model scores of HCC and normal sample of the test dataset of GSE105130. ****p-value <0.0001 (Wilcoxon test, p-value = 2.8e-11) (G) ROC curve showing the performance and prediction accuracy of the diagnostic model in the test dataset of GSE105130.
Table 1
Promoter activity and methylation alterations of mrAPs in the HCC diagnosis model.
A good performing HCC diagnosis model was generated using mrAPs. (A) The cross-validation fit curve was calculated by the LASSO regression using promoter activities of all mrAPs, and six mrAPs were filtered to generate a diagnostic model. (B) The boxplot showing the significant different model scores of 19 paired tumors and normal samples based on the diagnostic model of 6mrAPs. **** p-value <0.0001 (Wilcoxon test, p-value = 5.7e-11) (C) t-SNE plot showing normal (blue dots) and HCC (red dots) samples can be significantly distinguished by the promoter activities of six mrAPs. (D) The upper heatmap showing the promoter activities of six mrAPs, the lower heatmap showing the screened CpG methylation status of six mrAPs. Samples are arranged in consistent order in both diagrams. (E) t-SNE plot showing normal (blue dots) and HCC (red dots) samples could also be grouped by the activities of six mrAPs in the independent test dataset of GSE105130. (F) Boxplot showing the significantly different model scores of HCC and normal sample of the test dataset of GSE105130. ****p-value <0.0001 (Wilcoxon test, p-value = 2.8e-11) (G) ROC curve showing the performance and prediction accuracy of the diagnostic model in the test dataset of GSE105130.Promoter activity and methylation alterations of mrAPs in the HCC diagnosis model.AP, alternative promoter; mrAPs, methylation-regulated APs; Dist., distance; Diff., Difference (Tumor-Normal).Two other independent public datasets of GSE105130 (30) and GSE55758 (29) were then further used as test datasets to assess the diagnostic model. The promoter activities of six mrAPs were successful in discriminating tumor from normal using t-SNE (
and
). The classify model yielded significant differences between tumor and normal samples (
and
). The AUC score of 0.97 (GSE105130) and 0.95 (GSE55758) indicated the good performance of our classifier in both test datasets (
and
). In this section, we constructed a tumor diagnostic model based on promoter activities of six mrAPs with significant diagnostic effects, which indicated that the promoter activity of mrAPs could be valuable in tumor diagnosis.
The Methylation Status of APs Could Be Used as a Prognostic Indicator in HCC
It has been reported that promoter activity could be used as prognostic markers in gastric cancer and renal cancer (11). In our study, approximately half of the promoter activity changed, which was likely due to the alteration of methylation status. We next asked whether the methylation status of APs predicts patient survival in HCC. In order to do this, The Cancer Genome Atlas (TCGA) 450K data of HCC patients with the prognostic information were used for survival analysis. We first focused on the methylation status of the above six mrAPs in the LASSO diagnostic model. As shown in
, compared with normal samples, the gene expression of CCDC150 in tumor samples was not significantly changed, but the activities of prmtr.36049 were notably increased. By comparing the methylation levels of the paired samples, a lower methylation level in CCDC150 in tumor samples was observed (
). Further analysis revealed that the methylation values and promoter activity of CCDC150 were well negatively correlated (R = −0.71, p = 6.4e-7), and the probe with the minimal p-value is cg01265662 (
). The methylation values of cg01265662 were then divided into two clusters based on optimal cutoffs and the length of patient survivals was compared (p-value = 0.00035,
). The higher the methylation level of cg01265662, the better prognostic result was observed. These results indicated that the CpG methylation status of prmtr.36049 of CCDC150 may be used as a prognostic factor in HCC. We further investigated the other five mrAPs and observed that the CpG methylation status of RASSF1, TACC1, and RABGAP1L could also possess prognostic values (
).
Figure 5
The methylation status of APs predicts patient survival in HCC. (A) UCSC genome browser screenshot showing mean read count (top 2 tracks) and 450K methylation beta values (bottom 2 tracks) at the CCDC150 gene locus of HCC (red) and normal tissues (blue or green). The statistical results are shown in the middle (boxplot for promoter activity, ****p-value <0.0001 (ANOVA, p-value = 1.14e-07)) and right (boxplot for gene expression, ns: not significant. p-value > 0.05 (ANOVA, p-value = 0.12)). (B) Methylation beta values of ±500 bps relative to TSS (prmtr.36049). Normal and tumor samples are colored by green and red dots, dots from the same sample were connected by lines. Cg01265662 with the lowest p-value for the correlation test was marked and screened for calculation in (C) and (D). (C) The scatter plot showing the negative correlation between promoter activities of prmtr.36049 and methylation beta values of cg1265662 in HCC (red) and normal (blue) samples (Pearson correlation). (D) The 10-year overall survival curve of methylation levels of cg1265662 in TCGA-LIHC patients in the high and low methylation cohort, showing that methylation of cg1265662 was significantly associated with survival in HCC. (E) The pie plot showing the majority of mrAPs with methylation values available in TCGA-LIHC. (F) The proportion of mrAPs with significant prognostic CpG methylation sites in upregulated and downregulated ones. (G) The proportion of genes with significant prognostic CpG methylation sites in mrAPs grouped by oncogenes (ONG), tumor suppressor genes (TSG), and others. (H–J) The correlations between promoter activity and methylation levels (upper) similar to (C) and corresponding survival curve of CpG sites (bottom) similar to (D) of RARA
(H), APC
(I), and PDZK1
(J) were shown, respectively.
The methylation status of APs predicts patient survival in HCC. (A) UCSC genome browser screenshot showing mean read count (top 2 tracks) and 450K methylation beta values (bottom 2 tracks) at the CCDC150 gene locus of HCC (red) and normal tissues (blue or green). The statistical results are shown in the middle (boxplot for promoter activity, ****p-value <0.0001 (ANOVA, p-value = 1.14e-07)) and right (boxplot for gene expression, ns: not significant. p-value > 0.05 (ANOVA, p-value = 0.12)). (B) Methylation beta values of ±500 bps relative to TSS (prmtr.36049). Normal and tumor samples are colored by green and red dots, dots from the same sample were connected by lines. Cg01265662 with the lowest p-value for the correlation test was marked and screened for calculation in (C) and (D). (C) The scatter plot showing the negative correlation between promoter activities of prmtr.36049 and methylation beta values of cg1265662 in HCC (red) and normal (blue) samples (Pearson correlation). (D) The 10-year overall survival curve of methylation levels of cg1265662 in TCGA-LIHC patients in the high and low methylation cohort, showing that methylation of cg1265662 was significantly associated with survival in HCC. (E) The pie plot showing the majority of mrAPs with methylation values available in TCGA-LIHC. (F) The proportion of mrAPs with significant prognostic CpG methylation sites in upregulated and downregulated ones. (G) The proportion of genes with significant prognostic CpG methylation sites in mrAPs grouped by oncogenes (ONG), tumor suppressor genes (TSG), and others. (H–J) The correlations between promoter activity and methylation levels (upper) similar to (C) and corresponding survival curve of CpG sites (bottom) similar to (D) of RARA
(H), APC
(I), and PDZK1
(J) were shown, respectively.We next analyzed how many mrAPs have prognostic methylation markers in HCC. Among 189 mrAPs, 171 with available probe methylation values from TCGA were used for further analysis (
). The results showed that 83.63% (143/171) of mrAPs had prognostic methylation markers, with 90.11% (82/91) of upregulated mrAPs and 76.25% (61/80) of downregulated mrAPs (
and
). It contained ten switch-usage AP genes, with the example of ARAP1 exhibited in
. The ONGene and TSGene already had catalogs genes closely associated with tumorigenesis and development. Six of eight oncogenes (ONG) and 10 of 11 tumor suppressor genes (TSG) had methylation markers (
). The higher expression of the oncogene RARA might be regulated by the hypomethylation, and the lower methylation status predicted a worse clinical outcome (
), while for TSG APC, the lower methylation status predicts a better prognostic result (
). Among the genes from the ONG and TSG lists, some may also play a role in cancers. For example, PDZK1 (
), which is related to cancer progression, had been reported in different kinds of cancers, such as gastric cancer (46), renal cell carcinoma (47), and breast cancer (48). However, PDZK1 played a different or even opposite role in other tumors. In our study, the lower methylation accompanied by higher expression status had a worse survival trend, which implied that PDZK1 may harbor oncogenic activity in HCC. Taken together, these results demonstrated that CpG methylation status of APs may be used as a prognostic marker by altering the activities of the promoters and provided a new perspective for understanding the underlying mechanisms of cancer development.
Validation of Promoter Activity and Methylation Status by RNA-seq and WGBS
To systematically verify the above results, we collected four samples (two HCC and two paired para-cancer tissues) for RNA-seq and WGBS. Mapping statistical information of RNA-seq and WGBS data are shown in
. A total of 54,293 promoters with activities (
) were obtained, suggesting that the usage of promoters in cancer may have a sample or subtype heterogeneity (9). We first verified the APs and mrAPs activity changes in our validation data by the criteria of 1.2-fold change between tumor and normal tissues. The results showed that activity changes of 86.6% (554/640) APs and 85.6% (137/160) mrAPs identified by a public dataset could be confirmed in at least one pair of our validation data (
and
). We next aimed to verify the methylation status of promoters using WGBS. We calculated the genome-wide CpG methylation and the average methylation levels of the TSS region of promoters. The methylation level of the TSS region gradually decreased with the increase of the promoter activity level in both pairs of HCC patients (
). Then, we focused on verification of the CpG sites selected by the correlation test in the public data previously. A differential methylation ratio over 0.1 with the same alteration trends in both public 450K data and our validation WGBS data would be regarded as confirmed. By WGBS, the average methylation status of the region was more effective for the representativeness of the promoter methylation status and could compensate for the lack of sites and deviations caused by a single methylation site. Then, we further calculated the mean methylation of the promoter regions (−1kb–1kb) and compare it to the methylation status of the selected CpG sites above. So, we next used mean methylation (sequencing reads covered all the samples) of promoter region for the sites without enough methylation information in WGBS for further analysis. About 89.7% (1,769/1973) of the significant methylation changes of the available DRPs CpG sites were verified in our samples (
). Among them, the confirmed ratios for APs and mrAPs were 91.4% (117/128) and 92.3% (60/65), respectively (
). The higher confirmed ratios (90.6% for DRPs, 96.6% for APs, and 95.7% for mrAPs) were achieved when the criteria of significant methylation changes were enhanced to 0.2 (
). The negative correlation between the changes of promoter activity and the regional methylation level in DRPs (R = −0.29, p-value = 2.2e-16;
) and APs (R = −0.22, p-value = 0.047;
) were also confirmed in our validation. Consistent with our expectation, the correlation coefficient in mrAPs was higher (R = −0.41, p-value = 0.017;
). A similar negative correlation was also obtained using selected CpGs sites (
). As shown above, our RNA-seq and WGBS data powerfully verified the negative correlation of promoter activity and methylation status in HCC.
Figure 6
Validation of promoter activity and methylation status by RNA-seq and WGBS. (A) Venn diagram showing the overlap of promoters with activities in our validation dataset of RNA-seq and public dataset of GSE77276. (B, C) The pie chart showing the percentage of all identified APs (B) and mrAPs (C) in GSE77276 with differential promoter activities being confirmed by our validation dataset of RNA-seq. (D–F) The pie chart showing the percentage of all identified DRPs (D), APs (E), and mrAPs (F) in GSE77276 with differential methylation status being confirmed by our validation dataset of WGBS. Changes of methylation of CpG site (site) with over 0.1 in both public 450K data and our validation WGBS data, and also with the same alteration trends, would be regarded as confirmed. The promoter region (region) mean methylation was adopted for testing if the methylation of a CpG site is not available in the WGBS dataset. (G–I) Scatter plots showing the correlation between differential methylation (HCC – normal) and promoter activity by normalized change fold for DRPs (G), APs (H), and mrAPs (I), similar to
, but in our validation datasets of RNA-seq and WGBS. (J) The t-SNE plot showing the normal (blue dots) and HCC (red dots) samples in our validation dataset could be grouped by the promoter activities of six mrAPs used in the diagnostic model in
. (K) Heatmap showing the promoter activities and methylation status of six mrAPs in our validation data. (L) UCSC genome browser screenshot showing the promoter activities and methylation of gene PDZK1 in both public dataset of GSE77276 and our validation datasets. GSE77276 RNA-seq tracks represent mean read counts of 19 samples, and validation dataset RNA-seq tracks represent read counts for HCC or normal samples from GX154044 and GX157272 separately. GSE77276 450K tracks represent the mean methylation value of 19 samples, and validation dataset WGBS tracks represent methylation ratio for samples from GX154044 and GX157272 separately.
Validation of promoter activity and methylation status by RNA-seq and WGBS. (A) Venn diagram showing the overlap of promoters with activities in our validation dataset of RNA-seq and public dataset of GSE77276. (B, C) The pie chart showing the percentage of all identified APs (B) and mrAPs (C) in GSE77276 with differential promoter activities being confirmed by our validation dataset of RNA-seq. (D–F) The pie chart showing the percentage of all identified DRPs (D), APs (E), and mrAPs (F) in GSE77276 with differential methylation status being confirmed by our validation dataset of WGBS. Changes of methylation of CpG site (site) with over 0.1 in both public 450K data and our validation WGBS data, and also with the same alteration trends, would be regarded as confirmed. The promoter region (region) mean methylation was adopted for testing if the methylation of a CpG site is not available in the WGBS dataset. (G–I) Scatter plots showing the correlation between differential methylation (HCC – normal) and promoter activity by normalized change fold for DRPs (G), APs (H), and mrAPs (I), similar to
, but in our validation datasets of RNA-seq and WGBS. (J) The t-SNE plot showing the normal (blue dots) and HCC (red dots) samples in our validation dataset could be grouped by the promoter activities of six mrAPs used in the diagnostic model in
. (K) Heatmap showing the promoter activities and methylation status of six mrAPs in our validation data. (L) UCSC genome browser screenshot showing the promoter activities and methylation of gene PDZK1 in both public dataset of GSE77276 and our validation datasets. GSE77276 RNA-seq tracks represent mean read counts of 19 samples, and validation dataset RNA-seq tracks represent read counts for HCC or normal samples from GX154044 and GX157272 separately. GSE77276 450K tracks represent the mean methylation value of 19 samples, and validation dataset WGBS tracks represent methylation ratio for samples from GX154044 and GX157272 separately.In addition, tumor samples could be successfully distinguished from normal samples by the activities of six mrAPs using t-SNE, which confirmed the accuracy of the above diagnostic model (
). Heatmap showed the activity trends of the six mrAPs used in the diagnostic model, which were consistent with the above results (
). The methylation status of six mrAPs in our validation data showed a similar changing pattern with
(
). Finally, examples mentioned above such as PDZK1 were examined in the UCSC genome browser (
and
). As shown in
, the promoter (prmtr.54498) of the tumor samples was higher than normal, both in the public data and our two samples. The CpG methylation status of prmtr.54498 was lower in the tumor samples, and our WGBS data showed a more pronounced effect. The correlation between the promoter activity and methylation status, including our validation data, is shown in
. Another example of CCDC150 is shown in
. In addition, these observations were further verified by an independent validation based on public WGBS and RNA-seq datasets (31) of the paired tumor and normal samples from two patients (
). Through the comprehensive analysis of our validation data and independent public datasets of RNA-seq and WGBS, we further confirmed the effects of aberrant DNA methylation on the usage of APs in HCC from a genome-wide perspective, which provides a new insight into the exploration of tumor mechanisms.
Discussion
Promoters are one of the key factors that regulate gene expression. Recent studies showed that the differential activities of promoters had a significant impact on the cancer transcriptome and contribute to the cellular transformation of cancer (11, 12). Genome-wide promoter analysis methods such as H3K4me3 ChIP-seq, CAGE, and RNA long-read sequence had a limitation in the numbers of publicly available datasets. In this study, we applied “proActiv”, an R package quantification promoter activity based on the widely used RNA short-read sequencing. Promoter activity inferred by “proActiv” has been advocated to have high consistency with other technologies (11, 12). So far, this study provides the first systematic study of genome-wide promoters usage in HCC. Compared to the gene expression, promoter activity successfully distinguished the tumor samples from normal by either t-SNE or PCA, which suggested an advantage by promoter activity in identifying the differences between tumor and normal samples. In addition, some cancer-related ontologies enriched only in genes with differentially regulated promoters (DRPGs) implied that promoter analysis may provide more information to the potential mechanism of tumorigenesis and development.DEGs of HCC have been emphasized in previous studies, but only a few studies focused on the non-differential genes (non-DEGs). In this study, we focused on the promoters that belong to the non-DEGs. DRPs were considered as an AP if they were derived from non-DEGs. We identified 855 APs from 709 genes, among which are several known cancer-associated genes, such as RARA, ARAP1, and MET. MET, a prototypical receptor tyrosine kinase, has been reported in several cancers and regulates many physiological processes including proliferation, morphogenesis, and survival (42). In our study, the promoter activity of the N short-truncated isoform was significantly increased in HCC patients, which may lead to abnormal SEMA domain lacking protein accumulation. The abnormally increased expression of the N short-truncated MET isoform had also been observed in gastric cancer (12). Further studies are required to determine how the abnormal SEMA-lacking protein accumulation plays a role in tumor development.Few studies have aimed to determine the potential mechanism of regulation in the usage of APs. DNA methylation is one of the most deeply studied epigenetic regulatory potential mechanisms. The canonical mechanisms of transcript silencing caused by hypermethylation include the following: (1) hypermethylation interferes with transcription factor binding, (2) methylated DNA-binding protein (MDBP) prevents the binding of transcription factors to target sequences in the promoter, and (3) hypermethylation changing chromatin structure leads to tighter chromatin structure and transcriptional inactivation (15, 49). This is the first systematic study focusing on the relationship between methylation status and promoter activities in APs. In our study, there are approximately 53% APs activity in cancers likely to be regulated by DNA methylation, among which 62% show canonical negative correlations. A positive correlation had also been reported in a selection of contexts (50). However, its potential mechanism needs further exploration. Taken together, our results indicated that the aberrant methylation states play a critical role in the precision usage of APs in HCC.We next focused on the diagnostic and prognostic values of methylation-regulated APs (mrAPs). Based on the LASSO regression model, six out of 189 mrAPs were selected to generate a diagnostic model, which works well in both the training and testing datasets. For the six mrAPs in the diagnostic model, five mrAPs (prmtr.53763 of TNFRSF10C, prmtr.32651 of RGS3, prmtr.36049 of CCDC150, prmtr.37640 of TACC1, and prmtr.39585 of RABGAP1L) belong to a multiple isoform mrAP with major significantly differentially expressed isoforms. prmtr.5237 of RASSF1 belongs to promoters regulating one transcript isoform with a significant expression change. All of these observations may highlight the more significant effect of multiple-isoform APs with major significantly differentially expressed isoforms on the development of HCC. TNFRSF10C works as an antagonistic receptor that protects cells from TRAIL-induced apoptosis. The copy number variation of TNFRSF10C and the downregulation of protein TNFRSF10C have been reported to be associated with colorectal cancer metastasis (51, 52). In our work, the activity of prmtr.53763 in TNFRSF10C was significantly upregulated and the role of transcript isoform is regulated by this promoter in tumors and needs to be explored in future studies. RGS3 is a GTPase-activating protein that inhibits G-protein-mediated signal transduction and associated with tumor cell proliferation and migration in glioma (53) and gastric cancer (54). Our study observed that GRS3 promoter activity (prmr.32651) is significantly and steadily upregulated in HCC. RASSF1 plays an important role in the occurrence and process of malignant tumors. It contains two well-studied subtypes, RASSF1A and RASSF1C, due to AP usage. Our research showed that hypermethylation of the RASSF1A promoter (prmr.5239) associated with the downregulation of promoter activity and tended to have poorer cancer survival, which was consistent with previous studies (55–57). In addition, the hypomethylation of RASSF1C promoter (prmtr.5237) was associated with the upregulation of promoter activity, which could serve as an oncogene in both our study and previous research (58). By interacting with a variety of complexes, TACC1 participates in tumorigenesis and development. Abnormal TACC1 regulation plays an important role in the occurrence and development of multiple myeloma including breast cancer (59), gastric cancer (60), and ovarian cancer (61). Our research demonstrated that the methylation status of the TACC1 promoter region is significantly related to promoter activity, which implies the new roles of TACC1 in liver cancer. RABGAP1L is a protein coding gene that is functionally involved in endocytosis and intracellular protein transport by regulating the activity of GTPases (62). CCDC150 is a protein coding gene with multiple transcripts. We reported that the CpG methylation status of CCDC150 and RABGAP1L could have prognostic values in HCC, which linked the functions of these two genes to cancer development.It has also been reported that promoter activity could be used as a prognostic marker in gastric cancer and renal cancer (11). DNA methylation could potentially function as a tumor biomarker with high stability and high specificity. Traditionally, DNA methylation studies were mainly based on the DEGs, and the gene promoter regions are usually located upstream and downstream of the most distal transcription start site. However, in reality, more than half of the genes have one or more transcription start sites, and a large amount of gene-related methylated regions are being overlooked. In our study, 83.63% (143/171) of mrAPs had at least one associated methylation site that could be used to predict clinical outcomes. Methylation of four promoters in the diagnostic model and several known oncogenes or tumor suppressor genes’ promoters were included.RARA has been reported to promote tumor progression in breast cancer (63), acute promyeloid leukemia (64), and liver cancer (65). In our study, the methylation level of the promoter (prmtr.27493) of RARA in HCC was significantly decreased, with the increase of promoter activity (65, 66). It is likely that the full-length transcript was overexpressed in HCC, which may promote the development of the tumor. The tumor suppressor gene (TSGene) APC has been most studied in colorectal cancer (67), and its role in liver cancer has also been reported (68–70). In contrast to RARA, hypermethylation of a promoter (prmtr.29535) inhibits the transcription and may contribute to the intensification of tumor progression. The oncogenic activity of RARA and tumor suppressor activity of APC observed in our study supported their roles that were reported in previous research. In addition, there are some cancer-associated genes from the oncogene and TSGene lists. PDZK1 plays a different or even opposite role in different tumors. PDZK1 acts as a tumor suppressor in gastric cancer and renal cancer, but in esophageal adenocarcinoma, breast cancer, and multiple myelomas (MMs), the overexpression of PDZK1 promotes cancer development or drug resistance (71–73). We found that PDZK1 promoter activity was significantly increased in the HCC, and was significantly correlated with methylation status, which showed that the lower methylation group of patients would have a worse prognosis. Our results suggested that PDZK1 may harbor oncogenic activity in HCC.Finally, we used RNA-seq and WGBS in HCC patients to perform a comprehensive verification of our study. The significant changing of promoter activities of 86.6% (554/640) APs and 85.6% (137/160) mrAPs could be confirmed in our validation dataset. A majority of the selected 450K CpGs with significantly changed methylation sites could be confirmed in our WGBS validation dataset, especially in mrAPs. A negative correlation between the change of promoter activity and the methylation variation implied that methylation may regulate the usage of APs in HCC. In addition, both promoter activity and the methylation status of the six methylation-regulated APs used in the diagnostic model could also be verified in the validation data. We extended our validations to two other independent pairs of liver cancer and matched normal samples from the public dataset (31). Some limitations exist in our study due to the small sample size, and more WGBS samples would be investigated in our future studies. However, the relationship between promoter activities and methylation changing of APs in cancer and normal samples could be validated on a genome-wide scale by paired WGBS and RNA-seq data. All in all, our results suggested that the study of APs and their methylation status can have a general application in liver cancer.
Conclusion
Our study demonstrated that promoter activity was more effective for HCC recognition than gene expression, and the usage of APs has a significant influence on the cancer transcriptome. Furthermore, the precise usage of APs could be regulated by DNA methylation in HCC, which would have a great effect on the comprehensive understanding of the tumorigenesis mechanism. Finally, based on methylation-regulated APs, our study provided an effective potential approach for cancer screening and treatment. Taken together, our study provided a new perspective on transcription regulation and contributed to the cellular transformation of cancer.
Data Availability Statement
The RNA-seq and WGBS raw sequence data generated in this paper have been deposited in the Genome Sequence Archive (74) in National Genomics Data Center (75), China National Center forBioinformation / Beijing Institute of Genomics, Chinese Academy of Sciences, under accession number HRA001330 that are publicly accessible at https://ngdc.cncb.ac.cn/gsa-human/browse/HRA001330. The processed data related to WGBS, RNA promoters’ activity and gene expression are publicly deposited at jianguoyun.com. (https://www.jianguoyun.com/p/Da8nQ6UQ7cDxCRiKhqUE). The source data for all the figures and supplementary figures are available in
. The public datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/
.
Ethics Statement
Written informed consent was obtained from the individual(s) for the publication of any potentially identifiable images or data included in this article.
Author Contributions
XH and YD conceived and designed the study. QW and BX collected the tissue sample and performed sequencing. YD, XL, SW, BJ, and RL performed the data analysis. YD and XL drafted the manuscript. XH revised the manuscript. XH and QW supervised the study. All authors contributed to the manuscript and approved the submitted version.
Funding
The project was supported by Fundamental Research Funds for the Central Universities, HUST (No. 2021GCRC073). Guangxi Key Research and Development Program (AB18126055 and AB20297009).
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Publisher’s Note
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.
Authors: Daniel G Tanenbaum; William A Hall; Lauren E Colbert; Amanda J Bastien; Daniel J Brat; Jun Kong; Sungjin Kim; Bhakti Dwivedi; Jeanne Kowalski; Jerome C Landry; David S Yu Journal: J Gastrointest Oncol Date: 2016-06
Authors: Ju Dong Yang; Ajitha Mannalithara; Andrew J Piscitello; John B Kisiel; Gregory J Gores; Lewis R Roberts; W Ray Kim Journal: Hepatology Date: 2018-05-09 Impact factor: 17.425
Authors: Yu Jin; Wai Yeow Lee; Soo Ting Toh; Chandana Tennakoon; Han Chong Toh; Pierce Kah-Hoe Chow; Alexander Y-F Chung; Samuel S Chong; London L-P-J Ooi; Wing-Kin Sung; Caroline G-L Lee Journal: J Transl Med Date: 2019-08-20 Impact factor: 5.531
Authors: Shuang G Zhao; William S Chen; Haolong Li; Adam Foye; Meng Zhang; Martin Sjöström; Rahul Aggarwal; Denise Playdle; Arnold Liao; Joshi J Alumkal; Rajdeep Das; Jonathan Chou; Junjie T Hua; Travis J Barnard; Adina M Bailey; Eric D Chow; Marc D Perry; Ha X Dang; Rendong Yang; Ruhollah Moussavi-Baygi; Li Zhang; Mohammed Alshalalfa; S Laura Chang; Kathleen E Houlahan; Yu-Jia Shiah; Tomasz M Beer; George Thomas; Kim N Chi; Martin Gleave; Amina Zoubeidi; Robert E Reiter; Matthew B Rettig; Owen Witte; M Yvonne Kim; Lawrence Fong; Daniel E Spratt; Todd M Morgan; Rohit Bose; Franklin W Huang; Hui Li; Lisa Chesner; Tanushree Shenoy; Hani Goodarzi; Irfan A Asangani; Shahneen Sandhu; Joshua M Lang; Nupam P Mahajan; Primo N Lara; Christopher P Evans; Phillip Febbo; Serafim Batzoglou; Karen E Knudsen; Housheng H He; Jiaoti Huang; Wilbert Zwart; Joseph F Costello; Jianhua Luo; Scott A Tomlins; Alexander W Wyatt; Scott M Dehm; Alan Ashworth; Luke A Gilbert; Paul C Boutros; Kyle Farh; Arul M Chinnaiyan; Christopher A Maher; Eric J Small; David A Quigley; Felix Y Feng Journal: Nat Genet Date: 2020-07-13 Impact factor: 38.330