GBM is the most aggressive form of brain tumor, with high recurrence (~37%) and mortality rates (median survival time <1 year) in 2018 (1,2). GBM originates from astrocytes and is comprised of different types of cell (3), which multiply rapidly and support GBM development via large vascular networks (4). The current standard treatment for GBM includes surgery, followed by radiation therapy and temozolomide chemotherapy (5). However, the median survival time for this procedure is <1 year and very few patients survive >3 years. Despite great efforts, little progress has been made in the treatment of GBM over the past decade (6,7). Furthermore, a series of phase III clinical trials for targeted drugs has failed to improve the overall survival of patients with GBM (8). Thus, it is critical to investigate potential biomarkers of GBM and provide novel insights into the diagnosis and treatment of the disease.Changes in the expression of biomarkers are commonly described in oncological research. Biomarkers provide a framework and assurance for the strategy of diagnostic and therapeutic methods, with the aim of eradicating complications. Previous studies have identified prognostic biomarkers for GBM, such as long coding RNAs, circular RNAs, microRNAs and somatic mutations (9–12). In the past decade, epigenetic modification, especially DNA methylation in carcinogenesis, has become a focus for cancer research. Several methylated markers have been reported. For example, O6-methylguanine DNA methyltransferase promoter methylation is the major emblematic biomarker in GBM, which may predict the response to temozolomide treatment (13–15). Epigenetic changes are heritable and reversible, affecting the spatial conformation of DNA and its transcriptional activities (16). DNA methylation is one of the main epigenetic mechanisms of gene expression regulation in eukaryotes (17). Changes in DNA methylation may affect gene expression and feedback mechanisms, with various positive and negative interactions (18). Thus, abnormally methylated CpG sites are considered potential prognostic factors for GBM.As the majority of previous studies were performed on relatively small sample sizes and limited to a single epigenetic level, the present study integrated 152 GBM RNA-sequencing (RNA-seq) samples and 151 GBM DNA methylation samples from public databases. Bioinformatics-based methods were implemented to identify novel markers of phenotypically important associations among DNA methylation, gene expression and survival time in patients with GBM, using The Cancer Genome Atlas (TCGA) datasets.
Materials and methods
Data preparation
RNA-seq data of 174 GBM samples were downloaded from the TCGA (https://portal.gdc.cancer.gov/) database in October, 2018 according to ‘glioblastoma’ search term. Repeat samples, non-cancerous samples, paracancerous samples and samples without follow-up information were removed using gdcRNA tools (version 4.6.3) (19). Thus, a total of 152 GBM samples were included in the present study. RNA-seq data were normalized using the trimmed mean of M-values method (20).DNA methylation data of 155 GBM samples were downloaded from the University of California Santa Cruz (UCSC)-Xena database (https://xenabrowser.net/datapages/) in October, 2018 according to ‘glioblastoma’ search term. UCSC-Xena is a bioinformatics tool used to visualize functional genomics data from several sources, including TCGA database. DNA methylation was represented as β values, which are bounded variables of the form [M/(M + U + 100)] that are generated by the Illumina 450 k BeadChip array (Illumina, Inc.) (21). Samples without follow-up information were removed, thus a total of 151 samples were used in the present study.The DNA methylation dataset (151 patients) and the RNA-seq dataset (152 patients) were compared, and 49 patients were found to be shared between the two datasets. The research flow chart is presented in Fig. S1.
Identification of differentially expressed genes (DEGs) according to survival time
Gene expression data were used for the differential expression analysis. The median survival time of patients with GBM is <1 year (1). Thus, the patients (n=152) were divided into two groups, survival time <1 year and survival time ≥1 year, according to the median survival time. Differential expression analysis was performed between the two groups using the edgeR package (version 3.8) (22). P<0.05 and |logFC|>0.585 were considered to indicate DEGs.
Identification of differentially methylated genes
Methylation data were used for the differential methylation analysis. The patients (n=151) were divided into two groups, survival time <1 year and survival time ≥1 year, according to the median survival time. Differentially methylated CpG sites were identified by comparing the two groups. CpG sites with ≥50% missing values were removed and CpG sites with <50% missing values were filled using K-means (23). The remaining missing values were imputed using the ChAMP package (version 3.8) (24). Low-quality probes were removed if they met the following criteria: Probes with non-CpG sites, probes associated with single-nucleotide polymorphisms, cross-reactive probes and probes on the X and Y chromosomes. The β-Mixture Quantile Normalization method was used for further type I and II probe correction (25,26). P<0.01 was considered to indicate differentially methylated CpG sites.
Identification of differentially expressed and methylated genes
DNA methylation analysis focused on CpG sites that were located simultaneously in differentially methylated sites and differentially expressed genes. Thus, the differentially expressed genes and the differentially methylated genes were integrated.
Protein-protein (PPI) network construction
The Search Tool for the Retrieval of interacting Genes/Proteins (STRING) database (http://string-db.org) is used to identify associations between known proteins and predicted proteins (27), and was used in the pwresent study to construct a PPI network. The PPI network was visualized using Cytoscape software (version 3.8.0) (28). Each node represents a gene, protein or molecule. Furthermore, the connections between nodes represent the interactions of these biological molecules, which can be used to identify interactions between the proteins encoded by DEGs and methylated genes in GBM. The corresponding protein in the central node can be a core protein or a key candidate gene with critical physiological regulatory functions.
Gene ontology (GO) and kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analyses
Functional enrichment analyses were performed to determine potential biological processes and pathways enriched by the DEGs and methylated genes in GBM. KEGG pathway enrichment analysis of the differentially expressed and methylated genes was performed using the clusterProfiler package (version 3.8) (29,30), and GO enrichment analysis was performed using Cytoscape plug-in ClueGO (version 2.5.3), which visualizes the non-redundant biological terms for large clusters of genes in a functionally grouped network (31). The cut-off value for significant terms was set as P<0.05.
Identification of hub genes
Cytoscape plug-in cytoHubba (version 1.6) is commonly used to identify hub genes and subnetworks from complex interactomes (32). The four local-based methods; Density of maximum neighborhood component (DMNC), MNC, maximal clique centrality (MCC) and degree, were used to predict and evaluate essential proteins according to ranking nodes in the network. The top 30 genes within the four methods were screened and overlapped, and the overlapping genes were considered to be hub genes.
Survival analysis
The expression values of the hub genes were visualized in box plots and heat maps. Survival analysis was performed using the survminer package (version 0.4.6; http://cran.r-project.org/) and the cut-off values were identified according to the surv_cutpoint function. Survival times were evaluated using Kaplan-Meier survival curves and differences were analyzed using the log-rank test. P<0.05 was considered to indicate a statistically significant difference. Subsequently, Cox proportional hazards regression analysis was performed to further assess the results of Kaplan-Meier survival analysis.
Results
Identification of DEGs according to survival time
A total of 152 GBM gene expression profiles were obtained from the UCSC-Xena database. The patients were divided into 2 groups; survival time <1 year (short group) and survival time ≥1 year (long group) according to the median survival time. A total of 429 downregulated genes and 672 upregulated genes were identified, according to survival time, with P<0.05 and |logFC|>0.585. The results were visualized as hierarchical clusters and volcano plots, as presented in Fig. 1.
Figure 1.
Differentially expressed genes of glioblastoma. (A) Hierarchical clustering. (B) Volcano plots. Red represents upregulated genes; blue represents downregulated genes between the longer survival time (≥1 year) group and the shorter survival time group (<1 year). NOT, no difference; logFC, log-fold change.
A total of 151 GBM DNA methylation profiles were obtained from TCGA database. The clinical characteristics of patients with GBM in the TCGA database are presented in Table I. A total of 4,079 differentially methylated CpG sites were identified, including 509 hypermethylated sites and 3,570 hypomethylated sites (P<0.01; Fig. S2A). The present study screened the top 50 methylation sites in descending order of the absolute value of logFC, as presented in Fig. S2B. Following annotation of the differentially methylated CpG sites, 370 hypermethylated genes and 1,987 hypomethylated genes were identified.
Table I.
Clinical characteristics of patients with glioblastoma.
Characteristic
TCGA (n=152)
UCSC-Xena (n=151)
Age, years
Median
61
60
Range
21-89
21-85
Sex
Male
98
88
Female
54
63
Ethnicity, n
Asian
5
0
Black or African American
10
24
White
136
118
N/A
1
9
Vital status, n
Living
30
45
Dead
122
106
TCGA, The Cancer Genome Atlas; UCSC-Xena, University of California Santa Cruz-Xena.
DNA methylation may downregulate gene expression. Thus, the present study examined the intersection of the differentially expressed and methylated genes. A total of 49 genes from 152 patients overlapped the DNA methylation array data from 151 patients. The clinical characteristics of patients with GBM from TCGA dataset are presented in Table I. A total of 71 genes that were hypomethylated and expressed at high levels, and four genes that were hypermethylated and expressed at low levels in GBM were identified.
PPI network construction and functional enrichment analyses
Within the STRING database, the expression products of the differentially expressed and methylated genes in GBM were used to construct the PPI network (Fig. 2A), with a total of 63 genes (confidence score, >0.15). KEGG pathway enrichment analysis demonstrated that these genes were predominantly enriched in the ‘JAK-STAT signaling pathway’, ‘protein digestion and absorption’, ‘steroid hormone biosynthesis’, ‘cytokine-cytokine receptor interaction’, ‘prolactin signaling pathway’, ‘transcriptional misregulation in cancer’ and ‘ECM-receptor interaction’ (Fig. 2B). GO enrichment analysis demonstrated that these genes were predominantly enriched in ‘cellular response to corticosteroid stimulus’, ‘cell fate determination’, ‘collagen fibril organization’ and ‘collagen biosynthetic process’ (Fig. 2C), which may contribute to GBM development.
Figure 2.
PPI network construction and functional enrichment analyses. (A) PPI network. Nodes represent proteins; lines represent the interactions between proteins. (B) KEGG pathway enrichment analysis results. (C) Gene Ontology enrichment analysis results. A node represents a term; lines represent the interactions between terms; node colors represent the enrichment degree of the terms. The darker the color, the higher the degree of enrichment. PPI, protein-protein interaction; KEGG, Kyoto Encyclopedia of Genes and Genomes.
Hub genes and survival analysis
The top 30 genes were screened by combining the four local-based methods (DMNC, MNC, MCC and degree) in the Cytoscape plug-in cytoHubba based on differentially expressed and methylated genes. The 24 overlapping genes were considered hub genes (Fig. 3). Subsequently, the expression values of the 24 hub genes were visualized as box plots, as presented in Figs. 4 and S3. These hub genes had significantly differential expression in patients who had a survival time ≥1 year (long group) compared with patients who had a survival time <1 year (short group) (P<0.05). Survival analysis was performed in order to further assess the prognostic values of the hub genes. Among the 24 hub genes, 15 possessed potential prognostic value by comparing patients who experienced longer survival times with patients who experienced shorter survival times. Patients with hypomethylated genes expressed at high levels had shorter survival times compared with patients expressing hypermethylated genes at low levels, including COL1A1 (P=0.0280), CYP1B1 (P=0.0370), CYR61 (P=0.0087), ELANE (P=0.0051), FAM20C (P=0.0019), HAL (P=0.0160), HIC1 (P=0.0210), KRT19 (P=0.0150), LIF (P=0.0097), LOX (P=0.0180), PAX3 (P=0.0110), SFRP2 (P=0.0110) and SOCS3 (P=0.0200) (Fig. 5). Furthermore, patients with high expression levels of the hypomethylated COMP gene had longer survival times compared with patients with low levels of the hypermethylated COMP gene (P=0.0400). In addition, patients with low levels of the hypermethylated DLL1 gene had shorter survival times compared with patients with high expression levels of the hypomethylated DLL1 gene (P=0.0047). The details of the 15 hub genes are presented in Table II. Cox proportional hazards regression analysis was performed to evaluate Kaplan-Meier survival analysis results, which are presented in Table III.
Figure 3.
Identification of hub genes. Top 30 genes identified by (A) degree, (B) MCC, (C) MNC and (D) DMNC, respectively. (E) Venn diagram of the hub genes identified in the four methods. The same color represents the same subpart. MCC, maximal clique centrality; MNC, maximum neighborhood component; DMNC, density of MNC.
Figure 4.
Box plots demonstrating the expression values of the hub genes in GBM.
Figure 5.
Survival analysis of patients with glioblastoma by Kaplan-Meier method.
Cox proportional-hazards regression model analysis of the 15 hub genes.
Hub gene
HR
CI
P-value
COMP
0.972
0.848–1.115
0.688
DLL1
0.860
0.703–1.053
0.144
CYR61
0.978
0.776–1.234
0.853
HAL
1.032
0.873–1.219
0.715
SFRP2
0.995
0.866–1.143
0.947
FAM20C
0.930
0.676–1.279
0.656
SOCS3
1.308
0.975–1.754
0.073
KRT19
1.003
0.868–1.159
0.967
HIC1
1.085
0.784–1.503
0.622
ELANE
1.028
0.884–1.196
0.722
CYP1B1
0.971
0.752–1.253
0.819
LIF
0.878
0.717–1.074
0.206
COL1A1
0.928
0.712–1.210
0.580
LOX
1.000
0.789–1.268
0.999
PAX3
0.901
0.821–0.989
0.029
HR, hazard ratio; CI, confidence interval.
Functional enrichment analysis of the 15 hub genes
Functional enrichment analyses were performed to determine the potential biological processes and signaling pathways enriched by the differentially expressed and methylated genes in GBM. KEGG pathway analysis demonstrated that the hub genes were predominantly enriched in the ‘JAK-STAT signaling pathway’, ‘transcriptional misregulation in cancer’, ‘focal adhesion’ and ‘PI3K-Akt signaling pathway’ (Fig. 6A). These pathways have been confirmed to be involved in GBM development and progression. GO enrichment analysis demonstrated that these hub genes were predominantly enriched in ‘skeletal system development’, ‘ossification’ and ‘striated muscle cell differentiation’ (Fig. 6B).
Figure 6.
Functional enrichment analyses of the 15 hub genes. (A) KEGG pathway enrichment analysis results. (B) GO enrichment analysis results. KEGG, Kyoto Encyclopedia of Genes and Genomes; GO, Gene Ontology.
Discussion
GBM is a brain tumor associated with a poor prognosis. Despite advancements made in the treatment of GBM, the median survival time remains <1 year, and few patients survive >3 years (33,34). In the present study, patients from TCGA database were divided into two groups, survival time <1 year and survival time ≥1 year, according to the median survival time. Gene expression and DNA methylation were compared between the genes in both groups. DNA methylation has been demonstrated to be associated with survival outcome via changes in gene expression (35). A previous study of GBM used a joint analysis method of DNA methylation and changes in gene expression to identify prognostic biomarkers for patients with GBM (36). The hypermethylation or hypomethylation of DEGs has a major impact on the development and progression of different types of tumor, either acting as oncogenes or anticancer genes. The current study successfully presented a number of genes that were significantly modulated by DNA methylation.In the present study, KEGG pathway enrichment analysis demonstrated that the differentially expressed and methylated genes were predominantly enriched in the ‘JAK-STAT signaling pathway’. Along with the ‘JAK-STAT signaling pathway’, ‘transcriptional misregulation in cancer’ and ‘ECM-receptor interaction’ have been confirmed to serve a critical role in GBM development (37–41). GO enrichment analysis demonstrated that these genes were predominantly enriched in ‘cell fate determination’, ‘collagen fibril organization’ and ‘collagen biosynthetic process’, which may contribute to GBM development (42–45).Cancer involves a complex regulatory network and the integration of multiple biomarkers into an aggregation model can improve the prognostic value compared with a single biomarker (46). For example, the expression of transmembrane protein 41A is associated with metastasis by modulating E-cadherin in radically resected gastric cancer (47). In the present study, 24 hub genes were identified by overlapping the top 30 genes within four local-based methods (DMNC, MNC, MCC and degree), in the Cytoscape plug-in cytoHubba. Among the 24 hub genes, 15 possessed potential prognostic value between patients who experienced longer survival times and patients who experienced shorter survival times. The hypomethylated genes expressed at high levels, including COL1A1, CYP1B1, CYR61, ELANE, FAM20C, HAL, HIC1, KRT19, LIF, LOX, PAX3, SFRP2 and SOCS3, were associated with a poor prognosis of GBM. Furthermore, the hypermethylated DLL1 gene and hypomethylated COMP gene, expressed at low levels, were demonstrated to have worse clinical outcomes. Among the 15 hub genes corresponding with candidate CpG sites, 7 have been reported as GBM-associated genes. For example, COL1A1 has the potential to stratify patients with GBM into subgroups, to determine the risk of recurrence (48). Overexpression of CYR61 has been reported to enhance the tumorigenicity of glioma cells and stimulate the β-catenin-TCF/Lef and Akt signaling pathways via the activation of integrin-linked kinase (49,50). Furthermore, LIF possesses a self-renewal capacity that prevents the differentiation of glioma-initiating cells (51). As an evolutionarily conserved metabolic gene, LOX is involved in gliomagenesis (52). PAX3 is overexpressed in GBM and strictly regulates the tumorigenicity of glioma cells (53). The transcriptional inhibition of p53 by PAX3 may contribute to glioma formation and the differentiation of glioma stem cells (54). The hypermethylation of SFRP2 occurs in >40% of primary GBMs (55,56). A previous study demonstrated that patients with hypermethylated SOCS3 were associated with a significantly poor prognosis compared with healthy controls (57). However, the results of the present study demonstrated that patients with hypomethylated SOCS3 at low levels had a shorter survival time compared with patients with hypermethylated SOCS3 at high levels Thus, further research is required in order to clarify these inconsistencies.Overall, the present study identified 15 prognostic biomarkers for GBM based on DNA methylation and mRNA expression levels, including COL1A1, CYP1B1, CYR61, ELANE, FAM20C, HAL, HIC1, KRT19, LIF, LOX, PAX3, SFRP2, SOCS3, COMP and DLL1. The results of the current study demonstrated that these genes are regulated by aberrant methylation, which could serve a crucial role in GBM. These genes have the potential to serve as candidate prognostic biomarkers and therapeutic targets in GBM; however, the results require validation within larger cohorts to evaluate their potential as biomarkers in GBM. Furthermore, future studies are required to determine the molecular mechanisms underlying these genes.The present study implemented an integrative analysis approach to analyze DNA methylation with changes in gene expression and assess the association of gene expression changes with GBM survival time. A total of 15 CpG-based genes were identified to possess potential value to predict the prognosis of patients with GBM. However, future studies are required on these gene methylation and/or gene expression biomarkers to develop personalized treatments for patients with GBM.There are a number of limitations associated with the present study. GBM isocitrate dehydrogenase (IDH)-wild-type is the most common astrocytic glioma (58). Thus, existing prognostic factors for this type of tumor should commonly be used for comparative analysis of novel prognostic factors for GBM. However, there are currently no available data within TCGA cohorts. Thus, the present study lacks comparative analyses between the status of IDH-1/2 genes and the 15 CpG-based genes. Furthermore, although GBM-specific prognostic biomarkers were identified via integrative analysis of DNA methylation and gene expression, the present study lacks validation by functional studies both in vitro and in vivo.
Authors: L Saadatpour; E Fadaee; S Fadaei; R Nassiri Mansour; M Mohammadi; S M Mousavi; M Goodarzi; J Verdi; H Mirzaei Journal: Cancer Gene Ther Date: 2016-11-11 Impact factor: 5.987
Authors: Alba A Brandes; Enrico Franceschi; Alexandro Paccapelo; Giovanni Tallini; Dario De Biase; Claudio Ghimenton; Daniela Danieli; Elena Zunarelli; Giovanni Lanza; Enrico Maria Silini; Carmelo Sturiale; Lorenzo Volpin; Franco Servadei; Andrea Talacchi; Antonio Fioravanti; Maria Pia Foschini; Stefania Bartolini; Annalisa Pession; Mario Ermani Journal: Oncologist Date: 2017-03-08
Authors: Dong Xie; Dong Yin; Xiangjun Tong; James O'Kelly; Akio Mori; Carl Miller; Keith Black; Dorina Gui; Johathan W Said; H Phillip Koeffler Journal: Cancer Res Date: 2004-03-15 Impact factor: 12.701