Han Zou1,2,3, Chang Li1,2,3, Siyi Wanggou2,3, Xuejun Li2,3. 1. Xiangya School of Medicine, Central South University, 172 Tongzipo Road, Changsha, Hunan 410013, China. 2. Department of Neurosurgery, Xiangya Hospital, Central South University, 87 Xiangya Road, Changsha, Hunan 410008, China. 3. Hunan International Scientific and Technological Cooperation Base of Brain Tumor Research, Xiangya Hospital, Central South University, No. 87, Xiangya Road, Changsha, Hunan 410008, China.
Glioma is the most common malignant Central Nervous System (CNS) tumor, with a 5-year overall survival (OS) no greater than 35% 1. Despite multiple therapy options, it is still challenging for researchers and doctors to improve the OS 2-5. The reasons are complex but may lie in RNAs because gene expression profiling has been verified as a promising tool to classify tumors and predict the prognosis of cancer 6, 7. To find out the OS related genes, gliomas were classified into several molecular subtypes because not only their prognosis differs stupendously between each other, but also the molecular alteration is an origin that causes glial or precursor cells evolving into different histological types 8, 9.The World Health Organization has reclassified glioma according to the IDH and 1p/19q status in 2016 10. In this study, we selected the genes associated with OS to establish three prognostic models by lasso regression to predict the survival risk of patients from The Cancer Genome Atlas (TCGA) with IDH-wildtype glioma, IDH-mutation 1p/19q intact gliomas and IDH-mutation 1p/19q codeletion gliomas, respectively. All of the three models are able to accurately predict the survival rate of 1 year, 3 years and 5 years. Therefore, the genes in these models are the potential factors that can influence the OS greatly. Also, these genes will be able to give insight into glioma and provide directions for subsequent research.
Materials and Methods
Data collection
All data of the 702 patients including RNA-seq data, clinical information and molecular features were obtained from TCGA (https://cancergenome.nih.gov/). The data collection was conducted in compliance with the publication guidelines and policies for the protection of human subjects provided by TCGA. Ensembl gene ID annotated RNA expression profiles were based on the V24 (hg19) of GENCODE (https://www.gencodegenes.org). Patients with the following conditions were eliminated in the study: (1) no IDH or 1p/19q molecular features (2) no survival information.
Identification of OS related genes
The analysis was carried out among 233 IDH-wildtype, 254 IDH-mutation 1p/19q intact and 166 IDH-mutation 1p/19q codeletion glioma samples using the “survival” and “survminer” packages in R/Bioconductor. All the total 60483 genes were analyzed one by one to perform a univariate survival analysis, and genes with P-value < 0.05 were selected for model establishment.
Lasso regression
Because genes affect each other can cause several colinear gene groups, we performed Lasso using the “lars” and “glmnet” packages in R/Bioconductor to reduce the collinearity influence and enhance the prediction accuracy and interpretability of the statistical mode. Cross-validation was used to select the regularization parameter. In order to quantify the risk of OS, a standard form of risk score (RS) for each patient was calculated combining the expression levels of the RNAs (Expi) and LASSO coefficients (Li), Risk score = .
ROC and Kaplan-Meier curve
To investigate the performance of the prognostic classifier in predicting patient outcome, the area under the curve (AUC) of the receiver-operator characteristic (ROC) was calculated and compared. To validate the efficiency and repeatability of the models, we randomly divided the patients in each group into training set and validation set with a ratio of 6:4 by R. To divide the patients into the high or low risk group in the training set, the best cutoff was determined with the “survival” and “survminer” packages in R/Bioconductor. Kaplan-Meier curves were plotted to estimate the survival status for patients in the high risk and the low risk group in both the training set and the testing set. In order to further assess the accurate role of the genes, heatmap was drawn using the R/Bioconductor package “ggplot2”.
Cox regression
Whether the prognostic value of the multi-RNA-based classifier is independent of clinical features was assessed by multivariate Cox regression model.All the statistical tests were done with R software version 3.5.1 and corresponding fundamental package. The figures were drawn with R software with or without subsequent modification by Adobe Illustrator CS6.
Enrichment analysis of KEGG pathways
Genes associated with OS were included in the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis using the WEB-based Gene SeT AnaLysis Toolkit (http://www.webgestalt.org). The hypergeometric test statistical method and the Benjamini-Hochberg multiple test adjustment method were used. All genes from human beings were used as reference. Top 10 pathways with at least 10 genes involved were considered significantly enriched (FDR < 0.05).
Results
The overall flowchart was summarized in Figure 1.
Figure 1
The overall workflow chart. 702 glioma samples with RNA-seq data were obtained from TCGA. After data screening, 49 samples were excluded. The remaining 653 samples were divided into three subtypes. Univariate survival analysis identified 12205, 6126, and 5206 candidate genes that were associated with the OS of IDH-wildtype, IDH-mutation 1p/19q intact and IDH-mutation 1p/19q codeletion gliomas, respectively. These genes were included in the KEGG pathway enrichment analysis. Three models were established using Lasso regression based on the OS-related genes. The models were validated in the testing sets.
Clinicopathological features of patients in the TCGA
The clinicopathological features of 233 IDH-wildtype, 254 IDH-mutation 1p/19q intact and 166 IDH-mutation 1p/19q codeletion patients were listed in Table 1. The features of IDH-wildtype gliomas are quite different from that of the IDH-mutation gliomas, while the two types of IDH-mutation are similar to each other. Consistent with previous studies, IDH-wildtype patients were older than IDH-mutation patients at diagnosis (50-70 years old vs 20-50 years old), with a poorer prognosis and mainly consisted of glioblastoma and G4 instead of G2 and G3. IDH-mutation 1p/19q intact and IDH-mutation 1p/19q codeletion patients had no significant difference among age, grade, OS and gender, except the histologic types because IDH-mutation 1p/19q intact patients mainly harbored astrocytoma while IDH-mutation 1p/19q codeletion patients mainly harbored oligodendroglioma.
Table 1
Clinicopathologic features for the glioma patients
Characteristics
Group
Age (years)
IDH-wildtype
IDH-mutation 1p/19q intact
IDH-mutation 1p/19q codeletion
1~10
0
0
0
11~20
0
7
2
21~30
13
55
13
31~40
13
84
45
41~50
30
48
32
51~60
66
19
37
61~70
64
9
15
71~80
31
1
6
81~90
8
0
0
91~100
0
0
0
Histology
oligodendroglioma
19
37
116
oligoastrocytoma
15
67
30
astrocytoma
52
110
4
glioblastoma
139
9
0
Grade
G2
19
114
80
G3
67
100
70
G4
139
9
0
Gender
Male
137
130
83
Female
88
93
67
Vital Status
Alive
68
206
145
Dead
165
48
21
Days to Death (mean±SD)
466±570
1068±957
1003±976
Median Days to Death
382
836
710.5
Identification of genes associated with OS
To begin with, we identified three sets of genes by performing univariate survival analysis using Cox Proportional Hazard Regression Model, with the threshold of P-value set as 0.05. The overlapping condition of the genes contained in the three sets is shown in Figure 2. Respectively, 12205 (20.18%), 6125 (10.13%) and 5206 (8.61%) genes were included in the IDH-wildtype, IDH-mutation 1p/19q intact and IDH-mutation 1p/19q codeletion set. The non-coding RNAs accounted for 58.74%, 70.13% and 58.11% of the OS related genes in the IDH-wildtype, IDH-mutation 1p/19q intact and IDH-mutation 1p/19q codeletion set respectively.
Figure 2
Overlapping condition of the three gene sets identified by univariate survival analysis. A Venn diagram presenting the overlapping condition of genes related to OS selected by univariate survival analysis. wt: IDH-wildtype; noncodel: IDH-mutation 1p/19q intact; codel: IDH-mutation 1p/19q codeletion.
Further genes screening and model establishment
We next screened genes associated with OS based on Lasso. Three sets of significant genes optimally predicting the survival risk score of patients with glioma were selected with cross-validation method (Figure 3A, Figure 4A and Figure 5A) and three survival risk score systems based on the 76 genes were constructed (Table 2). Moreover, non-coding RNAs accounted for almost half in IDH-mutation types while less in IDH-wildtype.
Figure 3
Further genes screening and model establishment of IDH-wildtype. (A) Cross-validation of Lasso regression. (B) ROC curve and AUC of the model. (C) Cutpoint determined with R/Bioconductor packages “survival” and “survminer”. (D) Kaplan-Meier curve of high risk and low risk groups.
Figure 4
Further genes screening and model establishment of IDH-mutation 1p/19q codeletion type. (A) Cross-validation of Lasso regression. (B) ROC curve and AUC of the model. (C) Cutpoint determined with R/Bioconductor packages “survival” and “survminer”. (D) Kaplan-Meier curve of high risk and low risk groups.
Figure 5
Further genes screening and model establishment of IDH-mutation 1p/19q intact type. (A) Cross-validation of Lasso regression. (B) ROC curve and AUC of the model. (C) Cutpoint determined with R/Bioconductor packages “survival” and “survminer”. (D) Kaplan-Meier curve of high risk and low risk groups.
Then we plotted time-dependent ROC curves for 1, 3 and 5 years. the AUCs were high enough to support the efficiency of the genes and models (IDH wildtype 1-year AUC: 0.724, 3-year AUC: 0.860, 5-year AUC: 0.924; IDH mutation 1p/19q codeletion 1-year AUC: 0.919, 3-year AUC: 0.834, 5-year AUC: 0.830; IDH mutation 1p/19q intact 1-year AUC: 0.980, 3-year AUC: 0.883, 5-year AUC: 0.817) (Figure 3B, Figure 4B and Figure 5B).To investigate the relationship between RS and survival status of each group of the gliomapatients, Kaplan-Meier analysis and log-rank test were conducted. The optimal cutoff was calculated with the methods described above and shown in Figure 3C, Figure 4C and Figure 5C. Obviously, patients with higher RS generally had a significantly worse OS than those with lower RS (all P-values of the three groups are less than 0.0001) (Figure 3D, Figure 4D and Figure 5D). Furthermore, the ratios of high-risk patients and low-risk patients are 8.7, 0.3 and 0.1 in IDH wildtype, IDH mutation 1p/19q codeletion and IDH mutation 1p/19q intact patients respectively. The results corresponded to the recognition that prognosis of IDH mutation is much better than that of IDH wildtype.
Validation of the models in the TCGA
To test the models, we randomly divided all patients into training sets and testing sets at a ratio of 6:4. Within the training sets, RS cutoff (Figure 6A, Figure 6D and Figure 6G), Kaplan-Meier analysis and log-rank test were made for each subtype. Obviously, patients with higher RS generally had a significantly worse OS than those with lower RS (p<0.0001; Figure 6B, Figure 6E and Figure 6H). Additionally, the RS cutoff was calculated in the training set, and Kaplan-Meier analysis was performed for both the training and testing sets to evaluate whether the models were powerful enough to separate patients into high-risk group and low-risk group. Consequently, all the subtypes' models were good enough to separate high-risk group and low-risk group apart from each other in the testing set (Figure 6C, Figure 6F and Figure 6I).
Figure 6
Validation of the models in the TCGA. We randomly divided the patients into training sets and testing sets by 6:4. (A-C) IDH-wildtype. (D-F) IDH-mutation 1p/19q codeletion type. (G-H) IDH-mutation 1p/19q intact type. (A), (D) and (G) RS cutoff of the training sets. (B), (E) and (H) Kaplan-Meier curve of the training sets. (C), (F) and (I) Kaplan-Meier curve of the testing sets.
Gene function inversion in different subtypes
Surprisingly, there are no overlapping genes between different models, which suggested the enormous difference among the three types. The overlapping condition among genes in model of one subtype and genes related to OS of the other two subtypes was shown in Figure 7. SERPINA5 in IDH-mutation 1p/19q codeletion type, RP11.229A12.2 and RP11.62F24.2 in IDH-mutation 1p/19q intact type were also identified as OS related genes of other two subtypes. Additionally, the heatmap also showed that a protective or risk factor of a glioma subtype can sometimes play a different or even an opposing role in other subtypes of gliomas. Genes with reverse function (changing from protective factor to risk factor or from risk factor to protection factor) included RP11.229A12.2, RP11.62F24.2, C3orf67, RP11.275H4.1 and TBX3. The Kaplan-Meier curves were plotted for the five genes in Figure 8.
Figure 7
Heatmap and overlapping condition in different subtypes. (A) Heatmap of the 76 genes selected in the three risk score models. (B) The overlapping condition among genes in the prognostic model of IDH-wildtype and genes related to OS of the other two subtypes. (C) The overlapping condition among genes in the prognostic model of IDH-mutation 1p/19q codeletion type and genes related to OS of the other two subtypes. (D) The overlapping condition among genes in the prognostic model of IDH-mutation 1p/19q intact type and genes related to OS of the other two subtypes.
Figure 8
Kaplan-Meier curves of genes played opposing roles in different subtypes. (A-C) RP11.229A12.2 expression in IDH-wildtype, IDH-mutation 1p/19q codeletion type and IDH-mutation 1p/19q intact type. (D-F) RP11.62F24.2 expression in IDH-wildtype, IDH-mutation 1p/19q codeletion type and IDH-mutation 1p/19q intact type. (G-H) C3orf67 expression in IDH-wildtype and IDH-mutation 1p/19q intact type. (I-J) RP11.275H4.1 expression in IDH-mutation 1p/19q codeletion type and IDH-mutation 1p/19q intact type. (K-L) TBX3 expression in IDH-wildtype and IDH-mutation 1p/19q codeletion type.
Prognostic value of the models is independent of clinical features
To assess whether the models represent an independent indicator in gliomapatients, the effect of each clinicopathologic feature on survival was analyzed by Cox regression. As shown in Table 3, in the univariate analysis, gender was not a powerful factor in all the three subtypes; histology was not powerful in IDH-mutation types but powerful in IDH-wildtype; age was not a powerful factor in 1p/19q intact type but powerful in the others. Both grade and risk score were powerful in all the three subtypes. Furthermore, hazard ratio (HR) of risk score was the highest (IDH wildtype 33.72 (14.13-80.45), P-value = 2.20E-15, IDH-mutation 1p/19q codeletion 18.07 (6.928-47.13), P-value = 3.28E-9 and IDH-mutation 1p/19q intact 10.59 (6.319-17.74), P-value < 2E-16).
Table 3
Univariate and multivariate analysis of clinicopathologic features and risk score.
After multivariable adjustment, the risk scores of these models remained powerful and independent. Except for age in IDH-wildtype and 1p/19q codeletion types which remained powerful, other factors, even grade was not identified as powerful or independent. The HRs of risk score calculated by the three models (IDH wildtype 125.7044 (32.1569-491.390), P-value = 3.67E-12, IDH-mutation 1p/19q codeletion 7.590 (2.4850-23.181), P-value = 0.00037 and IDH-mutation 1p/19q intact 10.6033 (6.1490-18.284), P-value < 2E-16) were also much higher than other factors, which illustrates the efficiency of the three models.
KEGG pathway enrichment analysis
To identify the key pathways associated with OS, we performed KEGG pathway enrichment analysis for the three sets of OS related genes (Table 4). Cell-energy related pathways, GTP and cAMP were included. This may illustrate that energy requirement of cancer cells was related to the OS. Surprisingly, the genes of IDH-mutation were enriched in alcoholism, systemic lupus erythematosus, hematopoietic cell lineage and diabetes. This might give a cue of the connection among these diseases and might represent the relationship between unhealthy lifestyle and the OS of gliomapatients.
Table 4
KEGG pathway enrichment
Types
Description
Number
FDR
wt
positive regulation of GTPase activity
226
1.54E-09
wt
regulation of GTPase activity
240
2.30E-09
wt
positive regulation of cellular component organization
369
6.57E-09
wt
positive regulation of hydrolase activity
301
6.57E-09
wt
positive regulation of molecular function
547
6.57E-09
wt
regulation of hydrolase activity
409
1.09E-08
wt
positive regulation of catalytic activity
465
1.19E-07
wt
establishment of localization in cell
582
1.56E-07
wt
vesicle-mediated transport
426
2.35E-06
wt
intracellular transport
491
3.19E-06
int
Alcoholism
54
0
int
Systemic lupus erythematosus
46
0
int
Transcriptional misregulation in cancer
33
0.002252
co
ECM-receptor interaction
27
2.93E-05
co
HIF-1 signaling pathway
24
0.04264
co
Alcoholism
35
0.04264
co
Systemic lupus erythematosus
28
0.04264
co
Hematopoietic cell lineage
22
0.04264
co
Cholinergic synapse
24
0.04264
co
cAMP signaling pathway
37
0.04264
co
AGE-RAGE signaling pathway in diabetic complications
22
0.04264
co
Proteoglycans in cancer
37
0.04264
co
Focal adhesion
37
0.04264
wt: IDH-wildtype;
int: IDH-mutation 1p/19q intact;
co: IDH-mutation 1p/19q codeletion;
FDR: false discovery rate.
Discussions
In this study, we identified 76 genes to establish three risk-score models that can effectively predict the overall survival of patients with IDH-wildtype, IDH-mutation 1p/19q intact and IDH-mutation 1p/19q codeletion gliomas respectively.The genes identified by univariate survival analysis are enriched in alcoholism, systemic lupus erythematosus and hematopoietic cell lineage. Recent studies suggested that alcoholism may share a common pathway with glioma through proteasome and oxidative stress, and disulfiram (a drug used for the treatment of alcoholism for decades) has been proven effective in glioma treatment 11, 12. As for the systemic lupus erythematosus, more studies are needed to investigate its relationship with glioma 13. IDH mutation can promote leukemogenesis, and hematopoietic cell lineage was suggested in gliomas in previous studies 14-17. However, whether gliomapatient with these diseases has a short OS and the underlying mechanisms require further studies. Additionally, the RAGE pathway in diabetes and diabetes itself and insulin usage are proved associated with glioma 18, 19.Further studies are required for the 76 genes in the models to explore their roles in affecting the OS of patients with glioma, as only a small part of them has been reported to be related to the OS of cancer. Additionally, the non-coding RNA accounted for almost half of the 76 genes but has seldom been studied in glioma, and they should be attached enough attention for their increasing importance in glioma. Some non-coding RNAs have been verified to play roles in glioma 20-22, and therapies targeting non-coding RNAs have become promising in glioma 23, 24. Molecular pathways regulating the expression of protective RNAs or risk RNAs might provide potential therapeutic targets to improve the prognosis of gliomapatients.Interestingly, several genes changed from protective factors to risk factors in different glioma types and vice versa. For example, TBX3 is associated with a number of cancers, including head and neck squamous cell carcinoma, gastric, breast, ovary, cervical, pancreatic, bladder, liver cancers and melanoma 25. Its special stemness function and role as a microenvironment-dependent and epigenetically regulated lineage-commitment factor makes it a risk factor in cancers 25, 26. Besides, it has been related to the hormonal therapy resistance in breast cancers 27. In this study, TBX3 is a protective factor in IDH-wildtype but a risk factor in IDH-mutation 1p/19q codeletion gliomas. These opposing roles may suggest a complex underlying mechanism for the pathway regulation in gliomas 28. Although we have associated these genes with different molecular features of gliomas, such as IDH and 1p/19q, further studies are needed to explore this phenomenon.One limitation of this study is that we did not validate the models in other databases because no studies with qualified data for a reliable validation were available. With the rapid progress of the next-sequencing applied in cancer, the robustness of the models would be further validated in the future.
Authors: Jing Wu; Casey Kim; Harrison X Bai; Lei Tang; Yinyan Wang; Xuejun Li; Zishu Zhang; Giorgos Karakousis; Raymond Huang; Paul J Zhang; Bo Xiao; Li Yang Journal: World Neurosurg Date: 2018-09-24 Impact factor: 2.104
Authors: James E Hansen; Grace Chan; Yanfeng Liu; Denise C Hegan; Shibani Dalal; Eloise Dray; Youngho Kwon; Yuanyuan Xu; Xiaohua Xu; Elizabeth Peterson-Roth; Erik Geiger; Yilun Liu; Joseph Gera; Joann B Sweasy; Patrick Sung; Sara Rockwell; Robert N Nishimura; Richard H Weisbart; Peter M Glazer Journal: Sci Transl Med Date: 2012-10-24 Impact factor: 17.956
Authors: Xueqing Lun; J Connor Wells; Natalie Grinshtein; Jennifer C King; Xiaoguang Hao; Ngoc-Ha Dang; Xiuling Wang; Ahmed Aman; David Uehling; Alessandro Datti; Jeffrey L Wrana; Jacob C Easaw; Artee Luchman; Samuel Weiss; J Gregory Cairncross; David R Kaplan; Stephen M Robbins; Donna L Senger Journal: Clin Cancer Res Date: 2016-03-22 Impact factor: 12.531
Authors: Pedram Razavi; Matthew T Chang; Guotai Xu; Chaitanya Bandlamudi; Dara S Ross; Neil Vasan; Yanyan Cai; Craig M Bielski; Mark T A Donoghue; Philip Jonsson; Alexander Penson; Ronglai Shen; Fresia Pareja; Ritika Kundra; Sumit Middha; Michael L Cheng; Ahmet Zehir; Cyriac Kandoth; Ruchi Patel; Kety Huberman; Lillian M Smyth; Komal Jhaveri; Shanu Modi; Tiffany A Traina; Chau Dang; Wen Zhang; Britta Weigelt; Bob T Li; Marc Ladanyi; David M Hyman; Nikolaus Schultz; Mark E Robson; Clifford Hudis; Edi Brogi; Agnes Viale; Larry Norton; Maura N Dickler; Michael F Berger; Christine A Iacobuzio-Donahue; Sarat Chandarlapaty; Maurizio Scaltriti; Jorge S Reis-Filho; David B Solit; Barry S Taylor; José Baselga Journal: Cancer Cell Date: 2018-09-10 Impact factor: 31.743