Literature DB >> 30008905

A risk score staging system based on the expression of seven genes predicts the outcome of bladder cancer.

Abstract

Bladder cancer (BLCA) is among the most malignant types of cancer. At present, the prognostic tools available for this disease are insufficient. In the present study, the transcriptome of 1,049 BLCA samples from four datasets from the Gene Expression Omnibus and The Cancer Genome Atlas (TCGA) were analyzed. By utilizing the RNA-seq data provided by TCGA, a risk score staging system model was built to predict the outcome of patients with BLCA using random forest variable hunting and Cox multivariate regression. A total of 7 genes, including zinc finger protein 230, Bcl2-like 14, AHNAK, transmembrane protein 109, apolipoprotein L2, advanced glycation end-product specific receptor and amine oxidase, copper containing 2 were identified as predicting the survival time of patients with BLCA. The patients with a low risk score had a significantly higher survival rate than those with a high-risk score both in the training and validation datasets. Association analyses between risk score and other clinical information were additionally performed; it was demonstrated that the risk score was significantly associated with pathological stage. A nomogram was plotted to compare risk score and other clinical information. The risk score spanned the greatest range of points, indicating the relative accuracy of risk score. In summary, the risk staging model based on the expression of 7 genes is robust and performs more effectively than other clinical information in predicting a prognosis.

Entities: Chemical Disease Gene Species

Keywords: bladder cancer; gene expression; prognosis; risk score

Year: 2018 PMID： 30008905 PMCID： PMC6036497 DOI： 10.3892/ol.2018.8904

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

Bladder cancer (BLCA) is among the most malignant types of cancer; 76,790 new cases and 16,390 mortalities were reported in the United States in 2016 (1). Based on a recent study on cancer in China, there were 80,500 new cases and 32,900 mortalities from BRCA reported in 2015 (2). Metastasis and early relapse are common in BLCA, thus determining the prognosis is important for patients with BLCA (3). However, the current clinical staging system is insufficient to predict the outcome of patients with BLCA (4). Therefore, novel molecular biomarkers for prediction of BLCA prognosis are urgently required. According to a previous study, single biomarkers often fail to accurately predict the prognosis of patients in datasets, whereas multiple biomarkers perform more effectively (5). In the present study, random forest variable hunting coupled with Cox multivariate regression were used to produce a model based on gene expression levels to evaluate the prognosis of patients with BLCA from The Cancer Genome Atlas (TCGA) dataset. The patients with high risk scores had a significantly shorter survival time than those with low risk scores, which was validated in 3 further independent cohorts. Furthermore, the association between risk score and other clinical information demonstrated that the risk score was associated with the pathological stage, while a nomogram based on risk score and clinical information indicated that the risk score corresponded the most with the outcome of bladder cancer.

Materials and methods

Data processing

mRNA expression levels from the ‘TCGA Bladder Cancer (BLCA)’ dataset (n=407) were downloaded from UCSC Xena (http://xena.ucsc.edu/) and converted to RNAseq by expectation-maximization (RSEM) values using the Xena website. Genes not expressed in any of the samples were filtered from the dataset. log 2-transformed RSEM values were retained for model development. Raw data from the expression profiles GSE31684, GSE48075 and E-MTAB-4321 were downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) and Array Express (www.ebi.ac.uk/arrayexpress/) in CEL format. Background correction and normalization with Robust Multiarray Averaging were performed on the raw data (6,7). Probes were matched to the HUGO Gene (https://www.genenames.org/) Nomenclature Committee-approved gene names. Probes without annotation were discarded, genes matching more than one probe were merged and mean values were used to represent gene expression. The Z-score was calculated in each dataset for each gene across samples and used for further analysis (8).

Gene selection and model construction

Univariate Cox regression analysis was performed on the training (TGCA) dataset. Gene expression significantly associated with overall survival (OS) in the training dataset was selected for further analysis, with a threshold of P<0.001. Random forest variable hunting was performed using 100 replications and 100 steps to select the most significant candidate genes, including zinc finger protein 230 (ZNF230), BCL2-like 14 (BCL2L14), AHNAK, transmembrane protein 109 (TMEM109), apolipoprotein L2 (APOL2), advanced glycation end-product specific receptor (AGER) and amine oxidase, copper containing 2 (AOC2). Multivariate Cox regression analysis was implemented to calculate the risk score using the candidate genes and overall survival information. Risk score was calculated using the following formula; where βi indicates the coefficients evaluated with gene expression and xi refers to gene relative expression level. Coefficients were locked to calculate the risk scores of the three test datasets.

Statistical analysis

All statistical analysis in this study was performed with R (version 3.0.1; https://www.r-project.org/) and R packages. Normalization of raw data was performed using the ‘affy’ package (v1.56.0) (9), the survival analysis and Cox probability hazard analyses were performed using the ‘survival’ (v1.4–8) package, random forest variable hunting was performed using the ‘randomForestSRC’ package (v2.0.5) (10) and the receiving operating characteristic (ROC) curves were drawn using the ‘pROC’ package (v1.11.0) (11). The gene set enrichment analysis (GSEA) was performed using Java GSEA software (http://software.broadinstitute.org/gsea/index.jsp) (v5.2) (12).

Results

Risk score staging system

Candidate genes for the staging system were selected by Univariate Cox regression analysis between gene expression and OS in the ‘TCGA Bladder Cancer (BLCA)’ dataset. Random forest variable hunting was implemented to select the most suitable combination of candidate genes; 7 genes were identified (Fig. 1A). Multivariate Cox regression analysis was performed and coefficients were calculated. The risk score of each patient was calculated using the following formula: Risk score=(0.012050982× ZNF230) + (−0.124027149× BCL2L14) + (−0.251893959× AHNAK) + (0.264530911× TMEM109) + (0.133540278× APOL2) + (−0.19351212× AGER) + (−0.209706035× AOC2); where gene name represents the Z-score for that gene. Parameters for each gene are detailed in Table I. Genes with positive coefficients indicate genes identified as cancer drivers, whereas genes with negative coefficients were identified as tumor suppressor genes (Fig. 1B).

Figure 1.

Candidate genes identified by random forest variable hunting, including (A) the frequency and (B) the coefficients of each gene. AOC2, amine oxidase; AGER, advanced glycation end-product specific receptor; APOL2, apolipoprotein L2; copper containing 2; TMEM109, transmembrane protein 109; BCL2L14, BCL2-like 14; ZNF230, zinc finger protein 230.

Table I.

Analysis of the candidate genes with univariate and multivariate Cox regression.

	Univariate			Multivariate

Gene	HR	95% CI	P-value	HR	95% CI	P-value
TMEM109	1.50	1.50–1.30	<0.001	1.30	1.08–1.57	0.005
AHNAK	1.40	1.40–1.20	<0.001	1.14	0.95–1.37	0.152
BCL2L14	0.76	0.76–0.68	<0.001	0.82	0.73–0.93	0.003
AOC2	0.79	0.79–0.69	<0.001	1.01	0.85–1.2	0.890
ZNF230	0.73	0.73–0.62	<0.001	0.81	0.69–0.96	0.015
AGER	0.77	0.77–0.68	<0.001	0.88	0.75–1.04	0.133
APOL2	0.73	0.73–0.63	<0.001	0.78	0.67–0.9	0.001

HR, hazard ratio; CI, confidence interval; TMEM109, transmembrane protein 109; BCL2L14, BCL2-like 14; AOC2, amine oxidase, copper containing 2; ZNF230, zinc finger protein 230; AGER, advanced glycation end-product specific receptor; APOL2, apolipoprotein L2.

Risk score predicts survival in the training dataset

The efficiency of the risk score in predicting the outcome of BLCA patients was evaluated. Using the median risk score value as a cutoff, patient data from the TCGA dataset was divided into high-risk and low-risk groups. The OS time of patients in the high-risk group was significantly longer than patients in the low-risk group (P=0.0002; Fig. 2A). The median survival of high-risk patients was 24.6 months (95% CI; 20–33.5 months) whereas the median survival of low-risk patients was 88 months (95% CI; 45.6-NA months). Furthermore, the recurrence-free survival (RFS) time was also compared between the high- and low-risk groups, and the resulting profiles resembled those of OS (P=0.026; Fig. 2B). Patients with high-risk scores were more prone to early relapse, and the expression pattern was consistent with the coefficients of each gene (Fig. 2C). The ROC curve for three-year events was also plotted based on age, sex and risk score (Fig. 2D) and the area under curve (AUC) was 0.608, 0.500, and 0.615, respectively. These results suggest that the risk score staging system performed better in predicting the survival of BLCA patients than other clinical information.

Figure 2.

Risk score predicts survival in the training dataset. (A) Overall survival and (B) recurrence-free survival rates were significantly higher in the low-risk score groups than in the high-risk score group. (C) Overall survival outcomes of the patients. (D) Sensitivity, specificity and associated AUC of the 7-gene model for the training dataset, compared with age and sex. AUC, area under curve.

Validation of performance of risk score in test datasets

It was possible that the model may have overfit to the training dataset; in order to test the robustness of the model, subsequent to locking the coefficients for each gene, risk scores of all patients in three independent test datasets (GSE31684, GSE48075 and E-MTAB-4321) were evaluated, and the median risk score value of each dataset was used as a cutoff. Consistent with the OS profile in the training dataset, the OS rate of the high-risk group was significantly lower than that of the low-risk group in both GSE31684 and GSE48075 datasets (P=0.050 and P=0.006, respectively; Fig. 3A and B). The progression-free survival curve for E-MTAB4321 resembled the RFS curve for the training dataset (P=0.0078; Fig. 3C) and the expression patterns of the 7 genes in the GSE31684, GSE48075 and E-MTAB-4321 datasets were also similar to the training dataset. These results indicate that the risk score staging system is robust across datasets.

Figure 3.

Validation of risk score in 3 independent data sets. The overall survival stratified by the high and low-risk score groups was plotted for the (A) GSE31684 and (B) GSE48075 datasets. (C) Progression-free survival stratified by high and low-risk score groups for the E-TABM-4321 dataset. Detailed risk scores, survival information and heat maps of gene expression are also included for each dataset.

Association between risk score, clinical information

The association between clinical information and risk score was calculated. It was observed that the risk score was independent from age, sex and lymph invasion, but significantly associated with pathological stage (Fig. 4A). A nomogram for three-year survival, considering pathological stage, age, sex and lymph invasion status, was plotted against risk score (Fig. 4B). According to the nomogram, the risk score ranged the most (from 0–100), indicative of the relative accuracy of the risk score staging system.

Figure 4.

The association between clinical information and the risk score (A) Box plots illustrating the association of age, sex, lymph invasion status and cancer stage with risk score. (B) A nomogram comparing clinical parameters with risk score. pStage, pathological stage.

Discussion

The prognostic value of clinical information, including tumor-node-metastasis staging and age, is currently unreliable for BLCA (13–15). Therefore, an effective molecular prognostic biomarker is required to guide the therapy and follow up of patients with BLCA. Various singular molecular markers for prognosis have been suggested (16–19) but the clinical power that they have demonstrated across datasets is unsatisfactory. In contrast, the predictive effect of multiple genes has been highlighted as a tool of greater potential (11,20–23). In the present study, a gene expression and multivariate Cox regression analysis-based model performed well in the prognosis of 1,049 samples in four independent datasets. The risk score calculated in this model may therefore be suitable for determining the prognosis of patients with BLCA. Of the 7 genes in the model, BCL2L14 has previously been associated with carcinogenesis (24) and a single-nucleotide polymorphism in this gene has been associated with lung cancer (25). The role of AHNAK is controversial between different types of cancer (26); AHNAK has been reported to be downregulated in melanoma and its low expression associated with reduced survival time (27), whereas the high expression of AHNAK is reported to be associated with cell migration and invasion in mesothelioma (28). To the best of our knowledge, the remaining genes, ZNF230, TMEM109, APOL2, AGER and AOC2, have not been associated with prognostic value prior to the present study. The clinical application of risk score is feasible as the quantification of gene expression in cancer tissue is time-efficient, and the risk score model can be applied to data from various platforms. However, the present study is constrained by certain limitations. The study is retrospective, thus important clinical information, including BLCA subtypes and muscle invasiveness were not included, and other types of survival information, including progression-free, recurrence-free and metastasis-free survival, were not directly predicted by the model. In summary, the risk score model constructed in this study is robust and performed effectively in predicting the survival of BLCA patients. The model has potential to be developed as a BCLA prognostic tool.

28 in total

1. affy--analysis of Affymetrix GeneChip data at the probe level.

Authors: Laurent Gautier; Leslie Cope; Benjamin M Bolstad; Rafael A Irizarry
Journal: Bioinformatics Date: 2004-02-12 Impact factor: 6.937

2. Validation of a protein panel for the noninvasive detection of recurrent non-muscle invasive bladder cancer.

Authors: Selma Gogalic; Ursula Sauer; Sara Doppler; Andreas Heinzel; Paul Perco; Arno Lukas; Guy Simpson; Hardev Pandha; Andras Horvath; Claudia Preininger
Journal: Biomarkers Date: 2017-01-19 Impact factor: 2.658

3. Different expression patterns of histone H3K27 demethylases in renal cell carcinoma and bladder cancer.

Authors: Zehui Hong; Hui Li; Lili Li; Weilong Wang; Ting Xu
Journal: Cancer Biomark Date: 2017 Impact factor: 4.388

Review 4. Systemic, perioperative management of muscle-invasive bladder cancer and future horizons.

Authors: Samuel A Funt; Jonathan E Rosenberg
Journal: Nat Rev Clin Oncol Date: 2016-11-22 Impact factor: 66.675

5. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

6. Multistage analysis of variants in the inflammation pathway and lung cancer risk in smokers.

Authors: Margaret R Spitz; Ivan P Gorlov; Qiong Dong; Xifeng Wu; Wei Chen; David W Chang; Carol J Etzel; Neil E Caporaso; Yang Zhao; David C Christiani; Paul Brennan; Demetrius Albanes; Jianxin Shi; Michael Thun; Maria Teresa Landi; Christopher I Amos
Journal: Cancer Epidemiol Biomarkers Prev Date: 2012-05-09 Impact factor: 4.254

7. Pathogenic and Diagnostic Potential of BLCA-1 and BLCA-4 Nuclear Proteins in Urothelial Cell Carcinoma of Human Bladder.

Authors: Matteo Santoni; Francesco Catanzariti; Daniele Minardi; Luciano Burattini; Massimo Nabissi; Giovanni Muzzonigro; Stefano Cascinu; Giorgio Santoni
Journal: Adv Urol Date: 2012-07-02

Review 8. Circulating Biomarkers in Bladder Cancer.

Authors: Lakshminarayanan Nandagopal; Guru Sonpavde
Journal: Bladder Cancer Date: 2016-10-27

9. AHNAK is downregulated in melanoma, predicts poor outcome, and may be required for the expression of functional cadherin-1.

Authors: Hilary M Sheppard; Vaughan Feisst; Jennifer Chen; Cris Print; P Rod Dunbar
Journal: Melanoma Res Date: 2016-04 Impact factor: 3.599

10. The transcription levels and prognostic values of seven proteasome alpha subunits in human cancers.

Authors: Yunhai Li; Jing Huang; Jiazheng Sun; Shili Xiang; Dejuan Yang; Xuedong Ying; Mengqi Lu; Hongzhong Li; Guosheng Ren
Journal: Oncotarget Date: 2017-01-17

12 in total

1. Identification of hub genes and pathways in bladder cancer using bioinformatics analysis.

Authors: Danhui Li; Fan Zhen; Jianwei Le; Guodong Chen; Jianhua Zhu
Journal: Am J Clin Exp Urol Date: 2022-02-15

2. Development and validation of an RNA-seq-based transcriptomic risk score for asthma.

Authors: Xuan Cao; Lili Ding; Tesfaye B Mersha
Journal: Sci Rep Date: 2022-05-23 Impact factor: 4.996

3. Andrographolide Inhibits Proliferation and Promotes Apoptosis in Bladder Cancer Cells by Interfering with NF- κ B and PI3K/AKT Signaling In Vitro and In Vivo.

Authors: Lei Xuan; Jing-Hai Hu; Ran Bi; Si-Qi Liu; Chun-Xi Wang
Journal: Chin J Integr Med Date: 2022-01-19 Impact factor: 1.978

4. A co-expression network for differentially expressed genes in bladder cancer and a risk score model for predicting survival.

Authors: Zihao Chen; Guojun Liu; Aslam Hossain; Irina G Danilova; Mikhail A Bolkov; Guoqing Liu; Irina A Tuzankina; Wanlong Tan
Journal: Hereditas Date: 2019-07-09 Impact factor: 3.271

Review 5. Apolipoproteins and cancer.

Authors: Liwen Ren; Jie Yi; Wan Li; Xiangjin Zheng; Jinyi Liu; Jinhua Wang; Guanhua Du
Journal: Cancer Med Date: 2019-10-01 Impact factor: 4.452

6. Development of prognostic signature based on immune-related genes in muscle-invasive bladder cancer: bioinformatics analysis of TCGA database.

Authors: Kun Jin; Shi Qiu; Di Jin; Xianghong Zhou; Xiaonan Zheng; Jiakun Li; Xinyang Liao; Lu Yang; Qiang Wei
Journal: Aging (Albany NY) Date: 2021-01-19 Impact factor: 5.682

7. Identification and Validation of an Individualized Prognostic Signature of Bladder Cancer Based on Seven Immune Related Genes.

Authors: Huaide Qiu; Xiaorong Hu; Chuan He; Binbin Yu; Yongqiang Li; Jianan Li
Journal: Front Genet Date: 2020-02-05 Impact factor: 4.599

8. Cryptotanshinone Inhibites Bladder Cancer Cell Proliferation and Promotes Apoptosis via the PTEN/PI3K/AKT Pathway.

Authors: Yadong Liu; Fanlu Lin; Yaodong Chen; Rui Wang; Jiannan Liu; Yinshan Jin; Ruihua An
Journal: J Cancer Date: 2020-01-01 Impact factor: 4.207

9. Small Nucleolar RNAs (snoRNAs)-Based Risk Score Classifier Predicts Overall Survival in Bladder Carcinoma.

Authors: Rong-Quan He; Zhi-Guang Huang; Gao-Qiang Zhai; Su-Ning Huang; Yong-Yao Gu; Yong-Yao Gu; Gang Chen; Jie Ma; Ji-Wen Cheng; Hai-Biao Yan; Sheng-Hua Li
Journal: Med Sci Monit Date: 2020-10-26

Review 10. The function of apolipoproteins L (APOLs): relevance for kidney disease, neurotransmission disorders, cancer and viral infection.

Authors: Etienne Pays
Journal: FEBS J Date: 2020-06-25 Impact factor: 5.542