Literature DB >> 30127947

Identification of genes that predict the biochemical recurrence of prostate cancer.

Abstract

Prostate cancer (PCa) is one of the most prevalent cancer types in men. Biochemical recurrence continues to occur in a large proportion of patients after radical prostatectomy. Thus, prognostic biomarkers are required to determine which treatment is suitable. In the present study, RNA-sequencing gene expression data from The Cancer Genome Atlas was used in order to develop a risk-score staging system based on the expression of eight genes. Cox multivariate regression was used to predict the outcome of patients with PCa. The biomedical recurrence-free survival of patients with low-risk scores was significantly longer compared with patients with high-risk scores (P=5×10-7). This result was further validated using another dataset, GSE70769, from the National Center for Biotechnology Information. The prognostic values of other clinical information and risk scores were evaluated for 5-year biochemical recurrence. The prognostic value of the risk score was determined using an area under curve value of 0.819, predicting the 5-year biochemical recurrence of patients with PCa. The risk score was identified to be significantly associated with primary tumor stage (P<0.01), Gleason score (P<0.01), and lymph node invasion (P<0.05), but was independent of age. Cox multivariate regression revealed that the risk score was an indicator for prediction of biochemical recurrence. Thus, the risk score is a valuable and robust indicator for predicting the biochemical recurrence of patients with PCa.

Entities: Chemical Disease Gene Species

Keywords: biomedical recurrence; gene expression; model; prognosis; prostate cancer

Year: 2018 PMID： 30127947 PMCID： PMC6096182 DOI： 10.3892/ol.2018.9106

Source DB: PubMed Journal: Oncol Lett ISSN： 1792-1074 Impact factor: 2.967

Introduction

Prostate cancer (PCa) is one of the most prevalent cancer types in men; in 2015, there were 60,300 newly diagnosed cases of PCa in China, resulting in 26,000 mortalities (1). Disease recurrence has been reported in a large proportion of patients following radical prostatectomy (2), and castration-resistant disease typically develops as a result (3,4). Although prognostic and clinical indicators were implemented, the prognostic effect was not fully understood (5). Thus, clinical biomarkers for PCa biochemical recurrence are required. Huang et al (6) used long non-coding RNAs to develop a prediction model for biochemical recurrence; however, the analysis lacked validation datasets. Over the previous decade, single biomarkers have been identified for the prognosis of PCa (7–9); however, the utilization of these biomarkers requires further investigation owing to the heterogeneity of PCa (10). Multiple gene-based studies of prognostic biomarkers are currently prevalent owing to their robustness in multiple different cancer types (11–17). By associating gene expression and survival information from The Cancer Genome Atlas (TCGA), survival-associated genes were identified. Using a random forest-based variable hunting approach, eight genes were selected and a risk score staging system was developed. Patients with high-risk scores had significantly poorer survival rates compared with patients with low-risk scores. This result was further validated using an independent dataset from the National Center for Biotechnology Information (NCBI), GSE70769 (18). Analysis of clinicopathological factors revealed that the risk score was independent of age but was significantly associated with Tumor Node Metastasis (TNM) stage (19), lymph node invasion and Gleason score. Cox multivariate regression and the 5-year biochemical recurrence area under the receiver operating curve (ROC) reveal that the risk score was an important indicator for prediction of biochemical recurrence.

Materials and methods

Data pre-processing

Raw microarray data of the NCBI dataset GSE70769 was downloaded from Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo) (20). Subsequent to background correction and normalization using the Robust Multi-Array Averaging (RMA) approach (21), the data was used for further analysis. The probe names were annotated according to the manufacturer's annotation file. For genes matching multiple probes, the average values were calculated and used as the expression values for the corresponding genes. TCGA gene expression (https://cancergenome.nih.gov/) data was downloaded from University of California Santa Cruz Xena and converted to fragments per kilobase of transcript sequence per million base pairs sequenced (FPKM) values. The log 2-transformed RNA-Sequence by expectation-maximization values were retained for model development.

Prediction gene selection, Cox multivariate regression model and validation

Cox univariate regression was performed on TCGA dataset. Genes with relative expression levels associated with biochemical recurrence-free survival (BFS) were retained for a further forest-based variable hunting approach, performed as previously described (22,23). Following 100 repeats and 100 iterations, genes from the top of the list were selected for further analysis. Finally, eight genes were identified as the most frequently present in the repeats and iterations, thus these eight genes were selected for model development. Next, multivariate Cox regression was performed using the aforementioned genes to construct a linear risk-score model. In the validation datasets, coefficients were locked and the risk score for each sample was calculated. The risk score was calculated using the following formula; where βi indicates the coefficients evaluated with gene expression and xi refers to the relative gene expression level. For the training dataset, the samples were divided into low- and high-risk groups according to the median risk score using R software (v3.0.1; http://cran.r-project.org/doc/FAQ/R-FAQ.html) and packages (24,25).

Statistical analysis

Background correction and RMA normalization of raw Affymetrix CEL data were performed using the ‘RMA’ function in the ‘affy’ package (v1.56.0) (26). The survival difference between the high-risk and low-risk groups, univariate regression in the training dataset, multivariate Cox proportional hazard model development and multivariate regression with risk score and other clinical indicators were performed using the ‘survival’ function in the R package (v1.4–8). The ROCs were drawn and the area under curve (AUC) calculation was performed using the R package, ‘pROC’ (v1.11.0) (27). All statistical analysis was performed using R software and packages. P<0.05 was considered to indicate a statistically significant difference.

Results

Identification of survival-associated genes

Univariate Cox regression was performed on TCGA dataset, following filtering of the non-primary PCa tissues, by associating BFS and gene expression. Detailed information of the samples enrolled in TCGA dataset are presented in Table I. Genes significantly associated with BFS (P<0.01) were retained for further analysis. As the identified gene panel was relatively large, a random forest-based variable hunting approach was implemented to retrieve the best combination of biomarkers. Eight genes were selected for further model development (Fig. 1A; Table II). Finally, the coefficients are presented in Fig. 1B. The positive coefficients suggest that the genes are oncogenes, while the negative coefficients indicate tumor suppressor genes.

Table I.

The Cancer Genome Atlas sample information.

Variables	Samples, n
Age, years
<60	138
>60	170
Tumor stage
T2	131
T3-T4	177
Gleason score
1	21
2	115
3	72
4	38
5	62

T, tumor.

Figure 1.

Genes selected for model development. (A) Frequency of selected genes in random forest variable hunting. (B) Coefficients of genes in the risk score model. CHST1, carbohydrate sulfotransferase 1; ACOX1, acyl-CoA oxidase 1; CTBS, chitobiase; GNPNAT1, glucosamine-phosphate N-acetyltransferase 1; NAGLU, N-acetyl-α-glucosaminidase; LPIN3, lipin 3; ASRGL1, asparaginase like 1; HMGCS2, 3-hydroxy-3-methylglutaryl-CoA synthase 2.

Table II.

Univariate and multivariate Cox regression analysis of candidate genes.

	Cox univariate regression			Cox multivariate regression

Variables	HR	95% CI	P-value	HR	95% CI	P-value
CHST1	1.800	1.300–2.600	0.001	1.380	0.940–2.020	0.100
ACOX1	0.300	0.130–0.680	0.004	1.740	0.630–4.850	0.286
CTBS	0.400	0.250–0.640	<0.001	0.700	0.370–1.320	0.270
GNPNAT1	0.390	0.220–0.710	0.002	0.460	0.220–0.990	0.047
NAGLU	0.550	0.360–0.840	0.006	0.670	0.440–1.010	0.058
LPIN3	2.000	1.300–3.000	0.001	1.220	0.760–1.950	0.419
ASRGL1	1.600	1.200–2.200	0.002	1.810	1.300–2.520	<0.001
HMGCS2	0.740	0.640–0.860	<0.001	0.760	0.630–0.900	0.002

HR, hazard ratio; CI, confidence interval; CHST1, carbohydrate sulfotransferase 1; acyl-CoA oxidase 1; CTBS, chitobiase; GNPNAT1, glucosamine-phosphate N-acetyltransferase 1; NAGLU, N-acetyl-α-glucosaminidase; LPIN3, lipin 3; ASRGL1, asparaginase like 1; HMGCS2, 3-hydroxy-3-methylglutaryl-CoA synthase 2.

Performance of risk score in the training dataset

To assess the prognostic value of the risk score model, the survival difference between high- and low-risk scores (using the median value as the cut-off) was compared to evaluate the performance of the risk score. According to the results, the BFS in the high-risk-score group was significantly shorter compared with the low-risk score group (P=5×10−7; Fig. 2A). As presented in Fig. 2A, samples with early biomedical recurrence were characterized with a high expression of asparaginase like 1 (ASRGL1), lipin 3 and carbohydrate sulfotransferase 1. However, patients without biochemical recurrence presented with a high expression of glucosamine-phosphate N-acetyltransferase 1 (GNPNAT1), chitobiase, acyl-CoA oxidase 1 (ACOX1), 3-hydroxy-3-methylglutaryl-CoA synthase 2 (HMGCS2) and N-acetyl-α-glucosaminidase (NAGLU), which was consistent with the coefficients (Figs. 1B and 2B). Disease-free survival time was additionally compared between the high- and low-risk groups and the result was similar to the BFS pattern as the survival of the high-risk group was notably lower compared with that of the low-risk group (Fig. 2C). The 5-year BFS ROC was identified to be an effective method to compare the prognostic value of the risk score and other clinicopathological observations (Fig. 2D). The AUCs of age, Gleason index, primary tumor stage, lymph node invasion and risk score were 0.597, 0.647, 0.628, 0.578 and 0.819, respectively. Specifically, it is indicated that the mortality risk of patients with the highest risk scores was very high. These results indicate that the risk score is better at predicting BFS than the other clinical observations.

Figure 2.

Risk score for prognosis in the training dataset. (A) Biochemical recurrence-free survival rate of high- and low-risk groups. (B) Heat maps of gene expression for each dataset. Blue/red dots in the first panel refer to the low and high-risk groups, respectively. (C) Disease-free survival rates of high- and low-risk groups. (D) The 5-year survival receiver operating curve of risk score and other clinical observations and their AUC. *P<0.001, risk score AUV vs. other clinical observations. AUC, area under the curve; T stage, tumor stage; CHST1, carbohydrate sulfotransferase 1; ACOX1, acyl-CoA oxidase 1; CTBS, chitobiase; GNPNAT1, glucosamine-phosphate N-acetyltransferase 1; NAGLU, N-acetyl-α-glucosaminidase; LPIN3, lipin 3; ASRGL1, asparaginase like 1; HMGCS2, 3-hydroxy-3-methylglutaryl-CoA synthase 2.

Validation of risk score performance

The high performance of the risk score may have resulted from the over-fit dataset. To test if over-fitness existed, the coefficients were locked in order to evaluate the robustness of this model, and the risk scores were calculated for an independent NCBI dataset (GSE70769). The samples from the independent dataset were additionally divvied into high- and low-risk groups, as with the training dataset. The results were similar to the BFS profile of the training dataset. The BFS of patients in the high-risk-score group were significantly shorter than the low-risk-score group (P=0.04; Fig. 3A) and associated with early biomedical recurrence (Fig. 3B). The expression profile was additionally similar to that of the training dataset (Fig. 3C). These results indicate that the risk score is a robust indicator for PCa prognosis.

Figure 3.

Prognostic value of the risk score on survival in a validation dataset. (A) Biochemical recurrence-free survival rate of the high- and low-risk groups in the GSE70769 dataset. (B) Detailed biochemical recurrence survival information. (C) Candidate gene expression. CHST1, carbohydrate sulfotransferase 1; ACOX1, acyl-CoA oxidase 1; CTBS, chitobiase; GNPNAT1, glucosamine-phosphate N-acetyltransferase 1; NAGLU, N-acetyl-α-glucosaminidase; LPIN3, lipin 3; ASRGL1, asparaginase like 1; HMGCS2, 3-hydroxy-3-methylglutaryl-CoA synthase 2.

Association between risk score and other clinical information

Analyses of risk score and clinical information were performed. The results indicated that the risk score was significantly associated with primary tumor stage (P<0.05), Gleason score (P<0.01) and lymph invasion (P<0.01), but not with age (Fig. 4A). Cox multivariate regression was performed using the risk score and the aforementioned clinical observations. The risk score was the only prognostic indicator identified to be significantly associated with biochemical recurrence (P=3×10−5; Fig. 4B). In summary, these results indicate that risk score is an important clinical indicator of PCa prognosis.

Figure 4.

Clinical information and risk score. (A) The association between risk score and clinicopathological information was evaluated and presented as a box plot. (B) Cox multivariate regression was performed using the risk score and other clinical information. The red dots indicate the hazard ratio, and the red line represents 95% confidence intervals. T, tumor.

Discussion

Despite the low rate of progression, biomedical recurrence and metastasis continue to be observed in a large proportion of patients with PCa (28). Thus, prognostic biomarkers are urgently required. Over the previous decade, single biomarkers have been reported to predict the survival of patients with PCa (3,9,29). However, the single-biomarker approach to cancer prognosis assessment is less robust compared with the more widely reported multiple-biomarker-based models (30–32). Using machine learning and gene expression, the present study developed a Cox multivariate regression-based risk score model. The model was then further evaluated for performance and robustness. The risk score staging system performed well in predicting survival in two datasets from different microarray platforms. Among the candidate genes selected, serum NAGLU has been reported to be associated with the clinical indicators and survival of gastrointestinal adenocarcinoma (33); and the expression of another gene, GNPNAT1, had been demonstrated to be associated with the progression of castration-resistant PCa (34) via the phosphatidylinositol3-kinase/protein kinase B signaling pathway. Proteomics analysis revealed that HMGCS2 expression is altered in PCa, and that the expression of this gene is associated with the survival of squamous cell carcinoma following surgery (35,36). It has additionally been revealed to affect the extracellular signal-regulated kinase/c-Jun signaling pathway in hepatocellular carcinoma (37). In addition, ACOX1 has been reported to be associated with migration and metastasis in the xenografts of colorectal carcinoma (38), and associated with the mitogen-activated protein kinase signaling pathway in hepatocellular carcinoma (39). A similar function was detected for ASRGL1 in endometrial carcinoma (39), although the underlying mechanism remains unclear. Collectively, these results indicate that the candidate genes used in the model are reliable, thus reinforcing the robustness of the model. In a previous study, Huang et al (6) used gene expression to predict biochemical recurrence using TCGA expression data, the study lacked a validation dataset. The present study was novel as it developed a robust prediction model for PCa that was validated using another platform. Indeed, the RNA-sequencing data was presented with log2-transformed FPKM values, whereas microarray data was presented as log2-transformed intensity values. The formula was calculated using the relative gene expression level, regardless of its unit. This may explain why this model is functional across different platforms. However, limitations of the present study exist. Firstly, the present study is a retrospective study. The clinical information and long-term follow-up are unavailable, and detailed clinical information are unavailable. Thus, bias may have resulted. Secondly, although the robustness of the risk score was validated using another dataset, the clinical utilization of the risk score requires further studies in order to fully confirm its efficiency. The present findings may provide novel insights for predicting the biochemical recurrence of patients with PCa.

39 in total

1. Gene expression profiling-derived immunohistochemistry signature with high prognostic value in colorectal carcinoma.

Authors: Wenjun Chang; Xianhua Gao; Yifang Han; Yan Du; Qizhi Liu; Lei Wang; Xiaojie Tan; Qi Zhang; Yan Liu; Yan Zhu; Yongwei Yu; Xinjuan Fan; Hongwei Zhang; Weiping Zhou; Jianping Wang; Chuangang Fu; Guangwen Cao
Journal: Gut Date: 2013-10-30 Impact factor: 23.059

2. A potential panel of four-long noncoding RNA signature in prostate cancer predicts biochemical recurrence-free survival and disease-free survival.

Authors: Tian-Bao Huang; Chuan-Peng Dong; Guang-Chen Zhou; Sheng-Ming Lu; Yang Luan; Xiao Gu; Lei Liu; Xue-Fei Ding
Journal: Int Urol Nephrol Date: 2017-02-10 Impact factor: 2.370

3. Random survival forests for competing risks.

Authors: Hemant Ishwaran; Thomas A Gerds; Udaya B Kogalur; Richard D Moore; Stephen J Gange; Bryan M Lau
Journal: Biostatistics Date: 2014-04-11 Impact factor: 5.899

4. Loss of ASRGL1 expression is an independent biomarker for disease-specific survival in endometrioid endometrial carcinoma.

Authors: Per-Henrik D Edqvist; Jutta Huvila; Björn Forsström; Lauri Talve; Olli Carpén; Helga B Salvesen; Camilla Krakstad; Seija Grénman; Henrik Johannesson; Oscar Ljungqvist; Mathias Uhlén; Fredrik Pontén; Annika Auranen
Journal: Gynecol Oncol Date: 2015-04-07 Impact factor: 5.482

5. Consistency of Random Survival Forests.

Authors: Hemant Ishwaran; Udaya B Kogalur
Journal: Stat Probab Lett Date: 2010-07-01 Impact factor: 0.870

6. SIRT1 suppresses colorectal cancer metastasis by transcriptional repression of miR-15b-5p.

Authors: Li-Na Sun; Zheng Zhi; Liang-Yan Chen; Qun Zhou; Xiu-Ming Li; Wen-Juan Gan; Shu Chen; Meng Yang; Yao Liu; Tong Shen; Yong Xu; Jian-Ming Li
Journal: Cancer Lett Date: 2017-09-18 Impact factor: 8.679

Review 7. New circulating biomarkers for prostate cancer.

Authors: K Bensalah; Y Lotan; J A Karam; S F Shariat
Journal: Prostate Cancer Prostatic Dis Date: 2007-11-13 Impact factor: 5.554

8. Expression of RABEX-5 and its clinical significance in prostate cancer.

Authors: Hongtuan Zhang; Shang Cheng; Andi Wang; Hui Ma; Bing Yao; Can Qi; Ranlu Liu; Shiyong Qi; Yong Xu
Journal: J Exp Clin Cancer Res Date: 2014-04-09

9. GATA2 as a potential metastasis-driving gene in prostate cancer.

Authors: Yan Ting Chiang; Kendric Wang; Ladan Fazli; Robert Z Qi; Martin E Gleave; Colin C Collins; Peter W Gout; Yuzhuo Wang
Journal: Oncotarget Date: 2014-01-30

Review 10. Long Non-coding RNAs in Urologic Malignancies: Functional Roles and Clinical Translation.

Authors: Jiajia Chen; Zhijun Miao; Boxin Xue; Yuxi Shan; Guobin Weng; Bairong Shen
Journal: J Cancer Date: 2016-08-15 Impact factor: 4.207

9 in total

1. Deep Learning-Based Multi-Omics Integration Robustly Predicts Relapse in Prostate Cancer.

Authors: Ziwei Wei; Dunsheng Han; Cong Zhang; Shiyu Wang; Jinke Liu; Fan Chao; Zhenyu Song; Gang Chen
Journal: Front Oncol Date: 2022-06-23 Impact factor: 5.738

2. PCSK9 Axis-Targeting Pseurotin A as a Novel Prostate Cancer Recurrence Suppressor Lead.

Authors: Khaldoun S Abdelwahed; Abu Bakar Siddique; Mohammed H Qusa; Judy Ann King; Soumaya Souid; Zakaria Y Abd Elmageed; Khalid A El Sayed
Journal: ACS Pharmacol Transl Sci Date: 2021-10-05

3. A Novel Approach for the Discovery of Biomarkers of Radiotherapy Response in Breast Cancer.

Authors: James Meehan; Mark Gray; Carlos Martínez-Pérez; Charlene Kay; Jimi C Wills; Ian H Kunkler; J Michael Dixon; Arran K Turnbull
Journal: J Pers Med Date: 2021-08-14

4. Upregulation of glucosamine-phosphate N-acetyltransferase 1 is a promising diagnostic and predictive indicator for poor survival in patients with lung adenocarcinoma.

Authors: Pengyuan Zhu; Shaorui Gu; Haitao Huang; Chongjun Zhong; Zhenchuan Liu; Xin Zhang; Wenli Wang; Shiliang Xie; Kaiqin Wu; Tiancheng Lu; Yongxin Zhou
Journal: Oncol Lett Date: 2021-04-22 Impact factor: 2.967

5. Differentially Expressed Genes Associated With Prognosis in Locally Advanced Lymph Node-Negative Prostate Cancer.

Authors: Elena A Pudova; Elena N Lukyanova; Kirill M Nyushko; Dmitry S Mikhaylenko; Andrew R Zaretsky; Anastasiya V Snezhkina; Maria V Savvateeva; Anastasiya A Kobelyatskaya; Nataliya V Melnikova; Nadezhda N Volchenko; Gennady D Efremov; Kseniya M Klimina; Anastasiya A Belova; Marina V Kiseleva; Andrey D Kaprin; Boris Y Alekseev; George S Krasnov; Anna V Kudryavtseva
Journal: Front Genet Date: 2019-08-09 Impact factor: 4.599

6. Assessment of biochemical recurrence of prostate cancer (Review).

Authors: Xiaozeng Lin; Anil Kapoor; Yan Gu; Mathilda Jing Chow; Hui Xu; Pierre Major; Damu Tang
Journal: Int J Oncol Date: 2019-10-04 Impact factor: 5.650

7. The Impact of Cand1 in Prostate Cancer.

Authors: Andrea Eigentler; Piotr Tymoszuk; Johanna Zwick; Arndt A Schmitz; Andreas Pircher; Florian Kocher; Andreas Schlicker; Ralf Lesche; Georg Schäfer; Igor Theurl; Helmut Klocker; Isabel Heidegger
Journal: Cancers (Basel) Date: 2020-02-12 Impact factor: 6.639

8. Predicting biochemical-recurrence-free survival using a three-metabolic-gene risk score model in prostate cancer patients.

Authors: Yiqiao Zhao; Zijia Tao; Lei Li; Jianyi Zheng; Xiaonan Chen
Journal: BMC Cancer Date: 2022-03-04 Impact factor: 4.430

9. Identification and construction of a 13-gene risk model for prognosis prediction in hepatocellular carcinoma patients.

Authors: Daming Cheng; Libing Wang; Fengzhi Qu; Jingkun Yu; Zhaoyuan Tang; Xiaogang Liu
Journal: J Clin Lab Anal Date: 2022-04-14 Impact factor: 3.124

9 in total