Dehua Zhang1, Muchun Zhang2, Qifu Zhang3, Zhiyi Zhao3, Yong Nie1. 1. Department of Urology, Yiling Hospital of Yichang, Yichang, Hubei, China (mainland). 2. Department of Urology, Bethune First Hospital of Jilin University, Changchun, Jilin, China (mainland). 3. Department of Urology, Jilin Provincial Cancer Hospital, Changchun, Jilin, China (mainland).
Abstract
BACKGROUND Prostate adenocarcinoma (PRAD) is the second most common malignancy in males and the fifth leading cause of cancer mortality. Cancer stem cells (CSCs) play an important role in the occurrence and development of PRAD, however, the prognostic biomarkers associated with CSC features have not been identified in PRAD. MATERIAL AND METHODS In order to identify the prognostic stemness-related genes (SRGs) of PRAD, the RNA sequencing data of patients with PRAD were retrieved from The Cancer Genome Atlas (TCGA) databases. The mRNA expression-based stemness index (mRNAsi) and the differential expressed genes (DEGs) were evaluated and identified. The associations between the mRNAsi and tumorigenesis, overall survival (OS), prostate-specific antigen (PSA) value, and Gleason score were also established by nonparametric test and Kaplan-Meier survival analysis. The SRGs were identified as the overlapped DEGs of PRAD-associated DEGs and the mRNAsi-associated DEGs. Based on the prognostic SRGs, the predict model was constructed. Its accuracy was tested by the area under the curve (AUC) of the receiver operator characteristic (ROC) curve and the risk score. RESULTS A total of 6005 PRAD-associated DEGs and 2462 mRNAsi-associated DEGs were identified. The mRNAsi was significantly upregulated in PRAD and associated with the PSA value and Gleason score. A total of 1631 SRGs were identified, with 36 prognostic SRGs screened by the univariate Cox analysis. Based on the prognostic SRGs, the predict model was constructed with the AUC of 0.986. Moreover, the risk score of the model was proved to be an independent prognostic factor, indicating its significant applicability. CONCLUSIONS Our data demonstrate the mRNAsi as a reliable index for the tumorigenesis, PSA value, and Gleason score of PRAD. Additionally, this study provides a well-applied model for predicting the OS for patients with PRAD based on prognostic SRGs.
BACKGROUND Prostate adenocarcinoma (PRAD) is the second most common malignancy in males and the fifth leading cause of cancermortality. Cancer stem cells (CSCs) play an important role in the occurrence and development of PRAD, however, the prognostic biomarkers associated with CSC features have not been identified in PRAD. MATERIAL AND METHODS In order to identify the prognostic stemness-related genes (SRGs) of PRAD, the RNA sequencing data of patients with PRAD were retrieved from The Cancer Genome Atlas (TCGA) databases. The mRNA expression-based stemness index (mRNAsi) and the differential expressed genes (DEGs) were evaluated and identified. The associations between the mRNAsi and tumorigenesis, overall survival (OS), prostate-specific antigen (PSA) value, and Gleason score were also established by nonparametric test and Kaplan-Meier survival analysis. The SRGs were identified as the overlapped DEGs of PRAD-associated DEGs and the mRNAsi-associated DEGs. Based on the prognostic SRGs, the predict model was constructed. Its accuracy was tested by the area under the curve (AUC) of the receiver operator characteristic (ROC) curve and the risk score. RESULTS A total of 6005 PRAD-associated DEGs and 2462 mRNAsi-associated DEGs were identified. The mRNAsi was significantly upregulated in PRAD and associated with the PSA value and Gleason score. A total of 1631 SRGs were identified, with 36 prognostic SRGs screened by the univariate Cox analysis. Based on the prognostic SRGs, the predict model was constructed with the AUC of 0.986. Moreover, the risk score of the model was proved to be an independent prognostic factor, indicating its significant applicability. CONCLUSIONS Our data demonstrate the mRNAsi as a reliable index for the tumorigenesis, PSA value, and Gleason score of PRAD. Additionally, this study provides a well-applied model for predicting the OS for patients with PRAD based on prognostic SRGs.
Prostate adenocarcinoma (PRAD) is the second most common malignancy in males and the fifth leading cause of cancermortality [1,2]. Generally, PRAD is clinically localized, and radical prostatectomy (RP) is the standard treatment option for these patients [3]. RP provides adequate local control and a favorable prognosis [4]. However, many cases experience tumor recurrence and metastasis, in especial bone metastasis [5]. As for patients with advanced PRAD, androgen deprivation therapy (ADT) is usually used, however, these patients also have poor clinical outcomes [6]. Thus, it is crucial to investigate PRAD occurrence and progression factors and to identify potential biomarkers for predicting the prognosis of patients with PRAD.Like many solid tumors, PRAD is heterogeneous, including many types of cancer cells [7]. Generally, the differentiation of tumor cells is initiated from a population of cancer stem cells (CSCs). CSCs exhibit similar phenotypic and functional properties of normal stem cells, such as self-renewal and reestablishing the heterogeneous tumor cell population [8]. Thus, CSCs are significantly associated with the occurrence, development, and treatment resistance of PRAD [9]. Identification of CSC-associated biomarkers may predict the tumorigenesis and prognosis of PRAD and provide CSC-based diagnostic and therapeutic strategies [10].Nowadays, CSC features have been explored by deep learning methods to assess the degree of oncogenic dedifferentiation [11]. The DNA methylation-based stemness index (mDNAsi) is used to reflect epigenetic features and the other mRNA expression-based stemness index (mRNAsi) is reflective of gene expression. Both stemness indices can be related to the activity of CSCs, tumor dedifferentiation, and pathological grade. However, the applicability of the mRNAsi in predicting the tumorigenesis and prognosis of PRAD has not been verified.In this study, RNA-seq data and clinical information of PRAD samples were collected from The Cancer Genome Atlas (TCGA) databases. Additionally, we identified the mRNAsi and the differential expressed genes (DEGs). The stemness-related genes (SRGs) were identified as the overlapped DEGs of PRAD-associated DEGs and the mRNAsi-associated DEGs. Based on the prognostic SRGs, the predict model was constructed. Thus, this study was the first one focused on the role of the mRNAsi in the PRAD, and to suggest that the mRNAsi could be a reliable prognostic index for PRAD patients which may assist oncologists in clinical diagnosis and treatment.
Material and Methods
Data acquisition
This study was approved by the Ethics Committee of the Yiling Hospital of Yichang (No. LW2020-04-1401). The gene expression profiling of 498 primary PRAD and 52 normal solid tissue available from the TCGA database () were downloaded in the formats of HTSeq-counts and fragments per kilobase per million (FPKM). The demographics, clinical data, and prognostic follow-up information of the PRAD patients were also acquired.
Estimation of the mRNAsi based on the gene expression profiling
The mRNAsi have been reported by Tathiane et al., using an algorithm named one-class logistic regression machine learning (OCLR) to define stem cell signatures of the bulk tissue (CSCs) [11]. In this study, the mRNAsi of PRAD and normal solid tissue were estimated by OCLR based on the normalized gene expression profile [log2 (FPKM+1)]. The associations between the mRNAsi and the tumorigenesis, overall survival (OS), prostate-specific antigen (PSA) value, and Gleason score were also established by nonparametric test and Kaplan-Meier survival analysis.
Identification of DEGs
EdgeR method was used to identify the DEGs between 498 primary PRAD and 52 normal solid tissue. The gene with log2 fold change (FC) >1.0 or <−1.0, and false discovery rate (FDR) value <0.05 was defined as DEG. Besides, the PRAD samples where the mRNAsi was greater than the median was defined as the high stemness tumor and vice versa (the low stemness tumor). Similarly, DEGs analysis between low and high stemness PRAD was also conducted. Furthermore, the Gene Oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional enrichment analysis were utilized to explore the biological processes and pathways that DEGs enriched.
Construction of multivariate Cox model
The SRGs were identified as the overlapped DEGs of PRAD-associated DEGs and the mRNAsi-associated DEGs. The prognostic SRGs selected by univariate Cox analysis were integrated into multivariate model. The accuracy of the model was tested by the area under the curve (AUC) of the receiver operator characteristic (ROC) curve. A protein-protein interaction (PPI) network were constructed based on prognostic genes in multivariate model by STRING database [12].
Independent prognosis analysis
For each PRAD sample, the risk score was calculated by the formula generated by the multivariate Cox model as followed:For each single PRAD sample, “n” represented the number of prognostic gene in the multivariate model; “x” represented the number of each patient; and “β” represented coefficient of each prognostic gene. Additionally, the patients were divided into high and low risk group by the median of risk score. Then, the Kaplan-Meier survival analysis was applied to predict the prognosis value of the risk group. Besides, the independent prognosis value for the risk score was evaluated by univariate and multivariate Cox regression analysis, corrected by demographics, Gleason score and new tumor event during the follow-up period.
Statistics analysis
As for descriptive statistics, mean±standard deviation was used for the continuous variables in normal distribution while the mean (range) was used for continuous variables in abnormal distribution. In addition, categorical variables were described by counts and percentages. The ROC curve illustrated the sensitivity and specificity of multivariate models to predict 5-year overall mortality. R software (Institute for Statistics and Mathematics, Vienna, Austria; www.r-project.org, version 3.6.1) were utilized for analysis. Two-sided P value <0.05 was defined as statistically significant for all analysis process.
Results
DEGs identification
The baseline clinical characteristics of all patients with PRAD are summarized in Table 1 and the flowchart of this study is shown in Figure 1. The essential characteristics of the 498 primary PRAD and 52 normal solid tissue with RNA-seq data are summarized in Table 2.
Table 1
Baseline information of patients diagnosed with PRAD.
Variables
Total patients (N=500)
Age, years
Mean±SD
61.01±6.82
Median (range)
61 (41–78)
Gender
Female
0 (0.00%)
Male
500 (100.00%)
T
T2a
13 (2.60%)
T2b
10 (2.00%)
T2c
165 (33.00%)
T3a
159 (31.80%)
T3b
136 (27.20%)
T4
10 (2.00%)
Unknown
7 (1.40%)
N
N0
348 (69.60%)
N1
79 (15.80%)
Unknown
73 (14.60%)
Primary diagnosis
Adenocarcinoma, NOS
487 (97.40%)
Adenocarcinoma with mixed subtypes
3 (0.60%)
Infiltrating duct carcinoma, NOS
9 (1.80%)
Mucinous adenocarcinoma
1 (0.20%)
Gleason score
6
45 (9.00%)
7
250 (50.00%)
8
64 (12.80%)
9
137 (27.40%)
10
4 (0.80%)
PSA ng/mL
Mean±SD
1.86±3.39
Median (range)
0.10 (0.01–323.00)
Unknown
84 (16.80%)
PRAD – prostate adenocarcinoma; SD – standard deviation; NOS – not otherwise specified; PSA – prostate-specific antigen.
Figure 1
The flowchart of all analysis process.
Table 2
Essential characteristics of the primary PRAD and normal solid tissue with RNA-seq data.
Variables
Total samples (N=550)
Age, years
Mean±SD
61.01±6.82
Median (range)
61 (41–78)
Gender
Female
0 (0.00%)
Male
550 (100.00%)
T
T2a
13 (2.36%)
T2b
10 (1.82%)
T2c
164 (29.82%)
T3a
158 (28.72%)
T3b
136 (24.73%)
T4
10 (1.82%)
Unknown
7 (1.27%)
Normal solid tissue
52 (9.46%)
N
N0
346 (62.91%)
N1
79 (14.36%)
Unknown
73 (13.27%)
Normal solid tissue
52 (9.46%)
Primary diagnosis
Adenocarcinoma, NOS
485 (88.18%)
Adenocarcinoma with mixed subtypes
3 (0.54%)
Infiltrating duct carcinoma, NOS
9 (1.64%)
Mucinous adenocarcinoma
1 (0.18%)
Normal solid tissue
52 (9.46%)
Gleason score
6
45 (8.18%)
7
248 (45.09%)
8
64 (11.64%)
9
137 (24.91%)
10
4 (0.72%)
Normal solid tissue
52 (9.46%)
PSA ng/mL
Mean±SD
1.86±3.39
Median (range)
0.10 (0.01–323.00)
Unknown
84 (15.27%)
Normal solid tissue
52 (9.46%)
Type
Primary PRAD
498 (90.55%)
Normal solid tissue
52 (9.46%)
PRAD – prostate adenocarcinoma; SD – standard deviation; NOS – not otherwise specified; PSA – prostate-specific antigen.
A total of 6005 genes including 2697 downregulated genes and 3308 upregulated genes were identified as DEGs between 498 primary PRAD and 52 normal solid tissue. The heatmap and volcano plot were presented in Figure 2A and 2B. The DEGs enriched in the GO term pointed to muscle system process, collagen-containing extracellular matrix, and passive transmembrane transporter activity (Figure 2C), while those enriched in KEGG term were associated with neuroactive ligand-receptor interaction (Figure 2D). Similarly, 2462 genes including 2029 downregulated genes and 433 upregulated genes were identified as DEGs between low and high mRNAsi PRAD. The heatmap and volcano plot are described in Figure 3A and 3B. The stemness-related DEGs enriched in GO (Figure 3C) and KEGG (Figure 3D) terms also included muscle system process, collagen-containing extracellular matrix, passive transmembrane transporter activity and neuroactive ligand-receptor interaction.
Figure 2
The results of differential expression genes analysis and functional enrichment analysis between 498 primary PRAD and 52 normal solid tissue. A total of 6005 genes including 2697 downregulated ones and 3308 upregulated ones were identified as DEGs between 498 primary PRAD and 52 normal solid tissue, illustrating in heatmap (A) and volcano plot (B). Most significant GO (C) and KEGG (D) terms for DEG were muscle system process, collagen–containing extracellular matrix, passive transmembrane transporter activity, and neuroactive ligand–receptor interaction, illustrating in bubble plot. PRAD – prostate adenocarcinoma; DEGs – differential expressed genes; GO – Gene Ontology; KEGG – Kyoto Encyclopedia of Genes and Genomes.
Figure 3
The results of differential expression genes analysis and functional enrichment analysis between low mRNAsi PRAD and high mRNAsi PRAD. A total of 2462 genes including 2029 downregulated ones and 433 upregulated ones were identified as DEGs between low mRNAsi PRAD and high mRNAsi PRAD, illustrating in heatmap (A) and volcano plot (B). GO (C) and KEGG (D) terms including muscle system process, collagen–containing extracellular matrix, passive transmembrane transporter activity and neuroactive ligand–receptor interaction were identified as significantly enriched items in stemness-related DEGs, illustrating in bubble plot. PRAD – prostate adenocarcinoma; mRNAsi – mRNA expression-based stemness index; DEGs – differential expressed genes; GO – Gene Ontology; KEGG – Kyoto Encyclopedia of Genes and Genomes.
Identification of SRGs with prognostic value
As shown in Venn plot, 1631 genes between the PRAD-associated DEGs and 2462 the mRNAsi-associated DEGs were identified as SRGs and included in the univariate Cox analysis (Figure 4A). A total of 36 SRGs with prognostic value were identified by the univariate Cox analysis (Figure 4B). Besides, the mRNAsi of PRAD tissues was significantly higher than that in the normal tissues (Figure 4C, P<0.001). Additionally, the mRNAsi was also significantly associated with the Gleason score (Figure 4D, P<0.001) and PSA value (Figure 4E, P<0.001). Although patients with high mRNAsihad a decreased OS compared to patients with low mRNAsi, the difference was not significant (Figure 4F, P=0.059).
Figure 4
Identification of SRGs with prognostic value. As shown in Venn plot, 1631 DEGs between both high/low stemness tumors and tumor/normal tissues were screened as SRGs and included in the univariate Cox analysis (A). The results of univariate Cox analysis of SRGs with prognostic value were illustrated in the forest plot (B). Besides, significant difference in mRNAsi between normal and tumor tissues was identified (C, P<0.001). And mRNAsi of PRAD were associated with Gleason score (D, P<0.001), PSA value (E, P<0.001) and prognosis (F, P=0.059). SRGs – stemness-related genes; mRNAsi – mRNA expression-based stemness index; DEGs – differential expressed genes; PRAD – prostate adenocarcinoma; PSA – prostate-specific antigen.
The prognostic SRGs were then integrated into the multivariate Cox model. The risk score distribution based on risk score of each sample was described in the scatter plot (Figure 5A) and risk curve (Figure 5B). The risk score was demonstrated to be a prognostic factor for OS in the Kaplan-Meier analysis (Figure 5C, P<0.001). In addition, the efficiency and residual distribution of the multivariate model was accessed by the ROC curve (AUC=0.986) and residual plot (Figure 5D, 5E).
Figure 5
The model diagnosis results of multivariate Cox model constructed based on prognostic SRGs. Prognostic SRGs were integrated into multivariate Cox model. The scatter plot (A) and risk curve (B) of the model demonstrated the risk score distribution based on risk score of each sample. Kaplan-Meier curve suggested the prognostic value of the risk score (C, P<0.001). And the efficiency and residual distribution of the multivariate model was accessed by the ROC curve (AUC=0.860) and residual plot (D, E). SRGs – stemness-related genes; AUC – area under the curve; ROC – receiver operator characteristic.
Independent prognosis analysis and construction of PPI network
The risk score was also revealed to be an independently prognostic indicator for patients with PRAD in univariate (hazard ratio [HR]=1.573, 95% confidence interval [CI] (1.551–1.595), P<0.001) (Figure 6A) and multivariate (HR=1.343, 95% CI (1.323–1.364), <0.001) (Figure 6B) Cox regression model was corrected by demographics, Gleason score, and new tumor event during the follow-up period.
Figure 6
Independent prognosis analysis of the risk score generated from the multivariate Cox model including prognostic SRGs. In univariate (HR=1.573, 95% CI (1.551–1.595), P<0.001) (A) and multivariate (HR=1.343, 95% CI (1.323–1.364), P<0.001) (B) Cox regression model corrected by demographics, Gleason score and new tumor event during the follow-up period, the risk score was shown to be an independently prognostic indicator for PRAD patients. SRGs – stemness-related genes; HR – hazard ratio; CI – confidence interval; PRAD – prostate adenocarcinoma.
A PPI network were also constructed based on prognostic SRGs in multivariate model by STRING database to present their interaction. In Figure 7, prognostic SRGs, such as KIF18B, KIF4A, BIRC5, NCAPG, UBE2C, NEIL3, and EXO1, had a strong interaction relationship.
Figure 7
The construction of the PPI network based on prognostic SRGs. PPI – protein–protein interaction; SRGs – stemness-related genes.
Discussion
PRAD is the second most challenging cause of cancer-related deaths in men [1,13] and its mortality is mainly associated with tumor progression [13]. CSCs play an important role in the occurrence and progression of PRAD and therapy resistance during treatment [9,14]. Thus, there is a pressing need to explore the prognostic SRGs which may assist oncologists in early diagnosis and targeted therapy. In this study, our findings firstly demonstrated that the mRNAsi was a reliable index which was significantly upregulated in PRAD. Based on the prognostic SRGs, we constructed a well-applied prediction model to predict the prognosis of PRAD patients and identified the interaction network of key prognostic SRGs.CSCs possess the self-renewal properties, genomic instability, and are regarded as the engine of tumor evolution. In tumor metabolism of PRAD, CSCs regulate the MYC-dependent metabolic reprogramming by increasing the expression levels of GLS1 glutaminase to utilize the glutamine [15,16]. The glutamine utilization is also mediated by the androgen receptor (AR), whose gene amplification or mutation works in the “androgen-independent” tumor progression [17]. Thus, the CSC-mediated androgen-resistance is associated with AR and glutamine metabolism [18]. Additionally, CSCs can also induce the epithelial-mesenchymal transition (EMT) in PRAD which is associated with the tumor metastasis [19].As the important roles of CSCs in the tumor initiating, tumor progression, and drug resistance of PRAD become known, identification of CSC-related signatures can be used to monitor the tumor development, recurrence, and metastasis; predict the prognosis of patients; and evaluate the efficacy of therapeutic methods. In this study, we used the mRNAsi as the stemness index of transcriptome gene expression and found that the mRNAsi was significantly highly expressed in the PRAD tissue samples compared with normal tissue [20]. Although the patients with higher mRNAsi had a lower OS, due to the favorable prognosis of patients with PRAD, the difference was not significant (P=0.059).It has been previously reported that the mRNAsi was related to the biological processes which were active in CSCs and malignant histopathological grade [11]. Besides, advanced neoplasms were also believed to have high stemness index (mRNAsi), because of the aggressiveness nature and dedifferentiated phenotype of the metastatic and recurrent tumor cells [21]. Moreover, the mRNAsi was also found to be related to some novel oncogenic pathways and somatic alterations, which demonstrate the roles of CSC in the tumorigenesis [11].Screening of serum PSA is the conventional and convenient method for predicting the probability of PCa and widely used in the worldwide [22]. However, its benefit is still controversial, especially in the older men (≥70 or 75 years) or patients with limited life expectancy (<10 years) [23]. Gleason score is also a reliable grading system for the aggressiveness for PRAD and strongly predicts the prognosis of patients with PRAD [24]. The high-grade PRAD can be defined as Gleason score sum 7 or higher [22]. The evaluation of Gleason grading in predicting PRAD behavior and recurrence is similar to expert pathologists by the deep-learning system [25]. However, the effects of PSA and Gleason score in PRAD prognosis are still queried by many scientists and exploring the novel prognostic biomarkers is an urgent issue [26]. In this study, we found the stemness index mRNAsi was also significantly associated with the PSA value and Gleason score, which may provide a reliable index for the prognosis of PRAD.In view of the important roles of CSCs in the tumorigenesis, progression, and drug resistance, we identified the key prognostic SRGs, such as KIF18B, KIF4A, BIRC5, NCAPG, UBE2C, NEIL3, and EXO1. Based on the prognostic SRGs, we built a model to predict the OS of patients with PRAD and the model achieved a good accuracy and applicability (AUC: 0.986). Evaluating the prognostic factors is important for oncologists in clinical decision-marking, thus many previous studies have focused on the identification of predictors for patients with PRAD [5,27,28]. Clinical information (such as age), blood examination (such as PSA value), tumor characteristics (such as androgen-independent and androgen-dependent type) and treatment methods (such as RP and ADT) have been regarded as prognostic factors and proved to be associated with the prognosis of patients with PRAD [28,29]. However, these clinical factors do not include the molecular biomarkers. With the well-application of TCGA dataset, regulatory network of miRNA, mRNA, and alternative splicing has been constructed and prognosis-associated biomarkers have also been identified for patients with PRAD [31,32]. Unfortunately, these prediction model do not cover the CSC-related signatures and prognostic SRGs. Thus, our study is an appropriate supplement to the existing prediction models for the prognosis evaluation of PRAD patients.Although this study was the first one which suggested good accuracy of the mRNAsi in predicting the prognosis of PRAD and constructed a well-applied prediction model based on the CSC-related genes, it still possessed some shortcomings that warrant consideration. Firstly, all the samples involved in this study are from America, and thus we are not quite sure about the applicability of prediction model in European and Asian. Second, this proposed prediction model has not been verified by external databases. Third, the protein expression of key prognostic SRGs have not been verified. Thus, our next study will focus on validating the prediction model and key prognostic SRGs using our own data and other public database.
Conclusions
Our data demonstrated that the mRNAsi as a reliable index for tumorigenesis, PSA value, and Gleason score of PRAD. Additionally, this study provided a well-applied model for predicting the OS for patients with PRAD based on prognostic SRGs.
Authors: Loukia G Karacosta; Barbara A Foster; Gissou Azabdaftari; David M Feliciano; Arthur M Edelman Journal: J Biol Chem Date: 2012-05-31 Impact factor: 5.157
Authors: Damian Szklarczyk; Andrea Franceschini; Stefan Wyder; Kristoffer Forslund; Davide Heller; Jaime Huerta-Cepas; Milan Simonovic; Alexander Roth; Alberto Santos; Kalliopi P Tsafou; Michael Kuhn; Peer Bork; Lars J Jensen; Christian von Mering Journal: Nucleic Acids Res Date: 2014-10-28 Impact factor: 16.971