Literature DB >> 31961962

A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy.

Xiang Li^1,2,3, Haiyan Huang¹, Jiahui Zhang¹, Fengle Jiang¹, Yating Guo¹, Yidan Shi¹, Zheng Guo^1,2,3, Lu Ao^1,2,3.

Abstract

BACKGROUND: The qualitative transcriptional characteristics, the within-sample relative expression orderings (REOs) of genes, are highly robust against batch effects and sample quality variations. Hence, we develop a qualitative transcriptional signature based on REOs to predict the biochemical recurrence risk of prostate cancer (PCa) patients after radical prostatectomy.
METHODS: Gene pairs with REOs significantly correlated with the biochemical recurrence-free survival (BFS) were identified from 131 PCa samples in the training data set. From these gene pairs, we selected a qualitative transcriptional signature based on the within-sample REOs of gene pairs which could predict the recurrence risk of PCa patients after radical prostatectomy.
RESULTS: A signature consisting of 74 gene pairs, named 74-GPS, was developed for predicting the recurrence risk of PCa patients after radical prostatectomy based on the majority voting rule that a sample was assigned as high risk when at least 37 gene pairs of the 74-GPS voted for high risk; otherwise, low risk. The signature was validated in six independent datasets produced by different platforms. In each of the validation datasets, the Kaplan-Meier survival analysis showed that the average BFS of the low-risk group was significantly better than that of the high-risk group. Analyses of multiomics data of PCa samples from TCGA suggested that both the epigenomic and genomic alternations could cause the reproducible transcriptional differences between the two different prognostic groups.
CONCLUSIONS: The proposed qualitative transcriptional signature can robustly stratify PCa patients after radical prostatectomy into two groups with different recurrence risk and distinct multiomics characteristics. Hence, 74-GPS may serve as a helpful tool for guiding the management of PCa patients with radical prostatectomy at the individual level.

Entities: CellLine Chemical Disease Gene Species

Keywords: biochemical recurrence-free survival; prostate cancer; qualitative signature; relative expression orderings

Mesh：

Substances：
RNA, Messenger

Year: 2020 PMID： 31961962 PMCID： PMC7065139 DOI： 10.1002/pros.23952

Source DB: PubMed Journal: Prostate ISSN： 0270-4137 Impact factor: 4.104

INTRODUCTION

Prostate cancer (PCa) is the second most frequently diagnosed malignant tumor in men worldwide, with the highest morbidity rates in the developed countries. , In China, PCa has the most rapid rise of incidence along with the increases of the aging population and the implementation of advanced detection services in recent decades. The standard method to treat localized PCa patients is radical prostatectomy, while approximately 20% to 40% of patients will suffer from biochemical recurrence in 10 years. , The prostate‐specific antigen (PSA) level is an important indicator of biochemical recurrence for localized and locally advanced PCa after radical prostatectomy. Nevertheless, some PCa patients with poor prognoses have a low PSA level. , The currently available clinical‐pathological features, such as the Gleason grade group, clinical and pathological stage and surgical margin, , are unable to provide accurate predictions for biochemical recurrence. , , Thus, it is crucial to develop an accurate prognostic signature to predict the recurrence risk for PCa patients after radical prostatectomy. High‐throughput microarray and RNA‐sequencing technologies facilitate researchers developing transcriptional prognostic signatures for PCa patients. , , However, most of the reported transcriptional signatures depend on risk threshold values summarized from the quantitative expression measurements of the signature genes, , , which are easily vulnerable to the measurement variations from batch effects introduced by laboratory conditions, reagent lots, and personal differences. In fact, subtle quantitative values of gene expression measurements are quite error‐prone. Data normalization methods for removing batch effects might even exacerbate the batch problems , and these methods are not suitable for individualized analysis of clinical application. On the contrary, the within‐sample relative expression orderings (REOs) of genes that are the qualitative features of transcription have been proved to be robust against experimental batch effects and differences in probe designs of different platforms. , The within‐sample REO is a promising feature for building robust classifiers, for example top‐scoring pair (TSP) and k‐TSP with existing R packages. , Besides, the within‐sample REO can be robustly analyzed individual sample without normalization, which is suitable for individualized application in clinical practice. More importantly, our previous studies have demonstrated that different from the signatures based on quantitative expression measurements of the signature genes, the REOs‐based qualitative signatures are rather insensitive to the tumor cell percentage difference of specimen sampled from different parts of the same tumor, the inescapably partial RNA degradation , and amplification bias of low‐input RNA samples. Based on these unique advantages of the within‐sample REOs, we have developed the qualitative REOs‐based signatures for predicting the prognosis of breast cancer, , colorectal cancer, gastric cancer, liver cancer, and lung cancer. Thus, it is worthwhile to develop a qualitative prognostic signature for PCa patients after radical prostatectomy. In this study, a qualitative REOs‐based signature consisting of 74 gene pairs, named as 74‐GPS, was developed to predict the recurrence risk of PCa patients using 131 samples in the training data set. A sample was assigned as high‐risk when at least 37 gene pairs of the 74‐GPS voted for high risk; otherwise, low risk. This signature was validated in six independent datasets produced by different platforms, totally including 660 fresh‐frozen (FF) samples and 106 formalin‐fixed paraffin‐embedded (FFPE) samples. Using the multiomics data of PCa samples from The Cancer Genome Atlas (TCGA), we analyzed the distinct transcriptomic, epigenetic, and genomic differences between the two prognostic groups. The results might be helpful for understanding the mechanisms of different prognoses and guiding the management for PCa patients.

MATERIALS AND METHODS

Data collection and data preprocessing

Data for PCa were downloaded from the Gene Expression Omnibus (GEO, http://www.ncbi.nlm.nih.gov/geo/) and TCGA (http://cancergenome.nih.gov/) database. A total of 791 FF samples from six datasets and 106 FFPE samples from the GSE54460 data set with BFS data were analyzed, as described in Table 1. These datasets were measured by different platforms, including single‐channel microarray, dual‐channel microarray, and next‐generation sequencing (NGS). For datasets measured by the Affymetrix platform, a robust multi‐array average (RMA) algorithm was used to process the raw mRNA expression data (.CEL files). For datasets measured by the Illumina platform and the Stanford Functional Genomics Facility dual‐channel platform, the processed data were directly used.

Table 1

Description of the datasets used in this study

	PC131	PC332	PC111	PC92	PC89	PC36	PC106
Accession	GSE21032	TCGA	GSE70768	GSE70769	GSE40272	GSE46602	GSE54460
Platform	GPL10264	Illumina Hiseq‐RNAseqV2	GPL10558	GPL10558	GPL9497	GPL570	GPL11154
Sample size	131	332	111	92	89	36	106
Sample type	FF	FF	FF	FF	FF	FF	FFPE
Age	58	61	62	‐	62	63	61.7
Median follow‐up period (mo)	54.5 (1.9‐149.2)	28.9 (0.2‐163.6)	34 (2‐66.8)	79.6 (1.8‐122.3)	43.3 (0‐116)	‐	68.5 (0.7‐180.6)
Pathologic stage
T1‐T2	85	‐	33	48	‐	19	87
T3‐T4	46	‐	78	42	‐	17	18
NA	0	‐	0	2	‐	0	1
Median preoperative PSA (ng/mL)	8.5 (1.1‐132)	9.8 (0.8‐87)	8.6 (3.2‐23.7)	11 (1.5‐117)	6.7 (2.1‐44.5)	18.2 (5.3‐42.5)	10.9 (1.8‐72.6)
Gleason grade group
1	41	‐	17	20	13	16	11
2‐3	74	‐	85	55	65	15	80
4	8	‐	8	5	4	4	10
5	7	‐	1	10	7	1	5
NA	1	‐	0	2	0	0	0
Surgical margin
positive	31	69	26	42	10	16	40
negative	100	244	85	50	78	20	61
NA	0	19	0	0	1	0	5

Abbreviations: FF, fresh‐frozen; FFPE, formalin‐fixed paraffin‐embedded; NA, not available; PSA, prostate‐specific antigen.

Description of the datasets used in this study Abbreviations: FF, fresh‐frozen; FFPE, formalin‐fixed paraffin‐embedded; NA, not available; PSA, prostate‐specific antigen. The level 3 mRNA‐seq profiles, DNA methylation profiles, and copy number profiles used in the study were obtained from cBioPortal (http://www.cbioportal.org/), , which the gene expression values, the methylation values (β) of the CpG sites or the copy number alternation status of regions had already been mapped to Entrez gene IDs. We directly downloaded these processed data. Overall, 20 436 genes for gene expression data, 16 182 genes for DNA methylation data and 23 286 genes for copy number data were analyzed in this study, respectively. The level 2 gene mutation data were downloaded from TCGA portal. A discrete mutation profile including 11 249 genes only with the nonsynonymous mutations were generated.

Survival analysis

The BFS time was calculated from the date of the resection to the date of biochemical recurrence or the date of the last follow‐up visit. The Cox proportional hazards regression model was used to calculate the hazard ratios (HRs) and corresponding 95% confidence intervals (CIs) and estimate the independent prognostic significance of the signature after adjustment for clinic pathological factors including Gleason grade group, surgical margins, preoperative PSA and pathological stage. Harrell's concordance index (C‐index) was used to quantify the overall concordance between the predicted risk classification and the BFS time. The log‐rank tests was used to compute the p‐value for the differences between the Kaplan‐Meier survival curves of BFS in two distinct subgroups.

Development of the qualitative signature

For a gene pair (Ga, Gb), gene a and gene b with expression levels of Ea and Eb, its REO (Ea>Eb or EaBFS by the univariate Cox proportional‐hazards regression model, the gene pair was defined as a prognosis‐associated gene pair. The Storey procedure was used to adjust the P‐values into false discovery rate (FDR). The significant level was set at 20%. All prognosis‐associated gene pairs were sorted in descending order according to their C‐index values. A forward selection procedure was applied to find the best subset of the prognosis‐related gene pairs that achieved the highest C‐index in the training data set. The first gene pair with the highest C‐index was selected as a seed and the other prognosis‐related gene pairs were added into the seed one by one based on the descending C‐index value if the gene pair can improve the C‐index. The subset of prognosis‐related gene pairs with the highest C‐index was chosen as the final prognostic signature. The voting rule was as follows: a patient was classified into the high‐risk group when at least 50% of the gene pairs voting for high risk; otherwise, the patient was classified into the low‐risk group. The R‐codes for developing the REOs‐based signature were available in Supporting Information Methods.

Analysis of epigenomic and genomic data

The RankCompV2 method, which is insensitive to batch effects, was used to identify differentially expressed genes (DEGs) between the high‐risk and low‐risk groups of PCa samples. the Wilcoxon rank‐sum test was used to identify differentially methylated genes (DMGs). Fisher's exact test was used to detect genes whose frequencies of copy number alteration or mutation were significantly different between two prognostic groups of TCGA samples.

Direction concordance scores

If k genes were overlapped between two DEGs lists identified from two datasets, of which s genes showed the same dysregulated direction (both up‐ or downregulated in the high‐risk group compared to the low‐risk group), then the direction concordance score was computed as s/k. Similarly, if k genes were overlapped between two DMGs lists identified from two datasets, of which s genes showed both hyper‐ or hypomethylated in the high‐risk group compared with the low‐risk group, then the direction concordance score was computed as s/k. For k DMGs, if s genes were upregulated (or downregulated) and correspondingly hypomethylated (or hypermethylated), the direction concordance score was computed as s/k. This score was used to calculate the reproducibility of DEGs identified from multiple independent datasets and the consistency between DEGs and DMGs. The cumulative binomial distribution model was used to evaluate whether the case of observing a direction concordance score of s/k is random: where p = 0.5 is the probability of one gene having the concordant dysregulated direction in two lists of genes by chance.

Functional enrichment analysis

The gene categories for functional enrichment analysis were performed on the Kyoto Encyclopedia of Genes and Genomes. The hypergeometric distribution model was applied to determine the significance of biological pathways enriched by genes of interest. The Benjamini and Hochberg procedure was used to estimate the FDR. Statistical analysis was carried out with the R software package version 3.5.1.

RESULTS

Development of the qualitative REOs‐based signature

The general workflow of this study is described in Figure 1. Total, 131 FF PCa samples measured by the GPL10264 platform (Table 1), denoted as PC131, were used as the training data set. Using the univariate Cox proportional‐hazards regression model with FDR < 20%, we found 80 genes with expression levels significantly correlated with the BFS of PCa patients after radical prostatectomy. A total of 3160 gene pairs consisting of every two of the 80 prognosis‐associated genes were constructed and each gene pair classified all samples into two subgroups according to its REO in each sample. Using the univariate Cox proportional‐hazards regression model with FDR < 20%, 1205 prognosis‐associated gene pairs were identified and sorted in descending order according to their C‐index values. Then, based on a forward selection method (see Section 2), 74 gene pairs with the highest C‐index (C‐index = 0.87) were chosen as the final prognostic signature, denoted as 74‐GPS (Table 2). Patients were classified into the high‐risk group when at least 37 of 74 gene pairs suggested that this patient was at high risk; otherwise, the low‐risk group. According to this decision rule, samples in the training data set were stratified into two subgroups: 108 samples in the low‐risk group and 23 samples in the high‐risk group, and the BFS of the former group were significantly better than the latter group (HR = 63.23, 95% CI: 18.56‐215.40, P < 2.2 × 10−16, C‐index = 0.87, Figure 2A). A multivariate Cox regression analysis revealed that the 74‐GPS still displayed significant correlations with patients' BFS in the training data set if the clinical factors of Gleason grade group, surgical margins, preoperative PSA and pathological stage were considered, as shown in Figure 3.

Figure 1

Table 2

The composition of the 74‐GPS

Pair 1‐25	Gene A	Gene B	Pair 26‐50	Gene A	Gene B	Pair 51‐74	Gene A	Gene B
pair1	ENG	CEBPD	pair26	ASPN	ZNF622	pair51	COL5A2	HELB
pair2	INHBA	CFDP1	pair27	INHBA	ITGA11	pair52	RELN	CTHRC1
pair3	ZHX3	TACC2	pair28	ASPN	OLFML2B	pair53	BGN	TACC2
pair4	OLFML2B	HELB	pair29	ENG	TJP2	pair54	PPP2R2C	TACC2
pair5	COL1A1	HSPA1B	pair30	ZHX3	CCNL1	pair55	FOLH1B	TACC2
pair6	NOTCH3	CEBPD	pair31	LTBP2	ZNF334	pair56	THBS2	CEBPD
pair7	PPP2R2C	CFDP1	pair32	RELN	CLEC14A	pair57	BGN	FZD5
pair8	COL3A1	ZFP36	pair33	COL8A1	HELB	pair58	POSTN	FZD5
pair9	COL1A1	NXF1	pair34	TACC2	ZNF532	pair59	COMP	DLL4
pair10	THBS2	CCNL1	pair35	FOLH1B	ZNF532	pair60	SFRP4	JUNB
pair11	ZHX3	TEP1	pair36	LTBP2	CCNL1	pair61	COL8A1	HOPX
pair12	NIPA1	ZNF532	pair37	COL3A1	TJP2	pair62	ESM1	HOPX
pair13	COL3A1	TACC2	pair38	CLSTN2	FZD5	pair63	LTBP2	CFDP1
pair14	XPO6	NXF1	pair39	COL3A1	FZD5	pair64	CLSTN2	ZFP36
pair15	HOPX	HELB	pair40	CLSTN2	CCNL1	pair65	FOLH1	CEBPD
pair16	CDH13	HELB	pair41	TCF19	HELB	pair66	FAP	LAMP5
pair17	POSTN	SLC25A17	pair42	OR,2T2	CTHRC1	pair67	CCNL1	FZD5
pair18	NIPA1	CCNL1	pair43	SFRP4	HOPX	pair68	CXCL14	TACC2
pair19	ASPN	NOX4	pair44	NOTCH3	XPO6	pair69	CTHRC1	DLL4
pair20	FOLH1B	HSPA1B	pair45	PPP2R2C	ITGA11	pair70	PYDC2	MAB21L3
pair21	COL3A1	XPO6	pair46	OR,2T11	OLFML2B	pair71	COL10A1	ABCC11
pair22	HOXC4	HELB	pair47	THY1	ITGA11	pair72	CDH13	ZNF334
pair23	COL1A1	CCNL1	pair48	CXCL14	CCNL1	pair73	CCNL1	CEBPD
pair24	COMP	TJP2	pair49	DLL4	HELB	pair74	CPS1	COL5A2
pair25	COL3A1	ENG	pair50	NIPA1	ZFP36

Note: Gene pair votes for high‐risk when Gene A has a higher expression level than Gene B in a sample.

Figure 3

Univariate and multivariate Cox regression analyses for the 74‐GPS in the training data set. The forest plot of univariate (blue lines) and multivariate (orange lines) Cox regression analysis of the predictive signature and available prognostic factors in the training data set PC131. Red color indicates significant P values. P < .1. GPS, gene pairs [Color figure can be viewed at wileyonlinelibrary.com]

Overview of the workflow used in this study. The workflow includes three major steps: the development of the REOs‐based signature in the training datasets (Step 1), the validation of the signature in the six independent validation datasets (Step 2), and the multiomics characteristics analyses of the two prognostic groups (Step 3). CNV, copy number variation; DNA methy, DNA methylation; Exp, expression; REOs, relative expression orderings [Color figure can be viewed at wileyonlinelibrary.com] The composition of the 74‐GPS Note: Gene pair votes for high‐risk when Gene A has a higher expression level than Gene B in a sample. The Kaplan‐Meier curves of biochemical recurrence‐free survival for the training and validation datasets. The Kaplan‐Meier curves of biochemical recurrence‐free survival for the training data set PC131 (A) and the six validation datasets PC332 (B), PC111 (C), PC92 (D), PC89 (E), PC36 (F), and PC106 (G). A sample was assigned into the high‐risk group (red lines) when at least 37 gene pairs of the 74‐GPS voted for high‐risk; otherwise, the low risk group (blue lines). GPS, gene pairs [Color figure can be viewed at wileyonlinelibrary.com] Univariate and multivariate Cox regression analyses for the 74‐GPS in the training data set. The forest plot of univariate (blue lines) and multivariate (orange lines) Cox regression analysis of the predictive signature and available prognostic factors in the training data set PC131. Red color indicates significant P values. P < .1. GPS, gene pairs [Color figure can be viewed at wileyonlinelibrary.com]

Validation of the qualitative REOs‐based signature

The first validation data set included 332 FF samples from TCGA, denoted as PC332. The 172 patients predicted to be at the low‐risk recurrence group had a significantly better BFS than the 160 patients predicted to be at the high‐risk group (HR = 2.02, 95% CI: 1.06‐3.83, P = 2.78 × 10−2, C‐index = 0.60; Figure 2B). The second validation data set included 111 FF samples measured by the GPL10558 platform, denoted as PC111. The signature classified 98 and 13 samples into the low‐risk and high‐risk groups, respectively while the BFS of the former were significantly better than the latter (HR = 4.69, 95% CI: 1.63‐13.51, P = 1.15 × 10−2, C‐index = 0.61; Figure 2C). The signature was also verified in the other three validation datasets with 92, 89, and 36 FF samples, respectively. Each group of patients at the low‐risk had significantly longer BFS than the group of patients at the high‐risk in all datasets (Figure 2D‐F). Notably, 106 FFPE samples in the data set GSE54460 were successfully stratified into two different prognostic groups: 91 patients at the low‐risk group had a significantly better BFS than 15 patients at the high‐risk group (HR = 2.68, 95% CI: 1.40‐5.14, P = 6.82 × 10−3, C‐index = 0.57; Figure 2G). As we know, FFPE samples always suffer RNA degrade during the process of preparation and storage, which hampers the clinical application of quantitative transcriptional signatures. , The multivariate Cox regression analysis was also performed in the validation datasets. The results showed that the signature remained significantly associated with patients' BFS in the datasets PC332, PC111, PC92, and PC89 after adjusting the available clinic pathological factors. The detailed information was shown in Figure 4 and Table S1.

Figure 4

Univariate and multivariate Cox regression analyses for the 74‐GPS in the validation datasets. The forest plot of univariate (blue lines) and multivariate (orange lines) Cox regression analyses of the predictive signature and available prognostic factors in the validation datasets PC332 (A), PC111 (B), PC92 (C), and PC89 (D). Red color indicates significant P values. P < .1. GPS, gene pairs [Color figure can be viewed at wileyonlinelibrary.com]

Distinct transcriptional and functional characteristics of the two prognostic groups

With 10% FDR control, 177 DEGs and 1250 DEGs were identified by RankCompV2 between the high‐ and low‐risk prognostic groups of the datasets PC131 and PC332, respectively. These two lists of DEGs shared 84 genes, of which 83 genes showed the same dysregulated directions in the high‐risk group compared with the low‐risk group, with a direction concordance score of 98.81% which was unlikely observed by chance (binomial distribution test, P < 2.2 × 10−16, see Section 2). Besides, the direction concordance scores between every two of the DEGs lists detected from the seven datasets were all unlikely happened by chance (see Table S2). These results suggested that the distinct transcriptional characteristics of the two prognostic groups were highly reproducible in the independent datasets. With FDR < 10%, functional enrichment analysis for the 1250 DEGs identified from TCGA samples in the data set PC332 revealed that the genes upregulated in the high‐risk group were significantly enriched in pathways associated with cell proliferation, such as the PI3K‐Akt signaling pathway and the TGF‐beta signaling pathway, whereas the downregulated genes were significantly enriched in metabolic pathways, such as fatty acid degradation pathway and glutathione metabolism pathway (hypergeometric distribution model, FDR < 10%, Table S3). These results indicated that the tumor cells in the high‐risk patients might grow faster than that in the low‐risk patients and experience dysregulated metabolism, which led to poor prognosis of PCa patients. ,

Distinct epigenomic characteristics of the two prognostic groups

In the TCGA data set PC332, 160 samples and 172 samples with DNA methylation data were classified into the high‐risk prognostic group and the low‐risk prognostic group by the 74‐GPS, respectively. Using the Wilcoxon rank‐sum test with FDR < 1%, 1631 hypermethylated and 624 hypomethylated genes were identified from the high‐risk prognostic group compared with the low‐risk prognostic group, respectively. There were 12.94% of 1631 hypermethylated genes overlapped with the 1250 DEGs between the two different prognostic groups. The direction concordance score of hypermethylation with downregulation was 94.31%, which was extremely unlikely happened due to chance (binomial distribution test, P < 2.2 × 10−16). Similarly, 11.86% of 624 hypomethylated genes were overlapped with DEGs between the high‐risk prognostic groups and low‐risk prognostic groups. The direction concordance score of hypomethylation with upregulation was 93.24%, which was also extremely unlikely happened due to chance (binomial distribution test, P = 8.88 × 10−16). An additional 160 samples without the recurrence information in the TCGA data portal, denoted as PC160, were used to confirm the epigenomic characteristics of the two prognostic groups. In data set PC160, 98 samples and 62 samples with DNA methylation data were classified into the high‐risk prognostic group and the low‐risk prognostic group by the 74‐GPS. With 10% FDR control, 588 DEGs were identified by RankCompV2 between the high‐ and low‐risk prognostic groups. The direction concordance score of DEGs from data set PC160 and data set PC332 was 100% (425/425). Using the Wilcoxon rank‐sum test with FDR < 10%, 542 hypermethylated and 237 hypomethylated genes were identified from the high‐risk prognostic group compared with the low‐risk prognostic group, respectively. The direction concordance score of DMGs from data set PC160 and data set PC332 was 97.98% (339/346). Moreover, the direction concordance scores of hypermethylation with downregulation and hypomethylation with upregulation in data set PC160 were 95.83% (23/24) and 97.50% (39/40), respectively. All the direction concordance scores were extremely unlikely happened due to chance, as shown in Table S4 and S5. The consistent and reproducible results observed in datasets PC332 and PC160 implied that the epigenetic alterations may cause reproducibly transcriptional alterations between different prognostic groups.

Distinct genomic characteristics of the two prognostic groups

The 331 samples with copy number alteration data in the data set PC332 were divided into 160 high‐risk samples and 171 low‐risk samples, respectively. A total of 10,342 genes were with significantly higher copy number alteration frequencies in the high‐risk group than in the low‐risk group (Fisher exact test, FDR < 5%). Then 6.48% of 2378 genes frequently amplified in the high‐risk group were shared in the DEGs list, of which 68.18% were upregulated in the high‐risk group. This was unlikely happened due to chance (binomial distribution test, P = 3.75 × 10−6). Moreover, Among the 7964 genes frequently deleted in the high‐risk group, 457 genes were shared in the DEGs list, of which 61.49% were downregulated in the high‐risk group. This was also unlikely happened due to chance (binomial distribution test, P = 5.16 × 10−7). The 109 samples with copy number variation data in the data set PC131 were divided into 24 high‐risk samples and 85 low‐risk samples by the 74‐GPS, respectively. With 5% p‐value control, 4020 DEGs were identified by Student's t‐test between the high‐ and low‐risk prognostic groups with a direction concordance score of 97.08% (333/343) in the data set PC332. A total of 2957 genes had significantly higher copy number alteration frequencies in the high‐risk group than in the low‐risk group (Fisher exact test, P < 5%), with a direction concordance score of 94.40% (1483/1571) in the datasets PC332. And the direction concordance scores of amplification with upregulation and deletion with downregulation in the high‐risk group of the data set PC131 were 74.47% (70/94) and 61.80% (406/657), respectively. All the direction concordance scores were unlikely happened due to chance, as shown in Table S4 and S6. In the data set PC160, 98 samples and 62 samples with copy number variation data were classified into the high‐risk prognostic group and the low‐risk prognostic group, respectively. A total of 4893 genes had significantly higher copy number alteration frequencies in the high‐risk group than in the low‐risk group (Fisher exact test, P < 5%), with a direction concordance score of 74.47% (2138/2871) in the datasets PC332. And the direction concordance score of amplification with upregulation in the high‐risk group was 82.26% (51/62). All the direction concordance scores were also unlikely happened due to chance (shown in Table S4 and S6). The results observed in the datasets PC332, PC131, and PC160 implied that these copy number alterations, especially amplification, may cause reproducibly transcriptional alterations between different prognostic groups. Using Fisher's exact test with P < .1, 84 genes whose mutation frequencies tended to be different were detected between the 160 high‐risk samples and the 171 low‐risk samples with somatic mutation data in the data set PC332 (Table S7). Impressively, all of the 84 genes had higher mutation frequencies in the high‐risk group than in the low‐risk group, which was unlikely to happen due to chance (binomial distribution test, P < 2.2 × 10−16). In the data set PC160, 12 genes whose mutation frequencies tended to be different were detected between the 97 high‐risk and the 62 low‐risk samples with somatic mutation data (Fisher's exact test, P < .1). In both the data set PC160 and PC332, TP53 was with significantly higher mutation frequencies in the high‐risk group than in the low‐risk group. It has been reported that TP53 mutation is correlated with metastasis and poor prognosis of PCa patients. , , Functional enrichment analysis showed that these 84 mutation genes were significantly enriched in focal adhesion and PI3K‐Akt signaling pathways (hypergeometric distribution model, FDR < 10%), suggesting that mutation‐induced alternation of genes in these pathways might lead a poor outcome of PCa patients.

DISCUSSION

In this study, we developed a qualitative transcriptional signature, 74‐GPS, to predict the recurrence risk of PCa patients after radical prostatectomy, which was validated in six independent datasets produced by different platforms, including a total of 660 FF and 106 FFPE samples. The further multiomics data analyses showed that the distinct transcriptomic, epigenetic, and genomic landscapes between the high‐risk and low‐risk groups might be helpful to understand the mechanisms of different prognoses and prescribe more specific and proper treatments for PCa patients. Consistent with our previous study, the qualitative REOs‐based prognostic signature is highly robust against experimental batch effects and differences in probe designs of different platforms. Besides, the signature can be readily applied at an individualized level without data normalization, which is more reliable and practical than quantitative signatures for risk prediction. At present, most of the clinical tissue samples are fixed in FFPE blocks, , , and stored in hospitals and tissue banks, which is a huge and precious resource for clinical research. Nevertheless, FFPE samples are generally considered unreliable for gene expression analysis because of RNA degradation during preparation and storage. As shown in our previous study, the expression measurements of thousands of genes had at least two‐fold change in FFPE samples compared with paired FF samples. Therefore, quantitative signatures based on gene expression measurements of FFPE (or FF) samples could not be applied to FF (or FFPE) samples directly. In contrast, as demonstrated in our previous study and confirmed in this study, most of the REOs of gene pairs in FFPE samples were insensitive to partial RNA degradation, which makes it possible to be easily applied to both FF and FFPE samples. One limitation of our study was that all samples for the development and validation of the signature were from the public databases. We noticed that compared with preoperative PSA and surgical margins, the signature lost significance in the datasets PC36 and PC106, which might be attributed to the inherent limitations in the public domain data available, such as the small sample size of data set PC36 or the poor quality of gene expression measurements for FFPE samples in data set PC106. Although the multivariate Cox regression analysis of samples integrated from datasets PC111, PC92, PC36, and PC106 with the common available clinical‐pathological factors showed that the signature remained significantly associated with patients' BFS (shown in Table S1 and Figure S1), it is necessary to collect an additional data set of independent samples in our future work to validate our signature. For the sake of more reliable prediction under some circumstances, the 74‐GPS could be combined with preoperative PSA and surgical margins to predict the biochemical recurrence for PCa patients. The multiomics analysis of TCGA samples played an essential role in uncovering the underlying molecular mechanisms of determining different prognoses of PCa patients after radical prostatectomy. For example, gene PDGFRB in the PI3K‐Akt signaling pathway, which was upregulated with concordant hypomethylation in the high‐risk group, can regulate cell growth, division, and migration , and is correlated with bone metastases and biochemical recurrence of PCa patients. , In addition, gene THBS1 in the TGF‐beta signaling pathway, which was upregulated with consistent hypomethylation alteration in the high‐risk group, has been reported to be positively associated with the invasion of PCa and the recurrence of PCa patients after radical prostatectomy. These results provided evidence that the tumor cells of high‐risk patients own faster growth and stronger migration abilities, which result in poorer prognoses. According to the National Comprehensive Cancer Network (NCCN) guidelines for prostate cancer patients after radical prostatectomy, PSA measurements should be performed every 6 to 12 months and a digital rectal examination (DRE) is recommended annually for the first 5 years. For PCa patients assigned as high risk by the signature, we suggest that more close follow‐ups, such as PSA testing every 3 months and DRE every 6 months for the first 5 years, maybe better to detect disease progression timely. This study may be helpful to guide management and improve prognoses for PCa patients after radical prostatectomy.

CONCLUSION

In conclusion, the qualitative REO‐based 74‐GPS is a robust individual‐level prognostic signature for predicting the BFS of postsurgical PCa patients from different hospitals equipped with different platforms. The PCa patients who identified with a high risk of biochemical recurrence by the signature should have timely treatments or close follow‐ups.

AUTHOR CONTRIBUTIONS

LA and ZG designed and supervised the research study; XL and LA performed the research; XL, JHZ, FLJ, YTG, and YDS performed the data analysis; XL, HYH, and LA wrote the R codes; XL and LA drafted the manuscript; LA and ZG revised the manuscript; XL and YTG interpreted the function annotations; XL and HYH drew the figures. All authors read and approved the final manuscript.

CONFLICTS OF INTEREST

The authors declare that they have no conflict of interests. Supporting information Click here for additional data file. Supporting information Click here for additional data file.

61 in total

1. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors: Ron Edgar; Michael Domrachev; Alex E Lash
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

2. Bone-metastatic potential of human prostate cancer cells correlates with Akt/PKB activation by alpha platelet-derived growth factor receptor.

Authors: Nathan G Dolloff; Shannon S Shulby; Autumn V Nelson; Mark E Stearns; Gregg J Johannes; Jeff D Thomas; Olimpia Meucci; Alessandro Fatatis
Journal: Oncogene Date: 2005-10-13 Impact factor: 9.867

3. Postoperative nomogram for disease recurrence after radical prostatectomy for prostate cancer.

Authors: M W Kattan; T M Wheeler; P T Scardino
Journal: J Clin Oncol Date: 1999-05 Impact factor: 44.544

4. Expression profiling of archival tumors for long-term health studies.

Authors: Levi Waldron; Shuji Ogino; Yujin Hoshida; Kaori Shima; Amy E McCart Reed; Peter T Simpson; Yoshifumi Baba; Katsuhiko Nosho; Nicola Segata; Ana Cristina Vargas; Margaret C Cummings; Sunil R Lakhani; Gregory J Kirkner; Edward Giovannucci; John Quackenbush; Todd R Golub; Charles S Fuchs; Giovanni Parmigiani; Curtis Huttenhower
Journal: Clin Cancer Res Date: 2012-11-07 Impact factor: 12.531

5. A rank-based transcriptional signature for predicting relapse risk of stage II colorectal cancer identified with proper data sources.

Authors: Wenyuan Zhao; Beibei Chen; Xin Guo; Ruiping Wang; Zhiqiang Chang; Yu Dong; Kai Song; Wen Wang; Lishuang Qi; Yunyan Gu; Chenguang Wang; Da Yang; Zheng Guo
Journal: Oncotarget Date: 2016-04-05

6. High expression of PDGFR-β in prostate cancer stroma is independently associated with clinical and biochemical prostate cancer recurrence.

Authors: Yngve Nordby; Elin Richardsen; Mehrdad Rakaee; Nora Ness; Tom Donnem; Hiten R H Patel; Lill-Tove Busund; Roy M Bremnes; Sigve Andersen
Journal: Sci Rep Date: 2017-02-24 Impact factor: 4.379

Review 7. Platelet-derived growth factor receptor/platelet-derived growth factor (PDGFR/PDGF) system is a prognostic and treatment response biomarker with multifarious therapeutic targets in cancers.

Authors: Kwaku Appiah-Kubi; Ying Wang; Hai Qian; Min Wu; Xiaoyuan Yao; Yan Wu; Yongchang Chen
Journal: Tumour Biol Date: 2016-05-19

8. A tissue biomarker panel predicting systemic progression after PSA recurrence post-definitive prostate cancer therapy.

Authors: Tohru Nakagawa; Thomas M Kollmeyer; Bruce W Morlan; S Keith Anderson; Eric J Bergstralh; Brian J Davis; Yan W Asmann; George G Klee; Karla V Ballman; Robert B Jenkins
Journal: PLoS One Date: 2008-05-28 Impact factor: 3.240

9. The landscape of viral expression and host gene fusion and adaptation in human cancer.

Authors: Ka-Wei Tang; Babak Alaei-Mahabadi; Tore Samuelsson; Magnus Lindh; Erik Larsson
Journal: Nat Commun Date: 2013 Impact factor: 14.919

10. An individualized prognostic signature and multi‑omics distinction for early stage hepatocellular carcinoma patients with surgical resection.

Authors: Lu Ao; Xuekun Song; Xiangyu Li; Mengsha Tong; You Guo; Jing Li; Hongdong Li; Hao Cai; Mengyao Li; Qingzhou Guan; Haidan Yan; Zheng Guo
Journal: Oncotarget Date: 2016-04-26

3 in total

1. A transcriptomic signature for prostate cancer relapse prediction identified from the differentially expressed genes between TP53 mutant and wild-type tumors.

Authors: Wensheng Zhang; Kun Zhang
Journal: Sci Rep Date: 2022-06-22 Impact factor: 4.996

2. Development of a Convolutional Neural Network-Based Colonoscopy Image Assessment Model for Differentiating Crohn's Disease and Ulcerative Colitis.

Authors: Lijia Wang; Liping Chen; Xianyuan Wang; Kaiyuan Liu; Ting Li; Yue Yu; Jian Han; Shuai Xing; Jiaxin Xu; Dean Tian; Ursula Seidler; Fang Xiao
Journal: Front Med (Lausanne) Date: 2022-04-08

3. A qualitative transcriptional signature for predicting the biochemical recurrence risk of prostate cancer patients after radical prostatectomy.

Authors: Xiang Li; Haiyan Huang; Jiahui Zhang; Fengle Jiang; Yating Guo; Yidan Shi; Zheng Guo; Lu Ao
Journal: Prostate Date: 2020-01-21 Impact factor: 4.104

3 in total