| Literature DB >> 35732666 |
Wensheng Zhang1, Kun Zhang2,3.
Abstract
For prostate cancer (PCa) patients, biochemical recurrence (BCR) is the first sign of disease relapse and the subsequent metastasis. TP53 mutations are relatively prevalent in advanced PCa forms. We aimed to utilize this knowledge to identify robust transcriptomic signatures for BCR prediction in patients with Gleason score ≥ 7 cancers, which cause most PCa deaths. Using the TCGA-PRAD dataset and the novel data-driven stochastic approach proposed in this study, we identified a 25-gene signature from the genes whose expression in tumors was associated with TP53 mutation statuses. The predictive strength of the signature was assessed by AUC and Fisher's exact test p-value according to the output of support vector machine-based cross validation. For the TCGA-PRAD dataset, the AUC and p-value were 0.837 and 5 × 10-13, respectively. For five external datasets, the AUCs and p-values ranged from 0.632 to 0.794 and 6 × 10-2 to 5 × 10-5, respectively. The signature also performed well in predicting relapse-free survival (RFS). The signature-based transcriptomic risk scores (TRS) explained 28.2% of variation in RFS on average. The combination of TRS and clinicopathologic prognostic factors explained 23-72% of variation in RFS, with a median of 54.5%. Our method and findings are useful for developing new prognostic tools in PCa and other cancers.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35732666 PMCID: PMC9217948 DOI: 10.1038/s41598-022-14436-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Flow chart of the identification approach (A,B,C) and performance/utility evaluation (D-1, D-2) of a TP53 mutation status-associated predictive transcriptomic signature for BCR. In A, the top differentially expressed genes (DEGs) regarding TP53 statuses were identified. In (B), 1000 small subsets of the DEGs were randomly sampled and their predictive strengths for BCR were assessed by SVM-based cross validation. In (C), the results from (B) step were integrated by a “filter” and a novel “wrapper” to obtain an optimized gene signature. In (D1) and (D2), the performance of the finally selected signature for BCR prediction and the clinical utility were evaluated in the TCGA dataset and five external datasets using statistical and machine learning methods. See the main text for a more detailed explanation.
Summary of the used datasets regarding clinicopathologic characteristics of patients.
| Data ID | Sample sizes | Sample partition on Gleason pattern (primary + second) | Sample partition on clinical T-stage | BCR % | Interquartile of ages at diag | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 + 4 | 4 + 3 | 3 + 5, 4 + 4|5, 5 + 4|5 | T1 | T2 | T3 | NA | Q2 | Q3 | |||
| TCGA-PRAD | 366 | 114 | 83 | 169 | 124 | 133 | 48 | 61 | 13.6 | 57.0 | 66.5 |
| GSE54460 | 95 | 56 | 24 | 15 | 10 | 67 | 18 | 0 | 53.7 | 56.7 | 66.3 |
| GSE84042 | 57 | 40 | 17 | 0 | 0 | 28 | 29 | 0 | 24.6 | 56.2 | 63.9 |
| GSE21032 | 89 | 53 | 21 | 15 | 43 | 42 | 4 | 0 | 28.1 | 54.3 | 61.8 |
| GSE70768 | 95 | 65 | 21 | 9 | 47 | 32 | 14 | 2 | 20.0 | 56.0 | 65.0 |
| GSE70769 | 70 | 36 | 19 | 15 | 26 | 32 | 9 | 3 | 62.9 | NA | NA |
Signature genes and their relevance to PCa and/or other cancers.
| Symbol | Name | Relevance with cancer/tumor/patient and references |
|---|---|---|
| cyclin dependent kinase inhibitor 1A | Variants; advanced PCa[ | |
| damage specific DNA binding protein 1 | Apoptosis, chemo-resistance regulation and progression; multiple cancer types[ | |
| eukaryotic translation initiation factor 5A2 | Cell growth, metastasis, chemotherapy resistance; multiple cancer types[ | |
| glutaryl-CoA dehydrogenase | ||
| glycerol kinase 3 pseudogene | ||
| KIAA0196 | Strumpellin | Amplified and overexpressed; PCa[ |
| La ribonucleoprotein 4B | Cell migration and invasion; PCa[ | |
| N-alpha-acetyltransferase 50, NatE catalytic subunit | ||
| NADH: ubiquinone oxidoreductase subunit A9 | Cell Proliferation, Metastasis; breast cancer[ | |
| nuclear factor of activated T cells 3 | Tumor growth, cell proliferation and migration; astroglioma[ | |
| NUMB endocytic adaptor protein | Invasion, metastasis, migration; melanoma [ | |
| Opa interacting protein 5 | Growth, metastasis and drug-resistance; bladder cancer[ | |
| 6-phosphofructo-2-kinase/fructose-2,6-biphosphatase 2 | Glycolysis, cell proliferation; pancreatic cancer[ | |
| PLEKHF2 | pleckstrin homology and FYVE domain containing 2 | Amplified, Survival; PCa[ |
| Ras interacting protein 1 | Cell migration; non-small-cell lung cancer cells lines[ | |
| ring finger protein 167 | Activates mTORC1 and promotes tumorigenesis; breast and liver cancer cell lines. [ | |
| SELENBP1 | selenium binding protein 1 | Tumor growth, progression, survival; lung cancer[ |
| solute carrier family 45 member 3 | SLC45A3-ERG fusion, survival; PCa[ | |
| SMC4 | structural maintenance of chromosomes 4 | TGFβ/Smad signaling, cell invasion; glioma cells[ |
| transmembrane protein 87A | Cell proliferation and metastasis; gastric cancer[ | |
| UBX domain protein 2B | ||
| serine and arginine rich splicing factor 10 | Maintenance of oncogenic features; colon cancer cells[ | |
| C3orf67 | Chromosome 3 open reading frame 67 | |
| C14orf169 (NO66) | Chromosome 14 open reading frame 169 | Osteolytic lesions, invasion and metastasis; PCa[ |
| CD27 antisense RNA 1 | Progression; acute myeloid leukemia[ |
Figure 2The performance of the TP53 mutation status-associated transcriptomic signature for BCR prediction in the discovery dataset TCGA (-PRAD) and five external datasets, i.e., the GSE54460 and others. The “-linear”, “-polynomial” and “-radial” indicate the kernel functions used in the SVM models. The output BCR label and decision value, i.e. the transcriptomic risk score (TRS) of a patient in GSE70769 was predicted by the model trained using the GSE70768 dataset. For the patients in other cohorts, the labels and scores were predicted via LOOCV. Together with the actual BCR labels, the output BCR labels and TRSs are used to calculate a 2 × 2 contingency table for estimating the p-value and to generate the ROC curve, respectively. Sn and Sp denote sensitivity and specificity, respectively.
Figure 3The association between RFS stratification and the BCR partition predicted using the TP53 mutation status-associated prognostic transcriptomic signature in the discovery dataset TCGA (-PRAD) and five external datasets, i.e., the GSE54460 and others. The “-linear”, “-polynomial” and “-radial” indicate the kernel functions in used the SVM models. The output BCR label (pre-BCR+ and pre-BCR-) of a patient in GSE70769 is predicted by the model trained using the GSE70768 dataset. For the patients in other cohorts, the labels are predicted via LOOCV. The survival profiles of pre-BCR+ and pre-BCR- samples are depicted by red and black Kaplan–Meier curves, respectively.
Results from Cox regression model analysis.╫
| Data ID | M-1, TRS | M-2, CPFs | M-3, TRS + CPFs | ||||||
|---|---|---|---|---|---|---|---|---|---|
| R2 | BIC¶ | p-value | R2 | BIC | p-value | R2 | BIC | p-value | |
| TCGA-PRAD | 0.43 | 475.18 | 1.1 × 10–10 | 0.44 | 421.18 | 5.8 × 10–7 | 0.63 | 403.45 | 2.2 × 10–11 |
| GSE54460 | 0.12 | 417.59 | 5.8 × 10–4 | 0.16 | 429.32 | 7.5 × 10–3 | 0.23 | 426.79 | 7.6 × 10–4 |
| GSE84042 | 0.36 | 99.29 | 2.2 × 10–3 | 0.17 | 109.63 | 2.6 × 10–1 | 0.51 | 104.67 | 1.3 × 10–2 |
| GSE21032 | 0.09 | 205.6 | 7.4 × 10–3 | 0.58 | 192.68 | 8.5 × 10–9 | 0.58 | 195.52 | 2.1 × 10–8 |
| GSE70768 | 0.60 | 126.27 | 9.2 × 10–8 | 0.41 | 143.26 | 1.9 × 10–3 | 0.72 | 130.37 | 9.8 × 10–8 |
| GSE70769 | 0.09 | 327.67 | 9.0 × 10–3 | 0.17 | 308.64 | 9.9 × 10–3 | 0.25 | 306.2 | 1.2 × 10–3 |
╫The three models (M-1, M-2 and M-3) are specified by the included predictor variable(s) for cancer relapse-free survival. TRS: transcriptomic risk score. CPFs: clinicopathologic prognostic factors. See the main text for a more detailed description.
¶BIC: Bayesian Information Criterion.