| Literature DB >> 29158798 |
Shengping Zhang1, Yafei Xu2, Xinjie Hui2, Fei Yang3, Yueming Hu2, Jianlin Shao4, Hui Liang1, Yejun Wang2.
Abstract
Prostate cancer is a leading male malignancy worldwide, while the prognosis prediction remains quite inaccurate. The study aimed to observe whether there was an association between the prognosis of prostate cancer and genetic mutation profile, and to build an accurate prognostic predictor based on the genetic signatures. The patients diagnosed of prostate cancer from The Cancer Genomic Atlas were used for prognostic stratification, while the somatic gene mutation profiles were compared between different prognostic groups. The genetic features were further used for training machine-learning models to predict prostate cancer prognosis. No significant gene with somatic mutation rate difference was found between prognostic groups of prostate cancer. Total 43 atypical genes were screened for building a support vector machine model to predict prostate cancer prognosis, with an average accuracy of 66% and 64% for 5-fold cross-validation or training-testing evaluation respectively. When combined with the National Institute for Health and Care Excellence (NICE) features, the model could be further improved, with the 5-fold cross-validation accuracy of ~71%, much better than NICE itself (62%). To our knowledge, for the first time, the research studied the relationship of genome-wide somatic mutations with prostate prognosis, and developed an effective prognostic prediction model with the atypical genetic signatures.Entities:
Keywords: atypical features; prognosis prediction; prostate cancer; somatic mutation; support vector machine
Year: 2017 PMID: 29158798 PMCID: PMC5665042 DOI: 10.7150/jca.21261
Source DB: PubMed Journal: J Cancer ISSN: 1837-9664 Impact factor: 4.207
Figure 1Prognostic stratification of TCGA PCa cases. (a) Stratification of PCa cases based on biochemical recurrence status and tumor status at last follow-up and the relationship. (b) Stratification of PCa cases based on biochemical recurrence status and NICE criteria and the relationship. (c) Stratification of PCa cases tumor status at last follow-up and NICE criteria and the relationship. The accumulative bar diagrams were shown with the sum percentage of 100%. The number of cases for each subgroup was indicated. Chi-square tests were performed, with the p values indicated at the right upper corner.
Sample size summary and comparison of somatic mutation profiles between PCa prognostic groups
| Recurrence Status | Tumor Status | ||
|---|---|---|---|
| Recurrence # | 58 | With Tumor # | 80 |
| Non_recurrence # | 366 | Tumor Free # | 308 |
| Sign. Genes # | 0 | Sign. Genes # | 0 |
Note: Rate comparisons were performed with both Chi-square tests with FDR correction and EBT.
The list of 43 genes used for PCa prognosis classification
| Signature genes | |||
|---|---|---|---|
| AHNAK2 | FAM47C | MUC2 | SACS |
| ANKRD30A | FAT2 | MUC4 | SALL1 |
| ANKRD36C | FAT4 | MYH11 | SCN5A |
| APOB | FBN3 | MYT1L | SPOP |
| ATP13A5 | FLG2 | NOD1 | SRCAP |
| BAI3 | FRG1B | PCDHA12 | TP53 |
| CACNA1A | HSPG2 | PIK3CA | TRPM6 |
| CACNA1E | KMT2D | PTEN | USH2A |
| CDH23 | KRTAP4-9 | PTH2 | ZNF208 |
| CNTNAP5 | LPHN3 | PTPRC | ZNF91 |
| EPB41L3 | MUC16 | RYR1 | |
Figure 2Prediction of PCa prognosis with models based on genetic features. (a) ROC curves of 5-, 10, 20, 30 and 43-gene genetic models (f5, f10, f20, f30 and f43, respectively). The average results of 5-fold cross validations were shown. (b) AUC and general accuracy of prognosis prediction models with varied feature size. (c) Comparison of AUC and general accuracy of the f43 model and those based on topN and mRMR feature selection strategies. (d) Performance of models based on 5-fold cross validation (CV) and 5-fold training-testing (TT).
Figure 3Prediction of PCa prognosis with models based on the combined NICE and genetic features. (a) The classification performance of NICE on PCa prognosis stratified by tumor status or recurrence. Bootstrapping analysis was performed and the results were represented as mean ± sd. (b) ROC curves of 43-gene genetic models (f43) and models based on the combined NICE and genetic features (f43+NICE). The average results of 5-fold cross validations were shown. (c) Comparison of the general accuracy of different prognosis prediction models. Students' t-tests were performed, and asterisk represented p < 0.05.