| Literature DB >> 32429941 |
Yiru Zhao1, Yifan Zhou2, Yuan Liu2, Yinyi Hao2, Menglong Li2, Xuemei Pu2, Chuan Li3, Zhining Wen4,5.
Abstract
BACKGROUND: The aim of gene expression-based clinical modelling in tumorigenesis is not only to accurately predict the clinical endpoints, but also to reveal the genome characteristics for downstream analysis for the purpose of understanding the mechanisms of cancers. Most of the conventional machine learning methods involved a gene filtering step, in which tens of thousands of genes were firstly filtered based on the gene expression levels by a statistical method with an arbitrary cutoff. Although gene filtering procedure helps to reduce the feature dimension and avoid overfitting, there is a risk that some pathogenic genes important to the disease will be ignored.Entities:
Keywords: Cancer prognosis prediction; Convolutional neural network; Cox regression; RNA-sequencing; Stationary wavelet transform
Year: 2020 PMID: 32429941 PMCID: PMC7236453 DOI: 10.1186/s12859-020-03544-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The workflow of our study
Fig. 2The AUCs achieved by using different wavelet functions for the prediction of tumor stages and 3-year overall survival. a The AUCs of predicting the tumor stages across different cancer types in validation set. b The AUCs of predicting the 3-year overall survivals across different cancer types in validation set
The detailed information of the data sets for tumor stage prediction
| Cancer Type | #of all samples | #of samples | Proportion of 1/0 samples | Wavelet | AUC | |
|---|---|---|---|---|---|---|
| positive | negative | |||||
| BLCA | 403 | 271 | 132 | 1:0.49 | 0.72 | |
| BRCA | 1055 | 267 | 788 | 1:2.95 | 0.64 | |
| COAD | 442 | 190 | 252 | 1:1.33 | 0.65 | |
| HNSC | 430 | 336 | 94 | 1:0.28 | 0.69 | |
| KIRC | 524 | 204 | 320 | 1:1.57 | 0.75 | |
| KIRP | 259 | 66 | 193 | 1:2.92 | 0.83 | |
| LIHC | 347 | 88 | 259 | 1:2.94 | 0.63 | |
| LUAD | 505 | 110 | 395 | 1:3.59 | 0.59 | |
| LUSC | 492 | 91 | 401 | 1:4.41 | 0.57 | |
| SKCM | 424 | 195 | 229 | 1:1.17 | 0.62 | |
| STAD | 350 | 186 | 164 | 1:0.88 | 0.60 | |
| THCA | 503 | 167 | 336 | 1:2.01 | 0.64 | |
The detailed information of the data sets for 3-year overall survival prediction
| Cancer Type | #of all samples | #of samples | Proportion of 1/0 samples | Wavelet | AUC | |
|---|---|---|---|---|---|---|
| positive | negative | |||||
| BLCA | 248 | 161 | 87 | 1:0.54 | 0.61 | |
| HNSC | 311 | 178 | 133 | 1:0.75 | 0.61 | |
| KIRC | 400 | 109 | 291 | 1:2.67 | 0.71 | |
| LGG | 240 | 79 | 161 | 1:2.04 | 0.89 | |
| LIHC | 196 | 104 | 92 | 1:0.88 | 0.65 | |
| LUAD | 267 | 134 | 133 | 1:0.99 | 0.67 | |
| LUSC | 302 | 154 | 148 | 1:0.96 | 0.61 | |
| OV | 274 | 112 | 162 | 1:1.45 | 0.59 | |
| SKCM | 340 | 112 | 228 | 1:2.04 | 0.66 | |
| UCEC | 279 | 70 | 209 | 1:2.99 | 0.64 | |
Fig. 3The mean AUCs as well as the distribution of AUCs achieved by SWT-CNN and SVM with 100 sampling times for the prediction of tumor stages and 3-year overall survival. a Mean AUCs achieved by SWT-CNN and SVM for predicting the tumor stages. b The distribution of AUCs achieved by SWT-CNN and SVM for predicting the tumor stages. c Mean AUCs achieved by SWT-CNN and SVM for predicting the 3-year overall survivals. d The distribution of AUCs achieved by SWT-CNN and SVM for predicting the 3-year overall survivals
Fig. 4The results of Kaplan-Meier survival analysis of all cancer types
The genes considered to be significantly associated with the 3-year overall survivals of OV by the univariate Cox regression
| Characteristics | |
|---|---|
| SACS | 0.0002 |
| SSC5D | 0.0002 |
| TSHRa | 0.0003 |
| CTD-2006C1.13 | 0.0004 |
| LATS1 | 0.0008 |
| HSPG2a | 0.0008 |
| AGPAT9 | 0.0019 |
| STK38L | 0.0032 |
| CACNA1C | 0.0033 |
| AC005330.2a | 0.0034 |
| RP11-254F7.2 | 0.0047 |
| MYH2 | 0.0048 |
| ALDOA | 0.0049 |
| HIGD2A | 0.0075 |
| COL1A1 | 0.0101 |
| ANAPC7a | 0.0103 |
| GIP | 0.0110 |
| BRD1 | 0.0117 |
| MCL1 | 0.0126 |
| IGDCC4 | 0.0137 |
| FABP4a | 0.0142 |
| CHCHD10 | 0.0147 |
| C12orf5 | 0.0148 |
| COL3A1 | 0.0152 |
| FAM196B | 0.0171 |
| CTD-2583A14.10 | 0.0181 |
| DLX4 | 0.0182 |
| ANKRD46 | 0.0183 |
| ABHD15 | 0.0189 |
| COX4I1a | 0.0191 |
| EPHB4 | 0.0202 |
| RP5-1024G6.5 | 0.0204 |
| RPL10 | 0.0218 |
| GP9 | 0.0221 |
| RPL15 | 0.0231 |
| SLC34A2a | 0.0243 |
| LINC00891 | 0.0246 |
| CD81 | 0.0247 |
| B4GALT4 | 0.0253 |
| BEST3 | 0.0254 |
| ARHGAP5 | 0.0262 |
| CCDC38 | 0.0263 |
| RP11-77 K12.10a | 0.0269 |
| MMP2 | 0.0284 |
| GLMN | 0.0337 |
| MAFA | 0.0343 |
| NCBP2a | 0.0348 |
| DOK6 | 0.0379 |
| P2RY6a | 0.0380 |
| RP11-282O18.3 | 0.0381 |
| FOLR1 | 0.0394 |
| ORAI2 | 0.0411 |
| FNBP1L | 0.0412 |
| NLGN2 | 0.0413 |
| LL22NC03-2H8.4 | 0.0421 |
| IER3IP1 | 0.0424 |
| TRPC4 | 0.0427 |
| RPS6 | 0.0429 |
| RP11-894P9.2 | 0.0435 |
| RPS25 | 0.0438 |
| FTH1 | 0.0448 |
| RP11-867G23.10a | 0.0453 |
| NPM2 | 0.0467 |
| AP001372.2 | 0.0468 |
| HOXD3 | 0.0469 |
| XX-C283C717.1 | 0.0477 |
| RGMB-AS1 | 0.0500 |
amarked the genes selected by multivariate Cox regression
Fig. 5The results of risk score model for predicting the 3-year overall survivals of OV. a The survival curves of high-risk and low-risk patients in OV data set stratified by risk score model. b The distribution of survival times of high-risk and low-risk patients stratified by the risk score model and SWT-CNN. (c) The ROC curves achieved by the risk score model and 100 runs of SWT-CNN
Fig. 6The distribution of survival times of high-risk and low-risk patients for the other cancer types stratified by the risk score model and SWT-CNN
Median survival time of the high-risk and the low-risk patients that divided by the risk score model and SWT-CNN
| Type | Risk stratification | Median survival time | |
|---|---|---|---|
| Risk Score | SWT-CNN | ||
| BLCA | Low Risk | 715 | 1401 |
| High Risk | 413 | 508 | |
| HNSC | Low Risk | 1218 | 1157 |
| High Risk | 556 | 862 | |
| KIRC | Low Risk | 1876 | 1614 |
| High Risk | 1019 | 787.5 | |
| LGG | Low Risk | 1341 | 1423.5 |
| High Risk | 722 | 560 | |
| LIHC | Low Risk | 1566 | 1172.5 |
| High Risk | 412 | 425 | |
| LUAD | Low Risk | 1268 | 1125.5 |
| High Risk | 719.5 | 807 | |
| LUSC | Low Risk | 1190 | 1111 |
| High Risk | 927 | 1004 | |
| OV | Low Risk | 1355 | 1238 |
| High Risk | 850.5 | 1155 | |
| SKCM | Low Risk | 1814 | 1716 |
| High Risk | 1154 | 1093 | |
| UCEC | Low Risk | 1700 | 1559.5 |
| High Risk | 1249 | 1223 | |
Fig. 7The AUCs achieved by risk score model and the mean AUCs achieved by 100 runs of SWT-CNN for predicting the 3-year overall survivals of all the data set
Fig. 8The architecture of the SWT-CNN model in our study