| Literature DB >> 32657401 |
Sunkyu Kim1, Keonwoo Kim1, Junseok Choe1, Inggeol Lee1, Jaewoo Kang1,2.
Abstract
MOTIVATION: Recent advances in deep learning have offered solutions to many biomedical tasks. However, there remains a challenge in applying deep learning to survival analysis using human cancer transcriptome data. As the number of genes, the input variables of survival model, is larger than the amount of available cancer patient samples, deep-learning models are prone to overfitting. To address the issue, we introduce a new deep-learning architecture called VAECox. VAECox uses transfer learning and fine tuning.Entities:
Mesh:
Year: 2020 PMID: 32657401 PMCID: PMC7355236 DOI: 10.1093/bioinformatics/btaa462
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Statistics of transcriptomics data for 10 cancer types on which VAECox was trained
| Cancer type | Before pre-processing | After pre-processing | ||
|---|---|---|---|---|
| # All | # Uncensored | # All | # Uncensored | |
| BLCA | 398 | 108 | 286 | 72 |
| BRCA | 1039 | 104 | 989 | 100 |
| HNSC | 522 | 170 | 477 | 160 |
| KIRC | 528 | 162 | 512 | 159 |
| LGG | 507 | 91 | 433 | 69 |
| LIHC | 343 | 91 | 267 | 72 |
| LUAD | 480 | 122 | 440 | 113 |
| LUSC | 477 | 158 | 404 | 134 |
| OV | 578 | 301 | 260 | 147 |
| STAD | 396 | 84 | 374 | 77 |
| CESC | 288 | 60 | 251 | 53 |
| COAD | 347 | 52 | 324 | 46 |
| GBM | 592 | 446 | 158 | 106 |
| KIRP | 264 | 31 | 209 | 23 |
| LAML | 173 | 108 | 149 | 92 |
| PAAD | 181 | 66 | 138 | 45 |
| PRAD | 500 | 8 | 375 | 6 |
| SKCM | 440 | 155 | 401 | 149 |
| THCA | 501 | 14 | 495 | 14 |
| UCEC | 540 | 45 | 505 | 43 |
Fig. 1.(A) The architecture of the VAE model used in this study. A hidden layer is added to both the encoder and decoder of the original VAE. We trained this VAE model on all the TCGA RNA-seq data of patients with 20 cancer types. (B) The architecture of the VAECox model which predicts a patient’s hazard ratio. The parameters of the first two layers are transferred from the encoder part of the pre-trained VAE model
Fig. 2.The box plot for the performance of the following survival prediction models on 10 cancer types: Cox-ridge, Cox-LASSO, Cox-nnet and our model VAECox. We randomly split the data into training (80%) and test sets (20%). We repeated this process 10 times and obtained 10 C-index scores. The white triangle of each box denotes the average of 10 C-index scores. The optimal hyperparameters were selected by fivefold cross validation on the training set
Fig. 3.Kaplan–Meier plots and results of the 10 cancer-types’ test patient samples from the log-rank test with VAECox and Cox-nnet. The patient samples are divided into high- and low-risk groups based on the predicted hazard ratios. A patient sample is included in high-risk group when the hazard ratio of the sample is higher than the median hazard ratios of all patient samples
Fig. 4.Pearson’s correlation values of the top five genes which have the highest absolute correlation values for each hidden node of the third layer in the BRCA dataset
Fig. 5.Enriched KEGG pathways for each hidden node of the third layer in our VAECox model trained using the BRCA dataset. The pathway enrichment test is conducted using the correlation values between a vector of hidden node and a vector of gene expression value across all BRCA samples