| Literature DB >> 30405689 |
Li Zhang1, Chenkai Lv1, Yaqiong Jin2,3, Ganqi Cheng1, Yibao Fu1, Dongsheng Yuan1, Yiran Tao1, Yongli Guo2,3, Xin Ni2,3,4, Tieliu Shi1,4.
Abstract
High-risk neuroblastoma is a very aggressive disease, with excessive tumor growth and poor outcomes. A proper stratification of the high-risk patients by prognostic outcome is important for treatment. However, there is still a lack of survival stratification for the high-risk neuroblastoma. To fill the gap, we adopt a deep learning algorithm, Autoencoder, to integrate multi-omics data, and combine it with K-means clustering to identify two subtypes with significant survival differences. By comparing the Autoencoder with PCA, iCluster, and DGscore about the classification based on multi-omics data integration, Autoencoder-based classification outperforms the alternative approaches. Furthermore, we also validated the classification in two independent datasets by training machine-learning classification models, and confirmed its robustness. Functional analysis revealed that MYCN amplification was more frequently occurred in the ultra-high-risk subtype, in accordance with the overexpression of MYC/MYCN targets in this subtype. In summary, prognostic subtypes identified by deep learning-based multi-omics integration could not only improve our understanding of molecular mechanism, but also help the clinicians make decisions.Entities:
Keywords: MYCN amplification; deep learning; high-risk neuroblastoma; machine learning; multi-omics data integration
Year: 2018 PMID: 30405689 PMCID: PMC6201709 DOI: 10.3389/fgene.2018.00477
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Overview workflow for the identification of prognostic subtypes by Autoencoder-based multi-omics data integration in high-risk neuroblastoma.
Figure 2The Kaplan–Meier curves for EFS or OS of two identified subtypes by three multi-omics integration algorithms, Autoencoder (A,B), PCA (C,D), and iCluster (E,F).
Figure 3Receiver operating characteristic (ROC) curve for four classifiers, including logistic regression, Naïve Bayes, SVM, and XGBoost, that predict the subtypes of samples from two independent datasets, (A) gene expression data from SEQC external validation cohort, and (B) CNA data from TARGET internal validation cohort.
Performance of four classifiers using the training dataset.
| ANOVA + SVM | GE | 56 | 0.9962 | 0.7553 | 0.8446 |
| CNA | 30 | 0.6586 | 0.5937 | 0.5159 | |
| ANOVA + naïve bayes | GE | 46 | 0.9299 | 0.6755 | 0.8291 |
| CNA | 24 | 0.6019 | 0.5234 | 0.5506 | |
| ANOVA + logistic regression | GE | 44 | 0.9703 | 0.7059 | 0.6053 |
| CNA | 15 | 0.6782 | 0.6135 | 0.5699 | |
| Xgboost | GE | 64 | 0.9602 | 0.7338 | 0.8025 |
| CNA | 30 | 0.954 | 0.6559 | 0.6317 |
GE, gene expression; CAN, copy number alteration; ANOVA, analysis of variance; SVM, support vector machine; AUC, area under the curve; Average accuracy, average of the accuracies from 10-fold cross-validation. Average AUC, average of the AUC values from 10-fold cross-validation.
Figure 4The Kaplan–Meier curves for EFS or OS of two predicted subtypes for the high-risk tumors from SEQC external validation cohort (A,B) and TARGET internal validation (C,D) cohort.
Hallmark gene sets identified by OEA (FDR < 0.05).
| Up | HALLMARK_MYC_TARGETS_V2 | MYC targets, variant 2 | 9.81E-07 | 4.9E-05 |
| Down | HALLMARK_INTERFERON_ALPHA_RESPONSE | Interferon-alpha response | 5.14E-03 | 5.76E-01 |