| Literature DB >> 31856727 |
Abstract
BACKGROUND: Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the "big p, small n" problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging.Entities:
Keywords: Autoencoder; Biological interaction networks; Data integration; Deep learning; Graph regularization; Multi-omics data; Multi-view learning
Mesh:
Year: 2019 PMID: 31856727 PMCID: PMC6923820 DOI: 10.1186/s12864-019-6285-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1A simple illustration of the proposed framework with two data views, each with an encoder and a decoder. Different views are fused in the latent space and the fused view is used for supervised learning. Feature interaction networks are incorporated as regularizers into the training objective
Results for bladder urothelial carcinoma (BLCA) dataset
| Model name | Average precision | AUC |
|---|---|---|
| SVM | 0.587 | 0.688 |
| Decision tree | 0.590 | 0.575 |
| Naive Bayes | 0.456 | 0.635 |
| Random forest | 0.575 | 0.670 |
| AdaBoost | 0.587 | 0.662 |
| Variational AE | 0.528 | 0.563 |
| Adversarial AE | 0.617 | 0.693 |
| Multi-view AE | 0.595 | 0.699 |
| MAE + feat_int | 0.650 | 0.719 |
| MAE + view_sim | 0.652 | 0.723 |
Results for brain lower grade glioma (LGG) dataset
| Model name | Average Precision | AUC |
|---|---|---|
| SVM | 0.591 | 0.713 |
| Decision tree | 0.518 | 0.658 |
| Naive Bayes | 0.568 | 0.742 |
| Random forest | 0.661 | 0.670 |
| AdaBoost | 0.594 | 0.673 |
| Variational AE | 0.628 | 0.642 |
| Adversarial AE | 0.659 | 0.702 |
| Multi-view AE | 0.551 | 0.726 |
| MAE + feat_int | 0.576 | 0.727 |
| MAE + view_sim | 0.737 | 0.819 |
Results using single -omics versus multi-omics on BLCA dataset
| Model name | Average precision | AUC |
|---|---|---|
| Gene (single -omics) | 0.532 | 0.688 |
| miRNA (single -omics) | 0.368 | 0.507 |
| Protein (single -omics) | 0.399 | 0.567 |
| DNA Methylation (single -omics) | 0.601 | 0.634 |
Results using single -omics versus multi-omics on LGG dataset
| Model name | Average precision | AUC |
|---|---|---|
| Gene (single -omics) | 0.634 | 0.728 |
| miRNA (single -omics) | 0.501 | 0.686 |
| Protein (single -omics) | 0.575 | 0.735 |
| DNA Methylation (single -omics) | 0.610 | 0.698 |
AUC scores for predicting PFI and OS on the TCGA pan-cancer dataset
| Model name | AUC (OS) | AUC (PFI) |
|---|---|---|
| SVM | 0.699 | 0.625 |
| Decision Tree | 0.670 | 0.634 |
| Naive Bayes | 0.655 | 0.644 |
| kNN | 0.706 | 0.659 |
| Random Forest | 0.720 | 0.661 |
| AdaBoost | 0.716 | 0.689 |
| MAE + feat_int | 0.765 | 0.721 |
| 0.763 | ||
| 0.766 |
AUC scores for PFI with different model architectures
| Number of hidden units | AUC (PFI) |
|---|---|
| 100 | 0.723 |
| 200 | 0.725 |
| 200-100 | 0.726 |
| 100-200 | 0.727 |
| 100-100 | 0.724 |
| 100-100-100 | 0.725 |
| 50-100-200 | 0.726 |
| 100-50-200 | 0.726 |
| 200-100-50 | 0.722 |
Fig. 2A typical feature interaction network regularizer training loss curve. After about 100 iterations, the training loss (corresponding to the third term in Eq. 16) approaches almost zero, which means the learned molecular feature embeddings become consistent with the provided feature interaction networks