| Literature DB >> 34285775 |
Milan Picard1, Marie-Pier Scott-Boyer1, Antoine Bodein1, Olivier Périn2, Arnaud Droit1.
Abstract
Increased availability of high-throughput technologies has generated an ever-growing number of omics data that seek to portray many different but complementary biological layers including genomics, epigenomics, transcriptomics, proteomics, and metabolomics. New insight from these data have been obtained by machine learning algorithms that have produced diagnostic and classification biomarkers. Most biomarkers obtained to date however only include one omic measurement at a time and thus do not take full advantage of recent multi-omics experiments that now capture the entire complexity of biological systems. Multi-omics data integration strategies are needed to combine the complementary knowledge brought by each omics layer. We have summarized the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical. In this mini-review, we focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications.Entities:
Keywords: Deep learning; Integration strategy; Machine learning; Multi-omics; Multi-view; Network
Year: 2021 PMID: 34285775 PMCID: PMC8258788 DOI: 10.1016/j.csbj.2021.06.030
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Structure of an interpretable artificial neural network. The input layer is followed by an additional pathway layer, where each node corresponds to a known molecular pathway. If a molecule is known to be involved in a pathway, a connection is made between the two. Hence, important pathways implicated in the outcome are activated with bigger weights during training. Figure inspired from Deng et al. (2020)[73].
Fig. 2Example of a mixed artificial neural network. Each omics block is first reduced to a latent representation using independent Stacked Sparse Autoencoders (SAE). The new representations learned are integrated in a final shared layer. The common representation is used for downstream analysis such as prediction or clustering. Figure inspired from Xu et al. (2019) [126].
A non-exhaustive list of multi-block dimensionality reduction methods for multi-omics datasets. NMF: Non-negative Matrix Factorization, MOFA: Multi-Omics Factor Analysis, JIVE: Joint and Individual Variation Explained, MO: multi-omic.
| Method | Principle | Purpose | Recent applications |
|---|---|---|---|
| jNMF/intNMF/nNMF | Matrix factorization | Disease subtyping, module detection, biomarker discovery | jNMF found biomarkers in MO and pharmacological data connected to drug sensitivity in cancerous cell lines |
| MOFA/MOFA+ | Bayesian Factor Analysis | biomarker discovery, systemic knowledge | MOFA found new biomarkers and pathways associated with Alzeihmer’s disease based on MO data including proteomics, metabolomics, lipidomics |
| iCluster | Gaussian latent variable model Generalized linear regression Bayesian integrative clustering | Disease subtyping, biomarker discovery | iCluster was used to identify subtypes of esophageal carcinoma from genomic, epigenomic and transcriptomic data |
| iClusterPlus | iClusterPlus was used to identify subtypes of non-responsive samples with ovarian cancer from different omics datasets | ||
| iClusterBayes | iClusterBayes was used to identify predictive biomarkers and clinically relevant subtypes on MIB cancer from 5 different omics | ||
| JIVE/aJIVE | Matrix factorization | Disease subtyping, systemic knowledge, module detection | JIVE was used as a dimension reduction technique to improve survival prediction of patients with glioblastoma from mRNA, miRNA and methylation data |
| Integrated PCA 64 | Generalized PCA | Visualization, prediction | iPCA was used as a dimension reduction technique to improve prediction of outcome on lung cancer from CpG methylation data, mRNA and miRNA expression |
| SLIDE | Matrix factorization | Disease subtyping, module detection, biomarker discovery | SLIDE was used on DNA methylation data and gene, protein and miRNA expression for subtyping patients with breast cancer |