| Literature DB >> 32076369 |
Indhupriya Subramanian1, Srikant Verma1, Shiva Kumar1, Abhay Jere2, Krishanpal Anamika1.
Abstract
To study complex biological processes holistically, it is imperative to take an integrative approach that combines multi-omics data to highlight the interrelationships of the involved biomolecules and their functions. With the advent of high-throughput techniques and availability of multi-omics data generated from a large set of samples, several promising tools and methods have been developed for data integration and interpretation. In this review, we collected the tools and methods that adopt integrative approach to analyze multiple omics data and summarized their ability to address applications such as disease subtyping, biomarker prediction, and deriving insights into the data. We provide the methodology, use-cases, and limitations of these tools; brief account of multi-omics data repositories and visualization portals; and challenges associated with multi-omics data integration.Entities:
Keywords: biomarker prediction; data integration; data repositories; disease subtyping; multi-omics
Year: 2020 PMID: 32076369 PMCID: PMC7003173 DOI: 10.1177/1177932219899051
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
List of multi-omics data repositories.
| Data repository | Web link | Disease | Types of multi-omics data available |
|---|---|---|---|
| The Cancer Genome Atlas (TCGA) |
| Cancer | RNA-Seq, DNA-Seq, miRNA-Seq, SNV, CNV, DNA methylation, and RPPA |
| Clinical Proteomic Tumor Analysis Consortium (CPTAC) |
| Cancer | Proteomics data corresponding to TCGA cohorts |
| International Cancer Genomics Consortium (ICGC) |
| Cancer | Whole genome sequencing, genomic variations data (somatic and germline mutation) |
| Cancer Cell Line Encyclopedia (CCLE) |
| Cancer cell line | Gene expression, copy number, and sequencing data; pharmacological profiles of 24 anticancer drugs |
| Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) |
| Breast cancer | Clinical traits, gene expression, SNP, and CNV |
| TARGET |
| Pediatric cancers | Gene expression, miRNA expression, copy number, and sequencing data |
| Omics Discovery Index |
| Consolidated data sets from 11 repositories in a uniform framework | Genomics, transcriptomics, proteomics, and metabolomics |
Abbreviations: CNV, copy number variation; miRNA, microRNA; RPPA, reverse phase protein array; SNP, single-nucleotide polymorphism; SNV, single-nucleotide variant.
Figure 1.Overview of multi-omics data integration tools. The tools/methods are grouped based on their approach and are color coded as per their applications. FSMKL indicates feature selection multiple kernel learning; JIVE, joint and individual variation explained; MCIA, multiple co-inertia analysis; MDI, multiple dataset integration; MFA, multiple factor analysis; MOFA, multi-omics factor analysis; NEMO, neighborhood based multi-omics clustering; PFA, pattern fusion analysis; PMA, penalized multivariate analysis; sMBPLS, sparse multi-block partial least squares; SNF, similarity network fusion; NMF, nonnegative matrix factorization; BCC, Bayesian consensus clustering; PSDF, patient-specific data fusion.
Integrative tools and methods addressing multi-omics applications, their usage and availability along with details of the input data used by each tool in their case study.
| Use-case addressed | Tool/method | Tool/method approach | Supervised/unsupervised | Tool/method link | Tool/method language | Omics data and data type supported | Handling missing data by the tool | Disease studied in case study | Input data used in case study | No. of samples in input data | Input data source |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Disease subtyping | PARADIGM | Probabilistic graphical models using directed factor graphs | Unsupervised |
| Python | Multi-omics (numerical) | NA | Glioblastoma multiforme | CNV segmentation; gene expression | 230 patient samples and 10 adjacent normal tissues | McLendon et al., 2008[ |
| Disease subtyping | iCluster | Joint latent variable model–based clustering method | Unsupervised |
| R package | Copy number; DNA methylation; gene expression; (numerical) | NA | Breast cancer | Copy number; gene expression | 37 primary breast cancer and 4 breast cancer cell lines | Pollack et al[ |
| Disease subtyping | iCluster | Joint latent variable model–based clustering method | Unsupervised |
| R package | Copy number; DNA methylation; gene expression; (numerical) | NA | Glioblastoma multiforme | Copy number; gene expression; DNA methylation | 55 samples with all 3 data sets | McLendon et al., 2008[ |
| Disease subtyping | iClusterPlus | Generalized linear regression for the formulation of a joint model | Unsupervised |
| R package | Multi-omics (numerical and categorical) | NA | Colorectal cancer | Copy number; gene expression; DNA methylation; exome sequencing | 189 colorectal carcinoma samples | TCGA |
| Disease subtyping | LRAcluster | Probabilistic model with low-rank approximation | Unsupervised |
| R package | Multi-omics (numerical and categorical) | NA | Cancer | Mutations; CNVs; DNA methylation; gene expression for 11 different cancers (BRCA, COAD, GBM, HNSC, KIRC, LGG, LUAD, LUSC, PRAD, STAD and THCA) | 3319 samples | TCGA |
| Disease subtyping | PSDF | Data fusion by Bayesian nonparametric Dirichlet modeling | Unsupervised |
| MATLAB | Copy number; gene expression (categorical) | NA | Breast cancer | Copy number; gene expression | 106 samples | Chin et al[ |
| Disease subtyping | PSDF | Data fusion by Bayesian nonparametric Dirichlet modeling | Unsupervised |
| MATLAB | Copy number; gene expression (categorical) | NA | Prostate cancer | Copy number; gene expression | 150 samples | Taylor et al[ |
| Disease subtyping | BCC | Bayesian consensus clustering | Unsupervised |
| R | Multi-omics (numerical) | No | Breast cancer | Gene expression; DNA methylation; miRNA expression and protein | 348 breast cancer samples | TCGA |
| Disease subtyping | MDI | Bayesian models | Unsupervised |
| MATLAB | Multi-omics (numerical and categorical) | NA | NA | NA | NA | NA |
| Disease subtyping | SNF | Local K-nearest neighbors (KNN); nonlinear method based on message-passing theory | Unsupervised |
| R/MATLAB | Multi-omics (numerical and categorical) | No | Glioblastoma multiforme | mRNA expression; DNA methylation | 215 patients’ GBM data | TCGA |
| Disease subtyping | PFA | Fusion method using PCA, k-means clustering | Unsupervised |
| MATLAB | DNA methylation; miRNA expression; gene expression; protein expression; (numerical) | No | Cancer | Gene expression; copy number | 415 cell lines data | CCLE |
| Disease subtyping | PFA | Fusion method using PCA, k-means clustering | Unsupervised |
| MATLAB | DNA methylation; miRNA expression; gene expression; protein expression; (numerical) | No | Kidney renal clear cell carcinoma | Gene expression; miRNA expression; DNA methylation | 122 KIRC samples | TCGA |
| Disease subtyping | PINSPlus | Similarity-based Clustering | Unsupervised |
| R package | Multi-omics (numerical) | NA | Cancer | 36 cancer data sets | 3653 samples | TCGA and METABRIC |
| Disease subtyping | NEMO | Similarity-based Clustering | Unsupervised |
| R package | Multi-omics (numerical) | Yes | Acute myeloid leukemia | DNA methylation; miRNA expression; gene expression | Gene expression: 173 samples; miRNA expression: 188 samples; DNA methylation: 194 samples | TCGA |
| Disease subtyping; biomarker prediction | mixOmics | Supervised and unsupervised multivariate methods like PLS, SPLS, sGCCA, sPLSDA, and so on | Supervised & Unsupervised |
| R package | Multi-omics (numerical and categorical) | Yes | Breast Cancer | Gene expression; miRNA expression; Proteomics | 150 samples | TCGA |
| Disease subtyping | moCluster | Consensus PCA (CPCA) approach | Unsupervised |
| R package | Multi-omics (numerical) | No | Colorectal cancer | Gene expression; DNA methylation; proteomics | 83 colorectal cancer patients | TCGA; CPTAC |
| Disease subtyping | MCIA | Multiple co-inertia analysis | Unsupervised |
| R package | Multi-omics (numerical) | No | Ovarian cancer | Gene expression from multiple platforms (Agilent G4502A, Affymetrix HG-U133 2.0, Illumina HiSeq) | 266 ovarian cancer gene expression data | TCGA |
| Disease subtyping | JIVE | Decomposition method with low-rank approximations | Unsupervised |
| MATLAB | Multi-omics (numerical) | Yes | Breast cancer | DNA methylation; miRNA expression; gene expression | 348 samples | TCGA |
| Disease subtyping | MFA | Multiple factor analysis | Unsupervised |
| R package | Multi-omics (numerical and categorical) | Yes | Glioma | CGH-array; gene expression | 43 samples | Bredel et al[ |
| Disease subtyping | rMKL-LPP | Multiple Kernel Learning | Unsupervised | Executable available on request | NA | Multi-omics (Numerical) | No | Glioblastoma multiforme | DNA methylation; miRNA expression; gene expression | 213 samples | TCGA |
| Disease subtyping | iNMF | NMF | Unsupervised |
| Python | Multi-omics (numerical) | No | Ovarian cancer | Gene expression, DNA methylation; miRNA expression | 592 samples | TCGA |
| Disease subtyping; biomarker prediction; disease insights | iClusterPlus | Generalized linear regression for the formulation of a joint model; | Unsupervised |
| R package | Multi-omics (numerical and categorical) | NA | Cancer | Copy number; gene expression; mutation | 729 human cell lines representing 30 tumors | CCLE |
| Biomarker prediction | MOFA | Probabilistic Bayesian model | Unsupervised |
| R/Python | Multi-omics (numerical and categorical) | Yes | Chronic lymphocytic leukemia | Mutation; gene expression, DNA methylation; drug response data (63 drugs) | 200 samples of leukemia and lymphoma | Dietrich et al[ |
| Biomarker prediction | NetICS | Network diffusion method | Unsupervised |
| MATLAB | Multi-omics (numerical and categorical) | NA | Cancer | Somatic mutations; CNVs; gene expression; miRNA expression for 5 different cancers (uterine corpus endometrial carcinoma [UCEC], liver hepatocellular carcinoma | UCEC: 560 samples; LIHC: 377 samples; BLCA: 412 samples; BRCA: 1098 samples; LUSC: 504 samples | TCGA |
| Biomarker prediction | FSMKL | Kernel based machine learning | Supervised |
| MATLAB | Multi-omics (numerical and categorical) | NA | Breast cancer | Copy number; gene expression | 2000 samples | METABRIC |
| Biomarker prediction | PMA | Supervised and unsupervised methods such as sparse CCA, sparce mCCA and sparse sCCA; multivariate methods using CCA | Supervised & Unsupervised |
| R package | Multi-omics (numerical and categorical) | Yes | Diffuse large B-cell lymphoma | CGH-array; gene expression | 203 samples | Lenz et al[ |
| Disease insights | PARADIGM | Probabilistic graphical models using directed factor graphs | Unsupervised |
| Python | Multi-omics (numerical) | NA | Breast cancer | Gene expression; copy number data | 171 patients data | Public data sets from GEO, ArrayExpress and published studies |
| Disease insights | Joint Bayesian factor | Joint Bayesian factor | Unsupervised |
| MATLAB | Multi-omics (numerical) | No | Ovarian cancer | DNA methylation; copy number; gene expression | 74 samples | TCGA |
| Disease insights | CNAmet | Correlation between copy number, methylation, and gene expression using permutation test | Unsupervised |
| R package | Copy number; DNA methylation; gene expression (numerical and categorical) | NA | Glioblastoma multiforme | Copy number; gene expression; DNA methylation | 50 samples | TCGA |
| Disease insights | MCIA | Multiple co-inertia analysis | Unsupervised |
| R package | Multi-omics (numerical) | No | Cancer | Gene expression; protein expression of NCI-60 panel of leukemia, lymphomas, melanomas, and carcinomas from 9 different tissues | 59 cancer cell lines data | CELLMINER ( |
| Disease insights | JIVE | Decomposition method with low-rank approximations | Unsupervised |
| MATLAB | Multi-omics (numerical) | Yes | Glioblastoma multiforme | miRNA expression; gene expression | 234 samples | TCGA |
| Disease insights | sMBPLS | Sparse multi-block PLS | Supervised |
| MATLAB | Multi-omics (numerical) | NA | Ovarian cancer | CNV; DNA methylation; miRNA expression; gene expression | 230 samples | TCGA |
| Disease insights | T-SVD | Single vector decomposition | Supervised |
| R package | Multi-omics (numerical) | NA | Ovarian cancer | miRNA expression; gene expression | 487 samples | TCGA |
| Disease insights | Joint NMF | NMF | Semi-supervised |
| MATLAB | Multi-omics (numerical) | NA | Ovarian cancer | DNA methylation; miRNA expression; gene expression | 385 samples | TCGA |
Abbreviations: BCC, Bayesian consensus clustering; BLCA, bladder urothelial carcinoma; CCA, canonical correlation analysis; CCLE, Cancer Cell Line Encyclopedia; CGH, comparative genomic hybridization; CNV, copy number variation; CPTAC, Clinical Proteomic Tumor Analysis Consortium; FSMKL, feature selection multiple kernel learning; GEO, Gene Expression Omnibus; BRCA, Breast Invasive Carcinoma; COAD, Colon Adenocarcinoma; HNSC, head and neck squamous cell carcinoma; JIVE, joint and individual variation explained; KIRC, kidney renal clear cell carcinoma; LGG, low-grade glioma; LIHC, liver hepatocellular carcinoma LUAD, lung adenocarcinoma; mCCA, multiple canonical correlation analysis; LUSC, lung squamous cell carcinoma; PRAD, Prostate adenocarcinoma; STAD, Stomach adenocarcinoma; THCA, Thyroid cancer; MCIA, multiple co-inertia analysis; MDI, multiple dataset integration; MFA, multiple factor analysis; miRNA, microRNA; MOFA, multi-omics factor analysis; NEMO, neighborhood based multi-omics clustering; NMF, nonnegative matrix factorization; PCA, principal component analysis; PFA, pattern fusion analysis; PLS, partial least squares; PMA, penalized multivariate analysis; rMKL-LPP, Regularized multiple kernel learning- locality preserving projections; sCCA, supervised canonical correlation analysis; sMBPLS, sparse multi-block partial least squares; SNF, similarity network fusion; SPLS, sparse partial least squares; sGCCA, sparse Generalized Canonical Correlation Analysis; sPLSDA, sparse partial least squares discriminant analysis; TCGA, The Cancer Genome Atlas; T-SVD, thresholding singular value decomposition.
Numerical data type includes continuous (for instance, segmentation mean data of CGH–arrays) and discrete data (for instance, read counts in RNA–Seq), and categorical data type includes all categorical data (for instance, ternary copy number data) including binary data. Missing values are marked “Yes” if the tool handles missing data, “No” if the tool requires missing value handling in preprocessing steps, and “NA” when the information is not available.
List of multi-omics data analysis and visualization portals.
| Portal name | Omics data supported | Source repository | Analysis of private data | Availability | Reference |
|---|---|---|---|---|---|
| cBioPortal | Mutation, copy number, gene expression, miRNA expression, DNA methylation, protein abundance, and clinical data | TCGA and published studies ( | Yes |
| Cerami et al[ |
| Firebrowse | Mutation, copy number, gene expression, miRNA expression, DNA methylation, protein abundance, and clinical data | TCGA | No |
| NA |
| UCSC Xena | Copy number, somatic mutation, DNA methylation, gene and exon expression, protein expression, tissue specific expression data, PARADIGM pathway inference, and phenotype data | TCGA, CCLE, ICGC, GTEX, TARGET, and published studies | Yes |
| Goldman et al[ |
| LinkedOmics | Clinical data, Copy number, miRNA expression, mutation, DNA methylation, gene expression, protein expression and abundance, phosphoproteome and glyco-proteome data | TCGA and CPTAC | No |
| Vasaikar et al[ |
| 3Omics | Gene expression, protein and metabolite abundance | User data driven | Yes |
| Kuo et al[ |
| NetGestalt | Gene expression, mutation, and copy number data | TCGA, CPTAC, and published studies | Yes |
| Shi et al[ |
| OASIS | Mutation, copy number, and gene expression data | TCGA, CCLE, GTEx, and published studies | No |
| Fernandez-Banet et al[ |
| Paintomics 3 | Gene expression, miRNA expression, metabolite and region-specific ChIP-Seq, and Methyl-Seq data | User data driven | Yes |
| Hernández-de-Diego et al[ |
| MethHC | DNA methylation, gene expression, and miRNA expression | TCGA | No |
| Huang et al[ |
Abbreviations: CCLE, Cancer Cell Line Encyclopedia; CPTAC, Clinical Proteomic Tumor Analysis Consortium; ICGC, International Cancer Genomics Consortium; miRNA, microRNA; GTEx, Genotype-Tissue expression; TCGA, The Cancer Genome Atlas.