| Literature DB >> 32299344 |
Amina Lemsara1, Salima Ouadfel1, Holger Fröhlich2,3.
Abstract
BACKGROUND: Recent years have witnessed an increasing interest in multi-omics data, because these data allow for better understanding complex diseases such as cancer on a molecular system level. In addition, multi-omics data increase the chance to robustly identify molecular patient sub-groups and hence open the door towards a better personalized treatment of diseases. Several methods have been proposed for unsupervised clustering of multi-omics data. However, a number of challenges remain, such as the magnitude of features and the large difference in dimensionality across different omics data sources.Entities:
Keywords: Deep learning; Multi-omics; Patient clustering
Mesh:
Year: 2020 PMID: 32299344 PMCID: PMC7161108 DOI: 10.1186/s12859-020-3465-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Conceptual overview about our approach: Multi-omics feature mapping to a specific pathway are summarized into a pathway level score via a sparse denoising multi-modal autoencoder architecture. Hidden layer 1 consists of up to [p/2] hidden units per omics modality, where p_j is the number of features in omics type j. Hidden units for omics modality j are densely connected to input features of the same omics type, but there are no connections from input features of other data modalities. Hidden layer 2 consists of one hidden unit, which represents the overall multi-omics pathway score. Concatenation of P multi-omics pathway scores for each patient allows for application of consensus sparse NMF clustering in a subsequent step
Datasets from The Cancer Genome Atlas (TCGA) used for evaluation: colorectal cancer (CRC), glioblastoma multiforme (GBM), lung squamous cell carcinoma (LSCC) and breast cancer (BRCA). Omics features correspond to those mapable to NCI pathways
| Dataset | Patients | Omics types | Features |
|---|---|---|---|
| mRNA | 2295 | ||
| CRC | 294 | miRNA | 264 |
| CNV | 2310 | ||
| mRNA | 2039 | ||
| GBM | 273 | miRNA | 18 |
| DNA methylation | 1798 | ||
| mRNA | 2039 | ||
| LSCC | 106 | miRNA | 150 |
| DNA methylation | 1846 | ||
| mRNA | 2329 | ||
| BRCA | 747 | miRNA | 99 |
| CNV | 2334 |
Comparison of PathME vs SNF and iCluster in terms of silhouette index. For PathME we report the silhouette index of the consensus clustering as well as the best individual one among 500 randomly initialized sNMF runs
| Cancer datasets | SNF | iCluster | ||||
|---|---|---|---|---|---|---|
| Disease | Omics type | Cophenetic correlation | Consensus silhouette | Optimal silhouette | Silhouette | Silhouette |
| CRC | Number of clusters | (5) | (2) | (2) | ||
| Multi-omics | 1 | 0.98 | 0.51 | 0.54 | 0.06 | |
| GBM | Number of clusters | (4) | (2) | (3) | ||
| Multi-omics | 1 | 1 | 0.67 | 0.58 | 0.11 | |
| LSCC | Number of clusters | (4) | (4) | (2) | ||
| Multi-omics | 1 | 1 | 0.82 | 0.71 | 0.35 | |
| BRCA | Number of clusters | (5) | (2) | (3) | ||
| Multi-omics | 1 | 0.93 | 0.67 | 0.57 | 0.12 | |
Comparison of PathME vs conventional sNMF consensus clustering of individual omics features in terms of cophenetic correlation and silhouette index
| Cancer datasets | Cophenetic correlation | Consensus silhouette | |||
|---|---|---|---|---|---|
| Disease | Omics type | sNMF (ind. features) | sNMF (ind. features) | ||
CRC (5 clusters) | mRNA | 1 | 1 | 0.99 | 0.87 |
| miRNA | 1 | 1 | 0.97 | 0 | |
| CNV | 1 | 0.99 | 0.99 | 0.93 | |
GBM (4 clusters) | mRNA | 1 | 0.99 | 1 | 1 |
| miRNA | 1 | 1 | 0.93 | 0.86 | |
| Methylation | 1 | 1 | 1 | 1 | |
LSCC (4 clusters) | mRNA | 0.92 | 1 | 0.67 | 1.00 |
| miRNA | 1 | 1 | 0.98 | 0.98 | |
| Methylation | 1 | 0.99 | 1.00 | 0.97 | |
BRCA (5 clusters) | mRNA | 0.98 | 0.99 | 0.88 | 0.94 |
| miRNA | 1 | 0.91 | 0.99 | 0.49 | |
| CNV | 1 | 1 | 1 | 1 | |
Fig. 2T-SNE visualization of CRC consensus clustering based on PathME. Data points (= patients) have been colored according to the consensus sNMF clustering of multi-omics pathway scores. T-SNE visualization of individual omics modalities is based on all features map-able to the pathways used by PathME. T-SNE plots for other datasets can be found in the Supplementary material
Fig. 3Example of SHAP analysis results obtained for CRC. For gene expression data gene symbols are shown together with Entrez gene IDs. Further results can be found in the Supplementary material, including Table S8
Fig. 4Progression free survival (days) of patients stratified by PathME (left), iCluster (middle) and SNF (right) for GBM. P-values were corrected for the confounding effect of age and adjusted for multiple testing