| Literature DB >> 33147797 |
Mira Park1, Doyoen Kim2, Kwanyoung Moon2, Taesung Park3.
Abstract
The recent development of high-throughput technology has allowed us to accumulate vast amounts of multi-omics data. Because even single omics data have a large number of variables, integrated analysis of multi-omics data suffers from problems such as computational instability and variable redundancy. Most multi-omics data analyses apply single supervised analysis, repeatedly, for dimensional reduction and variable selection. However, these approaches cannot avoid the problems of redundancy and collinearity of variables. In this study, we propose a novel approach using blockwise component analysis. This would solve the limitations of current methods by applying variable clustering and sparse principal component (sPC) analysis. Our approach consists of two stages. The first stage identifies homogeneous variable blocks, and then extracts sPCs, for each omics dataset. The second stage merges sPCs from each omics dataset, and then constructs a prediction model. We also propose a graphical method showing the results of sparse PCA and model fitting, simultaneously. We applied the proposed methodology to glioblastoma multiforme data from The Cancer Genome Atlas. The comparison with other existing approaches showed that our proposed methodology is more easily interpretable than other approaches, and has comparable predictive power, with a much smaller number of variables.Entities:
Keywords: dimensional reduction; multi-omics data; sparse principal component analysis; variable clustering
Mesh:
Year: 2020 PMID: 33147797 PMCID: PMC7663540 DOI: 10.3390/ijms21218202
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Scree plots and dendrograms for GBM data. (A) Scree plot of dissimilarity according to number of clusters. Line in the scree plot represent cut-off lines. (B) Dendrogram from hierarchical clustering. Rectangles in dendrogram represent homogeneous variable blocks.
Stepwise Cox regression results for the integrated datasets.
| Omics | Block | Number of Variables | Number of Variables Remained | Coefficient (Standard Error) | Hazard Ratio | 1 |
|---|---|---|---|---|---|---|
| DNA | D4 | 128 | 13 | −0.09 (0.05) | 0.92 | 0.089 |
| D5 | 186 | 19 | −0.19 (0.05) | 0.83 | <0.001 | |
| D6 | 153 | 15 | 0.25 (0.06) | 1.29 | <0.001 | |
| D9 | 69 | 7 | 0.21 (0.07) | 1.24 | 0.004 | |
| D11 | 46 | 5 | −0.12 (0.05) | 0.89 | 0.011 | |
| mRNA | MR4 | 188 | 19 | 0.19 (0.07) | 1.21 | 0.009 |
| MR5 | 161 | 16 | 0.12 (0.06) | 1.13 | 0.033 | |
| MR9 | 65 | 6 | −0.40 (0.10) | 0.67 | <0.001 | |
| MR10 | 67 | 7 | −0.20 (0.07) | 0.82 | 0.011 | |
| miRNA | MI3 | 32 | 3 | 0.21 (0.08) | 1.24 | 0.005 |
| MI5 | 79 | 8 | −0.17 (0.05) | 0.84 | <0.001 | |
| MI7 | 56 | 6 | −0.21 (0.09) | 0.81 | 0.021 | |
| MI10 | 22 | 2 | 0.16 (0.06) | 1.17 | 0.016 |
1 Uncorrected p-value.
Figure 2MP chart of integrated dataset analysis (MR: mRNA, D:DNA methylation, MI:miRNA).
Figure 3ROC curve and AUC value of integrated dataset by iMO-BSPC.
Comparisons of predictability with other approaches.
| Methodology | Single Omics | Multi-Omics | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| (1) | (2) | (3) | (1) | (2) | (3) | |||||||
| Omics | (a) | (b) | (c) | (a) | (b) | (c) | (a) | (b) | (c) | all | all | all |
| 1 PC-before | 12 | 10 | 10 | 12 | 10 | 10 | 12 | 10 | 10 | 32 | 32 | 32 |
| 2 PC-after | 5 | 2 | 2 | 8 | 4 | 3 | 4 | 4 | 3 | 13 | 15 | 13 |
| 3 variable | 563 | 188 | 225 | 935 | 559 | 167 | 56 | 49 | 17 | 1580 | 1339 | 126 |
| AUC | 0.67 | 0.54 | 0.61 | 0.71 | 0.58 | 0.59 | 0.66 | 0.60 | 0.63 | 0.75 | 0.76 | 0.74 |
| C-index | 0.63 | 0.55 | 0.60 | 0.64 | 0.61 | 0.60 | 0.61 | 0.59 | 0.60 | 0.69 | 0.69 | 0.67 |
(1) PCA without variable clustering, (2) PCA with clustering, (3) iMO-BSPC; (a) DNA methylation, (b) mRNA expression, (c) miRNA expression; 1 Number of PCs (sPCs) before stepwise variable selection process; 2 Number of PCs (sPCs) selected by stepwise variable selection process; 3 Number of variables used for prediction.
Figure 4MP charts and ROC curves for each single omics dataset using iMO-BSPC. (A) MP chart. (B) ROC curve (DNA: DNA methylation, mRNA: mRNA expression, miRNA: miRNA expression).
Figure 5Procedure of iMO-BSPC.
Figure 6Example of a multi-level polar chart (MP chart): the radius of a sector is proportional to the coefficient of the sPC, and the distance from the origin to specific points is proportional to the variable loading.