| Literature DB >> 36104347 |
Conghao Wang1, Wu Lue1, Rama Kaalia1, Parvin Kumar1, Jagath C Rajapakse2.
Abstract
Multi-omics data are increasingly being gathered for investigations of complex diseases such as cancer. However, high dimensionality, small sample size, and heterogeneity of different omics types pose huge challenges to integrated analysis. In this paper, we evaluate two network-based approaches for integration of multi-omics data in an application of clinical outcome prediction of neuroblastoma. We derive Patient Similarity Networks (PSN) as the first step for individual omics data by computing distances among patients from omics features. The fusion of different omics can be investigated in two ways: the network-level fusion is achieved using Similarity Network Fusion algorithm for fusing the PSNs derived for individual omics types; and the feature-level fusion is achieved by fusing the network features obtained from individual PSNs. We demonstrate our methods on two high-risk neuroblastoma datasets from SEQC project and TARGET project. We propose Deep Neural Network and Machine Learning methods with Recursive Feature Elimination as the predictor of survival status of neuroblastoma patients. Our results indicate that network-level fusion outperformed feature-level fusion for integration of different omics data whereas feature-level fusion is more suitable incorporating different feature types derived from same omics type. We conclude that the network-based methods are capable of handling heterogeneity and high dimensionality well in the integration of multi-omics.Entities:
Mesh:
Year: 2022 PMID: 36104347 PMCID: PMC9475034 DOI: 10.1038/s41598-022-19019-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Illustration of our methods for fusion of multi-omics data. Feature-level fusion combines features of patient similarity networks (PSN) and network-level fusion combines PSN derived from individual omics data.
Results of DNN on SEQC dataset.
| Dataset | Feature type | ACC (%) | F1 score | ROC AUC | Feature dim |
|---|---|---|---|---|---|
| RNA-Seq (GSE49710) | Centrality | 74.3 ± 2.4 | 0.52 ± 0.04 | 0.79 ± 0.04 | 13 |
| Modularity | 77.7 ± 8.2 | 0.56 ± 0.05 | 0.79 ± 0.05 | 16 | |
| Both | 77.3 ± 5.0 | 0.55 ± 0.05 | 0.82 ± 0.03 | 29 | |
| Abridged | 79.8 ± 1.1 | 0.59 ± 0.03 | 0.77 ± 0.02 | 4 | |
| Microarray (GSE62564) | Centrality | 69.7 ± 12.4 | 0.19 ± 0.11 | 0.54 ± 0.04 | 13 |
| Modularity | 58.9 ± 5.0 | 0.37 ± 0.08 | 0.60 ± 0.03 | 204 | |
| Both | 60.1 ± 5.3 | 0.38 ± 0.06 | 0.61 ± 0.03 | 217 | |
| Abridged | 62.8 ± 3.9 | 0.43 ± 0.09 | 0.66 ± 0.07 | 133 | |
| Network-level fusion | Centrality | 68.6 ± 4.8 | 0.23 ± 0.08 | 0.54 ± 0.06 | 13 |
| Modularity | 61.1 ± 4.0 | 0.37 ± 0.08 | 0.61 ± 0.02 | 109 | |
| Both | 63.9 ± 4.3 | 0.36 ± 0.08 | 0.60 ± 0.03 | 122 | |
| Abridged | 71.3 ± 2.1 | 0.33 ± 0.06 | 0.64 ± 0.04 | 98 | |
| Feature-level fusion | Centrality | 73.3 ± 2.9 | 0.49 ± 0.10 | 0.74 ± 0.03 | 13 |
| Modularity | 75.1 ± 3.1 | 0.55 ± 0.06 | 0.78 ± 0.03 | 220 | |
Significant values are in bold.
Results of DNN on TARGET dataset.
| Dataset | Feature type | ACC (%) | F1 score | ROC AUC | Feature dim |
|---|---|---|---|---|---|
| RNA-Seq | Centrality | 59.4 ± 6.3 | 0.69 ± 0.05 | 0.62 ± 0.07 | 13 |
| Modularity | 61.6 ± 6.8 | 0.67 ± 0.09 | 0.62 ± 0.07 | 64 | |
| Both | 60.7 ± 3.5 | 0.64 ± 0.09 | 0.64 ± 0.04 | 77 | |
| Abridged | 61.5 ± 4.6 | 0.66 ± 0.1 | 0.66 ± 0.05 | 38 | |
| Methylation | Centrality | 57.1 ± 6.5 | 0.36 ± 0.25 | 0.54 ± 0.09 | 13 |
| Modularity | 54.1 ± 4.2 | 0.49 ± 0.19 | 0.51 ± 0.03 | 22 | |
| Both | 54.6 ± 1.6 | 0.31 ± 0.11 | 0.52 ± 0.06 | 35 | |
| Abridged | 54.1 ± 0.1 | 0.51 ± 0.13 | 0.53 ± 0.06 | 34 | |
| Network-level fusion | Centrality | 57.2 ± 5.2 | 0.57 ± 0.07 | 0.62 ± 0.08 | 13 |
| Modularity | 64.8 ± 8.7 | 0.65 ± 0.17 | 0.69 ± 0.07 | 44 | |
| Feature-level fusion | Centrality | 55.7 ± 3.0 | 0.45 ± 0.29 | 0.55 ± 0.02 | 13 |
| Modularity | 60.5 ± 7.9 | 0.67 ± 0.05 | 0.61 ± 0.10 | 55 | |
| Both | 61.1 ± 8.0 | 0.58 ± 0.12 | 0.61 ± 0.07 | 68 | |
| Abridged | 56.8 ± 8.1 | 0.64 ± 0.12 | 0.60 ± 0.15 | 39 |
Significant values are in bold.
Results of linear classifiers on SEQC dataset.
| Fusion strategy | Classifiers | With all features | With RFE feature selection | ||||
|---|---|---|---|---|---|---|---|
| ACC (%) | F1 score | ROC AUC | ACC (%) | F1 score | ROC AUC | ||
| Network-level fusion | SVM | 61.0 ± 3.2 | 0.32 ± 0.01 | 0.49 ± 0.04 | 78.5 ± 0.3 | 0.00 ± 0.00 | 0.55 ± 0.01 |
| Decision tree | 65.5 ± 2.8 | 0.23 ± 0.04 | 0.54 ± 0.03 | 67.0 ± 1.5 | 0.22 ± 0.04 | 0.52 ± 0.00 | |
| Random forest | 76.7 ± 0.3 | 0.05 ± 0.04 | 0.50 ± 0.01 | 77.5 ± 0.6 | 0.03 ± 0.04 | 0.50 ± 0.01 | |
| Logistic regression | 63.1 ± 4.2 | 0.33 ± 0.02 | 0.47 ± 0.04 | 78.3 ± 0.0 | 0.00 ± 0.00 | 0.57 ± 0.02 | |
| Feature-level fusion | SVM | 74.7 ± 4.4 | 0.46 ± 0.08 | 0.48 ± 0.03 | 69.9 ± 11.5 | 0.51 ± 0.02 | 0.76 ± 0.03 |
| Decision tree | 73.9 ± 2.5 | 0.39 ± 0.07 | 0.47 ± 0.02 | 74.7 ± 2.1 | 0.37 ± 0.07 | 0.62 ± 0.05 | |
| Random forest | 75.6 ± 0.8 | 0.25 ± 0.04 | 0.48 ± 0.01 | 77.9 ± 2.8 | 0.31 ± 0.08 | 0.58 ± 0.05 | |
| 73.5 ± 1.3 | 0.51 ± 0.01 | 0.53 ± 0.02 | |||||
Significant values are in bold.
Results of linear classifiers on TARGET dataset.
| Fusion strategy | Classifiers | With all features | With RFE feature selection | ||||
|---|---|---|---|---|---|---|---|
| ACC (%) | F1 score | ROC AUC | ACC (%) | F1 score | ROC AUC | ||
| Network-level fusion | SVM | 68.2 ± 3.1 | 0.69 ± 0.03 | 0.53 ± 0.01 | 60.5 ± 4.9 | 0.56 ± 0.10 | 0.60 ± 0.09 |
| Decision tree | 54.2 ± 4.0 | 0.51 ± 0.08 | 0.45 ± 0.03 | 48.4 ± 1.9 | 0.47 ± 0.05 | 0.59 ± 0.07 | |
| Random forest | 51.7 ± 4.8 | 0.42 ± 0.03 | 0.46 ± 0.05 | 54.1 ± 6.6 | 0.51 ± 0.06 | 0.44 ± 0.05 | |
| 61.2 ± 6.1 | 0.54 ± 0.18 | 0.68 ± 0.02 | |||||
| Feature-level fusion | SVM | 58.0 ± 4.1 | 0.58 ± 0.01 | 0.53 ± 0.05 | 57.3 ± 2.7 | 0.60 ± 0.03 | 0.60 ± 0.06 |
| Decision tree | 48.3 ± 3.0 | 0.50 ± 0.08 | 0.51 ± 0.05 | 57.3 ± 3.6 | 0.60 ± 0.06 | 0.48 ± 0.03 | |
| Random forest | 54.1 ± 5.8 | 0.55 ± 0.04 | 0.52 ± 0.05 | 56.7 ± 0.8 | 0.53 ± 0.05 | 0.57 ± 0.02 | |
| Logistic regression | 56.7 ± 1.7 | 0.58 ± 0.04 | 0.50 ± 0.05 | 56.7 ± 2.0 | 0.58 ± 0.02 | 0.54 ± 0.01 | |
Significant values are in bold.
Comparison of results on SEQC dataset.
| Method | ACC (%) | F1 score |
|---|---|---|
| RGCCA | 58.6 ± 5.9 | 0.43 ± 0.34 |
| MOFA | 67.7 ± 4.5 | 0.79 ± 0.03 |
| DIABLO | 71.1 ± 8.5 | – |
| Ours | 78.9 ± 4.5 | 0.57 ± 0.11 |
Comparison of results on TARGET dataset.
| Method | ACC (%) | F1 score |
|---|---|---|
| RGCCA | 43.7 ± 1.5 | 0.43 ± 0.09 |
| MOFA | 52.3 ± 6.0 | 0.51 ± 0.09 |
| DIABLO | 64.2 ± 5.4 | – |
| Ours | 70.1 ± 1.6 | 0.71 ± 0.01 |