| Literature DB >> 34103512 |
Tongxin Wang1, Wei Shao2, Zhi Huang2,3, Haixu Tang1, Jie Zhang4, Zhengming Ding5, Kun Huang6,7,8.
Abstract
To fully utilize the advances in omics technologies and achieve a more comprehensive understanding of human diseases, novel computational methods are required for integrative analysis of multiple types of omics data. Here, we present a novel multi-omics integrative method named Multi-Omics Graph cOnvolutional NETworks (MOGONET) for biomedical classification. MOGONET jointly explores omics-specific learning and cross-omics correlation learning for effective multi-omics data classification. We demonstrate that MOGONET outperforms other state-of-the-art supervised multi-omics integrative analysis approaches from different biomedical classification applications using mRNA expression data, DNA methylation data, and microRNA expression data. Furthermore, MOGONET can identify important biomarkers from different omics data types related to the investigated biomedical problems.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34103512 PMCID: PMC8187432 DOI: 10.1038/s41467-021-23774-w
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Illustration of MOGONET.
MOGONET combines GCN for multi-omics-specific learning and VCDN for multi-omics integration. For clear and concise illustration, an example of one sample is chosen to demonstrate the VCDN component for multi-omics integration. Preprocessing is first performed on each omics data type to remove noise and redundant features. Each omics-specific GCN is trained to perform class prediction using omics features and the corresponding sample similarity network generated from the omics data. The cross-omics discovery tensor is calculated from the initial predictions of omics-specific GCNs and forwarded to VCDN for final prediction. MOGONET is an end-to-end model and all networks are trained jointly.
Summary of datasets.
| Dataset | Categories | Number of features mRNA, meth, miRNA | Number of features for training mRNA, meth, miRNA |
|---|---|---|---|
| ROSMAP | NC: 169, AD: 182 | 55,889, 23,788, 309 | 200, 200, 200 |
| LGG | Grade 2: 246, Grade 3: 264 | 20,531, 20,114, 548 | 2000, 2000, 548 |
| KIPAN | KICH: 66, KIRC: 318, KIRP: 274 | 20,531, 20,111, 445 | 2000, 2000, 445 |
| BRCA | Normal-like: 115, Basal-like: 131, HER2-enriched: 46, Luminal A: 436, Luminal B: 147 | 20,531, 20,106, 503 | 1000, 1000, 503 |
mRNA refers to mRNA expression data. meth refers to DNA methylation data. miRNA refers to miRNA expression data. The ROSMAP dataset is for the classification of Alzheimer’s disease (AD) patients vs. normal control (NC). The LGG dataset is for grade classification in low-grade glioma (LGG). The KIPAN dataset is for kidney cancer type classification with chromophobe renal cell carcinoma (KICH), clear renal cell carcinoma (KIRC), and papillary renal cell carcinoma (KIRP). The BRCA dataset is for breast invasive carcinoma (BRCA) PAM50 subtype classification with normal-like, basal-like, human epidermal growth factor receptor 2 (HER2)-enriched, Luminal A, and Luminal B subtypes.
Classification results on ROSMAP dataset.
| Method | ACC | F1 | AUC |
|---|---|---|---|
| KNN | 0.657 ± 0.036 | 0.671 ± 0.044 | 0.709 ± 0.045 |
| SVM | 0.770 ± 0.024 | 0.778 ± 0.016 | 0.770 ± 0.026 |
| Lasso | 0.694 ± 0.037 | 0.730 ± 0.033 | 0.770 ± 0.035 |
| RF | 0.726 ± 0.029 | 0.734 ± 0.021 | 0.811 ± 0.019 |
| XGBoost | 0.760 ± 0.046 | 0.772 ± 0.045 | 0.837 ± 0.030 |
| NN | 0.755 ± 0.021 | 0.764 ± 0.021 | 0.827 ± 0.025 |
| GRridge | 0.760 ± 0.034 | 0.769 ± 0.029 | 0.841 ± 0.023 |
| block PLSDA | 0.742 ± 0.024 | 0.755 ± 0.023 | 0.830 ± 0.025 |
| block sPLSDA | 0.753 ± 0.033 | 0.764 ± 0.035 | 0.838 ± 0.021 |
| NN_NN | 0.766 ± 0.023 | 0.777 ± 0.019 | 0.819 ± 0.017 |
| NN_VCDN | 0.775 ± 0.026 | 0.790 ± 0.018 | 0.843 ± 0.021 |
| MOGONET_NN (Ours) | 0.804 ± 0.016 | 0.808 ± 0.010 | 0.858 ± 0.024 |
| MOGONET (Ours) | 0.815 ± 0.023 | 0.821 ± 0.022 | 0.874 ± 0.012 |
Classification results on BRCA dataset.
| Method | ACC | F1_weighted | F1_macro |
|---|---|---|---|
| KNN | 0.742 ± 0.024 | 0.730 ± 0.023 | 0.682 ± 0.025 |
| SVM | 0.729 ± 0.018 | 0.702 ± 0.015 | 0.640 ± 0.017 |
| Lasso | 0.732 ± 0.012 | 0.698 ± 0.015 | 0.642 ± 0.026 |
| RF | 0.754 ± 0.009 | 0.733 ± 0.010 | 0.649 ± 0.013 |
| XGBoost | 0.781 ± 0.008 | 0.764 ± 0.010 | 0.701 ± 0.017 |
| NN | 0.754 ± 0.028 | 0.740 ± 0.034 | 0.668 ± 0.047 |
| GRridge | 0.745 ± 0.016 | 0.726 ± 0.019 | 0.656 ± 0.025 |
| block PLSDA | 0.642 ± 0.009 | 0.534 ± 0.014 | 0.369 ± 0.017 |
| block sPLSDA | 0.639 ± 0.008 | 0.522 ± 0.016 | 0.351 ± 0.022 |
| NN_NN | 0.796 ± 0.012 | 0.784 ± 0.014 | 0.723 ± 0.018 |
| NN_VCDN | 0.792 ± 0.010 | 0.781 ± 0.006 | 0.721 ± 0.018 |
| MOGONET_NN (Ours) | 0.805 ± 0.017 | 0.782 ± 0.030 | 0.737 ± 0.038 |
| MOGONET (Ours) | 0.829 ± 0.018 | 0.825 ± 0.016 | 0.774 ± 0.017 |
Fig. 2Performance comparison of multi-omics data classification via MOGONET and single-omics data classification via GCN (n = 5 experiments for each model).
a Results of the ROSMAP dataset. b Results of the LGG dataset. c Results of the BRCA dataset. Means of evaluation metrics with standard deviations from different experiments are shown in the figure, where the error bar represents plus/minus one standard deviation. mRNA, meth, and miRNA refer to single-omics data classification via GCN with mRNA expression data, DNA methylation data, and miRNA expression data, respectively. mRNA + meth, mRNA + miRNA, and meth + miRNA refer to classification with two types of omics data. mRNA + meth + miRNA refers to classification with three types of omics data. Source data are provided as a Source Data file.
Fig. 3Performance of MOGONET under different values of hyper-parameter k.
a Results of the ROSMAP dataset. b Results of the BRCA dataset. The dashed lines represent the results from the best performed existing multi-omics integration methods (GRridge for ROSMAP and XGBoost for BRCA). MOGONET outperformed the best existing methods under different k values. Source data are provided as a Source Data file.
Important omics biomarkers identified by MOGONET in the ROSMAP dataset.
| Omics data type | Biomarkers |
|---|---|
| mRNA expression (8) | |
| DNA methylation (5) | |
| miRNA expression (17) |
Important omics biomarkers identified by MOGONET in the BRCA dataset.
| Omics data type | Biomarkers |
|---|---|
| mRNA expression (15) | |
| DNA methylation (9) | |
| miRNA expression (6) |
Classification results on LGG dataset.
| Method | ACC | F1 | AUC |
|---|---|---|---|
| KNN | 0.729 ± 0.034 | 0.738 ± 0.033 | 0.799 ± 0.038 |
| SVM | 0.754 ± 0.046 | 0.757 ± 0.050 | 0.754 ± 0.046 |
| Lasso | 0.761 ± 0.018 | 0.767 ± 0.022 | 0.823 ± 0.027 |
| RF | 0.748 ± 0.012 | 0.742 ± 0.010 | 0.823 ± 0.010 |
| XGBoost | 0.756 ± 0.040 | 0.767 ± 0.032 | 0.840 ± 0.023 |
| NN | 0.737 ± 0.023 | 0.748 ± 0.024 | 0.810 ± 0.037 |
| GRridge | 0.746 ± 0.038 | 0.756 ± 0.036 | 0.826 ± 0.044 |
| block PLSDA | 0.759 ± 0.025 | 0.738 ± 0.031 | 0.825 ± 0.023 |
| block sPLSDA | 0.685 ± 0.027 | 0.662 ± 0.030 | 0.730 ± 0.026 |
| NN_NN | 0.740 ± 0.039 | 0.756 ± 0.036 | 0.824 ± 0.036 |
| NN_VCDN | 0.740 ± 0.030 | 0.771 ± 0.021 | 0.826 ± 0.031 |
| MOGONET_NN (Ours) | 0.804 ± 0.025 | 0.811 ± 0.023 | 0.832 ± 0.029 |
| MOGONET (Ours) | 0.816 ± 0.016 | 0.814 ± 0.014 | 0.840 ± 0.027 |