| Literature DB >> 32373160 |
Haijie Liu1,2,3, Jiaojiao Guan4, He Li5, Zhijie Bao6, Qingmei Wang3, Xun Luo7,8, Hansheng Xue4.
Abstract
Multiple sclerosis (MS) is an autoimmune disease for which it is difficult to find exact disease-related genes. Effectively identifying disease-related genes would contribute to improving the treatment and diagnosis of multiple sclerosis. Current methods for identifying disease-related genes mainly focus on the hypothesis of guilt-by-association and pay little attention to the global topological information of the whole protein-protein-interaction (PPI) network. Besides, network representation learning (NRL) has attracted a huge amount of attention in the area of network analysis because of its promising performance in node representation and many downstream tasks. In this paper, we try to introduce NRL into the task of disease-related gene prediction and propose a novel framework for identifying the disease-related genes multiple sclerosis. The proposed framework contains three main steps: capturing the topological structure of the PPI network using NRL-based methods, encoding learned features into low-dimensional space using a stacked autoencoder, and training a support vector machine (SVM) classifier to predict disease-related genes. Compared with three state-of-the-art algorithms, our proposed framework shows superior performance on the task of predicting disease-related genes of multiple sclerosis.Entities:
Keywords: PPI network; deep learning; disease gene prediction; multiple sclerosis; network embedding
Year: 2020 PMID: 32373160 PMCID: PMC7186413 DOI: 10.3389/fgene.2020.00328
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The workflow of the proposed NRL-based framework. The framework contains three main parts: (A) learning the topological structure of the protein-protein-interaction network, (B) transforming network embedding features into low-dimensional space, and (C) training the support vector machine classifier to predict disease-related genes.
Figure 2(A) Overview of DeepWalk. It consists of three main parts: random walk generation, representation learning, and hierarchical softmax. This figure was extracted from the original paper. (B) Two types of search strategies from node 5, BFS and DFS. (C) The random walk procedure in node2vec.
The experimental results of NRL-based methods and other baselines.
| ED | 0.6032 (0.0165) | 0.5933 (0.0204) | 0.6439 (0.0163) | 0.6356 (0.0216) |
| SPL | 0.6136 (0.0296) | 0.6033 (0.0198) | 0.6703 (0.0205) | 0.6531 (0.0208) |
| RWR | 0.5312 (0.0113) | 0.5203 (0.0305) | 0.5431 (0.0195) | 0.5321 (0.0233) |
| LINE-SAE-SVM | 0.5527 (0.0102) | 0.5403 (0.0218) | 0.5838 (0.0106) | 0.5716 (0.0198) |
| node2vec-SAE-SVM | 0.7472 (0.0283) | |||
| DeepWalk-SAE-SVM | 0.6941 (0.0288) | 0.6914 (0.0315) | 0.7554 (0.0204) |
The bold values indicate the best performance.
Figure 3Accuracy and AUPRC values of three network representation learning algorithms with four different numbers of dimensions. The x-axis represents three different methods. The y-axis represents the values of Accuracy (left) and AUROC (right).
Figure 4Accuracy, F1, AUPRC, and AUPRC values of three network representation learning algorithms with four different numbers of dimensions and different autoencoder structures. The x-axis represents four different evaluation metrics. The y-axis represents the value of the evaluation metric.
The experimental results of NRL-based methods with different classifiers.
| Logistic | LINE | 0.5272(0.0131) | 0.5172(0.0125) | 0.5596(0.0138) | 0.5391(0.0248) |
| Regression | node2vec | 0.6483(0.0163) | 0.6483(0.0163) | 0.6899(0.0236) | 0.6409(0.0208) |
| DeepWalk | 0.5793(0.0250) | 0.5793(0.0150) | 0.6658(0.0216) | 0.6153(0.0200) | |
| Random | LINE | 0.6176(0.0188) | 0.6276(0.0188) | 0.6208(0.0216) | 0.6057(0.0263) |
| Forest | node2vec | 0.7172(0.0117) | 0.7012(0.0217) | 0.7400(0.0126) | 0.7191(0.0203) |
| DeepWalk | 0.6959(0.0215) | 0.6759(0.0163) | 0.7336(0.0185) | 0.7008(0.0202) |
Figure 5AUROC with different parameter combinations of p and q in the node2vec algorithm. The x-axis represents different parameter combinations. The y-axis represents the value of AUROC.