Ping Luo1, Yuanyuan Li1,2, Li-Ping Tian3, Fang-Xiang Wu1,4,5. 1. Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada. 2. School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, China. 3. School of Information, Beijing Wuzi University, Beijing, China. 4. Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada. 5. Department of Computer Science, University of Saskatchewan, Saskatoon, Canada.
Abstract
MOTIVATION: Computationally predicting disease genes helps scientists optimize the in-depth experimental validation and accelerates the identification of real disease-associated genes. Modern high-throughput technologies have generated a vast amount of omics data, and integrating them is expected to improve the accuracy of computational prediction. As an integrative model, multimodal deep belief net (DBN) can capture cross-modality features from heterogeneous datasets to model a complex system. Studies have shown its power in image classification and tumor subtype prediction. However, multimodal DBN has not been used in predicting disease-gene associations. RESULTS: In this study, we propose a method to predict disease-gene associations by multimodal DBN (dgMDL). Specifically, latent representations of protein-protein interaction networks and gene ontology terms are first learned by two DBNs independently. Then, a joint DBN is used to learn cross-modality representations from the two sub-models by taking the concatenation of their obtained latent representations as the multimodal input. Finally, disease-gene associations are predicted with the learned cross-modality representations. The proposed method is compared with two state-of-the-art algorithms in terms of 5-fold cross-validation on a set of curated disease-gene associations. dgMDL achieves an AUC of 0.969 which is superior to the competing algorithms. Further analysis of the top-10 unknown disease-gene pairs also demonstrates the ability of dgMDL in predicting new disease-gene associations. AVAILABILITY AND IMPLEMENTATION: Prediction results and a reference implementation of dgMDL in Python is available on https://github.com/luoping1004/dgMDL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Computationally predicting disease genes helps scientists optimize the in-depth experimental validation and accelerates the identification of real disease-associated genes. Modern high-throughput technologies have generated a vast amount of omics data, and integrating them is expected to improve the accuracy of computational prediction. As an integrative model, multimodal deep belief net (DBN) can capture cross-modality features from heterogeneous datasets to model a complex system. Studies have shown its power in image classification and tumor subtype prediction. However, multimodal DBN has not been used in predicting disease-gene associations. RESULTS: In this study, we propose a method to predict disease-gene associations by multimodal DBN (dgMDL). Specifically, latent representations of protein-protein interaction networks and gene ontology terms are first learned by two DBNs independently. Then, a joint DBN is used to learn cross-modality representations from the two sub-models by taking the concatenation of their obtained latent representations as the multimodal input. Finally, disease-gene associations are predicted with the learned cross-modality representations. The proposed method is compared with two state-of-the-art algorithms in terms of 5-fold cross-validation on a set of curated disease-gene associations. dgMDL achieves an AUC of 0.969 which is superior to the competing algorithms. Further analysis of the top-10 unknown disease-gene pairs also demonstrates the ability of dgMDL in predicting new disease-gene associations. AVAILABILITY AND IMPLEMENTATION: Prediction results and a reference implementation of dgMDL in Python is available on https://github.com/luoping1004/dgMDL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Cheng Zhang; Cristina Correia; Taylor M Weiskittel; Shyang Hong Tan; Kevin Meng-Lin; Grace T Yu; Jingwen Yao; Kok Siong Yeo; Shizhen Zhu; Choong Yong Ung; Hu Li Journal: Front Immunol Date: 2022-07-14 Impact factor: 8.786