Chengqian Lu1, Mengyun Yang1, Feng Luo2, Fang-Xiang Wu3, Min Li1, Yi Pan4, Yaohang Li5, Jianxin Wang1. 1. School of Information Science and Engineering, Central South University, Changsha, People's Republic of China. 2. School of Computing, Clemson University, Clemson, SC, USA. 3. Division of Biomedical Engineering, University of Saskatchewan, Saskatchewan, Canada. 4. Department of Computer Science, Georgia State University, Atlanta, GA, USA. 5. Department of Computer Science, Old Dominion University, Norfolk, VA, USA.
Abstract
Motivation: Accumulating evidences indicate that long non-coding RNAs (lncRNAs) play pivotal roles in various biological processes. Mutations and dysregulations of lncRNAs are implicated in miscellaneous human diseases. Predicting lncRNA-disease associations is beneficial to disease diagnosis as well as treatment. Although many computational methods have been developed, precisely identifying lncRNA-disease associations, especially for novel lncRNAs, remains challenging. Results: In this study, we propose a method (named SIMCLDA) for predicting potential lncRNA-disease associations based on inductive matrix completion. We compute Gaussian interaction profile kernel of lncRNAs from known lncRNA-disease interactions and functional similarity of diseases based on disease-gene and gene-gene onotology associations. Then, we extract primary feature vectors from Gaussian interaction profile kernel of lncRNAs and functional similarity of diseases by principal component analysis, respectively. For a new lncRNA, we calculate the interaction profile according to the interaction profiles of its neighbors. At last, we complete the association matrix based on the inductive matrix completion framework using the primary feature vectors from the constructed feature matrices. Computational results show that SIMCLDA can effectively predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case studies show that SIMCLDA can effectively predict candidate lncRNAs for renal cancer, gastric cancer and prostate cancer. Availability and implementation: https://github.com//bioinfomaticsCSU/SIMCLDA. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Accumulating evidences indicate that long non-coding RNAs (lncRNAs) play pivotal roles in various biological processes. Mutations and dysregulations of lncRNAs are implicated in miscellaneous human diseases. Predicting lncRNA-disease associations is beneficial to disease diagnosis as well as treatment. Although many computational methods have been developed, precisely identifying lncRNA-disease associations, especially for novel lncRNAs, remains challenging. Results: In this study, we propose a method (named SIMCLDA) for predicting potential lncRNA-disease associations based on inductive matrix completion. We compute Gaussian interaction profile kernel of lncRNAs from known lncRNA-disease interactions and functional similarity of diseases based on disease-gene and gene-gene onotology associations. Then, we extract primary feature vectors from Gaussian interaction profile kernel of lncRNAs and functional similarity of diseases by principal component analysis, respectively. For a new lncRNA, we calculate the interaction profile according to the interaction profiles of its neighbors. At last, we complete the association matrix based on the inductive matrix completion framework using the primary feature vectors from the constructed feature matrices. Computational results show that SIMCLDA can effectively predict lncRNA-disease associations with higher accuracy compared with previous methods. Furthermore, case studies show that SIMCLDA can effectively predict candidate lncRNAs for renal cancer, gastric cancer and prostate cancer. Availability and implementation: https://github.com//bioinfomaticsCSU/SIMCLDA. Supplementary information: Supplementary data are available at Bioinformatics online.