Dipan Shaw1, Hao Chen2, Minzhu Xie3, Tao Jiang4,5. 1. Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA. dshaw003@ucr.edu. 2. Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA. 3. College of Information Science and Engineering, Hunan Normal University, Changsha, China. xieminzhu@hotmail.com. 4. Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA. jiang@cs.ucr.edu. 5. Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China. jiang@cs.ucr.edu.
Abstract
BACKGROUND: Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS: In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION: Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
BACKGROUND: Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS: In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION: Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Authors: Markus Hafner; Markus Landthaler; Lukas Burger; Mohsen Khorshid; Jean Hausser; Philipp Berninger; Andrea Rothballer; Manuel Ascano; Anna-Carina Jungkamp; Mathias Munschauer; Alexander Ulrich; Greg S Wardle; Scott Dewell; Mihaela Zavolan; Thomas Tuschl Journal: Cell Date: 2010-04-02 Impact factor: 41.582
Authors: Donny D Licatalosi; Aldo Mele; John J Fak; Jernej Ule; Melis Kayikci; Sung Wook Chi; Tyson A Clark; Anthony C Schweitzer; John E Blume; Xuning Wang; Jennifer C Darnell; Robert B Darnell Journal: Nature Date: 2008-11-02 Impact factor: 49.962
Authors: Alexander R Gawronski; Michael Uhl; Yajia Zhang; Yen-Yi Lin; Yashar S Niknafs; Varune R Ramnarine; Rohit Malik; Felix Feng; Arul M Chinnaiyan; Colin C Collins; S Cenk Sahinalp; Rolf Backofen Journal: Bioinformatics Date: 2018-09-15 Impact factor: 6.937