Cangzhi Jia1, Yue Bi1, Jinxiang Chen2,3, André Leier4,5, Fuyi Li2,3, Jiangning Song2,3,6. 1. School of Science, Dalian Maritime University, Dalian 116026, China. 2. Department of Biochemistry and Molecular Biology, Monash Biomedicine Discovery Institute. 3. Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia. 4. Department of Genetics, School of Medicine. 5. Department of Cell, Developmental and Integrative Biology, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, USA. 6. ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, Melbourne, VIC 3800, Australia.
Abstract
MOTIVATION: Different from traditional linear RNAs (containing 5' and 3' ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. RESULTS: For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Different from traditional linear RNAs (containing 5' and 3' ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. RESULTS: For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. AVAILABILITY AND IMPLEMENTATION: A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.