| Literature DB >> 33297946 |
Xiaoyong Pan1, Yi Fang2, Xianfeng Li3, Yang Yang4, Hong-Bin Shen5.
Abstract
BACKGROUND: RNA-binding proteins (RBPs) play crucial roles in various biological processes. Deep learning-based methods have been demonstrated powerful on predicting RBP sites on RNAs. However, the training of deep learning models is very time-intensive and computationally intensive.Entities:
Keywords: Circular RNAs; Deep learning; Linear RNAs; RNA-binding proteins
Mesh:
Substances:
Year: 2020 PMID: 33297946 PMCID: PMC7724624 DOI: 10.1186/s12864-020-07291-6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
The details of training and independent test sets. Each RBP has one training set and one test set, the number is the average across all RBPs
| RNA type | # of RBPs | Positive data of each RBP | Negative data of each RBP |
|---|---|---|---|
| Linear RNAs | 154 | Training: 44,119 Independent test: 11,030 | Training: 44,119 Independent test: 11,030 |
| Circular RNAs | 37 | Training: 3680 Independent test: 920 | Training: 3680 Independent test: 920 |
Fig. 1The workflow of RBPsuite webserver. RBPsuite first breaks the full-length sequence into segments of 101 nucleotides. For linear RNAs, the binding scores of individual segments are calculated by iDeepS. For circRNAs, the binding scores of individual segments are calculated by CRIP. The output page gives the binding scores for each segment and identified motifs on the segment, and also the score distribution of RBP binding sites within the input sequence
Fig. 2The AUCs of the updated iDeepS for linear RNAs on 154 RBPs
The expected runtime of predicting binding sites of a specific RBP on a linear RNA and a circRNA using RBPsuite for sequences with different lengths
| RNA type | Sequence length | Time(s) |
|---|---|---|
| Linear RNA | 1000 | 5.62 |
| 10,000 | 15.43 | |
| 100,000 | 115.59 | |
| circRNA | 1000 | 9.07 |
| 10,000 | 9.79 | |
| 100,000 | 20.26 |
Fig. 3The output of RBPsuite for general model, clicking a protein of interest to see the detailed results for this protein. In the table, the detected motif on the predicted binding site is marked in red
Fig. 4The results of RBPsuite for predicting AGO2 binding sites on hsa_circ_0054654. A) The 101 nt segments of hsa_circ_0054654 with a binding score greater than 0.5. B) The score distribution of 18 segments from hsa_circ_0054654, where the star corresponds to the verified binding sites derived from CLIP-seq read peaks