| Literature DB >> 35768792 |
Jidong Zhang1, Bo Liu2,3, Zhihan Wang1, Klaus Lehnert4, Mark Gahegan5.
Abstract
BACKGROUND: Addressing the laborious nature of traditional biological experiments by using an efficient computational approach to analyze RNA-binding proteins (RBPs) binding sites has always been a challenging task. RBPs play a vital role in post-transcriptional control. Identification of RBPs binding sites is a key step for the anatomy of the essential mechanism of gene regulation by controlling splicing, stability, localization and translation. Traditional methods for detecting RBPs binding sites are time-consuming and computationally-intensive. Recently, the computational method has been incorporated in researches of RBPs. Nevertheless, lots of them not only rely on the sequence data of RNA but also need additional data, for example the secondary structural data of RNA, to improve the performance of prediction, which needs the pre-work to prepare the learnable representation of structural data.Entities:
Keywords: Bioinformatics; Convolutional neural network; Graph convolution network; RNA-binding protein
Mesh:
Substances:
Year: 2022 PMID: 35768792 PMCID: PMC9241231 DOI: 10.1186/s12859-022-04798-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1The structure of the DeepPN. The RBP sequence is processed by one-hot method. Then it enters the main part for hidden feature extraction and finally the result is obtained by three fully connected layers
Time spent in processing data for one-hot, k-gram and k-mer (seconds)
| RBPs | One-hot(s) | k-gram(s) | k-mer(s) |
|---|---|---|---|
| C17ORF85 PAR-CLIP | 1.21 | 166.94 | 56.26 |
| CAPRIN1 PAR-CLIP | 4.43 | 698.55 | 236.86 |
| C22ORF28 PAR-CLIP | 5.17 | 776.85 | 264.13 |
| ALKBH5 PAR-CLIP | 0.72 | 108.97 | 36.60 |
| ELAVL1 HITS-CLIP | 4.86 | 730.46 | 247.91 |
| HNRNPC iCLIP | 10.34 | 1707.43 | 544.39 |
| SFRS1 HITS-CLIP | 9.56 | 1646.51 | 557.26 |
| AGO2 HITS-CLIP | 21.79 | 3801.02 | 1310.53 |
| TDP43 iCLIP | 39.82 | 6768.75 | 2195.09 |
| AGO1-4 PAR-CLIP | 17.09 | 2875.98 | 975.49 |
| TIAL1 iCLIP | 21.40 | 3501.71 | 1181.97 |
| TIA1 iCLIP | 7.76 | 1350.94 | 454.44 |
| EWSR1 PAR-CLIP | 8.70 | 1373.17 | 467.09 |
| ELAVL1 PAR-CLIP(A) | 11.60 | 2005.25 | 677.86 |
| ELAVL1 PAR-CLIP(B) | 5.19 | 847.13 | 288.93 |
| FUS PAR-CLIP | 18.47 | 2863.57 | 994.37 |
| PUM2 PAR-CLIP | 5.65 | 904.35 | 307.65 |
| IGF2BP1-3 PAR-CLIP | 4.61 | 696.19 | 234.87 |
| MOV10 PAR-CLIP | 7.79 | 1214.91 | 429.22 |
| ELAVL1 PAR-CLIP(C) | 58.01 | 9836.02 | 3327.17 |
| ZC3H7B PAR-CLIP | 12.16 | 1955.89 | 656.31 |
| PTB HITS-CLIP | 24.31 | 3705.75 | 1285.36 |
| TAF15 PAR-CLIP | 4.78 | 702.27 | 237.64 |
| QKI PAR-CLIP | 6.17 | 982.99 | 328.93 |
| Sum | 51221.59 | 17296.34 | |
| Average | 2134.23 | 720.68 |
The shortest total time and average time among the three data processing methods are shown in bold font
The number of total samples including positive and negative samples in each dataset
| RBP | Samples | RBP | Samples |
|---|---|---|---|
| C17ORF85 PAR-CLIP | 3754 | EWSR1 PAR-CLIP | 31649 |
| CAPRIN1 PAR-CLIP | 16041 | ELAVL1 PAR-CLIP(A) | 51249 |
| C22ORF28 PAR-CLIP | 18505 | ELAVL1 PAR-CLIP(B) | 18702 |
| ALKBH5 PAR-CLIP | 2410 | FUS PAR-CLIP | 66061 |
| ELAVL1 HITS-CLIP | 17031 | PUM2 PAR-CLIP | 17343 |
| HNRNPC iCLIP | 41266 | IGF2BP1-3 PAR-CLIP | 15377 |
| SFRS1 HITS-CLIP | 36633 | MOV10 PAR-CLIP | 26780 |
| AGO2 HITS-CLIP | 92346 | ELAVL1 PAR-CLIP(C) | 238888 |
| TDP43 iCLIP | 167110 | ZC3H7B PAR-CLIP | 40980 |
| AGO1-4 PAR-CLIP | 68212 | PTB HITS-CLIP | 88274 |
| TIAL1 iCLIP | 78984 | TAF15 PAR-CLIP | 13904 |
| TIA1 iCLIP | 34184 | QKI PAR-CLIP | 19418 |
Fig. 2The accuracy and loss without using early-stopping method in C17ORF85 PAR-CLIP, CAPRIN1 PAR-CLIP and SFRS1 HITS-CLIP
Fig. 3The test accuracy on all the RBP datasets for DeepPN and ChebNet (A). The test accuracy on large datasets are much better than small datasets for both DeepPN and ChebNet (B). The distribution of the test accuracy for DeepPN and ChebNet (C)
The AUC results for each method
| RBP | DeepPN | GraphProt | Deepnet-rbp | iDeepV |
|---|---|---|---|---|
| C17ORF85 PAR-CLIP | 0.800 | 0.820 | 0.740 | |
| CAPRIN1 PAR-CLIP | 0.855 | 0.834 | 0.824 | |
| C22ORF28 PAR-CLIP | 0.785 | 0.751 | 0.792 | |
| ALKBH5 PAR-CLIP | 0.660 | 0.680 | 0.643 | |
| ELAVL1 HITS-CLIP | 0.955 | 0.966 | 0.966 | |
| HNRNPC iCLIP | 0.977 | 0.952 | 0.962 | |
| SFRS1 HITS-CLIP | 0.898 | 0.931 | 0.905 | |
| AGO2 HITS-CLIP | 0.868 | 0.765 | 0.809 | |
| TDP43 iCLIP | 0.874 | 0.876 | 0.935 | |
| AGO1-4 PAR-CLIP | 0.912 | 0.895 | 0.881 | |
| TIAL1 iCLIP | 0.926 | 0.833 | 0.870 | |
| TIA1 iCLIP | 0.928 | 0.861 | 0.891 | |
| EWSR1 PAR-CLIP | 0.954 | 0.935 | 0.962 | |
| ELAVL1 PAR-CLIP(A) | 0.967 | 0.959 | 0.966 | |
| ELAVL1 PAR-CLIP(B) | 0.935 | 0.961 | 0.962 | |
| FUS PAR-CLIP | 0.977 | 0.968 | 0.976 | |
| PUM2 PAR-CLIP | 0.952 | 0.954 | 0.965 | |
| IGF2BP1-3 PAR-CLIP | 0.889 | 0.879 | 0.923 | |
| MOV10 PAR-CLIP | 0.863 | 0.854 | 0.896 | |
| ELAVL1 PAR-CLIP(C) | 0.991 | 0.994 | 0.990 | |
| ZC3H7B PAR-CLIP | 0.820 | 0.796 | 0.883 | |
| PTB HITS-CLIP | 0.938 | 0.937 | 0.936 | |
| TAF15 PAR-CLIP | 0.974 | 0.970 | 0.978 | |
| QKI PAR-CLIP | 0.975 | 0.957 | 0.965 | |
| Average | 0.887 | 0.903 | 0.913 |
The best performance is marked in bold
The AUC results for GraphProt, Deepnet-RBP and iDeepV are taken from original papers
Fig. 4The quantity of different ranges of AUC score for each method