| Literature DB >> 24658593 |
R Nagarajan1, M Michael Gromiha1.
Abstract
Protein-RNA complexes play key roles in several cellular processes by the interactions of amino acids with RNA. To understand the recognition mechanism, it is important to identify the specific amino acids involved in RNA binding. Various computational methods have been developed for predicting RNA binding residues from protein sequence. However, their performances mainly depend on the training dataset, feature selection for developing a model and learning capacity of the model. Hence, it is important to reveal the correspondence between the performance of methods and properties of RNA-binding proteins (RBPs). In this work, we have collected all available RNA binding residues prediction methods and revealed their performances on unbiased, stringent and diverse datasets for RBPs with less than 25% sequence identity based on structural class, fold, superfamily, family, protein function, RNA type, RNA strand and RNA conformation. The best methods for each type of RBPs and the type of RBPs, which require further refinement in prediction, have been brought out. We also analyzed the performance of these methods for the disordered regions, structures which are not included in the training dataset and recently solved structures. The reliability of prediction is better than randomly choosing any method or combination of methods. This approach would be a valuable resource for biologists to choose the best method based on the type of RBPs for designing their experiments and the tool is freely accessible online at www.iitm.ac.in/bioinfo/RNA-protein/.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24658593 PMCID: PMC3962366 DOI: 10.1371/journal.pone.0091140
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Prediction accuracy (%) of binding sites in different structural classes.
| Method | Average | all-α | all-β | α+β | α/β | Low resolution | Multidomain | Peptides | Small proteins |
|
| |||||||||
| BindN | 66.14 | 62.74 | 63.52 | 62.80 | 64.91 | 72.11 | 65.26 | 66.66 | 71.08 |
| BindN+ | 75.71 | 75.37 | 75.50 | 73.58 | 75.20 | 81.64 | 74.59 | 76.81 | 74.77 |
| NAPS | 64.46 | 63.12 | 60.39 | 60.64 | 63.19 | 64.67 | 60.22 | 82.97 | 60.45 |
| Pprint | 72.70 | 70.40 | 71.12 | 70.13 | 73.10 | 70.55 | 70.10 | 84.78 | 71.35 |
| RNABindR v2.0 | 72.00 | 75.40 | 67.22 | 70.93 | 73.25 | 70.76 | 73.74 | 89.13 | 55.63 |
| RNAProB | 66.53 | 66.05 | 70.13 | 65.61 | 71.58 | 71.69 | 61.43 | 68.12 | 57.59 |
|
| |||||||||
| BindN | 60.99 | 59.68 | 59.95 | 59.62 | 60.95 | 69.07 | 59.50 | 57.69 | 61.49 |
| BindN+ | 69.95 | 69.35 | 69.70 | 67.90 | 67.69 | 79.11 | 64.38 | 69.95 | 71.50 |
| NAPS | 60.12 | 59.72 | 57.33 | 57.96 | 58.34 | 60.62 | 57.27 | 70.67 | 59.03 |
| Pprint | 73.46 | 70.71 | 71.23 | 69.89 | 73.51 | 76.90 | 68.04 | 79.09 | 78.28 |
| RNABindR v2.0 | 69.51 | 73.33 | 66.95 | 68.55 | 70.59 | 72.50 | 67.81 | 78.37 | 57.98 |
| RNAProB | 64.54 | 62.52 | 64.88 | 63.36 | 65.06 | 69.39 | 56.97 | 78.37 | 55.76 |
Accuracy = (sensitivity+specificity)/2.
Figure 1Prediction performance of computational methods in various folds, superfamilies and families.
Typical examples of best and least predicted folds, superfamilies and families.
| Fold/superfamily/family | Distance (Å) | Best Method | Sensitivity | Specificity | Accuracy1 | Accuracy2 | MCC | LowestAccuracy2 | Method | MCC |
|
| ||||||||||
| NSP3 homodimer (1) | ≤ 3.5 | BindN+ | 93.75 | 89.86 | 90.26 | 91.81 | 0.65 | 60.28 | NAPS | 0.14 |
| Ribosomal proteins (2) | ≤ 6.0 | Pprint | 90.62 | 88.73 | 91.40 | 89.68 | 0.79 | 61.60 | RNAProB | 0.32 |
|
| ≤ 3.5 |
|
|
|
|
|
|
|
|
|
|
| ≤ 6.0 |
|
|
|
|
|
|
|
|
|
|
| ||||||||||
| Rho N-terminal domain(1) | ≤ 3.5 | RNAProB | 100.00 | 97.25 | 97.46 | 98.62 | 0.85 | 49.70 | NAPS | 0 |
| tRNA-binding arm (1) | ≤ 6.0 | RNAProB | 88.89 | 99.72 | 99.46 | 94.31 | 0.89 | 61.39 | NAPS | 0.07 |
|
| ≤ 3.5 |
|
|
|
|
|
|
|
|
|
|
| ≤ 6.0 |
|
|
|
|
|
|
|
|
|
|
| ||||||||||
| SM motif of SNRNP (1) | ≤ 3.5 | RNAProB | 83.33 | 93.94 | 93.06 | 88.63 | 0.65 | 43.94 | BindN+ | 0 |
| L23p(1) | ≤ 6.0 | Pprint | 88.57 | 95.65 | 92.59 | 92.11 | 0.85 | 62.86 | RNAProB | 0.41 |
|
| ≤ 3.5 |
|
|
|
|
|
|
|
|
|
|
| ≤ 6.0 |
|
|
|
|
|
|
|
|
|
Accuracy1 = (TP + TN)/(TP + TN + FP + FN).
Accuracy2 = (sensitivity + specificity)/2.
Figure 2Prediction performance of computational methods in disordered regions.
Prediction accuracy of binding sites in different RNA types (3.5 Å cutoff).
| RNA type | Best Method | Sensitivity | Specificity | Accuracy1 | Accuracy2 | MCC | Lowest Accuracy2 | Method | MCC |
| mRNA (7) | RNABindR v2.0 | 77.03 | 80.48 | 80.48 | 78.76 | 0.35 | 56.74 | NAPS | 0.07 |
| Pre miRNA(2) | BindN | 80.00 | 81.38 | 81.20 | 80.69 | 0.19 | 45.70 | RNAProB | 0 |
| rRNA(54) | BindN+ | 83.01 | 74.65 | 79.08 | 78.83 | 0.52 | 63.62 | NAPS | 0.24 |
| sRNA(1) | RNAProB | 61.90 | 96.81 | 90.43 | 79.36 | 0.66 | 60.81 | BindN | 0.23 |
| siRNA(3) | RNABindR v2.0 | 82.64 | 77.89 | 79.82 | 80.27 | 0.48 | 60.93 | Pprint | 0.19 |
| snRNA(3) | BindN+ | 90.11 | 91.15 | 90.65 | 90.63 | 0.66 | 73.64 | NAPS | 0.20 |
| tRNA(28) | RNAProB | 59.86 | 93.74 | 91.49 | 76.80 | 0.44 | 59.25 | NAPS | 0.09 |
| viral_RNA(12) | BindN+ | 49.21 | 83.38 | 81.90 | 66.29 | 0.17 | 57.60 | NAPS | 0.07 |
Accuracy1 = (TP + TN)/(TP + TN + FP + FN).
Accuracy2 = (sensitivity + specificity)/2.
Prediction performance of different methods in two independent datasets.
| Method | Data set 1 | Data set 2 | ||||||||||
| ≤3.5 Å | ≤6.0 Å | ≤3.5 Å | ≤6.0 Å | |||||||||
| Accuracy1 | Accuracy2 | MCC | Accuracy1 | Accuracy2 | MCC | Accuracy1 | Accuracy2 | MCC | Accuracy1 | Accuracy2 | MCC | |
| BindN | 74.88 | 64.00 | 0.23 | 70.35 | 61.26 | 0.21 | 75.49 | 62.78 | 0.21 | 71.26 | 60.98 | 0.20 |
| BindN+ | 79.45 | 70.65 | 0.34 | 77.43 | 66.93 | 0.33 | 78.75 | 68.01 | 0.30 | 76.77 | 65.46 | 0.30 |
| NAPS | 66.29 | 60.89 | 0.17 | 63.58 | 58.36 | 0.15 | 66.61 | 62.80 | 0.18 | 64.55 | 60.61 | 0.16 |
| Pprint | 70.74 | 66.22 | 0.25 | 73.82 | 66.59 | 0.31 | 70.70 | 64.80 | 0.21 | 72.05 | 65.17 | 0.26 |
| RNABindR v2.0 | 65.95 | 68.78 | 0.27 | 71.28 | 67.38 | 0.32 | 65.12 | 66.90 | 0.22 | 70.47 | 66.72 | 0.28 |
| RNAProB | 82.21 | 60.15 | 0.22 | 73.43 | 58.47 | 0.21 | 80.48 | 55.71 | 0.13 | 71.03 | 55.20 | 0.13 |
Data set 1: List of protein-RNA complexes analyzed in this work and are not used in the respective methods.
Data set 2: List of protein-RNA complexes published since June 2012, after the publication of analyzed prediction methods.
Comparison between ensemble method and best methods in different datasets.
| Data set | Number of sub groups | Number of sub groups predicted with highest accuracy | ||||
| ≤3.5 Å | ≤6.0 Å | |||||
| Ensemble | Best method | Ensemble | Best method | |||
| Class | 8 | 2 (76.03) | 6 (77.48) | 0 (69.69) | 8 (74.06) | |
| Fold | 90 | 11 (74.09) | 79 (80.93) | 4 (68.69) | 86 (76.87) | |
| Superfamily | 100 | 13 (74.08) | 87 (80.94) | 4 (68.68) | 96 (76.85) | |
| Family | 126 | 17 (73.67) | 109 (80.67) | 5 (68.15) | 121 (76.63) | |
| RNA conformation | 4 | 0 (70.69) | 4 (75.23) | 0 (65.35) | 4 (71.22) | |
| RNA strand | 2 | 0 (67.48) | 2 (71.31) | 0 (63.57) | 2 (68.56) | |
| RNA type | 8 | 0 (70.61) | 8 (78.95) | 0 (64.48) | 8 (72.48) | |
| Protein function | 21 | 1 (68.11) | 20 (75.74) | 0 (62.48) | 21 (70.79) | |
Average accuracies (%) are given in parentheses.
Figure 3Web application to provide the best methods based on the type of RBPs.