| Literature DB >> 26540053 |
Jingna Si1, Jing Cui2, Jin Cheng2, Rongling Wu3.
Abstract
Proteins and RNA interaction have vital roles in many cellular processes such as protein synthesis, sequence encoding, RNA transfer, and gene regulation at the transcriptional and post-transcriptional levels. Approximately 6%-8% of all proteins are RNA-binding proteins (RBPs). Distinguishing these RBPs or their binding residues is a major aim of structural biology. Previously, a number of experimental methods were developed for the determination of protein-RNA interactions. However, these experimental methods are expensive, time-consuming, and labor-intensive. Alternatively, researchers have developed many computational approaches to predict RBPs and protein-RNA binding sites, by combining various machine learning methods and abundant sequence and/or structural features. There are three kinds of computational approaches, which are prediction from protein sequence, prediction from protein structure, and protein-RNA docking. In this paper, we review all existing studies of predictions of RNA-binding sites and RBPs and complexes, including data sets used in different approaches, sequence and structural features used in several predictors, prediction method classifications, performance comparisons, evaluation methods, and future directions.Entities:
Keywords: RNA-binding proteins (RBPs); RNA-binding site; bioinformatics; macromolecular docking; prediction
Mesh:
Substances:
Year: 2015 PMID: 26540053 PMCID: PMC4661811 DOI: 10.3390/ijms161125952
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Strategies for RNA-binding site and RBP prediction.
Commonly used data sets for RNA-binding sites identification.
| ID | Reference | Publication Year | Notes |
|---|---|---|---|
| PRIPU dataset | [ | 2015 | The dataset contains positive and unlabeled examples, which is an innovation because previous ones usually have negative samples. Such negative samples are not real negative samples, some even may be unknown positive samples |
| a RB344 | [ | 2015 | 344 RNA binding proteins, almost entirely non-redundant at 30% sequence identity |
| RB172 | [ | 2014 | 172 protein entries with sequence identity of less than 25% |
| RB75 | [ | 2012 | 75 RNP complexes released between 1 January and 28 April 2011 from PDB database b, non-redundant at 40% sequence identity |
| RB199 | [ | 2011 | Extracted dataset (May 2010) from PDB database. Proteins with >30% sequence identity or structures with resolution worse than 3.5 Å were removed |
| RB164 | [ | 2010 | The data were downloaded from RsiteDB. After removing protein and RNA chains with sequence identity above 25% and 60%, respectively, 205 non-redundant protein–RNA chains in 164 complexes were obtained |
| RB86 | [ | 2008 | 86 RNA-binding protein chains were collected for training and fivefold cross validation |
| RB147 | [ | 2007 | Adding novel RNA-binding complexes since 2006, based on RB109 |
| RB109 | [ | 2006 | 109 RNA–protein complexes extracted from structures of known RNA–protein complexes solved by X-ray crystallography in the PDB. Proteins with >30% sequence identity or structures with resolution worse than 3.5 Å were removed |
a RB: Abbreviation of RNA-binding dataset; b PDB: Protein Data Bank.
A general selection of Web servers of RNA-binding sites and protein prediction and protein–RNA complex docking.
| Methods | URLs | References | Available | Seq/Struc/Docking | Sites/Protein |
|---|---|---|---|---|---|
| PRIPU | Cheng | ○ | seq | site | |
| RNABindRPlus | Walia | ○ | site | ||
| CatRAPID omics | Agostini | ○ | site | ||
| SRCPred | Fernandez | ○ | site | ||
| SPOT | Zhao | X | protein | ||
| PRBR | Ma | ○ | site | ||
| RNAPred | Kumar | ○ | protein | ||
| RPISeq | Muppirala | ○ | site | ||
| BindN+ | Wang | ○ | site | ||
| NAPS | Carson | X | site | ||
| PiRaNhA | Murakami | ○ | site | ||
| PRNA | Liu | X | site | ||
| RNA | Li | X | site | ||
| RISP | Tong | X | site | ||
| PRINTR | Wang | X | site | ||
| PPRInt | Kumar | ○ | site | ||
| RNABindR | Terribilini | ○ | site | ||
| BindN | Wang and Brown (2006) [ | ○ | site | ||
| SVMProt | Han | X | protein | ||
| RBPDetector | Yang | ○ | struc | site | |
| SPOT-Seq-RNA | Yang | X | protein | ||
| DRNA | Zhao | X | protein | ||
| OPRA | Program available upon request from the authors | Perez-Cano and Fernandez-Recio (2010) [ | ○ | site | |
| PRIP | Maetschke | X | site | ||
| KYG | Kim | X | protein | ||
| DARS-RNP and QUASI-RNP | Tuszynska and Bujnicki (2011) [ | ○ | docking | complex | |
| PatchDock | Schneidman-Duhovny | ○ | complex | ||
| Haddock | Dominguez | ○ | complex | ||
| Hex | Ritchie and Kemp (2000) [ | ○ | complex | ||
| FTDock (3D-Dock) | Gabb | ○ | complex | ||
| GRAMM | Katchalski-Katzir | ○ | complex |
○: denotes the URL is available now; X: means the URL is not available nowadays; URLs: Abbreviations of UniformResourceLocators.
Evaluation parameters.
| Parameter | Meaning | Expression |
|---|---|---|
| Accuracy (ACC) | Percentage of correct prediction | |
| Sensitivity | Percentage of correctly predicted positive | |
| Specificity | Percentage of correctly predicted negative | |
| Strength | Mean value of the sum of sensitivity and specificity | |
| MCC | Matthews correlation coefficient | |
| Precision | Positive predictive rate | |
| F-measure | The harmonic mean of sensitivity and specificity | |
| AUC b | Probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one |
a TP = True positive number; TN = True negative number; FP = False positive number; FN = False negative number; b In AUC formulation, i takes on values from 1 to n, T is the total number of positives in the test set, and T is the number of positives that score higher than the ith highest scoring negative.
Performance of the state-of-the-art methods for RNA-binding site prediction.
| Methods | Data Set | Performance | Reference | Feature | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | SEN | SPE | AUC | MCC | Strength | F-Measure | Precision | ||||
| PiRaNhA | RB75 | - | - | - | 0.822 | 0.435 | - | - | - | [ | Sequence-based |
| PPRInt | RB75 | - | - | - | 0.779 | 0.339 | - | - | - | [ | |
| RB172 | 0.71 | - | 0.25 | 0.66 | - | - | [ | ||||
| RB344 | 0.70 | 0.45 | 0.82 | 0.68 | 0.28 | - | 0.49 | 0.53 | [ | ||
| BindN | RB75 | - | - | - | 0.733 | 0.297 | - | - | - | [ | |
| RB172 | 0.75 | - | - | - | 0.23 | 0.64 | - | - | [ | ||
| BindN+ | RB75 | - | - | - | 0.821 | 0.397 | - | - | - | [ | |
| RB172 | 0.79 | - | - | - | 0.34 | 0.71 | - | - | [ | ||
| RB344 | 0.72 | 0.32 | 0.89 | 0.68 | 0.26 | - | 0.41 | 0.56 | [ | ||
| RNABindR | RB75 | - | - | - | 0.708 | 0.317 | - | - | - | [ | |
| RNABindR v2.0 | RB172 | 0.66 | - | - | - | 0.27 | 0.69 | - | - | [ | |
| PRBR | RB75 | - | - | - | N/A a | 0.294 | - | - | - | [ | |
| NAPS | RB75 | - | - | - | 0.679 | 0.215 | - | - | - | [ | |
| RB172 | 0.66 | - | - | - | 0.17 | 0.61 | - | - | [ | ||
| RNAProB | RB172 | 0.82 | - | - | - | 0.22 | 0.60 | - | - | [ | |
| KYG * | RB75 | - | - | - | N/A | 0.382 | - | - | - | [ | Structure-based |
| DRNA * | RB75 | - | - | - | N/A | 0.382 | - | - | - | [ | |
| RB344 | 0.75 | 0.21 | 0.94 | N/A | 0.22 | - | 0.31 | 0.54 | [ | ||
| OPRA * | RB75 | - | - | - | N/A | 0.296 | - | - | - | [ | |
| Ren’s method | RB344 | 0.68 | 0.48 | 0.76 | 0.68 | 0.26 | - | 0.48 | 0.48 | [ | |
| Meta-predictor b | RB75 | - | - | - | 0.835 | 0.460 | - | - | - | [ | |
a N/A—not available; MCC—Matthews Correlation Coefficient; AUC—area under curve; SEN—sensitivity; SPE—specificity; b Meta-predictor developed based on top three sequence-based methods according to authors benchmark (PiRaNhA, PPRInt and BindN+); * The meta-predictor is composed of those methods labeled with asterisk.