| Literature DB >> 20507911 |
Yoichi Murakami1, Ruth V Spriggs, Haruki Nakamura, Susan Jones.
Abstract
The PiRaNhA web server is a publicly available online resource that automatically predicts the location of RNA-binding residues (RBRs) in protein sequences. The goal of functional annotation of sequences in the field of RNA binding is to provide predictions of high accuracy that require only small numbers of targeted mutations for verification. The PiRaNhA server uses a support vector machine (SVM), with position-specific scoring matrices, residue interface propensity, predicted residue accessibility and residue hydrophobicity as features. The server allows the submission of up to 10 protein sequences, and the predictions for each sequence are provided on a web page and via email. The prediction results are provided in sequence format with predicted RBRs highlighted, in text format with the SVM threshold score indicated and as a graph which enables users to quickly identify those residues above any specific SVM threshold. The graph effectively enables the increase or decrease of the false positive rate. When tested on a non-redundant data set of 42 protein sequences not used in training, the PiRaNhA server achieved an accuracy of 85%, specificity of 90% and a Matthews correlation coefficient of 0.41 and outperformed other publicly available servers. The PiRaNhA prediction server is freely available at http://www.bioinformatics.sussex.ac.uk/PIRANHA.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20507911 PMCID: PMC2896099 DOI: 10.1093/nar/gkq474
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Example PiRaNhA server prediction results for 30S ribosomal protein S9 (PDB-ID 2J00, chain I). (I) The sequence format webpage where the predicted RBRs are highlighted in red. (II) The text format results, which includes the sequence and the SVM values that can be downloaded. (III) The graphical interpretation of the results in which the submitted sequence is plotted against SVM threshold values. The x-axis shows the submitted sequence and the y-axis the threshold for the prediction. The optimal SVM threshold value (0.4411) is rescaled to zero, for ease of interpretation. The graph has a ‘click and drag’ zoom function to enable the easy highlighting of residues (IV) above a desired threshold to produce a finer grained graph (V).
Figure 2.Two example PiRaNhA predictions; (A) 30S ribosomal protein S9 (PDB-ID 2J00, chain I), (B) CCA-adding enzyme (PDB-ID 2DRB, chain A). In the left panel, the three-dimensional structures of the protein–RNA complexes are shown, where TP, FP and FN are highlighted as cyan, yellow and pink CPK spheres, respectively, and the remaining protein residues and RNA nucleotides are represented as red and green sticks, respectively. In the right panel, the protein sequence is shown. Non-RBRs are indicated with the symbol ‘−’ and RBRs with ‘+’. The TPs are highlighted in red. The prediction performance: TP (true positives), FP (false positives), TN (true negatives), FN (false negatives), Sn (sensitivity), Sp (specificity), Acc (accuracy), MCC (Mathews Correlation Coefficient) and precision are listed for each example.