| Literature DB >> 21183467 |
Huiying Zhao1, Yuedong Yang, Yaoqi Zhou.
Abstract
Mechanistic understanding of many key cellular processes often involves identification of RNA binding proteins (RBPs) and RNA binding sites in two separate steps. Here, they are predicted simultaneously by structural alignment to known protein-RNA complex structures followed by binding assessment with a DFIRE-based statistical energy function. This method achieves 98% accuracy and 91% precision for predicting RBPs and 93% accuracy and 78% precision for predicting RNA-binding amino-acid residues for a large benchmark of 212 RNA binding and 6761 non-RNA binding domains (leave-one-out cross-validation). Additional tests revealed that the method makes no false positive prediction from 311 DNA binding domains but correctly detects six domains binding with both DNA and RNA. In addition, it correctly identified 31 of 75 unbound RNA-binding domains with 92% accuracy and 65% precision for predicted binding residues and achieved 86% success rate in its application to SCOP RNA binding domain superfamily (Structural Classification Of Proteins). It further predicts 25 targets as RBPs in 2076 structural genomics targets: 20 of 25 predicted ones (80%) are putatively RNA binding. The superior performance over existing methods indicates the importance of dividing structures into domains, using a Z-score to measure relative structural similarity, and a statistical energy function to measure protein-RNA binding affinity.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21183467 PMCID: PMC3082898 DOI: 10.1093/nar/gkq1266
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Distribution of the top TM-Score-ranked templates on RB212/NB6761.
Figure 2.Distribution of the top Z-score ranked templates on RB212/NB6761.
Figure 3.Sensitivity verus false positive rate, given by TM-align (cross), PSIBLAST (open triangle), Z-score (open diamond), TM-Score combining with the DRNA energy score (closed circle) and Z-score combining with the DRNA energy score (solid line).
Targets are predicted as RNA-binding on HOLO set but not on APO set
| HOLO | APO | TMHA | SeqID | TMP | TMH | ZHT | EH | TMAT | ZAT | EA |
|---|---|---|---|---|---|---|---|---|---|---|
| 2atwA2 | 1hh2P3 | 0.95 | 47.9 | 2asbA3 | 0.66 | 1.4 | −17.4 | 0.57 | 0.98 | −14.7 |
| 1uvlA | 1hi8B | 0.98 | 96.2 | 2r7xA | 0.43 | 1.2 | −27.9 | 0.42 | 1.1 | −25.9 |
| 2j03S | 1ovyA | 0.56 | 54.3 | 1jj2M | 0.60 | 1.2 | −59.3 | 0.46 | 1.1 | −37.3 |
aTargets from HOLO set.
bTargets from APO set.
cTM-Score between HOLO and APO targets.
dSequence identity between APO and HOLO target calculated by bl2seq in blast2.2.
eTemplate for HOLO target.
fTM-score between template and HOLO target.
gZ-score between HOLO target and template.
hBinding energy of template RNA–HOLO target complex.
iTM-score of APO target and template.
jZ-score of APO target and template.
kBinding energy of template RNA–APO target complex;
DNA binding proteins predicated as RBPs from the DB311 set
| Target | Template | SeqID (%) | Sens. | Speci. | Acc. | Prec. | MCC |
|---|---|---|---|---|---|---|---|
| 2nvqB1 | 2o5iM | 12.7 | 1.00 | 0.99 | 0.99 | 0.44 | 0.66 |
| 2nvqB2 | 2o5iM | 8.7 | 1.00 | 0.99 | 0.99 | 0.63 | 0.79 |
| 2qkbA | 1zbiB | 15.8 | 0.52 | 0.99 | 0.89 | 0.94 | 0.64 |
| 2p6rA1 | 2db3A | 15.6 | – | – | – | – | – |
| 1p7hN | 1ooaA1 | 22.4 | – | – | – | – | – |
| 1sfuA | 2gxbB | 27.1 | – | – | – | – | – |
Figure 4.The native binding regions of DNA (in orange) and RNA (in red) of the target domain 1 of Chain B (PDB ID#2nvqB1) is compared to the predicted RNA binding region (in green) as well as the corresponding RNA binding region of the template (2o5iM) (in green, all binding region in blue). The native RNA binding region of the target is completely overlapped with the predicted region (red is a part of green). For clarity, residue index of >1000 for 2o5i is not shown.
Structural genomics targets (SG2076) predicated as RBPs
| Target | Template | TM-score | Energy | Function | |
|---|---|---|---|---|---|
| 1vhyA1 | 2rfkA2 | 0.56 | 1.5 | −14.0 | RB |
| 1nnhA | 1asyA2 | 0.78 | 2.8 | −13.5 | RB |
| 1nzjA | 1gaxA1 | 0.49 | 1.2 | −16.8 | RB |
| 2oceA5 | 2ix1A4 | 0.65 | 1.4 | −12.2 | UK |
| 2f96A | 2a1rB | 0.57 | 1.4 | −13.5 | RB |
| 2cphA | 1fxlA2 | 0.70 | 1.3 | −17.9 | RB |
| 3cymA1 | 2a1rB | 0.56 | 1.3 | −11.9 | RB |
| 1tuaA1 | 1ec6A | 0.68 | 1.4 | −11.5 | RB |
| 2q07A2 | 1r3eA2 | 0.67 | 2.1 | −10.9 | RB |
| 1yvcA | 2bh2A1 | 0.72 | 1.8 | −13.5 | RB |
| 1t5yA2 | 1r3eA2 | 0.77 | 2.8 | −15.3 | RB |
| 3go5A2 | 2ix1A4 | 0.68 | 1.5 | −13.7 | RB |
| 2k52A | 2ix1A4 | 0.63 | 1.3 | −12.4 | RB |
| 1zkpA | 2fk6A | 0.78 | 2.3 | −15.9 | RB |
| 1x40A | 2f8kA | 0.62 | 1.3 | −10.8 | UB |
| 2ogkD | 1jj2D | 0.62 | 1.8 | −25.5 | RB |
| 2cpfA | 1fxlA2 | 0.74 | 1.5 | −12.0 | RB |
| 1yezA | 2bh2A1 | 0.69 | 1.6 | −14.9 | RB |
| 2e5hA | 1fxlA2 | 0.74 | 1.5 | −13.3 | RB |
| 3frnA3 | 1jj2J | 0.51 | 1.2 | −20.4 | UK |
| 2jz2A | 1jj2P | 0.59 | 1.3 | −33.5 | UK |
| 3ir9A | 1rlgB | 0.56 | 1.2 | −11.5 | UK |
| 3hp7A1 | 1h3eA2 | 0.63 | 1.4 | −12.5 | RB |
| 1wi6A | 1fxlA2 | 0.70 | 1.3 | −17.6 | RB |
| 1wdtA4 | 1fjgI | 0.55 | 1.4 | −29.7 | RB |
aTargets are annotated as having putative functions related to RNA binding in the NCBI database.
bFunction unknown.
cNon-RNA binding