| Literature DB >> 24343026 |
Yao Chi Chen1, Karen Sargsyan, Jon D Wright, Yi-Shuian Huang, Carmay Lim.
Abstract
Increasing numbers of protein structures are solved each year, but many of these structures belong to proteins whose sequences are homologous to sequences in the Protein Data Bank. Nevertheless, the structures of homologous proteins belonging to the same family contain useful information because functionally important residues are expected to preserve physico-chemical, structural and energetic features. This information forms the basis of our method, which detects RNA-binding residues of a given RNA-binding protein as those residues that preserve physico-chemical, structural and energetic features in its homologs. Tests on 81 RNA-bound and 35 RNA-free protein structures showed that our method yields a higher fraction of true RNA-binding residues (higher precision) than two structure-based and two sequence-based machine-learning methods. Because the method requires no training data set and has no parameters, its precision does not degrade when applied to 'novel' protein sequences unlike methods that are parameterized for a given training data set. It was used to predict the 'unknown' RNA-binding residues in the C-terminal RNA-binding domain of human CPEB3. The two predicted residues, F430 and F474, were experimentally verified to bind RNA, in particular F430, whose mutation to alanine or asparagine nearly abolished RNA binding. The method has been implemented in a webserver called DR_bind1, which is freely available with no login requirement at http://drbind.limlab.ibms.sinica.edu.tw.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24343026 PMCID: PMC3919582 DOI: 10.1093/nar/gkt1299
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Performance of DR_bind1 based on 81 RNA-bound protein structures compared to that of KYG, OPRA, BindN+ or Pprint using default settings
| DR_bind1 | KYG | OPRA | BindN+ | Pprint | |
|---|---|---|---|---|---|
| TP | 166 | 1820 | 1021 | 2235 | 2516 |
| FP | 75 | 2916 | 1018 | 1868 | 3534 |
| TN | 14 628 | 11 787 | 13 685 | 12 835 | 11 169 |
| FN | 2892 | 1238 | 2037 | 823 | 542 |
| Sensitivity | 0.05 | 0.60 | 0.33 | 0.73 | 0.82 |
| Specificity | 0.99 | 0.80 | 0.93 | 0.87 | 0.76 |
| Precision | 0.69 | 0.38 | 0.50 | 0.54 | 0.42 |
| Accuracy | 0.83 | 0.77 | 0.83 | 0.85 | 0.77 |
| MCC | 0.16 | 0.34 | 0.31 | 0.54 | 0.46 |
Performance of DR_bind1 based on 35 RNA-free protein structures compared to that of KYG, OPRA, BindN+ or Pprint using default settingsa
| DR_bind1 | KYG | OPRA | BindN+ | Pprint | |
|---|---|---|---|---|---|
| TP | 47 (64) | 457 (528) | 179 (224) | 554 (601) | 673 (735) |
| FP | 46 (45) | 1699 (1688) | 440 (549) | 1007 (1001) | 1773 (1798) |
| TN | 8307 (8522) | 6654 (6879) | 7913 (8018) | 7346 (7566) | 6580 (6769) |
| FN | 903 (967) | 493 (503) | 771 (807) | 396 (430) | 277 (296) |
| Sensitivity | 0.05 (0.06) | 0.48 (0.51) | 0.19 (0.22) | 0.58 (0.58) | 0.71 (0.71) |
| Specificity | 0.99 (0.99) | 0.80 (0.80) | 0.95 (0.94) | 0.88 (0.88) | 0.79 (0.79) |
| Precision | 0.51 (0.59) | 0.21 (0.24) | 0.29 (0.29) | 0.35 (0.38) | 0.28 (0.29) |
| Accuracy | 0.90 (0.89) | 0.76 (0.77) | 0.87 (0.86) | 0.85 (0.85) | 0.78 (0.78) |
| MCC | 0.13 (0.17) | 0.20 (0.23) | 0.16 (0.17) | 0.37 (0.39) | 0.34 (0.35) |
aNumbers with and without parentheses are based on the RNA-bound and free protein structures, respectively.
Performance of DR_bind1 based on 41 ribosomal (or 40 nonribosomal) RNA-bound protein structures compared to that of KYG, OPRA, BindN+ or Pprint using default settingsa
| DR_bind1 | KYG | OPRA | BindN+ | Pprint | |
|---|---|---|---|---|---|
| TP | 102 (64) | 1334 (486) | 931 (90) | 1679 (556) | 1782 (734) |
| FP | 19 (56) | 812 (2104) | 593 (425) | 730 (1138) | 1406 (2128) |
| TN | 3673 (10 955) | 2880 (8907) | 3099 (10 586) | 2962 (9873) | 2286 (8883) |
| FN | 1883 (1009) | 651 (587) | 1054 (983) | 306 (517) | 203 (339) |
| Sensitivity | 0.05 (0.06) | 0.67 (0.45) | 0.47 (0.08) | 0.85 (0.52) | 0.90 (0.68) |
| Specificity | 0.99 (0.99) | 0.78 (0.81) | 0.84 (0.96) | 0.80 (0.90) | 0.62 (0.81) |
| Precision | 0.84 (0.53) | 0.62 (0.19) | 0.61 (0.17) | 0.70 (0.33) | 0.56 (0.26) |
| Accuracy | 0.66 (0.91) | 0.74 (0.78) | 0.71 (0.88) | 0.82 (0.86) | 0.72 (0.80) |
| MCC | 0.15 (0.16) | 0.44 (0.18) | 0.33 (0.06) | 0.63 (0.34) | 0.50 (0.33) |
aNumbers with and without parentheses were derived from 40 nonribosomal and 41 ribosomal RNA-bound protein structures, respectively.
Figure 1.Frequency distribution of the precision values derived from ribosomal (top) and nonribosomal (bottom) RNA-bound protein structures using DR_bind1 (black curves), KYG (gray curves), OPRA (dotted curves), BindN+(dashed curves) and Pprint (dashed dot curves). (a) Ribosomal RNA-bound protein structures. (b) Nonribosomal RNA-bound protein structures.
Performance of DR_bind1 based on 41 ribosomal (or 40 nonribosomal) RNA-bound protein structures compared to that of KYG, OPRA, BindN+ or Pprint for the same number of predictions made by DR_bind1a
| DR_bind1 | KYG | OPRA | BindN+ | Pprint | |
|---|---|---|---|---|---|
| TP | 102 (64) | 82 (33) | 76 (26) | 97 (59) | 90 (48) |
| FP | 19 (56) | 39 (87) | 45 (94) | 24 (61) | 31 (72) |
| TN | 3673 (10 955) | 3653 (10 924) | 3647 (10 917) | 3668 (10 950) | 3661 (10 939) |
| FN | 1883 (1009) | 1903 (1040) | 1909 (1047) | 1888 (1014) | 1895 (1025) |
| Sensitivity | 0.05 (0.06) | 0.04 (0.03) | 0.04 (0.02) | 0.05 (0.05) | 0.05 (0.04) |
| Specificity | 0.99 (0.99) | 0.99 (0.99) | 0.99 (0.99) | 0.99 (0.99) | 0.99 (0.99) |
| Precision | 0.84 (0.53) | 0.68 (0.28) | 0.63 (0.22) | 0.80 (0.49) | 0.74 (0.40) |
| Accuracy | 0.66 (0.91) | 0.66 (0.91) | 0.66 (0.91) | 0.66 (0.91) | 0.66 (0.91) |
| MCC | 0.15 (0.16) | 0.10 (0.07) | 0.09 (0.05) | 0.14 (0.14) | 0.12 (0.11) |
aNumbers with and without parentheses were derived from 40 nonribosomal and 41 ribosomal RNA-bound protein structures, respectively.
Figure 2.Venn diagram showing four sets of true positives predicted by DR_bind1, KYG, OPRA, BindN+ and Pprint. (a) Ribosomal true positives. (b) Nonribosomal true positives.
Performance of DR_bind1 based on 41 ribosomal (or 40 nonribosomal) RNA-bound protein structures compared to that of dRNA-3D
| Homolog structures | DR_bind1 | dRNA-3D | ||
|---|---|---|---|---|
| None | No complex | Best complex | Second best complex | |
| TP | 110 (74) | 101 (58) | 1950 (873) | 1295 (627) |
| FP | 24 (66) | 22 (54) | 173 (321) | 463 (681) |
| TN | 3668 (10 945) | 3670 (10 957) | 3519 (10 690) | 3229 (10 330) |
| FN | 1875 (999) | 1884 (1015) | 35 (200) | 690 (446) |
| Sensitivity | 0.06 (0.07) | 0.05 (0.05) | 0.98 (0.81) | 0.65 (0.58) |
| Specificity | 0.99 (0.99) | 0.99 (1) | 0.95 (0.97) | 0.87 (0.94) |
| Precision | 0.82 (0.53) | 0.82 (0.52) | 0.92 (0.73) | 0.74 (0.48) |
| Accuracy | 0.67 (0.91) | 0.66 (0.91) | 0.96 (0.96) | 0.80 (0.91) |
| MCC | 0.15 (0.17) | 0.15 (0.15) | 0.92 (0.75) | 0.54 (0.48) |
aNumbers with and without parentheses were derived from 40 nonribosomal and 41 ribosomal RNA-bound protein structures, respectively.
bNumbers were derived without free/complex structures of homologs.
cNumbers were derived without complex structures of homologs.
dNumbers were derived based on the best matching complex structure.
eNumbers were derived based on the second best matching complex structure.
Figure 3.Experimental evaluation of the predicted RNA-interacting aa residues in CPEB3. (a) Salient features of CPEB3 showing the N-terminal glutamine-rich region (Q) and the C-terminal RBD composed of two RRMs and zinc fingers (Zif). The myc-tagged wt and the RRM1-deleted (ΔRRM1) hCPEB3 are shown. All point mutations are located in the RRM1 domain. (b) The 293T lysates containing wt or various mutant CPEB3 proteins were cross-linked with the radiolabeled 1904 RNA probe for RNA-binding assay or used for western blotting with myc antibody. (c) The normalized RNA-binding abilities of various CPEB3 mutants were expressed relative to the wt CPEB3, which was arbitrarily set to 1. Gray and black bars indicate that the two sets of experiments were conducted separately. The data from three independent experiments were expressed as mean ± standard deviation. One and two asterisks denote the statistical significance, *P < 0.05 and **P < 0.001, respectively, from the Student’s t-test.