| Literature DB >> 26681179 |
Abstract
Computational prediction of nucleic acid binding sites in proteins are necessary to disentangle functional mechanisms in most biological processes and to explore the binding mechanisms. Several strategies have been proposed, but the state-of-the-art approaches display a great diversity in i) the definition of nucleic acid binding sites; ii) the training and test datasets; iii) the algorithmic methods for the prediction strategies; iv) the performance measures and v) the distribution and availability of the prediction programs. Here we report a large-scale assessment of 19 web servers and 3 stand-alone programs on 41 datasets including more than 5000 proteins derived from 3D structures of protein-nucleic acid complexes. Well-defined binary assessment criteria (specificity, sensitivity, precision, accuracy…) are applied. We found that i) the tools have been greatly improved over the years; ii) some of the approaches suffer from theoretical defects and there is still room for sorting out the essential mechanisms of binding; iii) RNA binding and DNA binding appear to follow similar driving forces and iv) dataset bias may exist in some methods.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26681179 PMCID: PMC4683125 DOI: 10.1371/journal.pcbi.1004639
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Binding site definition and assessment metrics can result in accuracy variation.
A) Binding site definition of Zif268 protein based on different distance cutoffs. ‘+’ marks the binding sites while ‘-’ marks non-binding sites. With a distance cutoff of 6.0Å, 40 residues are defined as binding sites, which is twice that obtained with a cutoff of 3.5Å; B) Two metrics to measure prediction accuracy in terms of AUC. Old metric mix all the residues from all the proteins together for comparison, then measure AUC on the mixed data. Metric in this work measures AUC for each protein and average the AUC values considering protein length. C) A scheme to illustrate the irrelevant comparison between binding sites of a protein and the non-binding sites on another protein. As protein A and protein B may have different size of nucleic acid binding region and binding affinity they are possible to have different energy funnels. The dashed region shows the binding region of the two proteins. Binary assessment, which mixes all residues together, will certainly include comparison between non-binding sites of protein A and binding sites of protein B, shown by the double arrowed line.
Summary of the existing approaches in nucleic acid binding site prediction.
| RNA | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sequence-based | Brief/Features | ||||||||||||||||
| Year | Name | Server | Program | Website | Binding Site Definition | PSSM | RP | ASA | HP | SS | EC | Q | SA | Training | window | Dataset | reference |
| 2006 | BindN | ✓ |
| 3.5Å | ✓ | ✓ | SVM | 11 | R107(PRINR25),D62(PDNA-62) | [ | |||||||
| 2007 | RNABindR | ✓ |
| 5Å | ✓ | NB | 25 | R147 | [ | ||||||||
| 2008 | PPRInt | ✓ |
| 6Å | ✓ | ✓ | SVM | 17 | R107 from BindN, R86 from PPRInt | [ | |||||||
| 2008 | RNAproB | 6Å/3.5Å | ✓ | SVM | 25 | R107 from BindN, R86 from PPRInt, R109 from RNABindR | [ | ||||||||||
| 2008 | PRINTR | ✓ |
| ENTANGLE | ✓ | SVM | 15 | R109 from RNABindR | [ | ||||||||
| 2008 | RISP | ✓ |
| 3.5Å | ✓ | SVM | 7 | R147 from RNABindR, R71(PRNA-71,NA) | [ | ||||||||
| 2009 | PiRaNhA | ✓ |
| 3.9A | ✓ | ✓ | ✓ | ✓ | SVM | 23 | R81 from PPRint,R42(NA) | [ | |||||
| 2010 | BindN+ | ✓ |
| 3.5Å | ✓ | ✓ | ✓ | SVM | 11 | R107(PRINR25),D62(PDNA-62) | [ | ||||||
| 2010 | NAPS | ✓ |
| 4.5Å | ✓ | ✓ | DT | 7 | R109 from RNABindR, D84 from Pro-dna, D274 from DISIS, D62(PDNA-62) | [ | |||||||
| 2010 | PRBR | ✓ |
| 3.5Å | ✓ | ✓ | RF | 11 | R180 (RBP-180) | [ | |||||||
| 2011 | Sungwook | H-bond | ✓ | ✓ | ✓ | SVM | 9 | R3149(PRI3149),R727(PRI727),R267(PRI267) | [ | ||||||||
| 2011 | SRCPred | ✓ |
| 3.5Å | ✓ | NN | 5 | R160(PRNA160) | [ | ||||||||
| 2011 | Predict_RBP | ✓ |
| ENTANGLE | ✓ | ✓ | ✓ | SVM | 15 | R107 from BindN, R86 from PPRInt, R109 from RNABindR | [ | ||||||
| 2011 | meta2 | ✓ |
| 3.5Å | Meta-server | R44, R38 from OPRA, R180 from PRBR, R111 from RNABindR, R81 from PiRaNha, R86 from KYG | [ | ||||||||||
| 2012 | Qian-Zhong | 6Å/3.5Å/ENTANGLE | ✓ | ✓ | SVM | 25 | R107 from BindN, R86 from PPRInt, R109 from RNABindR | [ | |||||||||
| 2014 | RNABindRPlus | ✓ |
| 5Å | ✓ | SVM | 21 | R28,R44,R111,R198 | [ | ||||||||
| 2015 | RBRIdent | ✓ |
| ENTANGLE | ✓ | ✓ | ✓ | RF | 9 | R281 | [ | ||||||
|
| |||||||||||||||||
| 2006 | KYG | ✓ |
| 7Å | ✓ | ✓ | Function | R86 | [ | ||||||||
| 2008 | RsiteDB | ✓ |
| 7Å | ✓ | Clustering | [ | ||||||||||
| 2008 | DR_bind1 | ✓ |
| HBPLUS | ✓ | ✓ | ✓ | Function | D56 from Susan 2003, D69, R81 | [ | |||||||
| 2009 | PRIP | ✓ |
| 5Å | ✓ | ✓ | SVM | 19 | R147(R144) and R109 from RNABindR | [ | |||||||
| 2010 | OPRA | By contact | 4Å | statistical potentials | Function | R316,R38 | [ | ||||||||||
| 2011 | DRNA | ✓ | 4.5Å | ✓ | Function | R250(RB250),R212(RB212),RBD292(NA) | [ | ||||||||||
| 2010 | Struct-NB | 5Å | ✓ | NB | R147 from RNABindR | [ | |||||||||||
| 2010 | PRNA | ✓ |
| ENTANGLE | ✓ | ✓ | ✓ | ✓ | RF | 5 | R205 | [ | |||||
| 2014 | aaRNA | ✓ |
| 3.5Å | ✓ | ✓ | ✓ | ✓ | NN | 11 | R67,R141,R205 | [ | |||||
| 2014 | RBRDetector | ✓ |
| 4.5Å, 10%rASA | ✓ | SVM | 11 | R264, R75 | [ | ||||||||
| 2014 | Xiaoyong | ENTANGLE | ✓ | ✓ | ✓ | ✓ | RF | 5 | R205(PRNA) | [ | |||||||
| 2015 | RBscore | ✓ |
| 3.5-6Å | ✓ | ✓ | ✓ | ✓ | R130,R116 | [ | |||||||
| 2015 | RNAProSite | ✓ |
| ||||||||||||||
|
| |||||||||||||||||
|
| |||||||||||||||||
| 2004 | DBS-Pred | ✓ |
| 3.5Å | ✓ | ✓ | ✓ | NN | 3 | D62(PDNA-62), NRTF-915 | [ | ||||||
| 2005 | DBS-PSSM | ✓ |
| 3.5Å | ✓ | NN | 5 | D62(PDNA-62), PDNA-RDN(NA), PDNA-NR90(NA) | [ | ||||||||
| 2006 | DNABindR | ΔASA>1 | ✓ | ✓ | ✓ | ✓ | ✓ | NB | 9 | D171 | [ | ||||||
| 2007 | DISIS | ✓ |
| 6Å | ✓ | ✓ | ✓ | SVM | 9 | D274 | [ | ||||||
| 2007 | DP-Bind | ✓ |
| 3.5Å | ✓ | kernel regression | D62(PDNA-62) | [ | |||||||||
| 2009 | ProteDNA | ✓ |
| 4.5Å | ✓ | SVM, SSEA | 11 | D253 | [ | ||||||||
| 2009 | DbindR | ✓ |
| 3.5Å | ✓ | ✓ | RF | 11 | D374 | [ | |||||||
| 2009 | SDCPred | ✓ |
| 3.5Å | ✓ | ✓ | NN | 5 | D159(PDNA159) | [ | |||||||
| 2014 | Byungkyu | H-bond | ✓ | SVM | 9 | D143 | [ | ||||||||||
|
| |||||||||||||||||
| 1999 | Hidetoshi | 3.5Å | Function | D52 | [ | ||||||||||||
| 2003 | Susan | ΔASA>1 | ✓ | Patch analysis | D56 | [ | |||||||||||
| 2005 | DBS-kernel | 4.5Å | ✓ | ✓ | SVM | D83(NA) | [ | ||||||||||
| 2005 | Pro-dna | ✓ |
| 4.5Å | ✓ | ✓ | ✓ | SVM | D99 (D96,D50) | [ | |||||||
| 2005 | PreDs | ✓ |
| 3.0Å | ✓ | Function | D63 | [ | |||||||||
| 2007 | DISPLAR | ✓ |
| 5Å | ✓ | NN | 15 | D428 | [ | ||||||||
| 2007 | DR_bind1 | ✓ |
| HBPLUS | Energy based | Function | D56 from Susan 2003, D69, R81 | [ | |||||||||
| 2008 | DBD-Hunter | ✓ |
| 4.5Å | ✓ | Function | D179(DB179) | [ | |||||||||
| 2010 | DNABINDPROT | ✓ |
| NUCPLOT | ✓ | ✓ | GNM | 3 | D54 | [ | |||||||
| 2011 | metaDBSite | ✓ |
| 3.5Å | Meta-server | NA | D316(PDNA-316),D232(PDNA-232) | [ | |||||||||
| 2011 | Xiong | 4.5Å, 10%rASA | ✓ | ✓ | SVM | 11 | D206 | [ | |||||||||
| 2012 | Sucharita | ΔASA>0.1 | ✓ | SVM | D130(NA) | [ | |||||||||||
| 2013 | Duo-Duo | 4.0Å | ✓ | SVM | 11 | D62(PDNA-62) | [ | ||||||||||
| 2013 | PreDNA | ✓ |
| 3.5Å | ✓ | ✓ | ✓ | SVM | 11 | D62(PDNA-62), D224 | [ | ||||||
| 2013 | DNABind | ✓ |
| 4.5Å, 10%rASA | ✓ | ✓ | SVM, template | 11 | D206 | [ | |||||||
| 2014 | Bi-Qing | 6Å | ✓ | ✓ | ✓ | SVM | 9 | D90 | [ | ||||||||
PSSM: position specific scoring matrix derived from sequence alignment
RP: residue propensity
ASA: accessible surface area
HP: hydrophobicity
SS: secondary structure
EC: conservation entropy
Q: electrostatic/pKa
SA: structural alignment
Minimum performance of all programs.
| DBP datasets | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Sequence-based | ||||||||||||
| Programs | Distance Cutoff | Dataset | wAUC | mAUC | sAUC | tAUC | SEN | SPC | PPV | ACC | F1 | MCC |
| DBS-Pred | 6 | RBscore_D381 | - | - | - | - | 0.418 | 0.781 | 0.214 | 0.736 | 0.283 | 0.154 |
| ProteDNA | 6 | RBscore_D381 | - | - | - | - | 0.029 | 0.999 | 0.799 | 0.878 | 0.055 | 0.137 |
| BindN+_DNA | 6 | New_D31 |
| 0.780 | 0.091 | 0.776 | 0.421 | 0.908 | 0.358 | 0.856 | 0.387 | 0.307 |
| RNABindR | 6 | New_D31 |
| 0.753 | 0.116 | 0.779 | 0.648 | 0.759 | 0.246 | 0.747 | 0.356 | 0.280 |
| RNABindRPlus | 6 | New_D31 |
| 0.732 | 0.117 | 0.759 | 0.229 | 0.956 | 0.390 | 0.878 | 0.289 | 0.236 |
| DBS-PSSM | 5.5 | New_D31 | 0.688 | 0.728 | 0.111 | 0.748 | 0.541 | 0.825 | 0.265 | 0.795 | 0.356 | 0.273 |
| RBscore_SVM | 6 | New_D31 | 0.684 | 0.703 | 0.104 | 0.737 | 0.000 | 1.000 | 0.000 | 0.892 | 0.000 | 0.000 |
| BindN_DNA | 6 | Luscombe_D129 | 0.654 | 0.678 | 0.093 | 0.680 | 0.341 | 0.856 | 0.282 | 0.783 | 0.308 | 0.182 |
| RBRIdent | 4.5 | New_D31 | 0.652 | 0.683 | 0.104 | 0.655 | 0.105 | 0.953 | 0.167 | 0.883 | 0.129 | 0.071 |
| PPRInt | 4 | New_D31 | 0.652 | 0.691 | 0.105 | 0.690 | 0.450 | 0.799 | 0.148 | 0.774 | 0.222 | 0.155 |
| xypan | 6 | DNABINDPROT_D54 | 0.635 | 0.628 | 0.109 | 0.641 | 0.092 | 0.982 | 0.469 | 0.850 | 0.153 | 0.156 |
| PRNA | 6 | New_D31 | 0.626 | 0.653 | 0.110 | 0.646 | 0.336 | 0.866 | 0.234 | 0.809 | 0.276 | 0.173 |
| PRBR | 6 | ProteDNA_D253 | 0.612 | 0.587 | 0.139 | 0.602 | 0.317 | 0.824 | 0.362 | 0.702 | 0.338 | 0.148 |
| Predict_RBP | 4.5 | New_D31 |
| 0.554 | 0.130 | 0.556 | 0.000 | 1.000 | 0.000 | 0.917 | 0.000 | 0.000 |
| Structure-based | ||||||||||||
| DISPLAR | 6 | New_D31 | - | - | - | - | 0.359 | 0.939 | 0.416 | 0.876 | 0.386 | 0.318 |
| DNABINDPROT | 5.5 | BindN_D62 | - | - | - | - | 0.075 | 0.929 | 0.233 | 0.737 | 0.113 | 0.005 |
| RBscore | 6 | New_D31 |
| 0.839 | 0.089 | 0.843 | 0.438 | 0.918 | 0.395 | 0.866 | 0.415 | 0.341 |
| aaRNA | 6 | New_D31 |
| 0.813 | 0.060 | 0.834 | 0.552 | 0.891 | 0.381 | 0.854 | 0.451 | 0.378 |
| RNAProSite | 6 | Luscombe_D129 |
| 0.803 | 0.081 | 0.790 | 0.760 | 0.696 | 0.290 | 0.705 | 0.420 | 0.329 |
| DNABind | 6 | RBscore_D381 |
| 0.798 | 0.152 | 0.774 | 0.614 | 0.898 | 0.492 | 0.859 | 0.546 | 0.467 |
| KYG | 6 | DNABINDPROT_D54 | 0.707 | 0.703 | 0.074 | 0.712 | 0.444 | 0.802 | 0.281 | 0.749 | 0.344 | 0.206 |
| RBP datasets | ||||||||||||
| Sequence-based | ||||||||||||
| DBS-Pred | 6 | aaRNA_R141 | - | - | - | - | 0.359 | 0.790 | 0.190 | 0.738 | 0.248 | 0.116 |
| ProteDNA | 4.5 | Sungwook_R267 | - | - | - | - | 0.001 | 0.999 | 0.140 | 0.888 | 0.002 | 0.003 |
| RNABindRPlus | 6 | RNABindR_R111 |
| 0.725 | 0.113 | 0.720 | 0.321 | 0.914 | 0.316 | 0.848 | 0.319 | 0.233 |
| RNABindR | 6 | New_R15 |
| 0.756 | 0.093 | 0.737 | 0.661 | 0.681 | 0.279 | 0.678 | 0.393 | 0.258 |
| RBscore_SVM | 6 | New_R15 | 0.687 | 0.697 | 0.135 | 0.697 | 0.046 | 0.987 | 0.396 | 0.839 | 0.083 | 0.090 |
| DBS-PSSM | 5.5 | New_R15 | 0.670 | 0.695 | 0.089 | 0.672 | 0.473 | 0.780 | 0.277 | 0.734 | 0.349 | 0.207 |
| RBRIdent | 4 | New_R15 | 0.670 | 0.699 | 0.127 | 0.671 | 0.169 | 0.952 | 0.313 | 0.863 | 0.219 | 0.160 |
| PRBR | 6 | New_R15 | 0.667 | 0.680 | 0.112 | 0.664 | 0.305 | 0.877 | 0.317 | 0.787 | 0.311 | 0.185 |
| BindN+_RNA | 6 | New_R15 | 0.667 | 0.687 | 0.089 | 0.672 | 0.378 | 0.835 | 0.300 | 0.763 | 0.334 | 0.194 |
| PPRInt | 5.5 | New_R15 | 0.637 | 0.658 | 0.131 | 0.648 | 0.356 | 0.814 | 0.255 | 0.745 | 0.297 | 0.150 |
| PRNA | 6 | New_R15 | 0.612 | 0.629 | 0.141 | 0.624 | 0.295 | 0.865 | 0.290 | 0.775 | 0.293 | 0.159 |
| xypan | 6 | New_R15 | 0.609 | 0.631 | 0.106 | 0.620 | 0.115 | 0.981 | 0.534 | 0.845 | 0.189 | 0.193 |
| BindN_RNA | 4 | New_R15 | 0.608 | 0.642 | 0.121 | 0.626 | 0.324 | 0.842 | 0.209 | 0.783 | 0.254 | 0.139 |
| Predict_RBP | 6 | New_R15 |
| 0.581 | 0.155 | 0.576 | 0.022 | 1.000 | 1.000 | 0.846 | 0.043 | 0.136 |
| Structure-based | ||||||||||||
| DISPLAR | 6 | Sungwook_R267 | - | - | - | - | 0.214 | 0.956 | 0.433 | 0.856 | 0.287 | 0.234 |
| DNABINDPROT | 5 | Sungwook_R3149 | - | - | - | - | 0.038 | 0.948 | 0.239 | 0.677 | 0.066 | -0.029 |
| DR_bind1 | 4.5 | meta2_R44 | - | - | - | - | 0.285 | 0.942 | 0.655 | 0.758 | 0.397 | 0.311 |
| RBscore | 6 | KYG_R86 |
| 0.845 | 0.095 | 0.862 | 0.502 | 0.936 | 0.697 | 0.837 | 0.584 | 0.496 |
| RNAProSite | 6 | meta2_R44 |
| 0.805 | 0.101 | 0.798 | 0.675 | 0.770 | 0.573 | 0.740 | 0.620 | 0.427 |
| aaRNA | 5.5 | New_R15 |
| 0.785 | 0.070 | 0.777 | 0.539 | 0.854 | 0.396 | 0.806 | 0.457 | 0.348 |
| KYG | 6 | Sungwook_R267 | 0.685 | 0.690 | 0.073 | 0.685 | 0.362 | 0.816 | 0.235 | 0.755 | 0.285 | 0.150 |
| DNABind | 4.5 | Sungwook_R267 | 0.570 | 0.608 | 0.149 | 0.600 | 0.260 | 0.827 | 0.169 | 0.759 | 0.205 | 0.073 |
-: binary predictors and AUC not applicable.
Summary of datasets used in tests.
| Name | Protein# | After screening | Reference | Seq id | Str id | resolution |
|---|---|---|---|---|---|---|
| BindN_R107 | 107 | 95 | [ | 25 | NA | 3.5 |
| DR_bind1_R69 | 69 | 69 | [ | NA | CATH | 3 |
| DR_bind1_R81 | 81 | 79 | [ | NA | CATH | 3 |
| KYG_R86 | 86 | 85 | [ | 50 | NA | NA |
| PPRInt_R86 | 86 | 83 | [ | 70 | NA | 3 |
| PRNA_R205 | 205 | 189 | [ | 25 | NA | 3 |
| SRCPred_R160 | 160 | 124 | [ | 25 | NA | NA |
| PRBR_R180 | 180 | 142 | [ | 25 | NA | 3.5 |
| RNABindR_R106 | 106 | 100 | [ | NA | NA | NA |
| RNABindR_R109 | 109 | 100 | [ | 30 | NA | 3.5 |
| RNABindR_R144 | 144 | 137 | [ | NA | NA | NA |
| RNABindR_R147 | 147 | 138 | [ | 30 | NA | 3.5 |
| RNABindR_R198 | 198 | 187 | [ | 30 | NA | 3.5 |
| RNABindR_R111 | 111 | 101 | [ | 30 | NA | 3.5 |
| meta2_R44 | 44 | 44 | [ | 40 | NA | NA |
| aaRNA_R67 | 67 | 67 | [ | 30 | NA | NA |
| aaRNA_R141 | 141 | 136 | [ | 25 | NA | 3 |
| aaRNA_R205 | 205 | 200 | [ | 25 | NA | 3 |
| RBscore_R130 | 130 | 130 | [ | 25 | TMscore<0.7 | 3.5 |
| RBscore_R116 | 117 | 116 | [ | 25 | TMscore<0.7 | 3.5 |
| Sungwook_R267 | 267 | 178 | [ | 60 | NA | 3 |
| Sungwook_R727 | 727 | 574 | [ | NA | NA | 3 |
| Sungwook_R3149 | 3149 | 2632 | [ | NA | NA | 3 |
| New_R15 | 15 | 15 | 25 | NA | 5 | |
| BindN_D62 | 62 | 66 | [ | 25 | NA | NA |
| ProteDNA_D253 | 253 | 253 | [ | 20 | NA | 3.5 |
| Pro-dna_D99 | 99 | 188 | [ | 20 | NA | 3 |
| Hidetoshi_D52 | 52 | 49 | [ | NA | NA | 3.2 |
| Shandar_D140 | 140 | 138 | [ | 25 | NA | 2.5 |
| Susan_D56 | 56 | 54 | [ | NA | CATH | 3 |
| DBD-Hunter_D179 | 179 | 177 | [ | 35 | NA | 3 |
| Luscombe_D129 | 129 | 182 | [ | NA | NA | 3 |
| DBindR_D374 | 374 | 329 | [ | 25 | NA | 3.5 |
| DISPLAR_D428 | 428 | 390 | [ | 50 | NA | NA |
| DNABINDPROT_D54 | 54 | 50 | [ | NA | NA | NA |
| PreDNA_D224 | 224 | 216 | [ | 25 | NA | 3 |
| RBscore_D381 | 381 | 381 | [ | 25 | NA | 3.5 |
| metaDBSite_D232 | 232 | 225 | [ | 30 | NA | 3 |
| metaDBSite_D316 | 316 | 308 | [ | 30 | NA | 3 |
| SDCPred_D159 | 159 | 158 | [ | 25 | NA | 2.5 |
| New_D31 | 31 | 31 | 25 | NA | 5 | |
| RBscore_P627 | 628 | 627 | [ | NA | NA | 3.5 |
| All_P5114 | 5058 | 5058 | NA | NA | NA |
NA: not applicable