| Literature DB >> 16845003 |
Liangjiang Wang1, Susan J Brown.
Abstract
BindN (http://bioinformatics.ksu.edu/bindn/) takes an amino acid sequence as input and predicts potential DNA or RNA-binding residues with support vector machines (SVMs). Protein datasets with known DNA or RNA-binding residues were selected from the Protein Data Bank (PDB), and SVM models were constructed using data instances encoded with three sequence features, including the side chain pK(a) value, hydrophobicity index and molecular mass of an amino acid. The results suggest that DNA-binding residues can be predicted at 69.40% sensitivity and 70.47% specificity, while prediction of RNA-binding residues achieves 66.28% sensitivity and 69.84% specificity. When compared with previous studies, the SVM models appear to be more accurate and more efficient for online predictions. BindN provides a useful tool for understanding the function of DNA and RNA-binding proteins based on primary sequence data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16845003 PMCID: PMC1538853 DOI: 10.1093/nar/gkl298
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Performance of the SVMs for prediction of DNA and RNA binding residues in proteins
| Prediction type | Accuracy (%) | Sensitivity (%) | Specificity (%) | ROC AUC |
|---|---|---|---|---|
| DNA-binding | 70.31 | 69.40 | 70.47 | 0.7524 |
| RNA-binding | 69.32 | 66.28 | 69.84 | 0.7308 |
Figure 1ROC curves for prediction of DNA and RNA-binding residues with SVMs.
Performance comparison of the web servers for prediction of DNA-binding residues
| Web server | Accuracy (%) | Sensitivity (%) | Specificity (%) |
|---|---|---|---|
| BindN | 72.18 | 65.22 | 72.84 |
| DBS-PSSM | 67.82 | 36.73 | 70.79 |
Figure 2Representative prediction results shown in the context of three-dimensional structures. In each complex, the correctly predicted binding residues (true positives) are in red and spacefill; the correctly predicted non-binding residues (true negatives) are in green and wireframe; the binding residues but predicted as negatives (false negatives) are in blue and spacefill; the non-binding residues but predicted as positives (false positives) are in yellow and spacefill; the nucleic acid molecule is shown in purple. (a) Putative DNA-binding residues predicted for the mouse ETS-1 transcription factor. The structure (PDB ID: 1K79) includes residues 331–440 of the ETS-1 protein. Chain D of 1K79 was used as the input sequence to BindN with the expected specificity set to 90%. (b) Putative RNA-binding residues predicted for the box C/D RNA-binding domain of the archaeal protein L7Ae. Chain B of the structure (PDB ID: 1RLG) was used for BindN prediction with the expected specificity set to 90%.
Figure 3Sample output from the BindN server. Putative DNA-binding residues were predicted for the Arabidopsis transcription factor WRKY1 (residues 301–380). Most of the positive predictions are located in the putative WRKY DNA-binding domain, for which structural data are still not available.