Tao Li1, Qian-Zhong Li, Shuai Liu, Guo-Liang Fan, Yong-Chun Zuo, Yong Peng. 1. Laboratory of Theoretical Biophysics, School of Physical Sciences and Technology, College of Computer Science and The National Research Center for Animal Transgenic Biotechnology, Inner Mongolia University, Hohhot, 010021, China.
Abstract
MOTIVATION: Protein-DNA interactions often take part in various crucial processes, which are essential for cellular function. The identification of DNA-binding sites in proteins is important for understanding the molecular mechanisms of protein-DNA interaction. Thus, we have developed an improved method to predict DNA-binding sites by integrating structural alignment algorithm and support vector machine-based methods. RESULTS: Evaluated on a new non-redundant protein set with 224 chains, the method has 80.7% sensitivity and 82.9% specificity in the 5-fold cross-validation test. In addition, it predicts DNA-binding sites with 85.1% sensitivity and 85.3% specificity when tested on a dataset with 62 protein-DNA complexes. Compared with a recently published method, BindN+, our method predicts DNA-binding sites with a 7% better area under the receiver operating characteristic curve value when tested on the same dataset. Many important problems in cell biology require the dense non-linear interactions between functional modules be considered. Thus, our prediction method will be useful in detecting such complex interactions.
MOTIVATION: Protein-DNA interactions often take part in various crucial processes, which are essential for cellular function. The identification of DNA-binding sites in proteins is important for understanding the molecular mechanisms of protein-DNA interaction. Thus, we have developed an improved method to predict DNA-binding sites by integrating structural alignment algorithm and support vector machine-based methods. RESULTS: Evaluated on a new non-redundant protein set with 224 chains, the method has 80.7% sensitivity and 82.9% specificity in the 5-fold cross-validation test. In addition, it predicts DNA-binding sites with 85.1% sensitivity and 85.3% specificity when tested on a dataset with 62 protein-DNA complexes. Compared with a recently published method, BindN+, our method predicts DNA-binding sites with a 7% better area under the receiver operating characteristic curve value when tested on the same dataset. Many important problems in cell biology require the dense non-linear interactions between functional modules be considered. Thus, our prediction method will be useful in detecting such complex interactions.