| Literature DB >> 24189096 |
Xiao-Hui Niu1, Xue-Hai Hu2, Feng Shi1, Jing-Bo Xia1.
Abstract
DNA-binding proteins play a vitally important role in many biological processes. Prediction of DNA-binding proteins from amino acid sequence is a significant but not fairly resolved scientific problem. Chaos game representation (CGR) investigates the patterns hidden in protein sequences, and visually reveals previously unknown structure. Fractal dimensions (FD) are good tools to measure sizes of complex, highly irregular geometric objects. In order to extract the intrinsic correlation with DNA-binding property from protein sequences, CGR algorithm, fractal dimension and amino acid composition are applied to formulate the numerical features of protein samples in this paper. Seven groups of features are extracted, which can be computed directly from the primary sequence, and each group is evaluated by the 10-fold cross-validation test and Jackknife test. Comparing the results of numerical experiments, the group of amino acid composition and fractal dimension (21-dimension vector) gets the best result, the average accuracy is 81.82% and average Matthew's correlation coefficient (MCC) is 0.6017. This resulting predictor is also compared with existing method DNA-Prot and shows better performances.Entities:
Keywords: Chaos game representation; Cross validation; Fractal dimension; Protein classification
Mesh:
Substances:
Year: 2013 PMID: 24189096 DOI: 10.1016/j.jtbi.2013.10.009
Source DB: PubMed Journal: J Theor Biol ISSN: 0022-5193 Impact factor: 2.691