Literature DB >> 17316627

Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions.

Nitin Bhardwaj1, Hui Lu.   

Abstract

Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.

Mesh:

Substances:

Year:  2007        PMID: 17316627      PMCID: PMC1993824          DOI: 10.1016/j.febslet.2007.01.086

Source DB:  PubMed          Journal:  FEBS Lett        ISSN: 0014-5793            Impact factor:   4.124


  32 in total

1.  Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification.

Authors:  I Dubchak; I Muchnik; C Mayor; I Dralyuk; S H Kim
Journal:  Proteins       Date:  1999-06-01

2.  Protein-DNA interactions: A structural analysis.

Authors:  S Jones; P van Heyningen; H M Berman; J M Thornton
Journal:  J Mol Biol       Date:  1999-04-16       Impact factor: 5.469

3.  Amino acid substitution matrices from protein blocks.

Authors:  S Henikoff; J G Henikoff
Journal:  Proc Natl Acad Sci U S A       Date:  1992-11-15       Impact factor: 11.205

Review 4.  Protein-DNA recognition patterns and predictions.

Authors:  Akinori Sarai; Hidetoshi Kono
Journal:  Annu Rev Biophys Biomol Struct       Date:  2005

5.  Using evolutionary and structural information to predict DNA-binding sites on DNA-binding proteins.

Authors:  Igor B Kuznetsov; Zhenkun Gou; Run Li; Seungwoo Hwang
Journal:  Proteins       Date:  2006-07-01

6.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

7.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli.

Authors:  G D Stormo; T D Schneider; L Gold; A Ehrenfeucht
Journal:  Nucleic Acids Res       Date:  1982-05-11       Impact factor: 16.971

8.  A general method for site-directed mutagenesis in prokaryotes.

Authors:  G B Ruvkun; F M Ausubel
Journal:  Nature       Date:  1981-01-01       Impact factor: 49.962

9.  Kernel-based machine learning protocol for predicting DNA-binding proteins.

Authors:  Nitin Bhardwaj; Robert E Langlois; Guijun Zhao; Hui Lu
Journal:  Nucleic Acids Res       Date:  2005-11-10       Impact factor: 16.971

10.  PSSM-based prediction of DNA binding sites in proteins.

Authors:  Shandar Ahmad; Akinori Sarai
Journal:  BMC Bioinformatics       Date:  2005-02-19       Impact factor: 3.169

View more
  28 in total

Review 1.  Computational prediction of type III and IV secreted effectors in gram-negative bacteria.

Authors:  Jason E McDermott; Abigail Corrigan; Elena Peterson; Christopher Oehmen; George Niemann; Eric D Cambronne; Danna Sharp; Joshua N Adkins; Ram Samudrala; Fred Heffron
Journal:  Infect Immun       Date:  2010-10-25       Impact factor: 3.441

2.  On the Accuracy of Sequence-Based Computational Inference of Protein Residues Involved in Interactions with DNA.

Authors:  Zhenkun Gou; Igor B Kuznetsov
Journal:  Trends Appl Sci Res       Date:  2008-12-01

3.  NAPS: a residue-level nucleic acid-binding prediction server.

Authors:  Matthew B Carson; Robert Langlois; Hui Lu
Journal:  Nucleic Acids Res       Date:  2010-05-16       Impact factor: 16.971

4.  Boosting the prediction and understanding of DNA-binding domains from sequence.

Authors:  Robert E Langlois; Hui Lu
Journal:  Nucleic Acids Res       Date:  2010-02-15       Impact factor: 16.971

5.  Prediction of functionally important sites from protein sequences using sparse kernel least squares classifiers.

Authors:  Ke Tang; Ganesan Pugalenthi; P N Suganthan; Christopher J Lanczycki; Saikat Chakrabarti
Journal:  Biochem Biophys Res Commun       Date:  2009-04-24       Impact factor: 3.575

6.  An improved machine learning protocol for the identification of correct Sequest search results.

Authors:  Morten Källberg; Hui Lu
Journal:  BMC Bioinformatics       Date:  2010-12-07       Impact factor: 3.169

7.  Prediction of FAD interacting residues in a protein from its primary sequence using evolutionary information.

Authors:  Nitish K Mishra; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2010-01-18       Impact factor: 3.169

8.  From nonspecific DNA-protein encounter complexes to the prediction of DNA-protein interactions.

Authors:  Mu Gao; Jeffrey Skolnick
Journal:  PLoS Comput Biol       Date:  2009-04-03       Impact factor: 4.475

9.  DNA-binding residues and binding mode prediction with binding-mechanism concerned models.

Authors:  Yu-Feng Huang; Chun-Chin Huang; Yu-Cheng Liu; Yen-Jen Oyang; Chien-Kang Huang
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

10.  DeepDISE: DNA Binding Site Prediction Using a Deep Learning Method.

Authors:  Samuel Godfrey Hendrix; Kuan Y Chang; Zeezoo Ryu; Zhong-Ru Xie
Journal:  Int J Mol Sci       Date:  2021-05-24       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.