Literature DB >> 19385697

DNA-Prot: identification of DNA binding proteins from protein sequence information using random forest.

K Krishna Kumar1, Ganesan Pugalenthi, P N Suganthan.   

Abstract

DNA-binding proteins (DNABPs) are important for various cellular processes, such as transcriptional regulation, recombination, replication, repair, and DNA modification. So far various bioinformatics and machine learning techniques have been applied for identification of DNA-binding proteins from protein structure. Only few methods are available for the identification of DNA binding proteins from protein sequence. In this work, we report a random forest method, DNA-Prot, to identify DNA binding proteins from protein sequence. Training was performed on the dataset containing 146 DNA-binding proteins and 250 non DNA-binding proteins. The algorithm was tested on the dataset containing 92 DNA-binding proteins and 100 non DNA-binding proteins. We obtained 80.31% accuracy from training and 84.37% accuracy from testing. Benchmarking analysis on the independent of 823 DNA-binding proteins and 823 non DNA-binding proteins shows that our approach can distinguish DNA-binding proteins from non DNA-binding proteins with more than 80% accuracy. We also compared our method with DNAbinder method on test dataset and two independent datasets. Comparable performance was observed from both methods on test dataset. In the benchmark dataset containing 823 DNA-binding proteins and 823 non DNA-binding proteins, we obtained significantly better performance from DNA-Prot with 81.83% accuracy whereas DNAbinder achieved only 61.42% accuracy using amino acid composition and 63.5% using PSSM profile. Similarly, DNA-Prot achieved better performance rate from the benchmark dataset containing 88 DNA-binding proteins and 233 non DNA-binding proteins. This result shows DNA-Prot can be efficiently used to identify DNA binding proteins from sequence information. The dataset and standalone version of DNA-Prot software can be obtained from http://www3.ntu.edu.sg/home/EPNSugan/index_files/dnaprot.htm.

Mesh:

Substances:

Year:  2009        PMID: 19385697     DOI: 10.1080/07391102.2009.10507281

Source DB:  PubMed          Journal:  J Biomol Struct Dyn        ISSN: 0739-1102


  37 in total

Review 1.  DNA-protein interactions: methods for detection and analysis.

Authors:  Bipasha Dey; Sameer Thukral; Shruti Krishnan; Mainak Chakrobarty; Sahil Gupta; Chanchal Manghani; Vibha Rani
Journal:  Mol Cell Biochem       Date:  2012-03-08       Impact factor: 3.396

2.  iRSpot-GAEnsC: identifing recombination spots via ensemble classifier and extending the concept of Chou's PseAAC to formulate DNA samples.

Authors:  Muhammad Kabir; Maqsood Hayat
Journal:  Mol Genet Genomics       Date:  2015-08-30       Impact factor: 3.291

3.  DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.

Authors:  Farman Ali; Saeed Ahmed; Zar Nawab Khan Swati; Shahid Akbar
Journal:  J Comput Aided Mol Des       Date:  2019-05-23       Impact factor: 3.686

4.  FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.

Authors:  Yi Zou; Yijie Ding; Li Peng; Quan Zou
Journal:  Interdiscip Sci       Date:  2021-11-06       Impact factor: 2.233

5.  A predicted physicochemically distinct sub-proteome associated with the intracellular organelle of the anammox bacterium Kuenenia stuttgartiensis.

Authors:  Marnix H Medema; Miaomiao Zhou; Sacha A F T van Hijum; Jolein Gloerich; Hans J C T Wessels; Roland J Siezen; Marc Strous
Journal:  BMC Genomics       Date:  2010-05-12       Impact factor: 3.969

6.  Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information.

Authors:  Weizhong Lu; Zhengwei Song; Yijie Ding; Hongjie Wu; Yan Cao; Yu Zhang; Haiou Li
Journal:  Biomed Res Int       Date:  2020-07-27       Impact factor: 3.411

7.  A sequence-based multiple kernel model for identifying DNA-binding proteins.

Authors:  Yuqing Qian; Limin Jiang; Yijie Ding; Jijun Tang; Fei Guo
Journal:  BMC Bioinformatics       Date:  2021-05-31       Impact factor: 3.169

8.  Prediction of DNA binding proteins using local features and long-term dependencies with primary sequences based on deep learning.

Authors:  Guobin Li; Xiuquan Du; Xinlu Li; Le Zou; Guanhong Zhang; Zhize Wu
Journal:  PeerJ       Date:  2021-05-03       Impact factor: 2.984

9.  DNA-binding protein prediction using plant specific support vector machines: validation and application of a new genome annotation tool.

Authors:  Graham B Motion; Andrew J M Howden; Edgar Huitema; Susan Jones
Journal:  Nucleic Acids Res       Date:  2015-08-24       Impact factor: 16.971

10.  UMAP-DBP: An Improved DNA-Binding Proteins Prediction Method Based on Uniform Manifold Approximation and Projection.

Authors:  Jinyue Wang; Shengli Zhang; Huijuan Qiao; Jiesheng Wang
Journal:  Protein J       Date:  2021-06-27       Impact factor: 2.371

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.