Literature DB >> 28113908

Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

Bin Liu, Shanyi Wang, Qiwen Dong, Shumin Li, Xuan Liu.   

Abstract

DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .

Year:  2016        PMID: 28113908     DOI: 10.1109/TNB.2016.2555951

Source DB:  PubMed          Journal:  IEEE Trans Nanobioscience        ISSN: 1536-1241            Impact factor:   2.935


  25 in total

1.  SAResNet: self-attention residual network for predicting DNA-protein binding.

Authors:  Long-Chen Shen; Yan Liu; Jiangning Song; Dong-Jun Yu
Journal:  Brief Bioinform       Date:  2021-09-02       Impact factor: 11.622

Review 2.  Large-scale comparative review and assessment of computational methods for anti-cancer peptide identification.

Authors:  Xiao Liang; Fuyi Li; Jinxiang Chen; Junlong Li; Hao Wu; Shuqin Li; Jiangning Song; Quanzhong Liu
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

3.  iRSpot-DACC: a computational predictor for recombination hot/cold spots identification based on dinucleotide-based auto-cross covariance.

Authors:  Bingquan Liu; Yumeng Liu; Xiaopeng Jin; Xiaolong Wang; Bin Liu
Journal:  Sci Rep       Date:  2016-09-19       Impact factor: 4.379

4.  Identification of Secretory Proteins in Mycobacterium tuberculosis Using Pseudo Amino Acid Composition.

Authors:  Huan Yang; Hua Tang; Xin-Xin Chen; Chang-Jian Zhang; Pan-Pan Zhu; Hui Ding; Wei Chen; Hao Lin
Journal:  Biomed Res Int       Date:  2016-08-11       Impact factor: 3.411

5.  DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

Authors:  Xin Ma; Jing Guo; Xiao Sun
Journal:  PLoS One       Date:  2016-12-01       Impact factor: 3.240

6.  On the prediction of DNA-binding proteins only from primary sequences: A deep learning approach.

Authors:  Yu-Hui Qu; Hua Yu; Xiu-Jun Gong; Jia-Hui Xu; Hong-Shun Lee
Journal:  PLoS One       Date:  2017-12-29       Impact factor: 3.240

7.  Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets.

Authors:  Federica Martina; Marco Beccuti; Gianfranco Balbo; Francesca Cordero
Journal:  PLoS One       Date:  2017-08-14       Impact factor: 3.240

8.  Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project.

Authors:  Manal Alghamdi; Mouaz Al-Mallah; Steven Keteyian; Clinton Brawner; Jonathan Ehrman; Sherif Sakr
Journal:  PLoS One       Date:  2017-07-24       Impact factor: 3.240

9.  Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest.

Authors:  Zhijun Liao; Ying Ju; Quan Zou
Journal:  Scientifica (Cairo)       Date:  2016-07-27

10.  Complex Network Clustering by a Multi-objective Evolutionary Algorithm Based on Decomposition and Membrane Structure.

Authors:  Ying Ju; Songming Zhang; Ningxiang Ding; Xiangxiang Zeng; Xingyi Zhang
Journal:  Sci Rep       Date:  2016-09-27       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.