Literature DB >> 30316822

Effective DNA binding protein prediction by using key features via Chou's general PseAAC.

Sheikh Adilina1, Dewan Md Farid2, Swakkhar Shatabda3.   

Abstract

DNA-binding proteins (DBPs) are responsible for several cellular functions, starting from our immunity system to the transport of oxygen. In the recent studies, scientists have used supervised machine learning based methods that use information from the protein sequence only to classify the DBPs. Most of the methods work effectively on the train sets but performance of most of them degrades in the independent test set. It shows a room for improving the prediction method by reducing over-fitting. In this paper, we have extracted several features solely using the protein sequence and carried out two different types of feature selection on them. Our results have proven comparable on training set and significantly improved on the independent test set. On the independent test set our accuracy was 82.26% which is 1.62% improved compared to the previous best state-of-the-art methods. Performance in terms of sensitivity and area under receiver operating characteristic curve for the independent test set was also higher and they were 0.95 and 0.823 respectively.
Copyright © 2018 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Classification algorithm; DNA binding proteins; Feature selection; Handling overfitting; Independent test set; Sequence based features

Mesh:

Substances:

Year:  2018        PMID: 30316822     DOI: 10.1016/j.jtbi.2018.10.027

Source DB:  PubMed          Journal:  J Theor Biol        ISSN: 0022-5193            Impact factor:   2.691


  10 in total

Review 1.  Some illuminating remarks on molecular genetics and genomics as well as drug development.

Authors:  Kuo-Chen Chou
Journal:  Mol Genet Genomics       Date:  2020-01-01       Impact factor: 3.291

2.  FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.

Authors:  Yi Zou; Yijie Ding; Li Peng; Quan Zou
Journal:  Interdiscip Sci       Date:  2021-11-06       Impact factor: 2.233

3.  A sequence-based multiple kernel model for identifying DNA-binding proteins.

Authors:  Yuqing Qian; Limin Jiang; Yijie Ding; Jijun Tang; Fei Guo
Journal:  BMC Bioinformatics       Date:  2021-05-31       Impact factor: 3.169

4.  PredDBP-Stack: Prediction of DNA-Binding Proteins from HMM Profiles using a Stacked Ensemble Method.

Authors:  Jun Wang; Huiwen Zheng; Yang Yang; Wanyue Xiao; Taigang Liu
Journal:  Biomed Res Int       Date:  2020-04-13       Impact factor: 3.411

5.  An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences.

Authors:  Siquan Hu; Ruixiong Ma; Haiou Wang
Journal:  PLoS One       Date:  2019-11-14       Impact factor: 3.240

6.  predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance.

Authors:  Sabit Ahmed; Afrida Rahman; Md Al Mehedi Hasan; Md Khaled Ben Islam; Julia Rahman; Shamim Ahmad
Journal:  PLoS One       Date:  2021-04-01       Impact factor: 3.240

Review 7.  Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.

Authors:  Ziye Zhao; Wen Yang; Yixiao Zhai; Yingjian Liang; Yuming Zhao
Journal:  Front Genet       Date:  2022-01-28       Impact factor: 4.599

8.  Generalized Property-Based Encoders and Digital Signal Processing Facilitate Predictive Tasks in Protein Engineering.

Authors:  David Medina-Ortiz; Sebastian Contreras; Juan Amado-Hinojosa; Jorge Torres-Almonacid; Juan A Asenjo; Marcelo Navarrete; Álvaro Olivera-Nappa
Journal:  Front Mol Biosci       Date:  2022-07-14

9.  IRC-Fuse: improved and robust prediction of redox-sensitive cysteine by fusing of multiple feature representations.

Authors:  Md Mehedi Hasan; Md Ashad Alam; Watshara Shoombuatong; Hiroyuki Kurata
Journal:  J Comput Aided Mol Des       Date:  2021-01-04       Impact factor: 3.686

10.  HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.

Authors:  Xiuzhi Sang; Wanyue Xiao; Huiwen Zheng; Yang Yang; Taigang Liu
Journal:  Comput Math Methods Med       Date:  2020-03-28       Impact factor: 2.238

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.