Literature DB >> 33476047

Identifying DNA-binding proteins based on multi-features and LASSO feature selection.

Shengli Zhang1, Fu Zhu1, Qianhao Yu2, Xiaoyue Zhu3.   

Abstract

DNA-binding proteins perform an indispensable function in the maintenance and processing of genetic information and are inefficiently identified by traditional experimental methods due to their huge quantities. On the contrary, machine learning methods as an emerging technique demonstrate satisfactory speed and accuracy when used to study these molecules. This work focuses on extracting four different features from primary and secondary sequence features: Reduced sequence and index-vectors (RS), Pseudo-amino acid components (PseAACS), Position-specific scoring matrix-Auto Cross Covariance Transform (PSSM-ACCT), and Position-specific scoring matrix-Discrete Wavelet Transform (PSSM-DWT). Using the LASSO dimension reduction method, we experiment on the combination of feature submodels to obtain the optimized number of top rank features. These features are respectively input into the training Ensemble subspace discriminant, Ensemble bagged tree and KNN to predict the DNA-binding proteins. Three different datasets, PDB594, PDB1075, and PDB186, are adopted to evaluate the performance of the as-proposed approach in this work. The PDB1075 and PDB594 datasets are adopted for the five-fold cross-validation, and the PDB186 is used for the independent experiment. In the five-fold cross-validation, both the PDB1075 and PDB594 show extremely high accuracy, reaching 86.98% and 88.9% by Ensemble subspace discriminant, respectively. The accuracy of independent experiment by multi-classifiers voting is 83.33%, which suggests that the methodology proposed in this work is capable of predicting DNA-binding proteins effectively.
© 2021 Wiley Periodicals LLC.

Entities:  

Keywords:  DNA-binding proteins; LASSO; multi-features; position-specific scoring matrix

Year:  2021        PMID: 33476047     DOI: 10.1002/bip.23419

Source DB:  PubMed          Journal:  Biopolymers        ISSN: 0006-3525            Impact factor:   2.505


  7 in total

1.  i6mA-VC: A Multi-Classifier Voting Method for the Computational Identification of DNA N6-methyladenine Sites.

Authors:  Tian Xue; Shengli Zhang; Huijuan Qiao
Journal:  Interdiscip Sci       Date:  2021-04-08       Impact factor: 2.233

2.  FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.

Authors:  Yi Zou; Yijie Ding; Li Peng; Quan Zou
Journal:  Interdiscip Sci       Date:  2021-11-06       Impact factor: 2.233

3.  Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.

Authors:  Die Chen; Hua Zhang; Zeqi Chen; Bo Xie; Ye Wang
Journal:  Comput Math Methods Med       Date:  2022-06-28       Impact factor: 2.809

4.  An Integrated Machine Learning Scheme for Predicting Mammographic Anomalies in High-Risk Individuals Using Questionnaire-Based Predictors.

Authors:  Cheuk-Kay Sun; Yun-Xuan Tang; Tzu-Chi Liu; Chi-Jie Lu
Journal:  Int J Environ Res Public Health       Date:  2022-08-08       Impact factor: 4.614

5.  MLP-Based Regression Prediction Model For Compound Bioactivity.

Authors:  Yongfei Qin; Chao Li; Xia Shi; Weigang Wang
Journal:  Front Bioeng Biotechnol       Date:  2022-07-13

6.  Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition.

Authors:  Quang H Nguyen; Hoang V Tran; Binh P Nguyen; Trang T T Do
Journal:  ACS Omega       Date:  2022-08-30

7.  Feature Selection Based on Adaptive Particle Swarm Optimization with Leadership Learning.

Authors:  Zhiwei Ye; Yi Xu; Qiyi He; Mingwei Wang; Wanfang Bai; Hongwei Xiao
Journal:  Comput Intell Neurosci       Date:  2022-08-28
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.