Literature DB >> 30040656

EL_LSTM: Prediction of DNA-Binding Residue from Protein Sequence by Combining Long Short-Term Memory and Ensemble Learning.

Jiyun Zhou, Qin Lu, Ruifeng Xu, Lin Gui, Hongpeng Wang.   

Abstract

Most past works for DNA-binding residue prediction did not consider the relationships between residues. In this paper, we propose a novel approach for DNA-binding residue prediction, referred to as EL_LSTM, which includes two main components. The first component is the Long Short-Term Memory (LSTM), which learns pairwise relationships between residues through a bi-gram model and then learns feature vectors for all residues. The second component is an ensemble learning based classifier introduced to tackle the data imbalance problem in binding residue predictions. We use a variant of the bagging strategy in ensemble learning to achieve balanced samples. Evaluations on PDNA-224 and DBP-123 show that adding feature relationships performs better than classifiers without feature relationships by at least 0.028 on MCC, 1.18 percent on ST and 0.012 on AUC. This indicates the usefulness of feature relationships for DNA-binding residue predictions. Evaluation on using ensemble learning indicates that the improvement can reach at least 0.021 on MCC, 1.32 percent on ST, and 0.018 on AUC compared to the use of a single LSTM classifier. Comparisons with the state-of-the-art predictors show that our proposed EL_LSTM outperforms them significantly. Further feature analysis validates the effectiveness of LSTM for the prediction of DNA-binding residues.

Mesh:

Substances:

Year:  2018        PMID: 30040656     DOI: 10.1109/TCBB.2018.2858806

Source DB:  PubMed          Journal:  IEEE/ACM Trans Comput Biol Bioinform        ISSN: 1545-5963            Impact factor:   3.710


  2 in total

1.  A novel model for malaria prediction based on ensemble algorithms.

Authors:  Mengyang Wang; Hui Wang; Jiao Wang; Hongwei Liu; Rui Lu; Tongqing Duan; Xiaowen Gong; Siyuan Feng; Yuanyuan Liu; Zhuang Cui; Changping Li; Jun Ma
Journal:  PLoS One       Date:  2019-12-26       Impact factor: 3.240

2.  Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach.

Authors:  Hongchen Ji; Junjie Li; Qiong Zhang; Jingyue Yang; Juanli Duan; Xiaowen Wang; Ben Ma; Zhuochao Zhang; Wei Pan; Hongmei Zhang
Journal:  BMC Med Genomics       Date:  2021-12-20       Impact factor: 3.063

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.