Literature DB >> 33837387

SAResNet: self-attention residual network for predicting DNA-protein binding.

Long-Chen Shen1, Yan Liu1, Jiangning Song2, Dong-Jun Yu1.   

Abstract

Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have some drawbacks associated with the use of limited datasets with insufficient experimental data. To address this, we propose a novel transfer learning-based method, termed SAResNet, which combines the self-attention mechanism and residual network structure. More specifically, the attention-driven module captures the position information of the sequence, while the residual network structure guarantees that the high-level features of the binding site can be extracted. Meanwhile, the pre-training strategy used by SAResNet improves the learning ability of the network and accelerates the convergence speed of the network during transfer learning. The performance of SAResNet is extensively tested on 690 datasets from the ChIP-seq experiments with an average AUC of 92.0%, which is 4.4% higher than that of the best state-of-the-art method currently available. When tested on smaller datasets, the predictive performance is more clearly improved. Overall, we demonstrate that the superior performance of DNA-protein binding prediction on DNA sequences can be achieved by combining the attention mechanism and residual structure, and a novel pipeline is accordingly developed. The proposed methodology is generally applicable and can be used to address any other sequence classification problems.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  DNA-protein binding; bioinformatics; deep residual network; self-attention mechanism; sequence analysis; transfer learning

Mesh:

Substances:

Year:  2021        PMID: 33837387      PMCID: PMC8579196          DOI: 10.1093/bib/bbab101

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  38 in total

1.  Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

Authors:  Bin Liu; Shanyi Wang; Qiwen Dong; Shumin Li; Xuan Liu
Journal:  IEEE Trans Nanobioscience       Date:  2016-04-20       Impact factor: 2.935

2.  SMiLE-seq identifies binding motifs of single and dimeric transcription factors.

Authors:  Alina Isakova; Romain Groux; Michael Imbeault; Pernille Rainer; Daniel Alpern; Riccardo Dainese; Giovanna Ambrosini; Didier Trono; Philipp Bucher; Bart Deplancke
Journal:  Nat Methods       Date:  2017-01-16       Impact factor: 28.547

3.  i6mA-Pred: identifying DNA N6-methyladenine sites in the rice genome.

Authors:  Wei Chen; Hao Lv; Fulei Nie; Hao Lin
Journal:  Bioinformatics       Date:  2019-08-15       Impact factor: 6.937

4.  gkmSVM: an R package for gapped-kmer SVM.

Authors:  Mahmoud Ghandi; Morteza Mohammad-Noori; Narges Ghareghani; Dongwon Lee; Levi Garraway; Michael A Beer
Journal:  Bioinformatics       Date:  2016-04-19       Impact factor: 6.937

5.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

Authors:  Shaoqing Ren; Kaiming He; Ross Girshick; Jian Sun
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2016-06-06       Impact factor: 6.226

6.  Assessing computational tools for the discovery of transcription factor binding sites.

Authors:  Martin Tompa; Nan Li; Timothy L Bailey; George M Church; Bart De Moor; Eleazar Eskin; Alexander V Favorov; Martin C Frith; Yutao Fu; W James Kent; Vsevolod J Makeev; Andrei A Mironov; William Stafford Noble; Giulio Pavesi; Graziano Pesole; Mireille Régnier; Nicolas Simonis; Saurabh Sinha; Gert Thijs; Jacques van Helden; Mathias Vandenbogaert; Zhiping Weng; Christopher Workman; Chun Ye; Zhou Zhu
Journal:  Nat Biotechnol       Date:  2005-01       Impact factor: 54.908

7.  iGHBP: Computational identification of growth hormone binding proteins from sequences using extremely randomised tree.

Authors:  Shaherin Basith; Balachandran Manavalan; Tae Hwan Shin; Gwang Lee
Journal:  Comput Struct Biotechnol J       Date:  2018-10-24       Impact factor: 7.271

8.  Recurrent Neural Network for Predicting Transcription Factor Binding Sites.

Authors:  Zhen Shen; Wenzheng Bao; De-Shuang Huang
Journal:  Sci Rep       Date:  2018-10-15       Impact factor: 4.379

9.  Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins.

Authors:  R Nagarajan; Shandar Ahmad; M Michael Gromiha
Journal:  Nucleic Acids Res       Date:  2013-06-20       Impact factor: 16.971

10.  TFBSTools: an R/bioconductor package for transcription factor binding site analysis.

Authors:  Ge Tan; Boris Lenhard
Journal:  Bioinformatics       Date:  2016-01-21       Impact factor: 6.937

View more
  3 in total

1.  RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

Authors:  Xinxin Peng; Xiaoyu Wang; Yuming Guo; Zongyuan Ge; Fuyi Li; Xin Gao; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

2.  MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network.

Authors:  Ke Han; Long-Chen Shen; Yi-Heng Zhu; Jian Xu; Jiangning Song; Dong-Jun Yu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 13.994

3.  ResSUMO: A Deep Learning Architecture Based on Residual Structure for Prediction of Lysine SUMOylation Sites.

Authors:  Yafei Zhu; Yuhai Liu; Yu Chen; Lei Li
Journal:  Cells       Date:  2022-08-25       Impact factor: 7.666

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.