Literature DB >> 33537753

Why can deep convolutional neural networks improve protein fold recognition? A visual explanation by interpretation.

Yan Liu1, Yi-Heng Zhu2, Xiaoning Song3, Jiangning Song4, Dong-Jun Yu1.   

Abstract

As an essential task in protein structure and function prediction, protein fold recognition has attracted increasing attention. The majority of the existing machine learning-based protein fold recognition approaches strongly rely on handcrafted features, which depict the characteristics of different protein folds; however, effective feature extraction methods still represent the bottleneck for further performance improvement of protein fold recognition. As a powerful feature extractor, deep convolutional neural network (DCNN) can automatically extract discriminative features for fold recognition without human intervention, which has demonstrated an impressive performance on protein fold recognition. Despite the encouraging progress, DCNN often acts as a black box, and as such, it is challenging for users to understand what really happens in DCNN and why it works well for protein fold recognition. In this study, we explore the intrinsic mechanism of DCNN and explain why it works for protein fold recognition using a visual explanation technique. More specifically, we first trained a VGGNet-based DCNN model, termed VGGNet-FE, which can extract fold-specific features from the predicted protein residue-residue contact map for protein fold recognition. Subsequently, based on the trained VGGNet-FE, we implemented a new contact-assisted predictor, termed VGGfold, for protein fold recognition; we then visualized what features were extracted by each of the convolutional layers in VGGNet-FE using a deconvolution technique. Furthermore, we visualized the high-level semantic information, termed fold-discriminative region, of a predicted contact map from the localization map obtained from the last convolutional layer of VGGNet-FE. It is visually confirmed that VGGNet-FE could effectively extract distinct fold-discriminative regions for different types of protein folds, thereby accounting for the improved performance of VGGfold for protein fold recognition. In summary, this study is of great significance for both understanding the working principle of DCNNs in protein fold recognition and exploring the relationship between the predicted protein contact map and protein tertiary structure. This proposed visualization method is flexible and applicable to address other DCNN-based bioinformatics and computational biology questions. The online web server of VGGfold is freely available at http://csbio.njust.edu.cn/bioinf/vggfold/.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  convolutional neural network; deconvolution; protein fold recognition; visual explanation

Mesh:

Substances:

Year:  2021        PMID: 33537753      PMCID: PMC8425391          DOI: 10.1093/bib/bbab001

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  37 in total

1.  Identification of related proteins on family, superfamily and fold level.

Authors:  E Lindahl; A Elofsson
Journal:  J Mol Biol       Date:  2000-01-21       Impact factor: 5.469

2.  A machine learning information retrieval approach to protein fold recognition.

Authors:  Jianlin Cheng; Pierre Baldi
Journal:  Bioinformatics       Date:  2006-03-17       Impact factor: 6.937

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

4.  ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks.

Authors:  Yang Li; Jun Hu; Chengxin Zhang; Dong-Jun Yu; Yang Zhang
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

5.  Protein Remote Homology Detection and Fold Recognition Based on Sequence-Order Frequency Matrix.

Authors:  Bin Liu; Junjie Chen; Mingyue Guo; Xiaolong Wang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2017-10-23       Impact factor: 3.710

6.  Multiple sequence alignment.

Authors:  D J Bacon; W F Anderson
Journal:  J Mol Biol       Date:  1986-09-20       Impact factor: 5.469

7.  Image denoising using deep CNN with batch renormalization.

Authors:  Chunwei Tian; Yong Xu; Wangmeng Zuo
Journal:  Neural Netw       Date:  2019-09-05

8.  Boosting Protein Threading Accuracy.

Authors:  Jian Peng; Jinbo Xu
Journal:  Res Comput Mol Biol       Date:  2009

9.  The HHpred interactive server for protein homology detection and structure prediction.

Authors:  Johannes Söding; Andreas Biegert; Andrei N Lupas
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

10.  CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Authors:  Stefan Seemayer; Markus Gruber; Johannes Söding
Journal:  Bioinformatics       Date:  2014-07-26       Impact factor: 6.937

View more
  1 in total

1.  MAResNet: predicting transcription factor binding sites by combining multi-scale bottom-up and top-down attention and residual network.

Authors:  Ke Han; Long-Chen Shen; Yi-Heng Zhu; Jian Xu; Jiangning Song; Dong-Jun Yu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 13.994

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.