Literature DB >> 34226918

Improving protein fold recognition using triplet network and ensemble deep learning.

Yan Liu1, Ke Han1, Yi-Heng Zhu1, Ying Zhang1, Long-Chen Shen1, Jiangning Song2,3, Dong-Jun Yu1.   

Abstract

Protein fold recognition is a critical step toward protein structure and function prediction, aiming at providing the most likely fold type of the query protein. In recent years, the development of deep learning (DL) technique has led to massive advances in this important field, and accordingly, the sensitivity of protein fold recognition has been dramatically improved. Most DL-based methods take an intermediate bottleneck layer as the feature representation of proteins with new fold types. However, this strategy is indirect, inefficient and conditional on the hypothesis that the bottleneck layer's representation is assumed as a good representation of proteins with new fold types. To address the above problem, in this work, we develop a new computational framework by combining triplet network and ensemble DL. We first train a DL-based model, termed FoldNet, which employs triplet loss to train the deep convolutional network. FoldNet directly optimizes the protein fold embedding itself, making the proteins with the same fold types be closer to each other than those with different fold types in the new protein embedding space. Subsequently, using the trained FoldNet, we implement a new residue-residue contact-assisted predictor, termed FoldTR, which improves protein fold recognition. Furthermore, we propose a new ensemble DL method, termed FSD_XGBoost, which combines protein fold embedding with the other two discriminative fold-specific features extracted by two DL-based methods SSAfold and DeepFR. The Top 1 sensitivity of FSD_XGBoost increases to 74.8% at the fold level, which is ~9% higher than that of the state-of-the-art method. Together, the results suggest that fold-specific features extracted by different DL methods complement with each other, and their combination can further improve fold recognition at the fold level. The implemented web server of FoldTR and benchmark datasets are publicly available at http://csbio.njust.edu.cn/bioinf/foldtr/.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  bioinformatics; convolutional neural network; ensemble deep learning; protein fold recognition; triplet loss

Mesh:

Substances:

Year:  2021        PMID: 34226918      PMCID: PMC8768454          DOI: 10.1093/bib/bbab248

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  46 in total

1.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment.

Authors:  Michael Remmert; Andreas Biegert; Andreas Hauser; Johannes Söding
Journal:  Nat Methods       Date:  2011-12-25       Impact factor: 28.547

2.  A machine learning information retrieval approach to protein fold recognition.

Authors:  Jianlin Cheng; Pierre Baldi
Journal:  Bioinformatics       Date:  2006-03-17       Impact factor: 6.937

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

4.  ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks.

Authors:  Yang Li; Jun Hu; Chengxin Zhang; Dong-Jun Yu; Yang Zhang
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

5.  Image denoising using deep CNN with batch renormalization.

Authors:  Chunwei Tian; Yong Xu; Wangmeng Zuo
Journal:  Neural Netw       Date:  2019-09-05

6.  Attentional multi-level representation encoding based on convolutional and variance autoencoders for lncRNA-disease association prediction.

Authors:  Nan Sheng; Hui Cui; Tiangang Zhang; Ping Xuan
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

7.  Fold recognition by concurrent use of solvent accessibility and residue depth.

Authors:  Song Liu; Chi Zhang; Shide Liang; Yaoqi Zhou
Journal:  Proteins       Date:  2007-08-15

8.  Improving protein fold recognition by extracting fold-specific features from predicted residue-residue contacts.

Authors:  Jianwei Zhu; Haicang Zhang; Shuai Cheng Li; Chao Wang; Lupeng Kong; Shiwei Sun; Wei-Mou Zheng; Dongbo Bu
Journal:  Bioinformatics       Date:  2017-12-01       Impact factor: 6.937

9.  CD-HIT Suite: a web server for clustering and comparing biological sequences.

Authors:  Ying Huang; Beifang Niu; Ying Gao; Limin Fu; Weizhong Li
Journal:  Bioinformatics       Date:  2010-01-06       Impact factor: 6.937

10.  Simultaneous Determination of Protein Structure and Dynamics Using Cryo-Electron Microscopy.

Authors:  Massimiliano Bonomi; Riccardo Pellarin; Michele Vendruscolo
Journal:  Biophys J       Date:  2018-04-10       Impact factor: 4.033

View more
  1 in total

Review 1.  MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.

Authors:  Haozheng Li; Yihe Pang; Bin Liu; Liang Yu
Journal:  Front Pharmacol       Date:  2022-03-08       Impact factor: 5.810

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.