Literature DB >> 34423306

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design.

Yue Cao1,2, Payel Das1, Vijil Chenthamarakshan1, Pin-Yu Chen1, Igor Melnyk1, Yang Shen2.   

Abstract

Designing novel protein sequences for a desired 3D topological fold is a fundamental yet nontrivial task in protein engineering. Challenges exist due to the complex sequence-fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence-structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https://github.com/IBM/fold2seq.

Entities:  

Year:  2021        PMID: 34423306      PMCID: PMC8375603     

Source DB:  PubMed          Journal:  Proc Mach Learn Res


  28 in total

1.  Twilight zone of protein sequence alignments.

Authors:  B Rost
Journal:  Protein Eng       Date:  1999-02

Review 2.  Computational protein design.

Authors:  C M Kraemer-Pecore; A M Wollacott; J R Desjarlais
Journal:  Curr Opin Chem Biol       Date:  2001-12       Impact factor: 8.822

3.  Scoring function for automated assessment of protein structure template quality.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  Proteins       Date:  2004-12-01

Review 4.  De novo protein design: how do we expand into the universe of possible protein structures?

Authors:  Derek N Woolfson; Gail J Bartlett; Antony J Burton; Jack W Heal; Ai Niitsu; Andrew R Thomson; Christopher W Wood
Journal:  Curr Opin Struct Biol       Date:  2015-06-18       Impact factor: 6.809

5.  Structural similarity and functional diversity in proteins containing the legume lectin fold.

Authors:  N R Chandra; M M Prabu; K Suguna; M Vijayan
Journal:  Protein Eng       Date:  2001-11

6.  To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map.

Authors:  Sheng Chen; Zhe Sun; Lihua Lin; Zifeng Liu; Xun Liu; Yutian Chong; Yutong Lu; Huiying Zhao; Yuedong Yang
Journal:  J Chem Inf Model       Date:  2019-12-20       Impact factor: 4.956

7.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

8.  TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.

Authors:  Yue Cao; Yang Shen
Journal:  Bioinformatics       Date:  2021-03-23       Impact factor: 6.937

9.  RosettaRemodel: a generalized framework for flexible backbone protein design.

Authors:  Po-Ssu Huang; Yih-En Andrew Ban; Florian Richter; Ingemar Andre; Robert Vernon; William R Schief; David Baker
Journal:  PLoS One       Date:  2011-08-31       Impact factor: 3.240

10.  Computational Protein Design with Deep Learning Neural Networks.

Authors:  Jingxue Wang; Huali Cao; John Z H Zhang; Yifei Qi
Journal:  Sci Rep       Date:  2018-04-20       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.