Literature DB >> 32945673

De Novo Protein Design for Novel Folds Using Guided Conditional Wasserstein Generative Adversarial Networks.

Mostafa Karimi1,2, Shaowen Zhu1, Yue Cao1, Yang Shen1,2.   

Abstract

Although massive data is quickly accumulating on protein sequence and structure, there is a small and limited number of protein architectural types (or structural folds). This study is addressing the following question: how well could one reveal underlying sequence-structure relationships and design protein sequences for an arbitrary, potentially novel, structural fold? In response to the question, we have developed novel deep generative models, namely, semisupervised gcWGAN (guided, conditional, Wasserstein Generative Adversarial Networks). To overcome training difficulties and improve design qualities, we build our models on conditional Wasserstein GAN (WGAN) that uses Wasserstein distance in the loss function. Our major contributions include (1) constructing a low-dimensional and generalizable representation of the fold space for the conditional input, (2) developing an ultrafast sequence-to-fold predictor (or oracle) and incorporating its feedback into WGAN as a loss to guide model training, and (3) exploiting sequence data with and without paired structures to enable a semisupervised training strategy. Assessed by the oracle over 100 novel folds not in the training set, gcWGAN generates more successful designs and covers 3.5 times more target folds compared to a competing data-driven method (cVAE). Assessed by sequence- and structure-based predictors, gcWGAN designs are physically and biologically sound. Assessed by a structure predictor over representative novel folds, including one not even part of basis folds, gcWGAN designs have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE. The ultrafast data-driven model is further shown to boost the success of a principle-driven de novo method (RosettaDesign), through generating design seeds and tailoring design space. In conclusion, gcWGAN explores uncharted sequence space to design proteins by learning generalizable principles from current sequence-structure data. Data, source codes, and trained models are available at https://github.com/Shen-Lab/gcWGAN.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 32945673      PMCID: PMC7775287          DOI: 10.1021/acs.jcim.0c00593

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  52 in total

1.  Protein design is NP-hard.

Authors:  Niles A Pierce; Erik Winfree
Journal:  Protein Eng       Date:  2002-10

2.  Identification of direct residue contacts in protein-protein interaction by message passing.

Authors:  Martin Weigt; Robert A White; Hendrik Szurmant; James A Hoch; Terence Hwa
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-30       Impact factor: 11.205

3.  A simple method for displaying the hydropathic character of a protein.

Authors:  J Kyte; R F Doolittle
Journal:  J Mol Biol       Date:  1982-05-05       Impact factor: 5.469

4.  Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information.

Authors:  Sergey Ovchinnikov; Hetunandan Kamisetty; David Baker
Journal:  Elife       Date:  2014-05-01       Impact factor: 8.140

5.  CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations.

Authors:  Stefan Seemayer; Markus Gruber; Johannes Söding
Journal:  Bioinformatics       Date:  2014-07-26       Impact factor: 6.937

6.  Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model.

Authors:  Sheng Wang; Siqi Sun; Zhen Li; Renyu Zhang; Jinbo Xu
Journal:  PLoS Comput Biol       Date:  2017-01-05       Impact factor: 4.475

7.  Solving the RNA design problem with reinforcement learning.

Authors:  Peter Eastman; Jade Shi; Bharath Ramsundar; Vijay S Pande
Journal:  PLoS Comput Biol       Date:  2018-06-21       Impact factor: 4.475

8.  SCOPe: classification of large macromolecular structures in the structural classification of proteins-extended database.

Authors:  John-Marc Chandonia; Naomi K Fox; Steven E Brenner
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  Principles for designing ideal protein structures.

Authors:  Nobuyasu Koga; Rie Tatsumi-Koga; Gaohua Liu; Rong Xiao; Thomas B Acton; Gaetano T Montelione; David Baker
Journal:  Nature       Date:  2012-11-08       Impact factor: 49.962

10.  De novo design of a fluorescence-activating β-barrel.

Authors:  Jiayi Dou; Anastassia A Vorobieva; William Sheffler; Lindsey A Doyle; Hahnbeom Park; Matthew J Bick; Binchen Mao; Glenna W Foight; Min Yen Lee; Lauren A Gagnon; Lauren Carter; Banumathi Sankaran; Sergey Ovchinnikov; Enrique Marcos; Po-Ssu Huang; Joshua C Vaughan; Barry L Stoddard; David Baker
Journal:  Nature       Date:  2018-09-12       Impact factor: 49.962

View more
  10 in total

1.  Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design.

Authors:  Yue Cao; Payel Das; Vijil Chenthamarakshan; Pin-Yu Chen; Igor Melnyk; Yang Shen
Journal:  Proc Mach Learn Res       Date:  2021-07

2.  Machine learning-assisted elucidation of CD81-CD44 interactions in promoting cancer stemness and extracellular vesicle integrity.

Authors:  Tujin Shi; Yang Shen; Nurmaa K Dashzeveg; Huiping Liu; Erika K Ramos; Chia-Feng Tsai; Yuzhi Jia; Yue Cao; Megan Manu; Rokana Taftaf; Andrew D Hoffmann; Lamiaa El-Shennawy; Marina A Gritsenko; Valery Adorno-Cruz; Emma J Schuster; David Scholten; Dhwani Patel; Xia Liu; Priyam Patel; Brian Wray; Youbin Zhang; Shanshan Zhang; Ronald J Moore; Jeremy V Mathews; Matthew J Schipma; Tao Liu; Valerie L Tokars; Massimo Cristofanilli
Journal:  Elife       Date:  2022-10-04       Impact factor: 8.713

3.  De novo protein design by deep network hallucination.

Authors:  Ivan Anishchenko; Samuel J Pellock; Tamuka M Chidyausiku; Theresa A Ramelot; Sergey Ovchinnikov; Jingzhou Hao; Khushboo Bafna; Christoffer Norn; Alex Kang; Asim K Bera; Frank DiMaio; Lauren Carter; Cameron M Chow; Gaetano T Montelione; David Baker
Journal:  Nature       Date:  2021-12-01       Impact factor: 69.504

Review 4.  Machine learning to navigate fitness landscapes for protein engineering.

Authors:  Chase R Freschlin; Sarah A Fahlberg; Philip A Romero
Journal:  Curr Opin Biotechnol       Date:  2022-04-09       Impact factor: 10.279

Review 5.  Protein Design with Deep Learning.

Authors:  Marianne Defresne; Sophie Barbe; Thomas Schiex
Journal:  Int J Mol Sci       Date:  2021-10-29       Impact factor: 5.923

6.  Quantitative Estimate Index for Early-Stage Screening of Compounds Targeting Protein-Protein Interactions.

Authors:  Takatsugu Kosugi; Masahito Ohue
Journal:  Int J Mol Sci       Date:  2021-10-10       Impact factor: 5.923

Review 7.  Protein design via deep learning.

Authors:  Wenze Ding; Kenta Nakai; Haipeng Gong
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

8.  Structural signatures: a web server for exploring a database of and generating protein structural features from human cell lines and tissues.

Authors:  Nicole Zatorski; David Stein; Rayees Rahman; Ravi Iyengar; Avner Schlessinger
Journal:  Database (Oxford)       Date:  2022-07-26       Impact factor: 4.462

9.  Conditional generative modeling for de novo protein design with hierarchical functions.

Authors:  Tim Kucera; Matteo Togninalli; Laetitia Meng-Papaxanthos
Journal:  Bioinformatics       Date:  2022-05-26       Impact factor: 6.931

10.  Evolving scenario of big data and Artificial Intelligence (AI) in drug discovery.

Authors:  Manish Kumar Tripathi; Abhigyan Nath; Tej P Singh; A S Ethayathulla; Punit Kaur
Journal:  Mol Divers       Date:  2021-06-23       Impact factor: 3.364

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.