Literature DB >> 33635868

Generating functional protein variants with variational autoencoders.

Alex Hawkins-Hooker1, Florence Depardieu1, Sebastien Baur1, Guillaume Couairon1, Arthur Chen1, David Bikard1.   

Abstract

The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 33635868      PMCID: PMC7946179          DOI: 10.1371/journal.pcbi.1008736

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.475


  44 in total

Review 1.  Instability, stabilization, and formulation of liquid protein pharmaceuticals.

Authors:  W Wang
Journal:  Int J Pharm       Date:  1999-08-20       Impact factor: 5.875

2.  Engineering soluble proteins for structural genomics.

Authors:  Jean-Denis Pédelacq; Emily Piltch; Elaine C Liong; Joel Berendzen; Chang-Yub Kim; Beom-Seop Rho; Min S Park; Thomas C Terwilliger; Geoffrey S Waldo
Journal:  Nat Biotechnol       Date:  2002-08-19       Impact factor: 54.908

3.  Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction.

Authors:  Susann Vorberg; Stefan Seemayer; Johannes Söding
Journal:  PLoS Comput Biol       Date:  2018-11-05       Impact factor: 4.475

4.  Evolutionary information for specifying a protein fold.

Authors:  Michael Socolich; Steve W Lockless; William P Russ; Heather Lee; Kevin H Gardner; Rama Ranganathan
Journal:  Nature       Date:  2005-09-22       Impact factor: 49.962

5.  High-resolution protein design with backbone freedom.

Authors:  P B Harbury; J J Plecs; B Tidor; T Alber; P S Kim
Journal:  Science       Date:  1998-11-20       Impact factor: 47.728

Review 6.  The coming of age of de novo protein design.

Authors:  Po-Ssu Huang; Scott E Boyken; David Baker
Journal:  Nature       Date:  2016-09-15       Impact factor: 49.962

7.  Learning protein constitutive motifs from sequence data.

Authors:  Jérôme Tubiana; Simona Cocco; Rémi Monasson
Journal:  Elife       Date:  2019-03-12       Impact factor: 8.140

8.  Global analysis of protein folding using massively parallel design, synthesis, and testing.

Authors:  Gabriel J Rocklin; Tamuka M Chidyausiku; Inna Goreshnik; Alex Ford; Scott Houliston; Alexander Lemak; Lauren Carter; Rashmi Ravichandran; Vikram K Mulligan; Aaron Chevalier; Cheryl H Arrowsmith; David Baker
Journal:  Science       Date:  2017-07-14       Impact factor: 47.728

9.  Protein 3D structure computed from evolutionary sequence variation.

Authors:  Debora S Marks; Lucy J Colwell; Robert Sheridan; Thomas A Hopf; Andrea Pagnani; Riccardo Zecchina; Chris Sander
Journal:  PLoS One       Date:  2011-12-07       Impact factor: 3.240

Review 10.  Consensus protein design.

Authors:  Benjamin T Porebski; Ashley M Buckle
Journal:  Protein Eng Des Sel       Date:  2016-06-05       Impact factor: 1.650

View more
  16 in total

Review 1.  Protein-Protein Docking: Past, Present, and Future.

Authors:  Sharon Sunny; P B Jayaraj
Journal:  Protein J       Date:  2021-11-17       Impact factor: 2.371

2.  De novo protein design by deep network hallucination.

Authors:  Ivan Anishchenko; Samuel J Pellock; Tamuka M Chidyausiku; Theresa A Ramelot; Sergey Ovchinnikov; Jingzhou Hao; Khushboo Bafna; Christoffer Norn; Alex Kang; Asim K Bera; Frank DiMaio; Lauren Carter; Cameron M Chow; Gaetano T Montelione; David Baker
Journal:  Nature       Date:  2021-12-01       Impact factor: 69.504

Review 3.  Machine learning to navigate fitness landscapes for protein engineering.

Authors:  Chase R Freschlin; Sarah A Fahlberg; Philip A Romero
Journal:  Curr Opin Biotechnol       Date:  2022-04-09       Impact factor: 10.279

4.  Utilizing graph machine learning within drug discovery and development.

Authors:  Thomas Gaudelet; Ben Day; Arian R Jamasb; Jyothish Soman; Cristian Regep; Gertrude Liu; Jeremy B R Hayter; Richard Vickers; Charles Roberts; Jian Tang; David Roblin; Tom L Blundell; Michael M Bronstein; Jake P Taylor-King
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

5.  Therapeutic enzyme engineering using a generative neural network.

Authors:  Athanasios Dousis; Kanchana Ravichandran; Kevin Smith; Andrew Giessel; Sreyoshi Sur; Iain McFadyen; Wei Zheng; Stuart Licht
Journal:  Sci Rep       Date:  2022-01-27       Impact factor: 4.379

6.  Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Authors:  Sam Gelman; Sarah A Fahlberg; Pete Heinzelman; Philip A Romero; Anthony Gitter
Journal:  Proc Natl Acad Sci U S A       Date:  2021-11-30       Impact factor: 11.205

7.  Reduced antigenicity of Omicron lowers host serologic response.

Authors:  Jérôme Tubiana; Yufei Xiang; Li Fan; Haim J Wolfson; Kong Chen; Dina Schneidman-Duhovny; Yi Shi
Journal:  bioRxiv       Date:  2022-02-15

8.  Learning meaningful representations of protein sequences.

Authors:  Nicki Skafte Detlefsen; Søren Hauberg; Wouter Boomsma
Journal:  Nat Commun       Date:  2022-04-08       Impact factor: 14.919

9.  The generative capacity of probabilistic protein sequence models.

Authors:  Francisco McGee; Sandro Hauri; Quentin Novinger; Slobodan Vucetic; Ronald M Levy; Vincenzo Carnevale; Allan Haldane
Journal:  Nat Commun       Date:  2021-11-02       Impact factor: 14.919

10.  Efficient generative modeling of protein sequences using simple autoregressive models.

Authors:  Jeanne Trinquier; Guido Uguzzoni; Andrea Pagnani; Francesco Zamponi; Martin Weigt
Journal:  Nat Commun       Date:  2021-10-04       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.