Literature DB >> 23277561

Navigating the protein fitness landscape with Gaussian processes.

Philip A Romero1, Andreas Krause, Frances H Arnold.   

Abstract

Knowing how protein sequence maps to function (the "fitness landscape") is critical for understanding protein evolution as well as for engineering proteins with new and useful properties. We demonstrate that the protein fitness landscape can be inferred from experimental data, using Gaussian processes, a Bayesian learning technique. Gaussian process landscapes can model various protein sequence properties, including functional status, thermostability, enzyme activity, and ligand binding affinity. Trained on experimental data, these models achieve unrivaled quantitative accuracy. Furthermore, the explicit representation of model uncertainty allows for efficient searches through the vast space of possible sequences. We develop and test two protein sequence design algorithms motivated by Bayesian decision theory. The first one identifies small sets of sequences that are informative about the landscape; the second one identifies optimized sequences by iteratively improving the Gaussian process model in regions of the landscape that are predicted to be optimized. We demonstrate the ability of Gaussian processes to guide the search through protein sequence space by designing, constructing, and testing chimeric cytochrome P450s. These algorithms allowed us to engineer active P450 enzymes that are more thermostable than any previously made by chimeragenesis, rational design, or directed evolution.

Mesh:

Substances:

Year:  2012        PMID: 23277561      PMCID: PMC3549130          DOI: 10.1073/pnas.1215251110

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  24 in total

1.  Protein design is NP-hard.

Authors:  Niles A Pierce; Erik Winfree
Journal:  Protein Eng       Date:  2002-10

2.  High-throughput carbon monoxide binding assay for cytochromes p450.

Authors:  Christopher R Otey
Journal:  Methods Mol Biol       Date:  2003

3.  High-throughput screen for aromatic hydroxylation.

Authors:  Christopher R Otey; John M Joern
Journal:  Methods Mol Biol       Date:  2003

4.  Functional evolution and structural conservation in chimeric cytochromes p450: calibrating a structure-guided approach.

Authors:  Christopher R Otey; Jonathan J Silberg; Christopher A Voigt; Jeffrey B Endelman; Geethani Bandara; Frances H Arnold
Journal:  Chem Biol       Date:  2004-03

5.  Estimating the prevalence of protein sequences adopting functional enzyme folds.

Authors:  Douglas D Axe
Journal:  J Mol Biol       Date:  2004-08-27       Impact factor: 5.469

6.  Diversification of catalytic function in a synthetic family of chimeric cytochrome p450s.

Authors:  Marco Landwehr; Martina Carbone; Christopher R Otey; Yougen Li; Frances H Arnold
Journal:  Chem Biol       Date:  2007-03

7.  A diverse family of thermostable cytochrome P450s created by recombination of stabilizing fragments.

Authors:  Yougen Li; D Allan Drummond; Andrew M Sawayama; Christopher D Snow; Jesse D Bloom; Frances H Arnold
Journal:  Nat Biotechnol       Date:  2007-08-26       Impact factor: 54.908

8.  De novo protein design: fully automated sequence selection.

Authors:  B I Dahiyat; S L Mayo
Journal:  Science       Date:  1997-10-03       Impact factor: 47.728

Review 9.  Exploring protein fitness landscapes by directed evolution.

Authors:  Philip A Romero; Frances H Arnold
Journal:  Nat Rev Mol Cell Biol       Date:  2009-12       Impact factor: 94.444

10.  Engineering proteinase K using machine learning and synthetic genes.

Authors:  Jun Liao; Manfred K Warmuth; Sridhar Govindarajan; Jon E Ness; Rebecca P Wang; Claes Gustafsson; Jeremy Minshull
Journal:  BMC Biotechnol       Date:  2007-03-26       Impact factor: 2.563

View more
  52 in total

1.  Optimization of therapeutic antibodies by predicting antigen specificity from antibody sequence via deep learning.

Authors:  Derek M Mason; Simon Friedensohn; Cédric R Weber; Christian Jordi; Bastian Wagner; Simon M Meng; Roy A Ehling; Lucia Bonati; Jan Dahinden; Pablo Gainza; Bruno E Correia; Sai T Reddy
Journal:  Nat Biomed Eng       Date:  2021-04-15       Impact factor: 25.671

2.  Dissecting enzyme function with microfluidic-based deep mutational scanning.

Authors:  Philip A Romero; Tuan M Tran; Adam R Abate
Journal:  Proc Natl Acad Sci U S A       Date:  2015-05-26       Impact factor: 11.205

3.  Peptide design by optimization on a data-parameterized protein interaction landscape.

Authors:  Justin M Jenson; Vincent Xue; Lindsey Stretz; Tirtha Mandal; Lothar Luther Reich; Amy E Keating
Journal:  Proc Natl Acad Sci U S A       Date:  2018-10-15       Impact factor: 11.205

4.  Chimeragenesis of distantly-related proteins by noncontiguous recombination.

Authors:  Matthew A Smith; Philip A Romero; Timothy Wu; Eric M Brustad; Frances H Arnold
Journal:  Protein Sci       Date:  2012-12-29       Impact factor: 6.725

5.  Machine learning-assisted directed protein evolution with combinatorial libraries.

Authors:  Zachary Wu; S B Jennifer Kan; Russell D Lewis; Bruce J Wittmann; Frances H Arnold
Journal:  Proc Natl Acad Sci U S A       Date:  2019-04-12       Impact factor: 11.205

6.  Learning epistatic interactions from sequence-activity data to predict enantioselectivity.

Authors:  Julian Zaugg; Yosephine Gumulya; Alpeshkumar K Malde; Mikael Bodén
Journal:  J Comput Aided Mol Des       Date:  2017-12-12       Impact factor: 3.686

7.  Learned protein embeddings for machine learning.

Authors:  Kevin K Yang; Zachary Wu; Claire N Bedbrook; Frances H Arnold
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

8.  Minimum epistasis interpolation for sequence-function relationships.

Authors:  Juannan Zhou; David M McCandlish
Journal:  Nat Commun       Date:  2020-04-14       Impact factor: 14.919

9.  Unified rational protein engineering with sequence-based deep representation learning.

Authors:  Ethan C Alley; Grigory Khimulya; Surojit Biswas; Mohammed AlQuraishi; George M Church
Journal:  Nat Methods       Date:  2019-10-21       Impact factor: 28.547

10.  Global analysis of protein folding using massively parallel design, synthesis, and testing.

Authors:  Gabriel J Rocklin; Tamuka M Chidyausiku; Inna Goreshnik; Alex Ford; Scott Houliston; Alexander Lemak; Lauren Carter; Rashmi Ravichandran; Vikram K Mulligan; Aaron Chevalier; Cheryl H Arrowsmith; David Baker
Journal:  Science       Date:  2017-07-14       Impact factor: 47.728

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.