Literature DB >> 35811998

Cluster learning-assisted directed evolution.

Yuchi Qiu1, Jian Hu2,3, Guo-Wei Wei1,3,4.   

Abstract

Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.

Entities:  

Keywords:  Protein engineering; clustering; directed evolution; fitness; machine learning

Year:  2021        PMID: 35811998      PMCID: PMC9267417          DOI: 10.1038/s43588-021-00168-y

Source DB:  PubMed          Journal:  Nat Comput Sci        ISSN: 2662-8457


  38 in total

Review 1.  Enzyme promiscuity: a mechanistic and evolutionary perspective.

Authors:  Olga Khersonsky; Dan S Tawfik
Journal:  Annu Rev Biochem       Date:  2010       Impact factor: 23.643

2.  Interpretable numerical descriptors of amino acid space.

Authors:  Alexander G Georgiev
Journal:  J Comput Biol       Date:  2009-05       Impact factor: 1.479

3.  Protein evolution. Pervasive degeneracy and epistasis in a protein-protein interface.

Authors:  Anna I Podgornaia; Michael T Laub
Journal:  Science       Date:  2015-02-06       Impact factor: 47.728

4.  ProFET: Feature engineering captures high-level protein functions.

Authors:  Dan Ofer; Michal Linial
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

5.  Navigating the protein fitness landscape with Gaussian processes.

Authors:  Philip A Romero; Andreas Krause; Frances H Arnold
Journal:  Proc Natl Acad Sci U S A       Date:  2012-12-31       Impact factor: 11.205

6.  Inference and multiscale model of epithelial-to-mesenchymal transition via single-cell transcriptomic data.

Authors:  Yutong Sha; Shuxiong Wang; Peijie Zhou; Qing Nie
Journal:  Nucleic Acids Res       Date:  2020-09-25       Impact factor: 16.971

7.  Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Authors:  Andrew Butler; Paul Hoffman; Peter Smibert; Efthymia Papalexi; Rahul Satija
Journal:  Nat Biotechnol       Date:  2018-04-02       Impact factor: 54.908

8.  Evaluating Protein Transfer Learning with TAPE.

Authors:  Roshan Rao; Nicholas Bhattacharya; Neil Thomas; Yan Duan; Xi Chen; John Canny; Pieter Abbeel; Yun S Song
Journal:  Adv Neural Inf Process Syst       Date:  2019-12

Review 9.  The Role of Protein Engineering in Biomedical Applications of Mammalian Synthetic Biology.

Authors:  Daniel Bojar; Martin Fussenegger
Journal:  Small       Date:  2019-10-07       Impact factor: 15.153

10.  Machine learning to design integral membrane channelrhodopsins for efficient eukaryotic expression and plasma membrane localization.

Authors:  Claire N Bedbrook; Kevin K Yang; Austin J Rice; Viviana Gradinaru; Frances H Arnold
Journal:  PLoS Comput Biol       Date:  2017-10-23       Impact factor: 4.475

View more
  1 in total

Review 1.  Making Enzymes Suitable for Organic Chemistry by Rational Protein Design.

Authors:  Manfred Reetz
Journal:  Chembiochem       Date:  2022-04-27       Impact factor: 3.461

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.