Literature DB >> 34937698

On the sparsity of fitness functions and implications for learning.

David H Brookes1, Amirali Aghazadeh2, Jennifer Listgarten3,4.   

Abstract

Fitness functions map biological sequences to a scalar property of interest. Accurate estimation of these functions yields biological insight and sets the foundation for model-based sequence design. However, the fitness datasets available to learn these functions are typically small relative to the large combinatorial space of sequences; characterizing how much data are needed for accurate estimation remains an open problem. There is a growing body of evidence demonstrating that empirical fitness functions display substantial sparsity when represented in terms of epistatic interactions. Moreover, the theory of Compressed Sensing provides scaling laws for the number of samples required to exactly recover a sparse function. Motivated by these results, we develop a framework to study the sparsity of fitness functions sampled from a generalization of the NK model, a widely used random field model of fitness functions. In particular, we present results that allow us to test the effect of the Generalized NK (GNK) model's interpretable parameters-sequence length, alphabet size, and assumed interactions between sequence positions-on the sparsity of fitness functions sampled from the model and, consequently, the number of measurements required to exactly recover these functions. We validate our framework by demonstrating that GNK models with parameters set according to structural considerations can be used to accurately approximate the number of samples required to recover two empirical protein fitness functions and an RNA fitness function. In addition, we show that these GNK models identify important higher-order epistatic interactions in the empirical fitness functions using only structural information.
Copyright © 2021 the Author(s). Published by PNAS.

Entities:  

Keywords:  compressed sensing; epistasis; fitness functions; protein structure

Mesh:

Year:  2022        PMID: 34937698      PMCID: PMC8740588          DOI: 10.1073/pnas.2109649118

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   12.779


  43 in total

1.  The NK model of rugged fitness landscapes and its application to maturation of the immune response.

Authors:  S A Kauffman; E D Weinberger
Journal:  J Theor Biol       Date:  1989-11-21       Impact factor: 2.691

2.  Extracting characteristic properties of fitness landscape from in vitro molecular evolution: a case study on infectivity of fd phage to E.coli.

Authors:  Takuyo Aita; Yuuki Hayashi; Hitoshi Toyota; Yuzuru Husimi; Itaru Urabe; Tetsuya Yomo
Journal:  J Theor Biol       Date:  2007-01-20       Impact factor: 2.691

3.  Biophysical Inference of Epistasis and the Effects of Mutations on Protein Stability and Function.

Authors:  Jakub Otwinowski
Journal:  Mol Biol Evol       Date:  2018-10-01       Impact factor: 16.240

4.  Towards a general theory of adaptive walks on rugged landscapes.

Authors:  S Kauffman; S Levin
Journal:  J Theor Biol       Date:  1987-09-07       Impact factor: 2.691

5.  Deep diversification of an AAV capsid protein by machine learning.

Authors:  Drew H Bryant; Ali Bashir; Sam Sinai; Nina K Jain; Pierce J Ogden; Patrick F Riley; George M Church; Lucy J Colwell; Eric D Kelsic
Journal:  Nat Biotechnol       Date:  2021-02-11       Impact factor: 54.908

6.  Structural characterization of acylimine-containing blue and red chromophores in mTagBFP and TagRFP fluorescent proteins.

Authors:  Oksana M Subach; Vladimir N Malashkevich; Wendy D Zencheck; Kateryna S Morozova; Kiryl D Piatkevich; Steven C Almo; Vladislav V Verkhusha
Journal:  Chem Biol       Date:  2010-04-23

7.  Higher-order epistasis shapes the fitness landscape of a xenobiotic-degrading enzyme.

Authors:  Gloria Yang; Dave W Anderson; Florian Baier; Elias Dohmen; Nansook Hong; Paul D Carr; Shina Caroline Lynn Kamerlin; Colin J Jackson; Erich Bornberg-Bauer; Nobuhiko Tokuriki
Journal:  Nat Chem Biol       Date:  2019-10-21       Impact factor: 15.040

8.  Low-N protein engineering with data-efficient deep learning.

Authors:  Surojit Biswas; Grigory Khimulya; Ethan C Alley; Kevin M Esvelt; George M Church
Journal:  Nat Methods       Date:  2021-04-07       Impact factor: 28.547

9.  ViennaRNA Package 2.0.

Authors:  Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal:  Algorithms Mol Biol       Date:  2011-11-24       Impact factor: 1.405

10.  I-TASSER server: new development for protein structure and function predictions.

Authors:  Jianyi Yang; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2015-04-16       Impact factor: 16.971

View more
  3 in total

Review 1.  Current progress and open challenges for applying deep learning across the biosciences.

Authors:  Nicolae Sapoval; Amirali Aghazadeh; Michael G Nute; Dinler A Antunes; Advait Balaji; Richard Baraniuk; C J Barberan; Ruth Dannenfelser; Chen Dun; Mohammadamin Edrisi; R A Leo Elworth; Bryce Kille; Anastasios Kyrillidis; Luay Nakhleh; Cameron R Wolfe; Zhi Yan; Vicky Yao; Todd J Treangen
Journal:  Nat Commun       Date:  2022-04-01       Impact factor: 14.919

2.  Experimental exploration of a ribozyme neutral network using evolutionary algorithm and deep learning.

Authors:  Rachapun Rotrattanadumrong; Yohei Yokobayashi
Journal:  Nat Commun       Date:  2022-08-17       Impact factor: 17.694

3.  Higher-order epistasis and phenotypic prediction.

Authors:  Juannan Zhou; Mandy S Wong; Wei-Chia Chen; Adrian R Krainer; Justin B Kinney; David M McCandlish
Journal:  Proc Natl Acad Sci U S A       Date:  2022-09-21       Impact factor: 12.779

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.