Literature DB >> 30857591

Learning protein constitutive motifs from sequence data.

Jérôme Tubiana1, Simona Cocco1, Rémi Monasson1.   

Abstract

Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype-phenotype relationship for protein families.
© 2019, Tubiana et al.

Entities:  

Keywords:  coevolution; computational biology; machine learning; none; physics of living systems; sequence analysis; systems biology

Mesh:

Substances:

Year:  2019        PMID: 30857591      PMCID: PMC6436896          DOI: 10.7554/eLife.39397

Source DB:  PubMed          Journal:  Elife        ISSN: 2050-084X            Impact factor:   8.140


  73 in total

1.  Structure of TPR domain-peptide complexes: critical elements in the assembly of the Hsp70-Hsp90 multichaperone machine.

Authors:  C Scheufler; A Brinker; G Bourenkov; S Pegoraro; L Moroder; H Bartunik; F U Hartl; I Moarefi
Journal:  Cell       Date:  2000-04-14       Impact factor: 41.582

2.  Computer simulation of protein folding.

Authors:  M Levitt; A Warshel
Journal:  Nature       Date:  1975-02-27       Impact factor: 49.962

3.  Protein interactions and ligand binding: from protein subfamilies to functional specificity.

Authors:  Antonio Rausell; David Juan; Florencio Pazos; Alfonso Valencia
Journal:  Proc Natl Acad Sci U S A       Date:  2010-01-19       Impact factor: 11.205

4.  A conserved loop in the ATPase domain of the DnaK chaperone is essential for stable binding of GrpE.

Authors:  A Buchberger; H Schröder; M Büttner; A Valencia; B Bukau
Journal:  Nat Struct Biol       Date:  1994-02

5.  Deep generative models of genetic variation capture the effects of mutations.

Authors:  Adam J Riesselman; John B Ingraham; Debora S Marks
Journal:  Nat Methods       Date:  2018-09-24       Impact factor: 28.547

6.  Three-dimensional structures of membrane proteins from genomic sequencing.

Authors:  Thomas A Hopf; Lucy J Colwell; Robert Sheridan; Burkhard Rost; Chris Sander; Debora S Marks
Journal:  Cell       Date:  2012-05-10       Impact factor: 41.582

7.  Accelerated Profile HMM Searches.

Authors:  Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2011-10-20       Impact factor: 4.475

8.  Improving contact prediction along three dimensions.

Authors:  Christoph Feinauer; Marcin J Skwark; Andrea Pagnani; Erik Aurell
Journal:  PLoS Comput Biol       Date:  2014-10-09       Impact factor: 4.475

9.  Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models.

Authors:  Hugo Jacquin; Amy Gilson; Eugene Shakhnovich; Simona Cocco; Rémi Monasson
Journal:  PLoS Comput Biol       Date:  2016-05-13       Impact factor: 4.475

10.  Determinants of protein function revealed by combinatorial entropy optimization.

Authors:  Boris Reva; Yevgeniy Antipin; Chris Sander
Journal:  Genome Biol       Date:  2007       Impact factor: 13.583

View more
  22 in total

1.  Coevolutionary Couplings Unravel PAM-Proximal Constraints of CRISPR-SpCas9.

Authors:  Yi Li; José A De la Paz; Xianli Jiang; Richard Liu; Adarsha P Pokkulandra; Leonidas Bleris; Faruck Morcos
Journal:  Biophys J       Date:  2019-10-08       Impact factor: 4.033

2.  Epistasis and entrenchment of drug resistance in HIV-1 subtype B.

Authors:  Avik Biswas; Allan Haldane; Eddy Arnold; Ronald M Levy
Journal:  Elife       Date:  2019-10-08       Impact factor: 8.140

3.  ScanNet: an interpretable geometric deep learning model for structure-based protein binding site prediction.

Authors:  Jérôme Tubiana; Dina Schneidman-Duhovny; Haim J Wolfson
Journal:  Nat Methods       Date:  2022-05-30       Impact factor: 28.547

4.  Size and structure of the sequence space of repeat proteins.

Authors:  Jacopo Marchi; Ezequiel A Galpern; Rocio Espada; Diego U Ferreiro; Aleksandra M Walczak; Thierry Mora
Journal:  PLoS Comput Biol       Date:  2019-08-15       Impact factor: 4.475

5.  Obtaining extremely large and accurate protein multiple sequence alignments from curated hierarchical alignments.

Authors:  Andrew F Neuwald; Christopher J Lanczycki; Theresa K Hodges; Aron Marchler-Bauer
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

6.  Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting.

Authors:  Duccio Malinverni; Alessandro Barducci
Journal:  Entropy (Basel)       Date:  2019-11-16       Impact factor: 2.524

7.  Generating functional protein variants with variational autoencoders.

Authors:  Alex Hawkins-Hooker; Florence Depardieu; Sebastien Baur; Guillaume Couairon; Arthur Chen; David Bikard
Journal:  PLoS Comput Biol       Date:  2021-02-26       Impact factor: 4.475

8.  Direct coupling analysis of epistasis in allosteric materials.

Authors:  Barbara Bravi; Riccardo Ravasio; Carolina Brito; Matthieu Wyart
Journal:  PLoS Comput Biol       Date:  2020-03-02       Impact factor: 4.475

9.  Unsupervised Inference of Protein Fitness Landscape from Deep Mutational Scan.

Authors:  Jorge Fernandez-de-Cossio-Diaz; Guido Uguzzoni; Andrea Pagnani
Journal:  Mol Biol Evol       Date:  2021-01-04       Impact factor: 16.240

10.  A Bacterial Inflammation Sensor Regulates c-di-GMP Signaling, Adhesion, and Biofilm Formation.

Authors:  Arden Perkins; Dan A Tudorica; Raphael D Teixeira; Tilman Schirmer; Lindsay Zumwalt; O Maduka Ogba; C Keith Cassidy; Phillip J Stansfeld; Karen Guillemin
Journal:  mBio       Date:  2021-06-22       Impact factor: 7.867

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.