Literature DB >> 34139171

Learning the protein language: Evolution, structure, and function.

Tristan Bepler1, Bonnie Berger2.   

Abstract

Language models have recently emerged as a powerful machine-learning approach for distilling information from massive protein sequence databases. From readily available sequence data alone, these models discover evolutionary, structural, and functional organization across protein space. Using language models, we can encode amino-acid sequences into distributed vector representations that capture their structural and functional properties, as well as evaluate the evolutionary fitness of sequence variants. We discuss recent advances in protein language modeling and their applications to downstream protein property prediction problems. We then consider how these models can be enriched with prior biological knowledge and introduce an approach for encoding protein structural knowledge into the learned representations. The knowledge distilled by these models allows us to improve downstream function prediction through transfer learning. Deep protein language models are revolutionizing protein biology. They suggest new ways to approach protein and therapeutic design. However, further developments are needed to encode strong biological priors into protein language models and to increase their accessibility to the broader community.
Copyright © 2021 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  contact prediction; deep neural networks; inductive bias; language models; natural language processing; protein sequences; proteins; transfer learning; transmembrane region prediction

Mesh:

Substances:

Year:  2021        PMID: 34139171      PMCID: PMC8238390          DOI: 10.1016/j.cels.2021.05.017

Source DB:  PubMed          Journal:  Cell Syst        ISSN: 2405-4712            Impact factor:   11.091


  78 in total

1.  GROMACS 4:  Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.

Authors:  Berk Hess; Carsten Kutzner; David van der Spoel; Erik Lindahl
Journal:  J Chem Theory Comput       Date:  2008-03       Impact factor: 6.006

Review 2.  Structure-based inference of molecular functions of proteins of unknown function from Berkeley Structural Genomics Center.

Authors:  Dong Hae Shin; Jingtong Hou; John-Marc Chandonia; Debanu Das; In-Geol Choi; Rosalind Kim; Sung-Hou Kim
Journal:  J Struct Funct Genomics       Date:  2007-09-02

3.  Capturing the mutational landscape of the beta-lactamase TEM-1.

Authors:  Hervé Jacquier; André Birgy; Hervé Le Nagard; Yves Mechulam; Emmanuelle Schmitt; Jérémy Glodt; Beatrice Bercot; Emmanuelle Petit; Julie Poulain; Guilène Barnaud; Pierre-Alexis Gros; Olivier Tenaillon
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-22       Impact factor: 11.205

4.  Improvements to the ABSINTH Force Field for Proteins Based on Experimentally Derived Amino Acid Specific Backbone Conformational Statistics.

Authors:  Jeong-Mo Choi; Rohit V Pappu
Journal:  J Chem Theory Comput       Date:  2019-01-22       Impact factor: 6.006

5.  Leveraging Uncertainty in Machine Learning Accelerates Biological Discovery and Design.

Authors:  Brian Hie; Bryan D Bryson; Bonnie Berger
Journal:  Cell Syst       Date:  2020-10-15       Impact factor: 10.304

6.  Predicting coiled coils by use of pairwise residue correlations.

Authors:  B Berger; D B Wilson; E Wolf; T Tonchev; M Milla; P S Kim
Journal:  Proc Natl Acad Sci U S A       Date:  1995-08-29       Impact factor: 11.205

7.  Learning the language of viral evolution and escape.

Authors:  Brian Hie; Ellen D Zhong; Bonnie Berger; Bryan Bryson
Journal:  Science       Date:  2021-01-15       Impact factor: 47.728

8.  End-to-End Differentiable Learning of Protein Structure.

Authors:  Mohammed AlQuraishi
Journal:  Cell Syst       Date:  2019-04-17       Impact factor: 10.304

9.  A framework for exhaustively mapping functional missense variants.

Authors:  Jochen Weile; Song Sun; Atina G Cote; Jennifer Knapp; Marta Verby; Joseph C Mellor; Yingzhou Wu; Carles Pons; Cassandra Wong; Natascha van Lieshout; Fan Yang; Murat Tasan; Guihong Tan; Shan Yang; Douglas M Fowler; Robert Nussbaum; Jesse D Bloom; Marc Vidal; David E Hill; Patrick Aloy; Frederick P Roth
Journal:  Mol Syst Biol       Date:  2017-12-21       Impact factor: 11.429

10.  Deep mutational scanning of an RRM domain of the Saccharomyces cerevisiae poly(A)-binding protein.

Authors:  Daniel Melamed; David L Young; Caitlin E Gamble; Christina R Miller; Stanley Fields
Journal:  RNA       Date:  2013-09-24       Impact factor: 4.942

View more
  23 in total

1.  Taxonomic classification of DNA sequences beyond sequence similarity using deep neural networks.

Authors:  Florian Mock; Fleming Kretschmer; Anton Kriese; Sebastian Böcker; Manja Marz
Journal:  Proc Natl Acad Sci U S A       Date:  2022-08-26       Impact factor: 12.779

2.  D-SCRIPT translates genome to phenome with sequence-based, structure-aware, genome-scale predictions of protein-protein interactions.

Authors:  Samuel Sledzieski; Rohit Singh; Lenore Cowen; Bonnie Berger
Journal:  Cell Syst       Date:  2021-10-09       Impact factor: 11.091

3.  Deciphering microbial gene function using natural language processing.

Authors:  Danielle Miller; Adi Stern; David Burstein
Journal:  Nat Commun       Date:  2022-09-29       Impact factor: 17.694

4.  Phylogenetic analysis and characterization of arsenic (As) transforming bacterial marker proteins following isolation of As-tolerant indigenous bacteria.

Authors:  Md Numan Islam; Md Suzauddula; Zubayed Ahamed; Md Golam Rabby; Md Munnaf Hossen; Mrityunjoy Biswas; Mantasa Bonny; Md Mahmudul Hasan
Journal:  Arch Microbiol       Date:  2022-10-03       Impact factor: 2.667

5.  Contrastive learning on protein embeddings enlightens midnight zone.

Authors:  Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal:  NAR Genom Bioinform       Date:  2022-06-11

6.  RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

Authors:  Xinxin Peng; Xiaoyu Wang; Yuming Guo; Zongyuan Ge; Fuyi Li; Xin Gao; Jiangning Song
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

7.  Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering.

Authors:  Jesse Horne; Diwakar Shukla
Journal:  Ind Eng Chem Res       Date:  2022-04-06       Impact factor: 4.326

8.  Topsy-Turvy: integrating a global view into sequence-based PPI prediction.

Authors:  Rohit Singh; Kapil Devkota; Samuel Sledzieski; Bonnie Berger; Lenore Cowen
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

Review 9.  Machine learning to navigate fitness landscapes for protein engineering.

Authors:  Chase R Freschlin; Sarah A Fahlberg; Philip A Romero
Journal:  Curr Opin Biotechnol       Date:  2022-04-09       Impact factor: 10.279

Review 10.  Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms.

Authors:  Mohammed AlQuraishi; Peter K Sorger
Journal:  Nat Methods       Date:  2021-10-04       Impact factor: 28.547

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.