Literature DB >> 33212013

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Hyebin Song1, Bennett J Bremer2, Emily C Hinds2, Garvesh Raskutti3, Philip A Romero4.   

Abstract

Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.
Copyright © 2020 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  deep mutational scanning; positive-unlabeled learning; protein engineering; protein sequence function relationships; statistical learning; supervised learning

Mesh:

Substances:

Year:  2020        PMID: 33212013      PMCID: PMC7856229          DOI: 10.1016/j.cels.2020.10.007

Source DB:  PubMed          Journal:  Cell Syst        ISSN: 2405-4712            Impact factor:   10.304


  44 in total

1.  Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Authors:  Faruck Morcos; Andrea Pagnani; Bryan Lunt; Arianna Bertolino; Debora S Marks; Chris Sander; Riccardo Zecchina; José N Onuchic; Terence Hwa; Martin Weigt
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-21       Impact factor: 11.205

2.  Dissecting enzyme function with microfluidic-based deep mutational scanning.

Authors:  Philip A Romero; Tuan M Tran; Adam R Abate
Journal:  Proc Natl Acad Sci U S A       Date:  2015-05-26       Impact factor: 11.205

3.  Massively parallel screening of synthetic microbial communities.

Authors:  Jared Kehe; Anthony Kulesa; Anthony Ortiz; Cheri M Ackerman; Sri Gowtham Thakku; Daniel Sellers; Seppe Kuehn; Jeff Gore; Jonathan Friedman; Paul C Blainey
Journal:  Proc Natl Acad Sci U S A       Date:  2019-06-11       Impact factor: 11.205

4.  Improved mutant function prediction via PACT: Protein Analysis and Classifier Toolkit.

Authors:  Justin R Klesmith; Benjamin J Hackel
Journal:  Bioinformatics       Date:  2019-08-15       Impact factor: 6.937

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

Review 6.  Deep sequencing methods for protein engineering and design.

Authors:  Emily E Wrenbeck; Matthew S Faber; Timothy A Whitehead
Journal:  Curr Opin Struct Biol       Date:  2016-11-22       Impact factor: 6.809

7.  Comparative protein modelling by satisfaction of spatial restraints.

Authors:  A Sali; T L Blundell
Journal:  J Mol Biol       Date:  1993-12-05       Impact factor: 5.469

8.  PUlasso: High-Dimensional Variable Selection With Presence-Only Data.

Authors:  Hyebin Song; Garvesh Raskutti
Journal:  J Am Stat Assoc       Date:  2019-04-11       Impact factor: 5.033

9.  Deep generative models of genetic variation capture the effects of mutations.

Authors:  Adam J Riesselman; John B Ingraham; Debora S Marks
Journal:  Nat Methods       Date:  2018-09-24       Impact factor: 28.547

10.  High-resolution comparative modeling with RosettaCM.

Authors:  Yifan Song; Frank DiMaio; Ray Yu-Ruei Wang; David Kim; Chris Miles; Tj Brunette; James Thompson; David Baker
Journal:  Structure       Date:  2013-09-12       Impact factor: 5.006

View more
  7 in total

Review 1.  Learning Strategies in Protein Directed Evolution.

Authors:  Xavier F Cadet; Jean Christophe Gelly; Aster van Noord; Frédéric Cadet; Carlos G Acevedo-Rocha
Journal:  Methods Mol Biol       Date:  2022

2.  Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering.

Authors:  Jesse Horne; Diwakar Shukla
Journal:  Ind Eng Chem Res       Date:  2022-04-06       Impact factor: 4.326

3.  Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.

Authors:  Bo Wang; Eric R Gamazon
Journal:  iScience       Date:  2022-06-02

4.  Microfluidic deep mutational scanning of the human executioner caspases reveals differences in structure and regulation.

Authors:  Hridindu Roychowdhury; Philip A Romero
Journal:  Cell Death Discov       Date:  2022-01-10

5.  Neural networks to learn protein sequence-function relationships from deep mutational scanning data.

Authors:  Sam Gelman; Sarah A Fahlberg; Pete Heinzelman; Philip A Romero; Anthony Gitter
Journal:  Proc Natl Acad Sci U S A       Date:  2021-11-30       Impact factor: 11.205

6.  Exaggerated trans-membrane charge of ammonium transporters in nutrient-poor marine environments.

Authors:  Matthew Kellom; Stefano Pagliara; Thomas A Richards; Alyson E Santoro
Journal:  Open Biol       Date:  2022-07-13       Impact factor: 7.124

7.  Design of synthetic human gut microbiome assembly and butyrate production.

Authors:  Ryan L Clark; Bryce M Connors; David M Stevenson; Susan E Hromada; Joshua J Hamilton; Daniel Amador-Noguez; Ophelia S Venturelli
Journal:  Nat Commun       Date:  2021-05-31       Impact factor: 14.919

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.