Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Literature DB >> 33212013

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Hyebin Song¹, Bennett J Bremer², Emily C Hinds², Garvesh Raskutti³, Philip A Romero⁴.

Abstract

Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.

Entities: Chemical

Keywords: deep mutational scanning; positive-unlabeled learning; protein engineering; protein sequence function relationships; statistical learning; supervised learning

Mesh：

Substances：
Proteins

Year: 2020 PMID： 33212013 PMCID： PMC7856229 DOI： 10.1016/j.cels.2020.10.007

Source DB: PubMed Journal: Cell Syst ISSN： 2405-4712 Impact factor: 10.304

44 in total

1. Direct-coupling analysis of residue coevolution captures native contacts across many protein families.

Authors: Faruck Morcos; Andrea Pagnani; Bryan Lunt; Arianna Bertolino; Debora S Marks; Chris Sander; Riccardo Zecchina; José N Onuchic; Terence Hwa; Martin Weigt
Journal: Proc Natl Acad Sci U S A Date: 2011-11-21 Impact factor: 11.205

2. Dissecting enzyme function with microfluidic-based deep mutational scanning.

Authors: Philip A Romero; Tuan M Tran; Adam R Abate
Journal: Proc Natl Acad Sci U S A Date: 2015-05-26 Impact factor: 11.205

3. Massively parallel screening of synthetic microbial communities.

Authors: Jared Kehe; Anthony Kulesa; Anthony Ortiz; Cheri M Ackerman; Sri Gowtham Thakku; Daniel Sellers; Seppe Kuehn; Jeff Gore; Jonathan Friedman; Paul C Blainey
Journal: Proc Natl Acad Sci U S A Date: 2019-06-11 Impact factor: 11.205

4. Improved mutant function prediction via PACT: Protein Analysis and Classifier Toolkit.

Authors: Justin R Klesmith; Benjamin J Hackel
Journal: Bioinformatics Date: 2019-08-15 Impact factor: 6.937

5. Fast gapped-read alignment with Bowtie 2.

Authors: Ben Langmead; Steven L Salzberg
Journal: Nat Methods Date: 2012-03-04 Impact factor: 28.547

Review 6. Deep sequencing methods for protein engineering and design.

Authors: Emily E Wrenbeck; Matthew S Faber; Timothy A Whitehead
Journal: Curr Opin Struct Biol Date: 2016-11-22 Impact factor: 6.809

7. Comparative protein modelling by satisfaction of spatial restraints.

Authors: A Sali; T L Blundell
Journal: J Mol Biol Date: 1993-12-05 Impact factor: 5.469

8. PUlasso: High-Dimensional Variable Selection With Presence-Only Data.

Authors: Hyebin Song; Garvesh Raskutti
Journal: J Am Stat Assoc Date: 2019-04-11 Impact factor: 5.033

9. Deep generative models of genetic variation capture the effects of mutations.

Authors: Adam J Riesselman; John B Ingraham; Debora S Marks
Journal: Nat Methods Date: 2018-09-24 Impact factor: 28.547

10. High-resolution comparative modeling with RosettaCM.

Authors: Yifan Song; Frank DiMaio; Ray Yu-Ruei Wang; David Kim; Chris Miles; Tj Brunette; James Thompson; David Baker
Journal: Structure Date: 2013-09-12 Impact factor: 5.006

7 in total

Review 1. Learning Strategies in Protein Directed Evolution.

Authors: Xavier F Cadet; Jean Christophe Gelly; Aster van Noord; Frédéric Cadet; Carlos G Acevedo-Rocha
Journal: Methods Mol Biol Date: 2022

2. Recent Advances in Machine Learning Variant Effect Prediction Tools for Protein Engineering.

Authors: Jesse Horne; Diwakar Shukla
Journal: Ind Eng Chem Res Date: 2022-04-06 Impact factor: 4.326

3. Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2.

Authors: Bo Wang; Eric R Gamazon
Journal: iScience Date: 2022-06-02

4. Microfluidic deep mutational scanning of the human executioner caspases reveals differences in structure and regulation.

Authors: Hridindu Roychowdhury; Philip A Romero
Journal: Cell Death Discov Date: 2022-01-10