Literature DB >> 33961736

Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets.

Christian Dallago1,2, Konstantin Schütze1, Michael Heinzinger1,2, Tobias Olenyi1, Maria Littmann1,2, Amy X Lu3, Kevin K Yang4, Seonwoo Min5, Sungroh Yoon5,6, James T Morton7, Burkhard Rost1,8,9,10,11.   

Abstract

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols.
© 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC.

Keywords:  deep learning embeddings; machine learning; protein annotation pipeline; protein representations; protein visualization

Mesh:

Substances:

Year:  2021        PMID: 33961736     DOI: 10.1002/cpz1.113

Source DB:  PubMed          Journal:  Curr Protoc        ISSN: 2594-1321


  9 in total

1.  Contrastive learning on protein embeddings enlightens midnight zone.

Authors:  Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal:  NAR Genom Bioinform       Date:  2022-06-11

2.  Protein transfer learning improves identification of heat shock protein families.

Authors:  Seonwoo Min; HyunGi Kim; Byunghan Lee; Sungroh Yoon
Journal:  PLoS One       Date:  2021-05-18       Impact factor: 3.240

3.  Embeddings from protein language models predict conservation and variant effects.

Authors:  Céline Marquet; Michael Heinzinger; Tobias Olenyi; Christian Dallago; Kyra Erckert; Michael Bernhofer; Dmitrii Nechaev; Burkhard Rost
Journal:  Hum Genet       Date:  2021-12-30       Impact factor: 5.881

4.  Protein embeddings and deep learning predict binding residues for various ligand classes.

Authors:  Maria Littmann; Michael Heinzinger; Christian Dallago; Konstantin Weissenow; Burkhard Rost
Journal:  Sci Rep       Date:  2021-12-13       Impact factor: 4.379

5.  Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation.

Authors:  Katarzyna Stapor; Krzysztof Kotowski; Tomasz Smolarczyk; Irena Roterman
Journal:  BMC Bioinformatics       Date:  2022-03-22       Impact factor: 3.169

6.  Machine learning modeling of family wide enzyme-substrate specificity screens.

Authors:  Samuel Goldman; Ria Das; Kevin K Yang; Connor W Coley
Journal:  PLoS Comput Biol       Date:  2022-02-10       Impact factor: 4.475

7.  Identification of Phage Receptor-Binding Protein Sequences with Hidden Markov Models and an Extreme Gradient Boosting Classifier.

Authors:  Dimitri Boeckaerts; Michiel Stock; Bernard De Baets; Yves Briers
Journal:  Viruses       Date:  2022-06-17       Impact factor: 5.818

8.  A roadmap for the functional annotation of protein families: a community perspective.

Authors:  Valérie de Crécy-Lagard; Rocio Amorin de Hegedus; Cecilia Arighi; Jill Babor; Alex Bateman; Ian Blaby; Crysten Blaby-Haas; Alan J Bridge; Stephen K Burley; Stacey Cleveland; Lucy J Colwell; Ana Conesa; Christian Dallago; Antoine Danchin; Anita de Waard; Adam Deutschbauer; Raquel Dias; Yousong Ding; Gang Fang; Iddo Friedberg; John Gerlt; Joshua Goldford; Mark Gorelik; Benjamin M Gyori; Christopher Henry; Geoffrey Hutinet; Marshall Jaroch; Peter D Karp; Liudmyla Kondratova; Zhiyong Lu; Aron Marchler-Bauer; Maria-Jesus Martin; Claire McWhite; Gaurav D Moghe; Paul Monaghan; Anne Morgat; Christopher J Mungall; Darren A Natale; William C Nelson; Seán O'Donoghue; Christine Orengo; Katherine H O'Toole; Predrag Radivojac; Colbie Reed; Richard J Roberts; Dmitri Rodionov; Irina A Rodionova; Jeffrey D Rudolf; Lana Saleh; Gloria Sheynkman; Francoise Thibaud-Nissen; Paul D Thomas; Peter Uetz; David Vallenet; Erica Watson Carter; Peter R Weigele; Valerie Wood; Elisha M Wood-Charlson; Jin Xu
Journal:  Database (Oxford)       Date:  2022-08-12       Impact factor: 4.462

9.  PredictProtein - Predicting Protein Structure and Function for 29 Years.

Authors:  Michael Bernhofer; Christian Dallago; Tim Karl; Venkata Satagopam; Michael Heinzinger; Maria Littmann; Tobias Olenyi; Jiajun Qiu; Konstantin Schütze; Guy Yachdav; Haim Ashkenazy; Nir Ben-Tal; Yana Bromberg; Tatyana Goldberg; Laszlo Kajan; Sean O'Donoghue; Chris Sander; Andrea Schafferhans; Avner Schlessinger; Gerrit Vriend; Milot Mirdita; Piotr Gawron; Wei Gu; Yohan Jarosz; Christophe Trefois; Martin Steinegger; Reinhard Schneider; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.