Literature DB >> 35190689

Using deep learning to annotate the protein universe.

Maxwell L Bileschi1, David Belanger2, Drew H Bryant2, Theo Sanderson2,3, Brandon Carter4, D Sculley2, Alex Bateman5, Mark A DePristo2,6, Lucy J Colwell7,8.   

Abstract

Understanding the relationship between amino acid sequence and protein function is a long-standing challenge with far-reaching scientific and translational implications. State-of-the-art alignment-based techniques cannot predict function for one-third of microbial protein sequences, hampering our ability to exploit data from diverse organisms. Here, we train deep learning models to accurately predict functional annotations for unaligned amino acid sequences across rigorous benchmark assessments built from the 17,929 families of the protein families database Pfam. The models infer known patterns of evolutionary substitutions and learn representations that accurately cluster sequences from unseen families. Combining deep models with existing methods significantly improves remote homology detection, suggesting that the deep models learn complementary information. This approach extends the coverage of Pfam by >9.5%, exceeding additions made over the last decade, and predicts function for 360 human reference proteome proteins with no previous Pfam annotation. These results suggest that deep learning models will be a core component of future protein annotation tools.
© 2022. The Author(s), under exclusive licence to Springer Nature America, Inc.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35190689     DOI: 10.1038/s41587-021-01179-w

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   68.164


  8 in total

1.  Constructing benchmark test sets for biological sequence analysis using independent set algorithms.

Authors:  Samantha Petti; Sean R Eddy
Journal:  PLoS Comput Biol       Date:  2022-03-07       Impact factor: 4.475

2.  DeePVP: Identification and classification of phage virion proteins using deep learning.

Authors:  Zhencheng Fang; Tao Feng; Hongwei Zhou; Muxuan Chen
Journal:  Gigascience       Date:  2022-08-11       Impact factor: 7.658

3.  Unifying the known and unknown microbial coding sequence space.

Authors:  Chiara Vanni; Matthew S Schechter; Silvia G Acinas; Albert Barberán; Pier Luigi Buttigieg; Emilio O Casamayor; Tom O Delmont; Carlos M Duarte; A Murat Eren; Robert D Finn; Renzo Kottmann; Alex Mitchell; Pablo Sánchez; Kimmo Siren; Martin Steinegger; Frank Oliver Gloeckner; Antonio Fernàndez-Guerra
Journal:  Elife       Date:  2022-03-31       Impact factor: 8.713

4.  Deep embeddings to comprehend and visualize microbiome protein space.

Authors:  Krzysztof Odrzywolek; Zuzanna Karwowska; Jan Majta; Aleksander Byrski; Kaja Milanowska-Zabel; Tomasz Kosciolek
Journal:  Sci Rep       Date:  2022-06-20       Impact factor: 4.996

5.  A roadmap for the functional annotation of protein families: a community perspective.

Authors:  Valérie de Crécy-Lagard; Rocio Amorin de Hegedus; Cecilia Arighi; Jill Babor; Alex Bateman; Ian Blaby; Crysten Blaby-Haas; Alan J Bridge; Stephen K Burley; Stacey Cleveland; Lucy J Colwell; Ana Conesa; Christian Dallago; Antoine Danchin; Anita de Waard; Adam Deutschbauer; Raquel Dias; Yousong Ding; Gang Fang; Iddo Friedberg; John Gerlt; Joshua Goldford; Mark Gorelik; Benjamin M Gyori; Christopher Henry; Geoffrey Hutinet; Marshall Jaroch; Peter D Karp; Liudmyla Kondratova; Zhiyong Lu; Aron Marchler-Bauer; Maria-Jesus Martin; Claire McWhite; Gaurav D Moghe; Paul Monaghan; Anne Morgat; Christopher J Mungall; Darren A Natale; William C Nelson; Seán O'Donoghue; Christine Orengo; Katherine H O'Toole; Predrag Radivojac; Colbie Reed; Richard J Roberts; Dmitri Rodionov; Irina A Rodionova; Jeffrey D Rudolf; Lana Saleh; Gloria Sheynkman; Francoise Thibaud-Nissen; Paul D Thomas; Peter Uetz; David Vallenet; Erica Watson Carter; Peter R Weigele; Valerie Wood; Elisha M Wood-Charlson; Jin Xu
Journal:  Database (Oxford)       Date:  2022-08-12       Impact factor: 4.462

6.  Conditional generative modeling for de novo protein design with hierarchical functions.

Authors:  Tim Kucera; Matteo Togninalli; Laetitia Meng-Papaxanthos
Journal:  Bioinformatics       Date:  2022-05-26       Impact factor: 6.931

7.  Prediction of evolutionary constraint by genomic annotations improves functional prioritization of genomic variants in maize.

Authors:  Guillaume P Ramstein; Edward S Buckler
Journal:  Genome Biol       Date:  2022-09-01       Impact factor: 17.906

8.  Organizing the bacterial annotation space with amino acid sequence embeddings.

Authors:  Susanna R Grigson; Jody C McKerral; James G Mitchell; Robert A Edwards
Journal:  BMC Bioinformatics       Date:  2022-09-23       Impact factor: 3.307

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.