Literature DB >> 33441905

Embeddings from deep learning transfer GO annotations beyond homology.

Maria Littmann1,2, Michael Heinzinger3,4, Christian Dallago3,4, Tobias Olenyi3, Burkhard Rost3,5,6,7.   

Abstract

Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Entities:  

Year:  2021        PMID: 33441905      PMCID: PMC7806674          DOI: 10.1038/s41598-020-80786-0

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


  37 in total

1.  PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.

Authors:  K Nakai; P Horton
Journal:  Trends Biochem Sci       Date:  1999-01       Impact factor: 13.807

Review 2.  Automatic prediction of protein function.

Authors:  B Rost; J Liu; R Nair; K O Wrzeszczynski; Y Ofran
Journal:  Cell Mol Life Sci       Date:  2003-12       Impact factor: 9.261

3.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

4.  DeepGOPlus: improved protein function prediction from sequence.

Authors:  Maxat Kulmanov; Robert Hoehndorf
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

Review 5.  Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks.

Authors:  J D Hirst; M J Sternberg
Journal:  Biochemistry       Date:  1992-08-18       Impact factor: 3.162

6.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors:  Yuxiang Jiang; Tal Ronnen Oron; Wyatt T Clark; Asma R Bankapur; Daniel D'Andrea; Rosalba Lepore; Christopher S Funk; Indika Kahanda; Karin M Verspoor; Asa Ben-Hur; Da Chen Emily Koo; Duncan Penfold-Brown; Dennis Shasha; Noah Youngs; Richard Bonneau; Alexandra Lin; Sayed M E Sahraeian; Pier Luigi Martelli; Giuseppe Profiti; Rita Casadio; Renzhi Cao; Zhaolong Zhong; Jianlin Cheng; Adrian Altenhoff; Nives Skunca; Christophe Dessimoz; Tunca Dogan; Kai Hakala; Suwisa Kaewphan; Farrokh Mehryary; Tapio Salakoski; Filip Ginter; Hai Fang; Ben Smithers; Matt Oates; Julian Gough; Petri Törönen; Patrik Koskinen; Liisa Holm; Ching-Tai Chen; Wen-Lian Hsu; Kevin Bryson; Domenico Cozzetto; Federico Minneci; David T Jones; Samuel Chapman; Dukka Bkc; Ishita K Khan; Daisuke Kihara; Dan Ofer; Nadav Rappoport; Amos Stern; Elena Cibrian-Uhalte; Paul Denny; Rebecca E Foulger; Reija Hieta; Duncan Legge; Ruth C Lovering; Michele Magrane; Anna N Melidoni; Prudence Mutowo-Meullenet; Klemens Pichler; Aleksandra Shypitsyna; Biao Li; Pooya Zakeri; Sarah ElShal; Léon-Charles Tranchevent; Sayoni Das; Natalie L Dawson; David Lee; Jonathan G Lees; Ian Sillitoe; Prajwal Bhat; Tamás Nepusz; Alfonso E Romero; Rajkumar Sasidharan; Haixuan Yang; Alberto Paccanaro; Jesse Gillis; Adriana E Sedeño-Cortés; Paul Pavlidis; Shou Feng; Juan M Cejuela; Tatyana Goldberg; Tobias Hamp; Lothar Richter; Asaf Salamov; Toni Gabaldon; Marina Marcet-Houben; Fran Supek; Qingtian Gong; Wei Ning; Yuanpeng Zhou; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Stefano Toppo; Carlo Ferrari; Manuel Giollo; Damiano Piovesan; Silvio C E Tosatto; Angela Del Pozo; José M Fernández; Paolo Maietta; Alfonso Valencia; Michael L Tress; Alfredo Benso; Stefano Di Carlo; Gianfranco Politano; Alessandro Savino; Hafeez Ur Rehman; Matteo Re; Marco Mesiti; Giorgio Valentini; Joachim W Bargsten; Aalt D J van Dijk; Branislava Gemovic; Sanja Glisic; Vladmir Perovic; Veljko Veljkovic; Nevena Veljkovic; Danillo C Almeida-E-Silva; Ricardo Z N Vencio; Malvika Sharan; Jörg Vogel; Lakesh Kansakar; Shanshan Zhang; Slobodan Vucetic; Zheng Wang; Michael J E Sternberg; Mark N Wass; Rachael P Huntley; Maria J Martin; Claire O'Donovan; Peter N Robinson; Yves Moreau; Anna Tramontano; Patricia C Babbitt; Steven E Brenner; Michal Linial; Christine A Orengo; Burkhard Rost; Casey S Greene; Sean D Mooney; Iddo Friedberg; Predrag Radivojac
Journal:  Genome Biol       Date:  2016-09-07       Impact factor: 13.583

7.  NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

Authors:  Qing Wei; Ishita K Khan; Ziyun Ding; Satwica Yerneni; Daisuke Kihara
Journal:  BMC Bioinformatics       Date:  2017-03-20       Impact factor: 3.169

8.  UniProt: a worldwide hub of protein knowledge.

Authors: 
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

9.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

10.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Authors:  Naihui Zhou; Yuxiang Jiang; Timothy R Bergquist; Alexandra J Lee; Balint Z Kacsoh; Alex W Crocker; Kimberley A Lewis; George Georghiou; Huy N Nguyen; Md Nafiz Hamid; Larry Davis; Tunca Dogan; Volkan Atalay; Ahmet S Rifaioglu; Alperen Dalkıran; Rengul Cetin Atalay; Chengxin Zhang; Rebecca L Hurto; Peter L Freddolino; Yang Zhang; Prajwal Bhat; Fran Supek; José M Fernández; Branislava Gemovic; Vladimir R Perovic; Radoslav S Davidović; Neven Sumonja; Nevena Veljkovic; Ehsaneddin Asgari; Mohammad R K Mofrad; Giuseppe Profiti; Castrense Savojardo; Pier Luigi Martelli; Rita Casadio; Florian Boecker; Heiko Schoof; Indika Kahanda; Natalie Thurlby; Alice C McHardy; Alexandre Renaux; Rabie Saidi; Julian Gough; Alex A Freitas; Magdalena Antczak; Fabio Fabris; Mark N Wass; Jie Hou; Jianlin Cheng; Zheng Wang; Alfonso E Romero; Alberto Paccanaro; Haixuan Yang; Tatyana Goldberg; Chenguang Zhao; Liisa Holm; Petri Törönen; Alan J Medlar; Elaine Zosa; Itamar Borukhov; Ilya Novikov; Angela Wilkins; Olivier Lichtarge; Po-Han Chi; Wei-Cheng Tseng; Michal Linial; Peter W Rose; Christophe Dessimoz; Vedrana Vidulin; Saso Dzeroski; Ian Sillitoe; Sayoni Das; Jonathan Gill Lees; David T Jones; Cen Wan; Domenico Cozzetto; Rui Fa; Mateo Torres; Alex Warwick Vesztrocy; Jose Manuel Rodriguez; Michael L Tress; Marco Frasca; Marco Notaro; Giuliano Grossi; Alessandro Petrini; Matteo Re; Giorgio Valentini; Marco Mesiti; Daniel B Roche; Jonas Reeb; David W Ritchie; Sabeur Aridhi; Seyed Ziaeddin Alborzi; Marie-Dominique Devignes; Da Chen Emily Koo; Richard Bonneau; Vladimir Gligorijević; Meet Barot; Hai Fang; Stefano Toppo; Enrico Lavezzo; Marco Falda; Michele Berselli; Silvio C E Tosatto; Marco Carraro; Damiano Piovesan; Hafeez Ur Rehman; Qizhong Mao; Shanshan Zhang; Slobodan Vucetic; Gage S Black; Dane Jo; Erica Suh; Jonathan B Dayton; Dallas J Larsen; Ashton R Omdahl; Liam J McGuffin; Danielle A Brackenridge; Patricia C Babbitt; Jeffrey M Yunes; Paolo Fontana; Feng Zhang; Shanfeng Zhu; Ronghui You; Zihan Zhang; Suyang Dai; Shuwei Yao; Weidong Tian; Renzhi Cao; Caleb Chandler; Miguel Amezola; Devon Johnson; Jia-Ming Chang; Wen-Hung Liao; Yi-Wei Liu; Stefano Pascarelli; Yotam Frank; Robert Hoehndorf; Maxat Kulmanov; Imane Boudellioua; Gianfranco Politano; Stefano Di Carlo; Alfredo Benso; Kai Hakala; Filip Ginter; Farrokh Mehryary; Suwisa Kaewphan; Jari Björne; Hans Moen; Martti E E Tolvanen; Tapio Salakoski; Daisuke Kihara; Aashish Jain; Tomislav Šmuc; Adrian Altenhoff; Asa Ben-Hur; Burkhard Rost; Steven E Brenner; Christine A Orengo; Constance J Jeffery; Giovanni Bosco; Deborah A Hogan; Maria J Martin; Claire O'Donovan; Sean D Mooney; Casey S Greene; Predrag Radivojac; Iddo Friedberg
Journal:  Genome Biol       Date:  2019-11-19       Impact factor: 13.583

View more
  14 in total

1.  Contrastive learning on protein embeddings enlightens midnight zone.

Authors:  Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal:  NAR Genom Bioinform       Date:  2022-06-11

2.  deepSimDEF: deep neural embeddings of gene products and Gene Ontology terms for functional analysis of genes.

Authors:  Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Jean-Christophe Grenier; Robert G Beiko; Julie Hussin
Journal:  Bioinformatics       Date:  2022-05-10       Impact factor: 6.931

3.  PANDA2: protein function prediction using graph neural networks.

Authors:  Chenguang Zhao; Tong Liu; Zheng Wang
Journal:  NAR Genom Bioinform       Date:  2022-02-02

4.  Rhea, the reaction knowledgebase in 2022.

Authors:  Parit Bansal; Anne Morgat; Kristian B Axelsen; Venkatesh Muthukrishnan; Elisabeth Coudert; Lucila Aimo; Nevila Hyka-Nouspikel; Elisabeth Gasteiger; Arnaud Kerhornou; Teresa Batista Neto; Monica Pozzato; Marie-Claude Blatter; Alex Ignatchenko; Nicole Redaschi; Alan Bridge
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

5.  Embeddings from protein language models predict conservation and variant effects.

Authors:  Céline Marquet; Michael Heinzinger; Tobias Olenyi; Christian Dallago; Kyra Erckert; Michael Bernhofer; Dmitrii Nechaev; Burkhard Rost
Journal:  Hum Genet       Date:  2021-12-30       Impact factor: 5.881

6.  Protein embeddings and deep learning predict binding residues for various ligand classes.

Authors:  Maria Littmann; Michael Heinzinger; Christian Dallago; Konstantin Weissenow; Burkhard Rost
Journal:  Sci Rep       Date:  2021-12-13       Impact factor: 4.379

Review 7.  Deep learning in prediction of intrinsic disorder in proteins.

Authors:  Bi Zhao; Lukasz Kurgan
Journal:  Comput Struct Biotechnol J       Date:  2022-03-08       Impact factor: 7.271

8.  TMbed: transmembrane proteins predicted through language model embeddings.

Authors:  Michael Bernhofer; Burkhard Rost
Journal:  BMC Bioinformatics       Date:  2022-08-08       Impact factor: 3.307

Review 9.  Representation learning applications in biological sequence analysis.

Authors:  Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal:  Comput Struct Biotechnol J       Date:  2021-05-23       Impact factor: 7.271

10.  PredictProtein - Predicting Protein Structure and Function for 29 Years.

Authors:  Michael Bernhofer; Christian Dallago; Tim Karl; Venkata Satagopam; Michael Heinzinger; Maria Littmann; Tobias Olenyi; Jiajun Qiu; Konstantin Schütze; Guy Yachdav; Haim Ashkenazy; Nir Ben-Tal; Yana Bromberg; Tatyana Goldberg; Laszlo Kajan; Sean O'Donoghue; Chris Sander; Andrea Schafferhans; Avner Schlessinger; Gerrit Vriend; Milot Mirdita; Piotr Gawron; Wei Gu; Yohan Jarosz; Christophe Trefois; Martin Steinegger; Reinhard Schneider; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.