Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Embeddings from deep learning transfer GO annotations beyond homology.

Literature DB >> 33441905

Embeddings from deep learning transfer GO annotations beyond homology.

Maria Littmann^1,2, Michael Heinzinger^3,4, Christian Dallago^3,4, Tobias Olenyi³, Burkhard Rost^3,5,6,7.

Abstract

Knowing protein function is crucial to advance molecular and medical biology, yet experimental function annotations through the Gene Ontology (GO) exist for fewer than 0.5% of all known proteins. Computational methods bridge this sequence-annotation gap typically through homology-based annotation transfer by identifying sequence-similar proteins with known function or through prediction methods using evolutionary information. Here, we propose predicting GO terms through annotation transfer based on proximity of proteins in the SeqVec embedding rather than in sequence space. These embeddings originate from deep learned language models (LMs) for protein sequences (SeqVec) transferring the knowledge gained from predicting the next amino acid in 33 million protein sequences. Replicating the conditions of CAFA3, our method reaches an Fmax of 37 ± 2%, 50 ± 3%, and 57 ± 2% for BPO, MFO, and CCO, respectively. Numerically, this appears close to the top ten CAFA3 methods. When restricting the annotation transfer to proteins with < 20% pairwise sequence identity to the query, performance drops (Fmax BPO 33 ± 2%, MFO 43 ± 3%, CCO 53 ± 2%); this still outperforms naïve sequence-based transfer. Preliminary results from CAFA4 appear to confirm these findings. Overall, this new concept is likely to change the annotation of proteins, in particular for proteins from smaller families or proteins with intrinsically disordered regions.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33441905 PMCID： PMC7806674 DOI： 10.1038/s41598-020-80786-0

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

37 in total

1. PSORT: a program for detecting sorting signals in proteins and predicting their subcellular localization.

Authors: K Nakai; P Horton
Journal: Trends Biochem Sci Date: 1999-01 Impact factor: 13.807

Review 2. Automatic prediction of protein function.

Authors: B Rost; J Liu; R Nair; K O Wrzeszczynski; Y Ofran
Journal: Cell Mol Life Sci Date: 2003-12 Impact factor: 9.261

3. Long short-term memory.

Authors: S Hochreiter; J Schmidhuber
Journal: Neural Comput Date: 1997-11-15 Impact factor: 2.026

4. DeepGOPlus: improved protein function prediction from sequence.

Authors: Maxat Kulmanov; Robert Hoehndorf
Journal: Bioinformatics Date: 2020-01-15 Impact factor: 6.937

Review 5. Prediction of structural and functional features of protein and nucleic acid sequences by artificial neural networks.

Authors: J D Hirst; M J Sternberg
Journal: Biochemistry Date: 1992-08-18 Impact factor: 3.162

6. An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors: Yuxiang Jiang; Tal Ronnen Oron; Wyatt T Clark; Asma R Bankapur; Daniel D'Andrea; Rosalba Lepore; Christopher S Funk; Indika Kahanda; Karin M Verspoor; Asa Ben-Hur; Da Chen Emily Koo; Duncan Penfold-Brown; Dennis Shasha; Noah Youngs; Richard Bonneau; Alexandra Lin; Sayed M E Sahraeian; Pier Luigi Martelli; Giuseppe Profiti; Rita Casadio; Renzhi Cao; Zhaolong Zhong; Jianlin Cheng; Adrian Altenhoff; Nives Skunca; Christophe Dessimoz; Tunca Dogan; Kai Hakala; Suwisa Kaewphan; Farrokh Mehryary; Tapio Salakoski; Filip Ginter; Hai Fang; Ben Smithers; Matt Oates; Julian Gough; Petri Törönen; Patrik Koskinen; Liisa Holm; Ching-Tai Chen; Wen-Lian Hsu; Kevin Bryson; Domenico Cozzetto; Federico Minneci; David T Jones; Samuel Chapman; Dukka Bkc; Ishita K Khan; Daisuke Kihara; Dan Ofer; Nadav Rappoport; Amos Stern; Elena Cibrian-Uhalte; Paul Denny; Rebecca E Foulger; Reija Hieta; Duncan Legge; Ruth C Lovering; Michele Magrane; Anna N Melidoni; Prudence Mutowo-Meullenet; Klemens Pichler; Aleksandra Shypitsyna; Biao Li; Pooya Zakeri; Sarah ElShal; Léon-Charles Tranchevent; Sayoni Das; Natalie L Dawson; David Lee; Jonathan G Lees; Ian Sillitoe; Prajwal Bhat; Tamás Nepusz; Alfonso E Romero; Rajkumar Sasidharan; Haixuan Yang; Alberto Paccanaro; Jesse Gillis; Adriana E Sedeño-Cortés; Paul Pavlidis; Shou Feng; Juan M Cejuela; Tatyana Goldberg; Tobias Hamp; Lothar Richter; Asaf Salamov; Toni Gabaldon; Marina Marcet-Houben; Fran Supek; Qingtian Gong; Wei Ning; Yuanpeng Zhou; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Stefano Toppo; Carlo Ferrari; Manuel Giollo; Damiano Piovesan; Silvio C E Tosatto; Angela Del Pozo; José M Fernández; Paolo Maietta; Alfonso Valencia; Michael L Tress; Alfredo Benso; Stefano Di Carlo; Gianfranco Politano; Alessandro Savino; Hafeez Ur Rehman; Matteo Re; Marco Mesiti; Giorgio Valentini; Joachim W Bargsten; Aalt D J van Dijk; Branislava Gemovic; Sanja Glisic; Vladmir Perovic; Veljko Veljkovic; Nevena Veljkovic; Danillo C Almeida-E-Silva; Ricardo Z N Vencio; Malvika Sharan; Jörg Vogel; Lakesh Kansakar; Shanshan Zhang; Slobodan Vucetic; Zheng Wang; Michael J E Sternberg; Mark N Wass; Rachael P Huntley; Maria J Martin; Claire O'Donovan; Peter N Robinson; Yves Moreau; Anna Tramontano; Patricia C Babbitt; Steven E Brenner; Michal Linial; Christine A Orengo; Burkhard Rost; Casey S Greene; Sean D Mooney; Iddo Friedberg; Predrag Radivojac
Journal: Genome Biol Date: 2016-09-07 Impact factor: 13.583

7. NaviGO: interactive tool for visualization and functional similarity and coherence analysis with gene ontology.

Authors: Qing Wei; Ishita K Khan; Ziyun Ding; Satwica Yerneni; Daisuke Kihara
Journal: BMC Bioinformatics Date: 2017-03-20 Impact factor: 3.169

8. UniProt: a worldwide hub of protein knowledge.

Authors:
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

9. CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors: Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal: Bioinformatics Date: 2012-10-11 Impact factor: 6.937

10. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Authors: Naihui Zhou; Yuxiang Jiang; Timothy R Bergquist; Alexandra J Lee; Balint Z Kacsoh; Alex W Crocker; Kimberley A Lewis; George Georghiou; Huy N Nguyen; Md Nafiz Hamid; Larry Davis; Tunca Dogan; Volkan Atalay; Ahmet S Rifaioglu; Alperen Dalkıran; Rengul Cetin Atalay; Chengxin Zhang; Rebecca L Hurto; Peter L Freddolino; Yang Zhang; Prajwal Bhat; Fran Supek; José M Fernández; Branislava Gemovic; Vladimir R Perovic; Radoslav S Davidović; Neven Sumonja; Nevena Veljkovic; Ehsaneddin Asgari; Mohammad R K Mofrad; Giuseppe Profiti; Castrense Savojardo; Pier Luigi Martelli; Rita Casadio; Florian Boecker; Heiko Schoof; Indika Kahanda; Natalie Thurlby; Alice C McHardy; Alexandre Renaux; Rabie Saidi; Julian Gough; Alex A Freitas; Magdalena Antczak; Fabio Fabris; Mark N Wass; Jie Hou; Jianlin Cheng; Zheng Wang; Alfonso E Romero; Alberto Paccanaro; Haixuan Yang; Tatyana Goldberg; Chenguang Zhao; Liisa Holm; Petri Törönen; Alan J Medlar; Elaine Zosa; Itamar Borukhov; Ilya Novikov; Angela Wilkins; Olivier Lichtarge; Po-Han Chi; Wei-Cheng Tseng; Michal Linial; Peter W Rose; Christophe Dessimoz; Vedrana Vidulin; Saso Dzeroski; Ian Sillitoe; Sayoni Das; Jonathan Gill Lees; David T Jones; Cen Wan; Domenico Cozzetto; Rui Fa; Mateo Torres; Alex Warwick Vesztrocy; Jose Manuel Rodriguez; Michael L Tress; Marco Frasca; Marco Notaro; Giuliano Grossi; Alessandro Petrini; Matteo Re; Giorgio Valentini; Marco Mesiti; Daniel B Roche; Jonas Reeb; David W Ritchie; Sabeur Aridhi; Seyed Ziaeddin Alborzi; Marie-Dominique Devignes; Da Chen Emily Koo; Richard Bonneau; Vladimir Gligorijević; Meet Barot; Hai Fang; Stefano Toppo; Enrico Lavezzo; Marco Falda; Michele Berselli; Silvio C E Tosatto; Marco Carraro; Damiano Piovesan; Hafeez Ur Rehman; Qizhong Mao; Shanshan Zhang; Slobodan Vucetic; Gage S Black; Dane Jo; Erica Suh; Jonathan B Dayton; Dallas J Larsen; Ashton R Omdahl; Liam J McGuffin; Danielle A Brackenridge; Patricia C Babbitt; Jeffrey M Yunes; Paolo Fontana; Feng Zhang; Shanfeng Zhu; Ronghui You; Zihan Zhang; Suyang Dai; Shuwei Yao; Weidong Tian; Renzhi Cao; Caleb Chandler; Miguel Amezola; Devon Johnson; Jia-Ming Chang; Wen-Hung Liao; Yi-Wei Liu; Stefano Pascarelli; Yotam Frank; Robert Hoehndorf; Maxat Kulmanov; Imane Boudellioua; Gianfranco Politano; Stefano Di Carlo; Alfredo Benso; Kai Hakala; Filip Ginter; Farrokh Mehryary; Suwisa Kaewphan; Jari Björne; Hans Moen; Martti E E Tolvanen; Tapio Salakoski; Daisuke Kihara; Aashish Jain; Tomislav Šmuc; Adrian Altenhoff; Asa Ben-Hur; Burkhard Rost; Steven E Brenner; Christine A Orengo; Constance J Jeffery; Giovanni Bosco; Deborah A Hogan; Maria J Martin; Claire O'Donovan; Sean D Mooney; Casey S Greene; Predrag Radivojac; Iddo Friedberg
Journal: Genome Biol Date: 2019-11-19 Impact factor: 13.583

14 in total

1. Contrastive learning on protein embeddings enlightens midnight zone.

Authors: Michael Heinzinger; Maria Littmann; Ian Sillitoe; Nicola Bordin; Christine Orengo; Burkhard Rost
Journal: NAR Genom Bioinform Date: 2022-06-11

2. deepSimDEF: deep neural embeddings of gene products and Gene Ontology terms for functional analysis of genes.

Authors: Ahmad Pesaranghader; Stan Matwin; Marina Sokolova; Jean-Christophe Grenier; Robert G Beiko; Julie Hussin
Journal: Bioinformatics Date: 2022-05-10 Impact factor: 6.931

3. PANDA2: protein function prediction using graph neural networks.

Authors: Chenguang Zhao; Tong Liu; Zheng Wang
Journal: NAR Genom Bioinform Date: 2022-02-02

4. Rhea, the reaction knowledgebase in 2022.

Authors: Parit Bansal; Anne Morgat; Kristian B Axelsen; Venkatesh Muthukrishnan; Elisabeth Coudert; Lucila Aimo; Nevila Hyka-Nouspikel; Elisabeth Gasteiger; Arnaud Kerhornou; Teresa Batista Neto; Monica Pozzato; Marie-Claude Blatter; Alex Ignatchenko; Nicole Redaschi; Alan Bridge
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

5. Embeddings from protein language models predict conservation and variant effects.

Authors: Céline Marquet; Michael Heinzinger; Tobias Olenyi; Christian Dallago; Kyra Erckert; Michael Bernhofer; Dmitrii Nechaev; Burkhard Rost
Journal: Hum Genet Date: 2021-12-30 Impact factor: 5.881

6. Protein embeddings and deep learning predict binding residues for various ligand classes.

Authors: Maria Littmann; Michael Heinzinger; Christian Dallago; Konstantin Weissenow; Burkhard Rost
Journal: Sci Rep Date: 2021-12-13 Impact factor: 4.379

Review 7. Deep learning in prediction of intrinsic disorder in proteins.

Authors: Bi Zhao; Lukasz Kurgan
Journal: Comput Struct Biotechnol J Date: 2022-03-08 Impact factor: 7.271

8. TMbed: transmembrane proteins predicted through language model embeddings.

Authors: Michael Bernhofer; Burkhard Rost
Journal: BMC Bioinformatics Date: 2022-08-08 Impact factor: 3.307

Review 9. Representation learning applications in biological sequence analysis.

Authors: Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal: Comput Struct Biotechnol J Date: 2021-05-23 Impact factor: 7.271

10. PredictProtein - Predicting Protein Structure and Function for 29 Years.

Authors: Michael Bernhofer; Christian Dallago; Tim Karl; Venkata Satagopam; Michael Heinzinger; Maria Littmann; Tobias Olenyi; Jiajun Qiu; Konstantin Schütze; Guy Yachdav; Haim Ashkenazy; Nir Ben-Tal; Yana Bromberg; Tatyana Goldberg; Laszlo Kajan; Sean O'Donoghue; Chris Sander; Andrea Schafferhans; Avner Schlessinger; Gerrit Vriend; Milot Mirdita; Piotr Gawron; Wei Gu; Yohan Jarosz; Christophe Trefois; Martin Steinegger; Reinhard Schneider; Burkhard Rost
Journal: Nucleic Acids Res Date: 2021-07-02 Impact factor: 16.971