Literature DB >> 34038555

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Shuwei Yao1,2, Ronghui You1,2, Shaojun Wang2, Yi Xiong3, Xiaodi Huang4, Shanfeng Zhu2,5,6,7,8.   

Abstract

With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about protein sequences to achieve good performance by dealing with all possible GO terms (>44 000). In this work, we propose the updated version as NetGO 2.0, which further improves the performance of large-scale AFP. NetGO 2.0 also incorporates literature information by logistic regression and deep sequence information by recurrent neural network (RNN) into the framework. We generate datasets following the critical assessment of functional annotation (CAFA) protocol. Experiment results show that NetGO 2.0 outperformed NetGO significantly in biological process ontology (BPO) and cellular component ontology (CCO). In particular, NetGO 2.0 achieved a 12.6% improvement over NetGO in terms of area under precision-recall curve (AUPR) in BPO and around 2.6% in terms of $\mathbf {F_{max}}$ in CCO. These results demonstrate the benefits of incorporating text and deep sequence information for the functional annotation of BPO and CCO. The NetGO 2.0 web server is freely available at http://issubmission.sjtu.edu.cn/ng2/.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Year:  2021        PMID: 34038555      PMCID: PMC8262706          DOI: 10.1093/nar/gkab398

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  18 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  UniProtKB/Swiss-Prot, the Manually Annotated Section of the UniProt KnowledgeBase: How to Use the Entry View.

Authors:  Emmanuel Boutet; Damien Lieberherr; Michael Tognolli; Michel Schneider; Parit Bansal; Alan J Bridge; Sylvain Poux; Lydie Bougueleret; Ioannis Xenarios
Journal:  Methods Mol Biol       Date:  2016

3.  DeepGOPlus: improved protein function prediction from sequence.

Authors:  Maxat Kulmanov; Robert Hoehndorf
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

4.  DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

Authors:  Ronghui You; Xiaodi Huang; Shanfeng Zhu
Journal:  Methods       Date:  2018-06-06       Impact factor: 3.608

5.  NetGO: improving large-scale protein function prediction with massive network information.

Authors:  Ronghui You; Shuwei Yao; Yi Xiong; Xiaodi Huang; Fengzhu Sun; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

6.  GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.

Authors:  Ronghui You; Zihan Zhang; Yi Xiong; Fengzhu Sun; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2018-07-15       Impact factor: 6.937

7.  InterProScan 5: genome-scale protein function classification.

Authors:  Philip Jones; David Binns; Hsin-Yu Chang; Matthew Fraser; Weizhong Li; Craig McAnulla; Hamish McWilliam; John Maslen; Alex Mitchell; Gift Nuka; Sebastien Pesseat; Antony F Quinn; Amaia Sangrador-Vegas; Maxim Scheremetjew; Siew-Yit Yong; Rodrigo Lopez; Sarah Hunter
Journal:  Bioinformatics       Date:  2014-01-21       Impact factor: 6.937

8.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors:  Yuxiang Jiang; Tal Ronnen Oron; Wyatt T Clark; Asma R Bankapur; Daniel D'Andrea; Rosalba Lepore; Christopher S Funk; Indika Kahanda; Karin M Verspoor; Asa Ben-Hur; Da Chen Emily Koo; Duncan Penfold-Brown; Dennis Shasha; Noah Youngs; Richard Bonneau; Alexandra Lin; Sayed M E Sahraeian; Pier Luigi Martelli; Giuseppe Profiti; Rita Casadio; Renzhi Cao; Zhaolong Zhong; Jianlin Cheng; Adrian Altenhoff; Nives Skunca; Christophe Dessimoz; Tunca Dogan; Kai Hakala; Suwisa Kaewphan; Farrokh Mehryary; Tapio Salakoski; Filip Ginter; Hai Fang; Ben Smithers; Matt Oates; Julian Gough; Petri Törönen; Patrik Koskinen; Liisa Holm; Ching-Tai Chen; Wen-Lian Hsu; Kevin Bryson; Domenico Cozzetto; Federico Minneci; David T Jones; Samuel Chapman; Dukka Bkc; Ishita K Khan; Daisuke Kihara; Dan Ofer; Nadav Rappoport; Amos Stern; Elena Cibrian-Uhalte; Paul Denny; Rebecca E Foulger; Reija Hieta; Duncan Legge; Ruth C Lovering; Michele Magrane; Anna N Melidoni; Prudence Mutowo-Meullenet; Klemens Pichler; Aleksandra Shypitsyna; Biao Li; Pooya Zakeri; Sarah ElShal; Léon-Charles Tranchevent; Sayoni Das; Natalie L Dawson; David Lee; Jonathan G Lees; Ian Sillitoe; Prajwal Bhat; Tamás Nepusz; Alfonso E Romero; Rajkumar Sasidharan; Haixuan Yang; Alberto Paccanaro; Jesse Gillis; Adriana E Sedeño-Cortés; Paul Pavlidis; Shou Feng; Juan M Cejuela; Tatyana Goldberg; Tobias Hamp; Lothar Richter; Asaf Salamov; Toni Gabaldon; Marina Marcet-Houben; Fran Supek; Qingtian Gong; Wei Ning; Yuanpeng Zhou; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Stefano Toppo; Carlo Ferrari; Manuel Giollo; Damiano Piovesan; Silvio C E Tosatto; Angela Del Pozo; José M Fernández; Paolo Maietta; Alfonso Valencia; Michael L Tress; Alfredo Benso; Stefano Di Carlo; Gianfranco Politano; Alessandro Savino; Hafeez Ur Rehman; Matteo Re; Marco Mesiti; Giorgio Valentini; Joachim W Bargsten; Aalt D J van Dijk; Branislava Gemovic; Sanja Glisic; Vladmir Perovic; Veljko Veljkovic; Nevena Veljkovic; Danillo C Almeida-E-Silva; Ricardo Z N Vencio; Malvika Sharan; Jörg Vogel; Lakesh Kansakar; Shanshan Zhang; Slobodan Vucetic; Zheng Wang; Michael J E Sternberg; Mark N Wass; Rachael P Huntley; Maria J Martin; Claire O'Donovan; Peter N Robinson; Yves Moreau; Anna Tramontano; Patricia C Babbitt; Steven E Brenner; Michal Linial; Christine A Orengo; Burkhard Rost; Casey S Greene; Sean D Mooney; Iddo Friedberg; Predrag Radivojac
Journal:  Genome Biol       Date:  2016-09-07       Impact factor: 13.583

9.  A large-scale evaluation of computational protein function prediction.

Authors:  Predrag Radivojac; Wyatt T Clark; Tal Ronnen Oron; Alexandra M Schnoes; Tobias Wittkop; Artem Sokolov; Kiley Graim; Christopher Funk; Karin Verspoor; Asa Ben-Hur; Gaurav Pandey; Jeffrey M Yunes; Ameet S Talwalkar; Susanna Repo; Michael L Souza; Damiano Piovesan; Rita Casadio; Zheng Wang; Jianlin Cheng; Hai Fang; Julian Gough; Patrik Koskinen; Petri Törönen; Jussi Nokso-Koivisto; Liisa Holm; Domenico Cozzetto; Daniel W A Buchan; Kevin Bryson; David T Jones; Bhakti Limaye; Harshal Inamdar; Avik Datta; Sunitha K Manjari; Rajendra Joshi; Meghana Chitale; Daisuke Kihara; Andreas M Lisewski; Serkan Erdin; Eric Venner; Olivier Lichtarge; Robert Rentzsch; Haixuan Yang; Alfonso E Romero; Prajwal Bhat; Alberto Paccanaro; Tobias Hamp; Rebecca Kaßner; Stefan Seemayer; Esmeralda Vicedo; Christian Schaefer; Dominik Achten; Florian Auer; Ariane Boehm; Tatjana Braun; Maximilian Hecht; Mark Heron; Peter Hönigschmid; Thomas A Hopf; Stefanie Kaufmann; Michael Kiening; Denis Krompass; Cedric Landerer; Yannick Mahlich; Manfred Roos; Jari Björne; Tapio Salakoski; Andrew Wong; Hagit Shatkay; Fanny Gatzmann; Ingolf Sommer; Mark N Wass; Michael J E Sternberg; Nives Škunca; Fran Supek; Matko Bošnjak; Panče Panov; Sašo Džeroski; Tomislav Šmuc; Yiannis A I Kourmpetis; Aalt D J van Dijk; Cajo J F ter Braak; Yuanpeng Zhou; Qingtian Gong; Xinran Dong; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Barbara Di Camillo; Stefano Toppo; Liang Lan; Nemanja Djuric; Yuhong Guo; Slobodan Vucetic; Amos Bairoch; Michal Linial; Patricia C Babbitt; Steven E Brenner; Christine Orengo; Burkhard Rost; Sean D Mooney; Iddo Friedberg
Journal:  Nat Methods       Date:  2013-01-27       Impact factor: 28.547

10.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Authors:  Naihui Zhou; Yuxiang Jiang; Timothy R Bergquist; Alexandra J Lee; Balint Z Kacsoh; Alex W Crocker; Kimberley A Lewis; George Georghiou; Huy N Nguyen; Md Nafiz Hamid; Larry Davis; Tunca Dogan; Volkan Atalay; Ahmet S Rifaioglu; Alperen Dalkıran; Rengul Cetin Atalay; Chengxin Zhang; Rebecca L Hurto; Peter L Freddolino; Yang Zhang; Prajwal Bhat; Fran Supek; José M Fernández; Branislava Gemovic; Vladimir R Perovic; Radoslav S Davidović; Neven Sumonja; Nevena Veljkovic; Ehsaneddin Asgari; Mohammad R K Mofrad; Giuseppe Profiti; Castrense Savojardo; Pier Luigi Martelli; Rita Casadio; Florian Boecker; Heiko Schoof; Indika Kahanda; Natalie Thurlby; Alice C McHardy; Alexandre Renaux; Rabie Saidi; Julian Gough; Alex A Freitas; Magdalena Antczak; Fabio Fabris; Mark N Wass; Jie Hou; Jianlin Cheng; Zheng Wang; Alfonso E Romero; Alberto Paccanaro; Haixuan Yang; Tatyana Goldberg; Chenguang Zhao; Liisa Holm; Petri Törönen; Alan J Medlar; Elaine Zosa; Itamar Borukhov; Ilya Novikov; Angela Wilkins; Olivier Lichtarge; Po-Han Chi; Wei-Cheng Tseng; Michal Linial; Peter W Rose; Christophe Dessimoz; Vedrana Vidulin; Saso Dzeroski; Ian Sillitoe; Sayoni Das; Jonathan Gill Lees; David T Jones; Cen Wan; Domenico Cozzetto; Rui Fa; Mateo Torres; Alex Warwick Vesztrocy; Jose Manuel Rodriguez; Michael L Tress; Marco Frasca; Marco Notaro; Giuliano Grossi; Alessandro Petrini; Matteo Re; Giorgio Valentini; Marco Mesiti; Daniel B Roche; Jonas Reeb; David W Ritchie; Sabeur Aridhi; Seyed Ziaeddin Alborzi; Marie-Dominique Devignes; Da Chen Emily Koo; Richard Bonneau; Vladimir Gligorijević; Meet Barot; Hai Fang; Stefano Toppo; Enrico Lavezzo; Marco Falda; Michele Berselli; Silvio C E Tosatto; Marco Carraro; Damiano Piovesan; Hafeez Ur Rehman; Qizhong Mao; Shanshan Zhang; Slobodan Vucetic; Gage S Black; Dane Jo; Erica Suh; Jonathan B Dayton; Dallas J Larsen; Ashton R Omdahl; Liam J McGuffin; Danielle A Brackenridge; Patricia C Babbitt; Jeffrey M Yunes; Paolo Fontana; Feng Zhang; Shanfeng Zhu; Ronghui You; Zihan Zhang; Suyang Dai; Shuwei Yao; Weidong Tian; Renzhi Cao; Caleb Chandler; Miguel Amezola; Devon Johnson; Jia-Ming Chang; Wen-Hung Liao; Yi-Wei Liu; Stefano Pascarelli; Yotam Frank; Robert Hoehndorf; Maxat Kulmanov; Imane Boudellioua; Gianfranco Politano; Stefano Di Carlo; Alfredo Benso; Kai Hakala; Filip Ginter; Farrokh Mehryary; Suwisa Kaewphan; Jari Björne; Hans Moen; Martti E E Tolvanen; Tapio Salakoski; Daisuke Kihara; Aashish Jain; Tomislav Šmuc; Adrian Altenhoff; Asa Ben-Hur; Burkhard Rost; Steven E Brenner; Christine A Orengo; Constance J Jeffery; Giovanni Bosco; Deborah A Hogan; Maria J Martin; Claire O'Donovan; Sean D Mooney; Casey S Greene; Predrag Radivojac; Iddo Friedberg
Journal:  Genome Biol       Date:  2019-11-19       Impact factor: 13.583

View more
  8 in total

Review 1.  I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction.

Authors:  Xiaogen Zhou; Wei Zheng; Yang Li; Robin Pearce; Chengxin Zhang; Eric W Bell; Guijun Zhang; Yang Zhang
Journal:  Nat Protoc       Date:  2022-08-05       Impact factor: 17.021

2.  DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.

Authors:  Maxat Kulmanov; Robert Hoehndorf
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

3.  Prediction of Gastric Cancer-Related Genes Based on the Graph Transformer Network.

Authors:  Yan Chen; Xuan Sun; Jiaxing Yang
Journal:  Front Oncol       Date:  2022-06-30       Impact factor: 5.738

4.  Identifying Functions of Proteins in Mice With Functional Embedding Features.

Authors:  Hao Li; ShiQi Zhang; Lei Chen; Xiaoyong Pan; ZhanDong Li; Tao Huang; Yu-Dong Cai
Journal:  Front Genet       Date:  2022-05-16       Impact factor: 4.772

5.  PANNZER-A practical tool for protein function prediction.

Authors:  Petri Törönen; Liisa Holm
Journal:  Protein Sci       Date:  2021-10-14       Impact factor: 6.725

6.  LectinOracle: A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction.

Authors:  Jon Lundstrøm; Emma Korhonen; Frédérique Lisacek; Daniel Bojar
Journal:  Adv Sci (Weinh)       Date:  2021-12-04       Impact factor: 16.806

7.  Evolution of Protein Functional Annotation: Text Mining Study.

Authors:  Ekaterina V Ilgisonis; Pavel V Pogodin; Olga I Kiseleva; Svetlana N Tarbeeva; Elena A Ponomarenko
Journal:  J Pers Med       Date:  2022-03-16

8.  Compound heterozygous variants in DYNC2H1 in a foetus with type III short rib-polydactyly syndrome and situs inversus totalis.

Authors:  Chen Cheng; Xiuxiu Li; Sheng Zhao; Qian Feng; Xiang Ren; Xinlin Chen
Journal:  BMC Med Genomics       Date:  2022-03-12       Impact factor: 3.063

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.