Literature DB >> 34882195

Accurate protein function prediction via graph attention networks with predicted structure information.

Boqiao Lai1, Jinbo Xu1.   

Abstract

Experimental protein function annotation does not scale with the fast-growing sequence databases. Only a tiny fraction (<0.1%) of protein sequences has experimentally determined functional annotations. Computational methods may predict protein function very quickly, but their accuracy is not very satisfactory. Based upon recent breakthroughs in protein structure prediction and protein language models, we develop GAT-GO, a graph attention network (GAT) method that may substantially improve protein function prediction by leveraging predicted structure information and protein sequence embedding. Our experimental results show that GAT-GO greatly outperforms the latest sequence- and structure-based deep learning methods. On the PDB-mmseqs testset where the train and test proteins share <15% sequence identity, our GAT-GO yields Fmax (maximum F-score) 0.508, 0.416, 0.501, and area under the precision-recall curve (AUPRC) 0.427, 0.253, 0.411 for the MFO, BPO, CCO ontology domains, respectively, much better than the homology-based method BLAST (Fmax 0.117, 0.121, 0.207 and AUPRC 0.120, 0.120, 0.163) that does not use any structure information. On the PDB-cdhit testset where the training and test proteins are more similar, although using predicted structure information, our GAT-GO obtains Fmax 0.637, 0.501, 0.542 for the MFO, BPO, CCO ontology domains, respectively, and AUPRC 0.662, 0.384, 0.481, significantly exceeding the just-published method DeepFRI that uses experimental structures, which has Fmax 0.542, 0.425, 0.424 and AUPRC only 0.313, 0.159, 0.193.
© The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Keywords:  Deep Learning; Gene Ontology; Graph Attention Networks; Machine Learning; Protein Function Prediction

Mesh:

Substances:

Year:  2022        PMID: 34882195      PMCID: PMC8898000          DOI: 10.1093/bib/bbab502

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  44 in total

1.  Distance-based protein folding powered by deep learning.

Authors:  Jinbo Xu
Journal:  Proc Natl Acad Sci U S A       Date:  2019-08-09       Impact factor: 11.205

2.  MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets.

Authors:  Martin Steinegger; Johannes Söding
Journal:  Nat Biotechnol       Date:  2017-10-16       Impact factor: 54.908

3.  FFPred 3: feature-based function prediction for all Gene Ontology domains.

Authors:  Domenico Cozzetto; Federico Minneci; Hannah Currant; David T Jones
Journal:  Sci Rep       Date:  2016-08-26       Impact factor: 4.379

4.  Unified rational protein engineering with sequence-based deep representation learning.

Authors:  Ethan C Alley; Grigory Khimulya; Surojit Biswas; Mohammed AlQuraishi; George M Church
Journal:  Nat Methods       Date:  2019-10-21       Impact factor: 28.547

5.  Predicting human protein function with multi-task deep neural networks.

Authors:  Rui Fa; Domenico Cozzetto; Cen Wan; David T Jones
Journal:  PLoS One       Date:  2018-06-11       Impact factor: 3.240

6.  Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.

Authors:  Alexander Rives; Joshua Meier; Tom Sercu; Siddharth Goyal; Zeming Lin; Jason Liu; Demi Guo; Myle Ott; C Lawrence Zitnick; Jerry Ma; Rob Fergus
Journal:  Proc Natl Acad Sci U S A       Date:  2021-04-13       Impact factor: 11.205

7.  HH-suite3 for fast remote homology detection and deep protein annotation.

Authors:  Martin Steinegger; Markus Meier; Milot Mirdita; Harald Vöhringer; Stephan J Haunsberger; Johannes Söding
Journal:  BMC Bioinformatics       Date:  2019-09-14       Impact factor: 3.169

8.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens.

Authors:  Naihui Zhou; Yuxiang Jiang; Timothy R Bergquist; Alexandra J Lee; Balint Z Kacsoh; Alex W Crocker; Kimberley A Lewis; George Georghiou; Huy N Nguyen; Md Nafiz Hamid; Larry Davis; Tunca Dogan; Volkan Atalay; Ahmet S Rifaioglu; Alperen Dalkıran; Rengul Cetin Atalay; Chengxin Zhang; Rebecca L Hurto; Peter L Freddolino; Yang Zhang; Prajwal Bhat; Fran Supek; José M Fernández; Branislava Gemovic; Vladimir R Perovic; Radoslav S Davidović; Neven Sumonja; Nevena Veljkovic; Ehsaneddin Asgari; Mohammad R K Mofrad; Giuseppe Profiti; Castrense Savojardo; Pier Luigi Martelli; Rita Casadio; Florian Boecker; Heiko Schoof; Indika Kahanda; Natalie Thurlby; Alice C McHardy; Alexandre Renaux; Rabie Saidi; Julian Gough; Alex A Freitas; Magdalena Antczak; Fabio Fabris; Mark N Wass; Jie Hou; Jianlin Cheng; Zheng Wang; Alfonso E Romero; Alberto Paccanaro; Haixuan Yang; Tatyana Goldberg; Chenguang Zhao; Liisa Holm; Petri Törönen; Alan J Medlar; Elaine Zosa; Itamar Borukhov; Ilya Novikov; Angela Wilkins; Olivier Lichtarge; Po-Han Chi; Wei-Cheng Tseng; Michal Linial; Peter W Rose; Christophe Dessimoz; Vedrana Vidulin; Saso Dzeroski; Ian Sillitoe; Sayoni Das; Jonathan Gill Lees; David T Jones; Cen Wan; Domenico Cozzetto; Rui Fa; Mateo Torres; Alex Warwick Vesztrocy; Jose Manuel Rodriguez; Michael L Tress; Marco Frasca; Marco Notaro; Giuliano Grossi; Alessandro Petrini; Matteo Re; Giorgio Valentini; Marco Mesiti; Daniel B Roche; Jonas Reeb; David W Ritchie; Sabeur Aridhi; Seyed Ziaeddin Alborzi; Marie-Dominique Devignes; Da Chen Emily Koo; Richard Bonneau; Vladimir Gligorijević; Meet Barot; Hai Fang; Stefano Toppo; Enrico Lavezzo; Marco Falda; Michele Berselli; Silvio C E Tosatto; Marco Carraro; Damiano Piovesan; Hafeez Ur Rehman; Qizhong Mao; Shanshan Zhang; Slobodan Vucetic; Gage S Black; Dane Jo; Erica Suh; Jonathan B Dayton; Dallas J Larsen; Ashton R Omdahl; Liam J McGuffin; Danielle A Brackenridge; Patricia C Babbitt; Jeffrey M Yunes; Paolo Fontana; Feng Zhang; Shanfeng Zhu; Ronghui You; Zihan Zhang; Suyang Dai; Shuwei Yao; Weidong Tian; Renzhi Cao; Caleb Chandler; Miguel Amezola; Devon Johnson; Jia-Ming Chang; Wen-Hung Liao; Yi-Wei Liu; Stefano Pascarelli; Yotam Frank; Robert Hoehndorf; Maxat Kulmanov; Imane Boudellioua; Gianfranco Politano; Stefano Di Carlo; Alfredo Benso; Kai Hakala; Filip Ginter; Farrokh Mehryary; Suwisa Kaewphan; Jari Björne; Hans Moen; Martti E E Tolvanen; Tapio Salakoski; Daisuke Kihara; Aashish Jain; Tomislav Šmuc; Adrian Altenhoff; Asa Ben-Hur; Burkhard Rost; Steven E Brenner; Christine A Orengo; Constance J Jeffery; Giovanni Bosco; Deborah A Hogan; Maria J Martin; Claire O'Donovan; Sean D Mooney; Casey S Greene; Predrag Radivojac; Iddo Friedberg
Journal:  Genome Biol       Date:  2019-11-19       Impact factor: 13.583

9.  Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function.

Authors:  Amelia Villegas-Morcillo; Stavros Makrodimitris; Roeland C H J van Ham; Angel M Gomez; Victoria Sanchez; Marcel J T Reinders
Journal:  Bioinformatics       Date:  2021-04-19       Impact factor: 6.937

10.  Improved protein structure prediction by deep learning irrespective of co-evolution information.

Authors:  Jinbo Xu; Matthew Mcpartlon; Jin Li
Journal:  Nat Mach Intell       Date:  2021-05-20
View more
  1 in total

Review 1.  Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms.

Authors:  Mohammed AlQuraishi; Peter K Sorger
Journal:  Nat Methods       Date:  2021-10-04       Impact factor: 28.547

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.