Literature DB >> 33755048

TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.

Yue Cao1, Yang Shen1.   

Abstract

MOTIVATION: Facing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on protein data besides sequences, or lack generalizability to novel sequences, species and functions.
RESULTS: To overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model using only sequence information for proteins, named Transformer-based protein function Annotation through joint sequence-Label Embedding (TALE). For generalizability to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions (tail labels), we embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (1D sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low similarity, new species, or rarely annotated functions compared to training data, revealing deep insights into the protein sequence-function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability. AVAILABILITY: The data, source codes and models are available at https://github.com/Shen-Lab/TALE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 33755048      PMCID: PMC8479653          DOI: 10.1093/bioinformatics/btab198

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  DeepGOPlus: improved protein function prediction from sequence.

Authors:  Maxat Kulmanov; Robert Hoehndorf
Journal:  Bioinformatics       Date:  2020-01-15       Impact factor: 6.937

3.  DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

Authors:  Ronghui You; Xiaodi Huang; Shanfeng Zhu
Journal:  Methods       Date:  2018-06-06       Impact factor: 3.608

4.  Structural basis for molecular recognition between nuclear transport factor 2 (NTF2) and the GDP-bound form of the Ras-family GTPase Ran.

Authors:  M Stewart; H M Kent; A J McCoy
Journal:  J Mol Biol       Date:  1998-04-03       Impact factor: 5.469

5.  Automated methods of predicting the function of biological sequences using GO and BLAST.

Authors:  Craig E Jones; Ute Baumann; Alfred L Brown
Journal:  BMC Bioinformatics       Date:  2005-11-15       Impact factor: 3.169

6.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy.

Authors:  Yuxiang Jiang; Tal Ronnen Oron; Wyatt T Clark; Asma R Bankapur; Daniel D'Andrea; Rosalba Lepore; Christopher S Funk; Indika Kahanda; Karin M Verspoor; Asa Ben-Hur; Da Chen Emily Koo; Duncan Penfold-Brown; Dennis Shasha; Noah Youngs; Richard Bonneau; Alexandra Lin; Sayed M E Sahraeian; Pier Luigi Martelli; Giuseppe Profiti; Rita Casadio; Renzhi Cao; Zhaolong Zhong; Jianlin Cheng; Adrian Altenhoff; Nives Skunca; Christophe Dessimoz; Tunca Dogan; Kai Hakala; Suwisa Kaewphan; Farrokh Mehryary; Tapio Salakoski; Filip Ginter; Hai Fang; Ben Smithers; Matt Oates; Julian Gough; Petri Törönen; Patrik Koskinen; Liisa Holm; Ching-Tai Chen; Wen-Lian Hsu; Kevin Bryson; Domenico Cozzetto; Federico Minneci; David T Jones; Samuel Chapman; Dukka Bkc; Ishita K Khan; Daisuke Kihara; Dan Ofer; Nadav Rappoport; Amos Stern; Elena Cibrian-Uhalte; Paul Denny; Rebecca E Foulger; Reija Hieta; Duncan Legge; Ruth C Lovering; Michele Magrane; Anna N Melidoni; Prudence Mutowo-Meullenet; Klemens Pichler; Aleksandra Shypitsyna; Biao Li; Pooya Zakeri; Sarah ElShal; Léon-Charles Tranchevent; Sayoni Das; Natalie L Dawson; David Lee; Jonathan G Lees; Ian Sillitoe; Prajwal Bhat; Tamás Nepusz; Alfonso E Romero; Rajkumar Sasidharan; Haixuan Yang; Alberto Paccanaro; Jesse Gillis; Adriana E Sedeño-Cortés; Paul Pavlidis; Shou Feng; Juan M Cejuela; Tatyana Goldberg; Tobias Hamp; Lothar Richter; Asaf Salamov; Toni Gabaldon; Marina Marcet-Houben; Fran Supek; Qingtian Gong; Wei Ning; Yuanpeng Zhou; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Stefano Toppo; Carlo Ferrari; Manuel Giollo; Damiano Piovesan; Silvio C E Tosatto; Angela Del Pozo; José M Fernández; Paolo Maietta; Alfonso Valencia; Michael L Tress; Alfredo Benso; Stefano Di Carlo; Gianfranco Politano; Alessandro Savino; Hafeez Ur Rehman; Matteo Re; Marco Mesiti; Giorgio Valentini; Joachim W Bargsten; Aalt D J van Dijk; Branislava Gemovic; Sanja Glisic; Vladmir Perovic; Veljko Veljkovic; Nevena Veljkovic; Danillo C Almeida-E-Silva; Ricardo Z N Vencio; Malvika Sharan; Jörg Vogel; Lakesh Kansakar; Shanshan Zhang; Slobodan Vucetic; Zheng Wang; Michael J E Sternberg; Mark N Wass; Rachael P Huntley; Maria J Martin; Claire O'Donovan; Peter N Robinson; Yves Moreau; Anna Tramontano; Patricia C Babbitt; Steven E Brenner; Michal Linial; Christine A Orengo; Burkhard Rost; Casey S Greene; Sean D Mooney; Iddo Friedberg; Predrag Radivojac
Journal:  Genome Biol       Date:  2016-09-07       Impact factor: 13.583

7.  Predicting human protein function with multi-task deep neural networks.

Authors:  Rui Fa; Domenico Cozzetto; Cen Wan; David T Jones
Journal:  PLoS One       Date:  2018-06-11       Impact factor: 3.240

8.  Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

Authors:  Daniel Wrapp; Nianshuang Wang; Kizzmekia S Corbett; Jory A Goldsmith; Ching-Lin Hsieh; Olubukola Abiona; Barney S Graham; Jason S McLellan
Journal:  Science       Date:  2020-02-19       Impact factor: 47.728

9.  Information-theoretic evaluation of predicted ontological annotations.

Authors:  Wyatt T Clark; Predrag Radivojac
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

10.  deepNF: deep network fusion for protein function prediction.

Authors:  Vladimir Gligorijevic; Meet Barot; Richard Bonneau
Journal:  Bioinformatics       Date:  2018-11-15       Impact factor: 6.937

View more
  5 in total

1.  Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design.

Authors:  Yue Cao; Payel Das; Vijil Chenthamarakshan; Pin-Yu Chen; Igor Melnyk; Yang Shen
Journal:  Proc Mach Learn Res       Date:  2021-07

2.  DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms.

Authors:  Maxat Kulmanov; Robert Hoehndorf
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

3.  LectinOracle: A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction.

Authors:  Jon Lundstrøm; Emma Korhonen; Frédérique Lisacek; Daniel Bojar
Journal:  Adv Sci (Weinh)       Date:  2021-12-04       Impact factor: 16.806

4.  Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

Authors:  Gabriela A Merino; Rabie Saidi; Diego H Milone; Georgina Stegmayer; Maria J Martin
Journal:  Bioinformatics       Date:  2022-08-05       Impact factor: 6.931

5.  TLCrys: Transfer Learning Based Method for Protein Crystallization Prediction.

Authors:  Chen Jin; Zhuangwei Shi; Chuanze Kang; Ken Lin; Han Zhang
Journal:  Int J Mol Sci       Date:  2022-01-16       Impact factor: 5.923

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.