Literature DB >> 29883746

DeepText2GO: Improving large-scale protein function prediction with deep semantic text representation.

Ronghui You1, Xiaodi Huang2, Shanfeng Zhu3.   

Abstract

As of April 2018, UniProtKB has collected more than 115 million protein sequences. Less than 0.15% of these proteins, however, have been associated with experimental GO annotations. As such, the use of automatic protein function prediction (AFP) to reduce this huge gap becomes increasingly important. The previous studies conclude that sequence homology based methods are highly effective in AFP. In addition, mining motif, domain, and functional information from protein sequences has been found very helpful for AFP. Other than sequences, alternative information sources such as text, however, may be useful for AFP as well. Instead of using BOW (bag of words) representation in traditional text-based AFP, we propose a new method called DeepText2GO that relies on deep semantic text representation, together with different kinds of available protein information such as sequence homology, families, domains, and motifs, to improve large-scale AFP. Furthermore, DeepText2GO integrates text-based methods with sequence-based ones by means of a consensus approach. Extensive experiments on the benchmark dataset extracted from UniProt/SwissProt have demonstrated that DeepText2GO significantly outperformed both text-based and sequence-based methods, validating its superiority.
Copyright © 2018 Elsevier Inc. All rights reserved.

Keywords:  Large-scale protein function prediction; Text classification

Mesh:

Substances:

Year:  2018        PMID: 29883746     DOI: 10.1016/j.ymeth.2018.05.026

Source DB:  PubMed          Journal:  Methods        ISSN: 1046-2023            Impact factor:   3.608


  9 in total

1.  NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Authors:  Shuwei Yao; Ronghui You; Shaojun Wang; Yi Xiong; Xiaodi Huang; Shanfeng Zhu
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

2.  Accurate protein function prediction via graph attention networks with predicted structure information.

Authors:  Boqiao Lai; Jinbo Xu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

3.  TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.

Authors:  Yue Cao; Yang Shen
Journal:  Bioinformatics       Date:  2021-03-23       Impact factor: 6.937

4.  PANDA2: protein function prediction using graph neural networks.

Authors:  Chenguang Zhao; Tong Liu; Zheng Wang
Journal:  NAR Genom Bioinform       Date:  2022-02-02

5.  UDSMProt: universal deep sequence models for protein classification.

Authors:  Nils Strodthoff; Patrick Wagner; Markus Wenzel; Wojciech Samek
Journal:  Bioinformatics       Date:  2020-04-15       Impact factor: 6.937

6.  A thorough analysis of the contribution of experimental, derived and sequence-based predicted protein-protein interactions for functional annotation of proteins.

Authors:  Stavros Makrodimitris; Marcel Reinders; Roeland van Ham
Journal:  PLoS One       Date:  2020-11-25       Impact factor: 3.240

Review 7.  A roadmap for metagenomic enzyme discovery.

Authors:  Serina L Robinson; Jörn Piel; Shinichi Sunagawa
Journal:  Nat Prod Rep       Date:  2021-11-17       Impact factor: 13.423

8.  Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.

Authors:  Gabriela A Merino; Rabie Saidi; Diego H Milone; Georgina Stegmayer; Maria J Martin
Journal:  Bioinformatics       Date:  2022-08-05       Impact factor: 6.931

Review 9.  Automatic Gene Function Prediction in the 2020's.

Authors:  Stavros Makrodimitris; Roeland C H J van Ham; Marcel J T Reinders
Journal:  Genes (Basel)       Date:  2020-10-27       Impact factor: 4.096

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.