Literature DB >> 29522145

GOLabeler: improving sequence-based large-scale protein function prediction by learning to rank.

Ronghui You1,2, Zihan Zhang1,2, Yi Xiong3, Fengzhu Sun2,4, Hiroshi Mamitsuka5,6, Shanfeng Zhu1,2.   

Abstract

Motivation: Gene Ontology (GO) has been widely used to annotate functions of proteins and understand their biological roles. Currently only <1% of >70 million proteins in UniProtKB have experimental GO annotations, implying the strong necessity of automated function prediction (AFP) of proteins, where AFP is a hard multilabel classification problem due to one protein with a diverse number of GO terms. Most of these proteins have only sequences as input information, indicating the importance of sequence-based AFP (SAFP: sequences are the only input). Furthermore, homology-based SAFP tools are competitive in AFP competitions, while they do not necessarily work well for so-called difficult proteins, which have <60% sequence identity to proteins with annotations already. Thus, the vital and challenging problem now is how to develop a method for SAFP, particularly for difficult proteins.
Methods: The key of this method is to extract not only homology information but also diverse, deep-rooted information/evidence from sequence inputs and integrate them into a predictor in a both effective and efficient manner. We propose GOLabeler, which integrates five component classifiers, trained from different features, including GO term frequency, sequence alignment, amino acid trigram, domains and motifs, and biophysical properties, etc., in the framework of learning to rank (LTR), a paradigm of machine learning, especially powerful for multilabel classification.
Results: The empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP methods. Availability and implementation: http://datamining-iip.fudan.edu.cn/golabeler. Supplementary information: Supplementary data are available at Bioinformatics online.

Mesh:

Substances:

Year:  2018        PMID: 29522145     DOI: 10.1093/bioinformatics/bty130

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  29 in total

1.  NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.

Authors:  Shuwei Yao; Ronghui You; Shaojun Wang; Yi Xiong; Xiaodi Huang; Shanfeng Zhu
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

2.  DeepGOWeb: fast and accurate protein function prediction on the (Semantic) Web.

Authors:  Maxat Kulmanov; Fernando Zhapa-Camacho; Robert Hoehndorf
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

3.  NetGO: improving large-scale protein function prediction with massive network information.

Authors:  Ronghui You; Shuwei Yao; Yi Xiong; Xiaodi Huang; Fengzhu Sun; Hiroshi Mamitsuka; Shanfeng Zhu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

4.  Data-driven network alignment.

Authors:  Shawn Gu; Tijana Milenković
Journal:  PLoS One       Date:  2020-07-02       Impact factor: 3.240

5.  Accurate protein function prediction via graph attention networks with predicted structure information.

Authors:  Boqiao Lai; Jinbo Xu
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

6.  Semantic similarity and machine learning with ontologies.

Authors:  Maxat Kulmanov; Fatima Zohra Smaili; Xin Gao; Robert Hoehndorf
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

7.  Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning.

Authors:  Jiajun Hong; Yongchao Luo; Yang Zhang; Junbiao Ying; Weiwei Xue; Tian Xie; Lin Tao; Feng Zhu
Journal:  Brief Bioinform       Date:  2020-07-15       Impact factor: 11.622

8.  TALE: Transformer-based protein function Annotation with joint sequence-Label Embedding.

Authors:  Yue Cao; Yang Shen
Journal:  Bioinformatics       Date:  2021-03-23       Impact factor: 6.937

9.  PANDA2: protein function prediction using graph neural networks.

Authors:  Chenguang Zhao; Tong Liu; Zheng Wang
Journal:  NAR Genom Bioinform       Date:  2022-02-02

10.  PANNZER-A practical tool for protein function prediction.

Authors:  Petri Törönen; Liisa Holm
Journal:  Protein Sci       Date:  2021-10-14       Impact factor: 6.725

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.