Literature DB >> 35649392

RBP-TSTL is a two-stage transfer learning framework for genome-scale prediction of RNA-binding proteins.

Xinxin Peng1,2, Xiaoyu Wang1,2, Yuming Guo3, Zongyuan Ge4, Fuyi Li1,5,6, Xin Gao7,8, Jiangning Song1,2.   

Abstract

RNA binding proteins (RBPs) are critical for the post-transcriptional control of RNAs and play vital roles in a myriad of biological processes, such as RNA localization and gene regulation. Therefore, computational methods that are capable of accurately identifying RBPs are highly desirable and have important implications for biomedical and biotechnological applications. Here, we propose a two-stage deep transfer learning-based framework, termed RBP-TSTL, for accurate prediction of RBPs. In the first stage, the knowledge from the self-supervised pre-trained model was extracted as feature embeddings and used to represent the protein sequences, while in the second stage, a customized deep learning model was initialized based on an annotated pre-training RBPs dataset before being fine-tuned on each corresponding target species dataset. This two-stage transfer learning framework can enable the RBP-TSTL model to be effectively trained to learn and improve the prediction performance. Extensive performance benchmarking of the RBP-TSTL models trained using the features generated by the self-supervised pre-trained model and other models trained using hand-crafting encoding features demonstrated the effectiveness of the proposed two-stage knowledge transfer strategy based on the self-supervised pre-trained models. Using the best-performing RBP-TSTL models, we further conducted genome-scale RBP predictions for Homo sapiens, Arabidopsis thaliana, Escherichia coli, and Salmonella and established a computational compendium containing all the predicted putative RBPs candidates. We anticipate that the proposed RBP-TSTL approach will be explored as a useful tool for the characterization of RNA-binding proteins and exploration of their sequence-structure-function relationships.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Keywords:  RNA binding proteins; deep learning; knowledge transfer learning; pre-trained language model; sequence analysis

Mesh:

Substances:

Year:  2022        PMID: 35649392      PMCID: PMC9294422          DOI: 10.1093/bib/bbac215

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   13.994


  71 in total

1.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003.

Authors:  Brigitte Boeckmann; Amos Bairoch; Rolf Apweiler; Marie-Claude Blatter; Anne Estreicher; Elisabeth Gasteiger; Maria J Martin; Karine Michoud; Claire O'Donovan; Isabelle Phan; Sandrine Pilbout; Michel Schneider
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

2.  Highly accurate and high-resolution function prediction of RNA binding proteins by fold recognition and binding affinity prediction.

Authors:  Huiying Zhao; Yuedong Yang; Yaoqi Zhou
Journal:  RNA Biol       Date:  2011-11-01       Impact factor: 4.652

3.  Prediction of RNA binding sites in a protein using SVM and PSSM profile.

Authors:  Manish Kumar; M Michael Gromiha; G P S Raghava
Journal:  Proteins       Date:  2008-04

4.  Consistent gene signature of schizophrenia identified by a novel feature selection strategy from comprehensive sets of transcriptomic data.

Authors:  Qingxia Yang; Bo Li; Jing Tang; Xuejiao Cui; Yunxia Wang; Xiaofeng Li; Jie Hu; Yuzong Chen; Weiwei Xue; Yan Lou; Yunqing Qiu; Feng Zhu
Journal:  Brief Bioinform       Date:  2020-05-21       Impact factor: 11.622

5.  DeepCleave: a deep learning predictor for caspase and matrix metalloprotease substrates and cleavage sites.

Authors:  Fuyi Li; Jinxiang Chen; André Leier; Tatiana Marquez-Lago; Quanzhong Liu; Yanze Wang; Jerico Revote; A Ian Smith; Tatsuya Akutsu; Geoffrey I Webb; Lukasz Kurgan; Jiangning Song
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

6.  The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation.

Authors:  Davide Chicco; Niklas Tötsch; Giuseppe Jurman
Journal:  BioData Min       Date:  2021-02-04       Impact factor: 2.522

7.  SPOT-Seq-RNA: predicting protein-RNA complex structure and RNA-binding function by fold recognition and binding affinity prediction.

Authors:  Yuedong Yang; Huiying Zhao; Jihua Wang; Yaoqi Zhou
Journal:  Methods Mol Biol       Date:  2014

Review 8.  Optimization of metabolomic data processing using NOREVA.

Authors:  Jianbo Fu; Ying Zhang; Yunxia Wang; Hongning Zhang; Jin Liu; Jing Tang; Qingxia Yang; Huaicheng Sun; Wenqi Qiu; Yinghui Ma; Zhaorong Li; Mingyue Zheng; Feng Zhu
Journal:  Nat Protoc       Date:  2021-12-24       Impact factor: 13.491

9.  BindUP: a web server for non-homology-based prediction of DNA and RNA binding proteins.

Authors:  Inbal Paz; Efrat Kligun; Barak Bengad; Yael Mandel-Gutfreund
Journal:  Nucleic Acids Res       Date:  2016-05-19       Impact factor: 16.971

10.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation.

Authors:  Davide Chicco; Giuseppe Jurman
Journal:  BMC Genomics       Date:  2020-01-02       Impact factor: 3.969

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.