Literature DB >> 32702119

IDP-Seq2Seq: identification of intrinsically disordered regions based on sequence to sequence learning.

Yi-Jun Tang1, Yi-He Pang1, Bin Liu1,2.   

Abstract

MOTIVATION: Related to many important biological functions, intrinsically disordered regions (IDRs) are widely distributed in proteins. Accurate prediction of IDRs is critical for the protein structure and function analysis. However, the existing computational methods construct the predictive models solely in the sequence space, failing to convert the sequence space into the 'semantic space' to reflect the structure characteristics of proteins. Furthermore, although the length-dependent predictors showed promising results, new fusion strategies should be explored to improve their predictive performance and the generalization.
RESULTS: In this study, we applied the Sequence to Sequence Learning (Seq2Seq) derived from natural language processing (NLP) to map protein sequences to 'semantic space' to reflect the structure patterns with the help of predicted residue-residue contacts (CCMs) and other sequence-based features. Furthermore, the Attention mechanism was used to capture the global associations between all residue pairs in the proteins. Three length-dependent predictors were constructed: IDP-Seq2Seq-L for long disordered region prediction, IDP-Seq2Seq-S for short disordered region prediction and IDP-Seq2Seq-G for both long and short disordered region predictions. Finally, these three predictors were fused into one predictor called IDP-Seq2Seq to improve the discriminative power and generalization. Experimental results on four independent test datasets and the CASP test dataset showed that IDP-Seq2Seq is insensitive with the ratios of long and short disordered regions and outperforms other competing methods.
AVAILABILITY AND IMPLEMENTATION: For the convenience of most experimental scientists, a user-friendly and publicly accessible web-server for the powerful new predictor has been established at http://bliulab.net/IDP-Seq2Seq/. It is anticipated that IDP-Seq2Seq will become a very useful tool for identification of IDRs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 32702119     DOI: 10.1093/bioinformatics/btaa667

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  23 in total

Review 1.  Protein Function Analysis through Machine Learning.

Authors:  Chris Avery; John Patterson; Tyler Grear; Theodore Frater; Donald J Jacobs
Journal:  Biomolecules       Date:  2022-09-06

2.  Identification and Classification of Enhancers Using Dimension Reduction Technique and Recurrent Neural Network.

Authors:  Qingwen Li; Lei Xu; Qingyuan Li; Lichao Zhang
Journal:  Comput Math Methods Med       Date:  2020-10-18       Impact factor: 2.238

3.  A Method for Identifying Vesicle Transport Proteins Based on LibSVM and MRMD.

Authors:  Zhiyu Tao; Yanjuan Li; Zhixia Teng; Yuming Zhao
Journal:  Comput Math Methods Med       Date:  2020-10-19       Impact factor: 2.238

4.  Identification of Methicillin-Resistant Staphylococcus Aureus From Methicillin-Sensitive Staphylococcus Aureus and Molecular Characterization in Quanzhou, China.

Authors:  Zhimin Bai; Min Chen; Qiaofa Lin; Ying Ye; Hongmei Fan; Kaizhen Wen; Jianxing Zeng; Donghong Huang; Wenfei Mo; Ying Lei; Zhijun Liao
Journal:  Front Cell Dev Biol       Date:  2021-01-21

5.  Prediction of lncRNA-Protein Interactions via the Multiple Information Integration.

Authors:  Yifan Chen; Xiangzheng Fu; Zejun Li; Li Peng; Linlin Zhuo
Journal:  Front Bioeng Biotechnol       Date:  2021-02-25

6.  Accurate identification of RNA D modification using multiple features.

Authors:  Lijun Dou; Wenyang Zhou; Lichao Zhang; Lei Xu; Ke Han
Journal:  RNA Biol       Date:  2021-03-17       Impact factor: 4.652

7.  i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning.

Authors:  Yanjuan Li; Zhengnan Zhao; Zhixia Teng
Journal:  Biomed Res Int       Date:  2021-05-29       Impact factor: 3.411

Review 8.  Representation learning applications in biological sequence analysis.

Authors:  Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal:  Comput Struct Biotechnol J       Date:  2021-05-23       Impact factor: 7.271

9.  Identifying Antioxidant Proteins by Using Amino Acid Composition and Protein-Protein Interactions.

Authors:  Yixiao Zhai; Yu Chen; Zhixia Teng; Yuming Zhao
Journal:  Front Cell Dev Biol       Date:  2020-10-29

Review 10.  Recent Advances in Predicting Protein S-Nitrosylation Sites.

Authors:  Qian Zhao; Jiaqi Ma; Fang Xie; Yu Wang; Yu Zhang; Hui Li; Yuan Sun; Liqi Wang; Mian Guo; Ke Han
Journal:  Biomed Res Int       Date:  2021-02-09       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.