Literature DB >> 31883904

Discovering nuclear targeting signal sequence through protein language learning and multivariate analysis.

Yun Guo1, Yang Yang2, Yan Huang3, Hong-Bin Shen4.   

Abstract

Nuclear localization signals (NLSs) are peptides that target proteins to the nucleus by binding to carrier proteins in the cytoplasm that transport their cargo across the nuclear membrane. Accurate identification of NLSs can help elucidate the functions of nuclear protein complexes. The currently known NLS predictors are usually specific to certain species or largely dependent on prior knowledge of NLS basic residues. Thus, a more general predictor is highly desired to reduce the potentially high false positives or false negatives in discovering new NLSs. Here, we report a new method, INSP (Identification Nucleus Signal Peptide), to effectively identify NLS mainly based on statistical knowledge and machine learning algorithms. In our NLS machine learning model, we considered the query protein sequence as text and extracted the sequence context features using a natural language model. These word-vector features encode discriminative knowledge of NLS motif frequency and are thus useful for model recognition. The output of the machine learning model will be fused with statistical knowledge of the query sequence to build a final multivariate regression model for NLS peptide identification. The experimental results demonstrate a promising performance of the new INSP approach. INSP is freely available at: www.csbio.sjtu.edu.cn/bioinf/INSP/for academic use.
Copyright © 2020 Elsevier Inc. All rights reserved.

Keywords:  Machine learning; Natural language processing; Nuclear localization signal; Targeting signal prediction

Year:  2019        PMID: 31883904     DOI: 10.1016/j.ab.2019.113565

Source DB:  PubMed          Journal:  Anal Biochem        ISSN: 0003-2697            Impact factor:   3.365


  5 in total

1.  Transcriptome analysis of Leucojum aestivum and identification of genes involved in norbelladine biosynthesis.

Authors:  Laurence Tousignant; Aracely Maribel Diaz-Garza; Bharat Bhusan Majhi; Sarah-Eve Gélinas; Aparna Singh; Isabel Desgagne-Penix
Journal:  Planta       Date:  2022-01-03       Impact factor: 4.116

2.  Genome-Wide Identification of PLATZ Transcription Factors in Ginkgo biloba L. and Their Expression Characteristics During Seed Development.

Authors:  Xin Han; Hao Rong; Yating Tian; Yanshu Qu; Meng Xu; Li-An Xu
Journal:  Front Plant Sci       Date:  2022-06-23       Impact factor: 6.627

Review 3.  Karyopherin-mediated nucleocytoplasmic transport.

Authors:  Casey E Wing; Ho Yee Joyce Fung; Yuh Min Chook
Journal:  Nat Rev Mol Cell Biol       Date:  2022-01-20       Impact factor: 113.915

Review 4.  Tools for the Recognition of Sorting Signals and the Prediction of Subcellular Localization of Proteins From Their Amino Acid Sequences.

Authors:  Kenichiro Imai; Kenta Nakai
Journal:  Front Genet       Date:  2020-11-25       Impact factor: 4.599

Review 5.  Representation learning applications in biological sequence analysis.

Authors:  Hitoshi Iuchi; Taro Matsutani; Keisuke Yamada; Natsuki Iwano; Shunsuke Sumi; Shion Hosoda; Shitao Zhao; Tsukasa Fukunaga; Michiaki Hamada
Journal:  Comput Struct Biotechnol J       Date:  2021-05-23       Impact factor: 7.271

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.