Literature DB >> 34145885

EPSOL: sequence-based protein solubility prediction using multidimensional embedding.

Xiang Wu1, Liang Yu1.   

Abstract

MOTIVATION: The heterologous expression of recombinant protein requires host cells, such as Escherichia coli, and the solubility of protein greatly affects the protein yield. A novel and highly accurate solubility predictor that concurrently improves the production yield and minimizes production cost, and that forecasts protein solubility in an E. coli expression system before the actual experimental work is highly sought.
RESULTS: In this paper, EPSOL, a novel deep learning architecture for the prediction of protein solubility in an E. coli expression system, which automatically obtains comprehensive protein feature representations using multidimensional embedding, is presented. EPSOL outperformed all existing sequence-based solubility predictors and achieved 0.79 in accuracy and 0.58 in Matthew's correlation coefficient. The higher performance of EPSOL permits large-scale screening for sequence variants with enhanced manufacturability and predicts the solubility of new recombinant proteins in an E. coli expression system with greater reliability.
AVAILABILITY AND IMPLEMENTATION: EPSOL's best model and results can be downloaded from GitHub (https://github.com/LiangYu-Xidian/EPSOL). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) (2021). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Year:  2021        PMID: 34145885     DOI: 10.1093/bioinformatics/btab463

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

1.  Identification of Diagnostic Markers for Breast Cancer Based on Differential Gene Expression and Pathway Network.

Authors:  Shumei Zhang; Haoran Jiang; Bo Gao; Wen Yang; Guohua Wang
Journal:  Front Cell Dev Biol       Date:  2022-01-12

2.  Pseudo-188D: Phage Protein Prediction Based on a Model of Pseudo-188D.

Authors:  Xiaomei Gu; Lina Guo; Bo Liao; Qinghua Jiang
Journal:  Front Genet       Date:  2021-12-01       Impact factor: 4.599

Review 3.  Application of Sparse Representation in Bioinformatics.

Authors:  Shuguang Han; Ning Wang; Yuxin Guo; Furong Tang; Lei Xu; Ying Ju; Lei Shi
Journal:  Front Genet       Date:  2021-12-15       Impact factor: 4.599

Review 4.  DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins.

Authors:  Yuxin Gong; Bo Liao; Peng Wang; Quan Zou
Journal:  Front Pharmacol       Date:  2021-11-30       Impact factor: 5.810

Review 5.  Genomic Variation Prediction: A Summary From Different Views.

Authors:  Xiuchun Lin
Journal:  Front Cell Dev Biol       Date:  2021-11-25

6.  iAIPs: Identifying Anti-Inflammatory Peptides Using Random Forest.

Authors:  Dongxu Zhao; Zhixia Teng; Yanjuan Li; Dong Chen
Journal:  Front Genet       Date:  2021-11-30       Impact factor: 4.599

7.  KK-DBP: A Multi-Feature Fusion Method for DNA-Binding Protein Identification Based on Random Forest.

Authors:  Yuran Jia; Shan Huang; Tianjiao Zhang
Journal:  Front Genet       Date:  2021-11-29       Impact factor: 4.599

8.  A SNARE Protein Identification Method Based on iLearnPlus to Efficiently Solve the Data Imbalance Problem.

Authors:  Dong Ma; Zhihua Chen; Zhanpeng He; Xueqin Huang
Journal:  Front Genet       Date:  2022-01-28       Impact factor: 4.599

9.  The Characterization of Structure and Prediction for Aquaporin in Tumour Progression by Machine Learning.

Authors:  Zheng Chen; Shihu Jiao; Da Zhao; Quan Zou; Lei Xu; Lijun Zhang; Xi Su
Journal:  Front Cell Dev Biol       Date:  2022-02-01

10.  Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features.

Authors:  Mujiexin Liu; Hui Chen; Dong Gao; Cai-Yi Ma; Zhao-Yue Zhang
Journal:  Comput Math Methods Med       Date:  2022-01-12       Impact factor: 2.238

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.