Literature DB >> 20122193

Learning to predict expression efficacy of vectors in recombinant protein production.

Wen-Ching Chan1, Po-Huang Liang, Yan-Ping Shih, Ueng-Cheng Yang, Wen-chang Lin, Chun-Nan Hsu.   

Abstract

BACKGROUND: Recombinant protein production is a useful biotechnology to produce a large quantity of highly soluble proteins. Currently, the most widely used production system is to fuse a target protein into different vectors in Escherichia coli (E. coli). However, the production efficacy of different vectors varies for different target proteins. Trial-and-error is still the common practice to find out the efficacy of a vector for a given target protein. Previous studies are limited in that they assumed that proteins would be over-expressed and focused only on the solubility of expressed proteins. In fact, many pairings of vectors and proteins result in no expression.
RESULTS: In this study, we applied machine learning to train prediction models to predict whether a pairing of vector-protein will express or not express in E. coli. For expressed cases, the models further predict whether the expressed proteins would be soluble. We collected a set of real cases from the clients of our recombinant protein production core facility, where six different vectors were designed and studied. This set of cases is used in both training and evaluation of our models. We evaluate three different models based on the support vector machines (SVM) and their ensembles. Unlike many previous works, these models consider the sequence of the target protein as well as the sequence of the whole fusion vector as the features. We show that a model that classifies a case into one of the three classes (no expression, inclusion body and soluble) outperforms a model that considers the nested structure of the three classes, while a model that can take advantage of the hierarchical structure of the three classes performs slight worse but comparably to the best model. Meanwhile, compared to previous works, we show that the prediction accuracy of our best method still performs the best. Lastly, we briefly present two methods to use the trained model in the design of the recombinant protein production systems to improve the chance of high soluble protein production.
CONCLUSION: In this paper, we show that a machine learning approach to the prediction of the efficacy of a vector for a target protein in a recombinant protein production system is promising and may compliment traditional knowledge-driven study of the efficacy. We will release our program to share with other labs in the public domain when this paper is published.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20122193      PMCID: PMC3009492          DOI: 10.1186/1471-2105-11-S1-S21

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  28 in total

1.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors:  A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-01-19       Impact factor: 5.469

Review 2.  Structural proteomics: prospects for high throughput sample preparation.

Authors:  D Christendat; A Yee; A Dharamsi; Y Kluger; M Gerstein; C H Arrowsmith; A M Edwards
Journal:  Prog Biophys Mol Biol       Date:  2000       Impact factor: 3.667

3.  SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics.

Authors:  P Bertone; Y Kluger; N Lan; D Zheng; D Christendat; A Yee; A M Edwards; C H Arrowsmith; G T Montelione; M Gerstein
Journal:  Nucleic Acids Res       Date:  2001-07-01       Impact factor: 16.971

4.  Mining the structural genomics pipeline: identification of protein properties that affect high-throughput experimental analysis.

Authors:  Chern-Sing Goh; Ning Lan; Shawn M Douglas; Baolin Wu; Nathaniel Echols; Andrew Smith; Duncan Milburn; Gaetano T Montelione; Hongyu Zhao; Mark Gerstein
Journal:  J Mol Biol       Date:  2004-02-06       Impact factor: 5.469

5.  Predicting the solubility of recombinant proteins in Escherichia coli.

Authors:  D L Wilkinson; R G Harrison
Journal:  Biotechnology (N Y)       Date:  1991-05

6.  Recombinant protein solubility - does more mean better?

Authors:  Nuria González-Montalbán; Elena García-Fruitós; Antonio Villaverde
Journal:  Nat Biotechnol       Date:  2007-07       Impact factor: 54.908

7.  Genomic divergence of Escherichia coli strains: evidence for horizontal transfer and variation in mutation rates.

Authors:  Santiago F Elena; Thomas S Whittam; Cynthia L Winkworth; Margaret A Riley; Richard E Lenski
Journal:  Int Microbiol       Date:  2005-12       Impact factor: 2.479

8.  Single-step purification of polypeptides expressed in Escherichia coli as fusions with glutathione S-transferase.

Authors:  D B Smith; K S Johnson
Journal:  Gene       Date:  1988-07-15       Impact factor: 3.688

9.  Understanding the relationship between the primary structure of proteins and its propensity to be soluble on overexpression in Escherichia coli.

Authors:  Susan Idicula-Thomas; Petety V Balaji
Journal:  Protein Sci       Date:  2005-02-02       Impact factor: 6.725

Review 10.  Strategies for achieving high-level expression of genes in Escherichia coli.

Authors:  S C Makrides
Journal:  Microbiol Rev       Date:  1996-09
View more
  11 in total

1.  Correlation Between Protein Primary Structure and Soluble Expression Level of HSA dAb in Escherichia coli.

Authors:  Yankun Yang; Guoqiang Liu; Meng Liu; Zhonghu Bai; Xiuxia Liu; Xiaofeng Dai; Wenwen Guo
Journal:  Food Technol Biotechnol       Date:  2018-03       Impact factor: 3.918

2.  DeepSol: a deep learning framework for sequence-based protein solubility prediction.

Authors:  Sameer Khurana; Reda Rawi; Khalid Kunji; Gwo-Yu Chuang; Halima Bensmail; Raghvendra Mall
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

3.  A comprehensive in silico characterization of bacterial signal peptides for the excretory production of Anabaena variabilis phenylalanine ammonia lyase in Escherichia coli.

Authors:  Hajar Owji; Shiva Hemmati
Journal:  3 Biotech       Date:  2018-11-16       Impact factor: 2.406

4.  Enhancement of the solubility of recombinant proteins by fusion with a short-disordered peptide.

Authors:  Jun Ren; Suhee Hwang; Junhao Shen; Hyeongwoo Kim; Hyunjoo Kim; Jieun Kim; Soyoung Ahn; Min-Gyun Kim; Seung Ho Lee; Dokyun Na
Journal:  J Microbiol       Date:  2022-07-14       Impact factor: 2.902

5.  PERISCOPE-Opt: Machine learning-based prediction of optimal fermentation conditions and yields of recombinant periplasmic protein expressed in Escherichia coli.

Authors:  Kulandai Arockia Rajesh Packiam; Chien Wei Ooi; Fuyi Li; Shutao Mei; Beng Ti Tey; Huey Fang Ong; Jiangning Song; Ramakrishnan Nagasundara Ramanan
Journal:  Comput Struct Biotechnol J       Date:  2022-06-03       Impact factor: 6.155

6.  TISIGNER.com: web services for improving recombinant protein production.

Authors:  Bikash K Bhandari; Chun Shen Lim; Paul P Gardner
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

7.  Solubility-Weighted Index: fast and accurate prediction of protein solubility.

Authors:  Bikash K Bhandari; Paul P Gardner; Chun Shen Lim
Journal:  Bioinformatics       Date:  2020-09-15       Impact factor: 6.937

8.  Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition.

Authors:  Hui-Ling Huang; Phasit Charoenkwan; Te-Fen Kao; Hua-Chin Lee; Fang-Lin Chang; Wen-Lin Huang; Shinn-Jang Ho; Li-Sun Shu; Wen-Liang Chen; Shinn-Ying Ho
Journal:  BMC Bioinformatics       Date:  2012-12-13       Impact factor: 3.169

9.  A review of machine learning methods to predict the solubility of overexpressed recombinant proteins in Escherichia coli.

Authors:  Narjeskhatoon Habibi; Siti Z Mohd Hashim; Alireza Norouzi; Mohammed Razip Samian
Journal:  BMC Bioinformatics       Date:  2014-05-08       Impact factor: 3.169

10.  DSResSol: A Sequence-Based Solubility Predictor Created with Dilated Squeeze Excitation Residual Networks.

Authors:  Mohammad Madani; Kaixiang Lin; Anna Tarakanova
Journal:  Int J Mol Sci       Date:  2021-12-17       Impact factor: 5.923

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.