| Literature DB >> 23436767 |
Shuichi Hirose1, Tamotsu Noguchi.
Abstract
Recombinant protein technology is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining soluble proteins is still a major experimental obstacle, knowledge about protein expression/solubility under standard conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we present a computational approach to estimate the probability of protein expression and solubility for two different protein expression systems: in vivo Escherichia coli and wheat germ cell-free, from only the sequence information. It implements two kinds of methods: a sequence/predicted structural property-based method that uses both the sequence and predicted structural features, and a sequence pattern-based method that utilizes the occurrence frequencies of sequence patterns. In the benchmark test, the proposed methods obtained F-scores of around 70%, and outperformed publicly available servers. Applying the proposed methods to genomic data revealed that proteins associated with translation or transcription have a strong tendency to be expressed as soluble proteins by the in vivo E. coli expression system. The sequence pattern-based method also has the potential to indicate a candidate region for modification, to increase protein solubility. All methods are available for free at the ESPRESSO server (http://mbs.cbrc.jp/ESPRESSO).Entities:
Mesh:
Substances:
Year: 2013 PMID: 23436767 DOI: 10.1002/pmic.201200175
Source DB: PubMed Journal: Proteomics ISSN: 1615-9853 Impact factor: 3.984