| Literature DB >> 24278168 |
Jian Tian1, Yuhong Zhang, Bo Liu, Dongyang Zuo, Tao Jiang, Jun Guo, Wei Zhang, Ningfeng Wu, Yunliu Fan.
Abstract
Pichia pastoris is commonly used for the production of recombinant proteins due to its preferential secretion of recombinant proteins, resulting in lower production costs and increased yields of target proteins. However, not all recombinant proteins can be successfully secreted in P. pastoris. A computational method that predicts the likelihood of a protein being secreted into the supernatant would be of considerable value; however, to the best of our knowledge, no such tool has yet been developed. We present a machine-learning approach called Presep to assess the likelihood of a recombinant protein being secreted by P. pastoris based on its pseudo amino acid composition (PseAA). Using a 20-fold cross validation, Presep demonstrated a high degree of accuracy, with Matthews correlation coefficient (MCC) and overall accuracy (Q2) scores of 0.78 and 95%, respectively. Computational results were validated experimentally, with six β-galactosidase genes expressed in P. pastoris strain GS115 to verify Presep model predictions. A strong correlation (R(2) = 0.967) was observed between Presep prediction secretion propensity and the experimental secretion percentage. Together, these results demonstrate the ability of the Presep model for predicting the secretion propensity of P. pastoris for a given protein. This model may serve as a valuable tool for determining the utility of P. pastoris as a host organism prior to initiating biological experiments. The Presep prediction tool can be freely downloaded at http://www.mobioinfor.cn/Presep.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24278168 PMCID: PMC3836778 DOI: 10.1371/journal.pone.0079749
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The effect of parameter settings on the prediction performance of Presep for Type I (A) and Type II (B) PseAAC modes.
The x-axis represents the weight factor (w), and the y-axis represents the lambda parameter (λ). Colour changes indicate differences in the Matthews correlation coefficient (MCC).
Prediction performance of Presep with different parameters.
| Coding Type |
| λ | MCC | Q2 | Sensitivity | Specificity |
| I | 0.05 | 19 | 0.78 | 0.95 | 0.82 | 0.97 |
| II | 0.05 | 20 | 0.78 | 0.95 | 0.78 | 0.98 |
| I | 0.05 | 20 | 0.78 | 0.95 | 0.82 | 0.97 |
| I | 0.35 | 20 | 0.78 | 0.95 | 0.82 | 0.97 |
| I | 0.5 | 16 | 0.78 | 0.95 | 0.82 | 0.97 |
| II | 0.45 | 2 | 0.78 | 0.95 | 0.83 | 0.97 |
| I | 0.25 | 19 | 0.77 | 0.95 | 0.81 | 0.97 |
| I | 0.45 | 20 | 0.77 | 0.95 | 0.81 | 0.97 |
| II | 0.15 | 4 | 0.77 | 0.95 | 0.82 | 0.97 |
| II | 0.2 | 3 | 0.77 | 0.95 | 0.82 | 0.97 |
Results are based on the Secreprot dataset with 20-fold cross validation.
MCC, Matthews correlation coefficient.
Overall prediction accuracy.
Figure 2The average prediction accuracy calculated cumulatively with RI above a given value.
This result was obtained using RF with a 20-fold cross validation.
Figure 3Prediction performance of Presep with different protein lengths.
The x axis represents the selected residues at N or C terminal that were used to predict the secretion propensity in the dataset. The y axis represents the prediction performance, which was evaluated by Matthews correlation coefficient (MCC). A Type I PseAAC coding scheme was used with a weight factor (w) of 0.05 and a lambda parameter (λ) of 19. Results are based on the Secreprot dataset with 20-fold cross validation.
Predicted propensity and the experimental results on the six β-galactosidase.
| Predicted propensity | Experiment (%) | |||
| Protein | No secretion | Secretion | Intracellular | Extracellular |
| LacB | 0.10 | 0.90 | 7.7±1.3 | 92.3±0.3 |
| CelB | 0.65 | 0.35 | 93.0±2.2 | 7.0±2.2 |
| BglKL | 0.50 | 0.50 | 69.8±3.3 | 30.2±3.3 |
| BglZQ | 0.82 | 0.18 | 99.6±0.1 | 0.4±0.1 |
| GalC168 | 0.77 | 0.23 | 94.2±3.8 | 5.8±3.8 |
| BG42–106 | 0.77 | 0.23 | 99.8±0.1 | 0.2±0.1 |
The coding scheme of six β-galactosidase proteins using the Type I PseAAC mode, with a weight factor (w) of 0.05 and a lambda parameter (λ) of 19.
Figure 4Correlation between the predicted secretion propensity, as determined by Presep, and the extracellular percentage (%) determined using the experimental method.