| Literature DB >> 30518113 |
Jian Tian1, Qingbin Li2,3, Xiaoyu Chu4, Ningfeng Wu5.
Abstract
In the natural host, most of the synonymous codons of a gene have been evolutionarily selected and related to protein expression and function. However, for the design of a new gene, most of the existing codon optimization tools select the high-frequency-usage codons and neglect the contribution of the low-frequency-usage codons (rare codons) to the expression of the target gene in the host. In this study, we developed the method Presyncodon, available in a web version, to predict the gene code from a protein sequence, using built-in evolutionary information on a specific expression host. The synonymous codon-usage pattern of a peptide was studied from three genomic datasets (Escherichia coli, Bacillus subtilis, and Saccharomyces cerevisiae). Machine-learning models were constructed to predict a selection of synonymous codons (low- or high-frequency-usage codon) in a gene. This method could be easily and efficiently used to design new genes from protein sequences for optimal expression in three expression hosts (E. coli, B. subtilis, and S. cerevisiae). Presyncodon is free to academic and noncommercial users; accessible at http://www.mobioinfor.cn/presyncodon_www/index.html.Entities:
Keywords: codon optimization; expression host; gene design; presyncodon; web server
Mesh:
Substances:
Year: 2018 PMID: 30518113 PMCID: PMC6321224 DOI: 10.3390/ijms19123872
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Flowchart summarizing the Presyncodon approach.
Figure 2The prediction performance of the 18 classifiers for the 18 amino acids, with different matched cut-offs and window sizes (left: Five amino acids; right: Seven amino acids) in E. coli, B. subtilis, and S. cerevisiae. The x-axis is the matched percent and the y-axis is the prediction accuracy of the 18 classifiers. Each open circle represents the prediction accuracy with one of the 18 classifiers. The horizontal divisions (from top to bottom) in each box are the upper whisker, 3rd quartile, median, 1st quartile, and lower whisker, respectively. The cross line in each box is the mean prediction accuracy of all 18 classifiers. All of the results were calculated based on a ten-fold cross validation.
Figure 3Screenshot of the web version of Presyncodon.