| Literature DB >> 23083100 |
Bevan Kai-Sheng Chung1, Dong-Yup Lee.
Abstract
BACKGROUND: The construction of customized nucleic acid sequences allows us to have greater flexibility in gene design for recombinant protein expression. Among the various parameters considered for such DNA sequence design, individual codon usage (ICU) has been implicated as one of the most crucial factors affecting mRNA translational efficiency. However, previous works have also reported the significant influence of codon pair usage, also known as codon context (CC), on the level of protein expression.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23083100 PMCID: PMC3495653 DOI: 10.1186/1752-0509-6-134
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Figure 1Multi-objective codon optimization solution. The optimal solutions generated by MOCO lies on the pareto front (region in yellow).
Figure 2General codon optimization workflow. In the step of codon optimization, either ICO, CCO or MOCO can be used to optimized the sequence.
ICU and CC biasness analysis
| Null hypothesis ( | ||||||||
| Alternative hypothesis ( | ||||||||
| No. of biased amino acids (P-value < 0.05) | 18 | 17 | 19 | 17 | 18 | 19 | 18 | 19 |
| No. of unbiased amino acids (P-value ≥ 0.05) | 1 | 2 | 0 | 2 | 1 | 0 | 1 | 0 |
| No. of singular amino acids | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| No. of unevaluated amino acids (Expect count < 5) | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total no. of amino acids | 21 | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
| No. of biased amino acid pairs (P-value < 0.05) | 314 | 99 | 327 | 15 | 354 | 259 | 372 | 282 |
| No. of unbiased amino acid pairs (P-value ≥ 0.05) | 26 | 23 | 12 | 65 | 38 | 36 | 19 | 9 |
| No. of singular amino acid pairs | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| No. of unevaluated amino acid pairs (Expect count < 5) | 76 | 294 | 77 | 336 | 24 | 121 | 25 | 125 |
| Total no. of amino acid pairs | 420 | 420 | 420 | 420 | 420 | 420 | 420 | 420 |
The chi-squared statistic is computed based on the observed occurrence of each codon (pair) and the expected occurrence under the null hypothesis of uniform distribution. Any amino acid (pair) with p-value < 0.05 is considered to exhibit significantly biased codon (pair) usage. Singular amino acids (methionine and tryptophan) and singular amino acid pairs (pairs only consisting of methionine and/or tryptophan) are not amenable to the biasness analysis since they are not encoded by more than one synonymous codon (pair). Chi-squared statistic and p-value are not calculated for amino acid (pair) with expected counts less than 5 (see Materials and Methods for details). Abbreviations: D, codon (pair) distribution of all genes in the genome; D, codon (pair) distribution of high-expression genes; U, uniform distribution.
Figure 3PCA of ICU and CC distributions. The first and second principal components (PC1 and PC2) are plotted to show the differences in the ICU and CC distributions of (top 5%) high-expression genes (H), (bottom 5%) low-expression genes (L) and all genes (A) found in the genomes of E. coli (EC), L. lactis (LL), P. pastoris (PP) and S. cerevisiae (SC). The unbiased distribution (U) is also included for each plot as reference.
Figure 4Codon optimization validation. The in silico cross-validation of the optimization procedures is performed according to the presented workflow.
Tournament matrix
| | | 7 | 19 | 95 |
| | | 2 | 18 | 99 |
| | 4 | 15 | 93 | |
| | | 5 | 22 | 99 |
| | 92 | | 82 | 97 |
| | 96 | | 93 | 100 |
| 96 | | 86 | 100 | |
| | 93 | | 89 | 99 |
| | 78 | 15 | | 97 |
| | 74 | 5 | | 100 |
| 83 | 12 | | 99 | |
| | 75 | 9 | | 99 |
| | 5 | 2 | 3 | |
| | 0 | 0 | 0 | |
| 6 | 0 | 1 | | |
| 1 | 0 | 0 |
For every gene, the p of the optimal sequences generated by respective optimization approaches are compared pair-wise for each expression host. The numbers of tournament wins/losses by each approach for all the genes in each expression host are added up. The sequences generated by ICO, CCO, MOCO and RCA are indicated as xICO, xCCO, xMOCO and xRCA respectively. In each cell, the numbers from top-most to bottom-most corresponds to the data for E. coli, L. lactis, P. pastoris and S. cerevisiae, respectively.