| Literature DB >> 28855614 |
Jian Tian1, Yaru Yan1,2, Qingxia Yue1,3, Xiaoqing Liu1, Xiaoyu Chu1, Ningfeng Wu4, Yunliu Fan1.
Abstract
Of the 20 common amino acids, 18 are encoded by multiple synonymous codons. These synonymous codons are not redundant; in fact, all of codons contribute substantially to protein expression, structure and function. In this study, the codon usage pattern of genes in the E. coli was learned from the sequenced genomes of E. coli. A machine learning based method, Presyncodon was proposed to predict synonymous codon selection in E. coli based on the learned codon usage patterns of the residue in the context of the specific fragment. The predicting results indicate that Presycoden could be used to predict synonymous codon selection of the gene in the E. coli with the high accuracy. Two reporter genes (egfp and mApple) were designed with a combination of low- and high-frequency-usage codons by the method. The fluorescence intensity of eGFP and mApple expressed by the (egfp and mApple) designed by this method was about 2.3- or 1.7- folds greater than that from the genes with only high-frequency-usage codons in E. coli. Therefore, both low- and high-frequency-usage codons make positive contributions to the functional expression of the heterologous proteins. This method could be used to design synthetic genes for heterologous gene expression in biotechnology.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28855614 PMCID: PMC5577221 DOI: 10.1038/s41598-017-10546-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Clustering results of the codon usage pattern of different species. The row and column represent the codon usage pattern and the different bacterial subspecies. The species between species id 5 and 6 (red arrow) is Bacteroides fragilis NCTC 9343. The numbers from 1 to 64 refer to the bacterial genera. 1 Helicobacter pylori, 2 Acetobacter pasteurianus, 3 Bacillus amyloliquefaciens, 4 Bacillus subtilis, 5 Zymomonas mobilis, 6 Alteromonas macleodii, 7 Lactobacillus plantarum, 8 Lactobacillus casei, 9 Lactobacillus rhamnosus, 10 Coxiella burnetii, 11 Mannheimia haemolytica, 12 Shewanella baltica, 13 Vibrio cholerae, 14 Yersinia pestis, 15 Acinetobacter baumannii, 16 Haemophilus influenzae, 17 Enterococcus faecalis, 18 Listeria monocytogenes, 19 Bacillus anthracis, Bacillus cereus or Bacillus thuringiensis, 20 Staphylococcus aureus, 21 Lactococcus lactis, 22 Streptococcus agalactiae, 23 Legionella pneumophila, 24 Mycoplasma hyopneumoniae, 25 Chlamydia trachomatis, 26 Chlamydophila pneumoniae, 27 Chlamydophila psittaci, 28 Lactobacillus reuteri, 29 Streptococcus dysgalactiae, 30 Streptococcus pyogenes, 31 Streptococcus pneumoniae, 32 Streptococcus suis, 33 Borrelia burgdorferi, 34 Prochlorococcus marinus, 35 Clostridium botulinum, 36 Candidatus Kinetoplastibacterium, 37 Francisella tularensis, 38 Campylobacter jejuni, 39 Rickettsia prowazekii, 40 Rickettsia rickettsii, 41 Wolbachia endosymbiont, 42 Mycoplasma gallisepticum, 43 Mycoplasma hyorhinis, 44 Brucella melitensis, 45 Corynebacterium glutamicum, 46 Propionibacterium acnes, 47 Corynebacterium diphtheria, 48 Corynebacterium pseudotuberculosis, 49 Xylella fastidiosa, 50 Treponema pallidum, 51 Enterobacter cloacae, 52 Klebsiella pneumoniae, 53 Escherichia coli, 54 Salmonella enterica, 55 Neisseria meningitidis, 56 Burkholderia pseudomallei, 57 Bifidobacterium animalis, 58 Bifidobacterium longum, 59 Mycobacterium bovis, Mycobacterium canettii or Mycobacterium tuberculosis, 60 Rhodopseudomonas palustris, 61 Pseudomonas fluorescens or Pseudomonas aeruginosa, 62 Ralstonia solanacearum, 63 Pseudomonas putida, 64 Pseudomonas stutzeri.
Figure 2The entropy of the codon usage pattern of the middle amino acid with the different amino acid neighbors in E. coli. The x-axis represents the different number of the adjacent amino acids. The y-axis represents the average entropy of all codon usage pattern of the middle amino acid with corresponding adjacent amino acids. The data were calculated by the 65 genomes of E. coli (Table S1).
Figure 3The codon usage patters of Leucine (L), Arginine (R) and Serine (S) in the specific fragment of E. coli. All genes of E. coli were divided into five-codon windows. The same amino acid fragments were merged and the codon usage bias of the middle amino acid (L, R and S) in the fragment was calculated. Each row represents the codon usage bias of the middle amino acids (L, R and S) in an amino acid fragment with five residues. Each column represents the codons to code the target amino acid. The color from blue to red represents the codon usage frequency of the codon.
Figure 4The prediction performance of the 18 classifiers for the 18 amino acids with different matched cutoff and window size (A) five amino acids, (B) seven amino acids) in E. coli. The x-axis is the matched percent and the y-axis is the prediction accuracy of the 18 classifiers. Each open circle represents the prediction accuracy with one of the 18 classifiers. The horizontal divisions (from top to bottom) in each box are the upper whisker, 3rd quartile, median, 1st quartile and lower whisker, respectively. The cross line in each box is the mean prediction accuracy of all 18 classifiers. All of the results were calculated based on a ten-fold cross validation.
Figure 5Fluorescence intensity of E. coli containing the reporter genes (egfp or mApple). The reporter genes (egfp-codon and mApple-codon) were designed based on the model in this study. The genes (egfp-genscript and mApple-genscript) were designed, in which most of the low-frequency-usage codons were changed to the high-frequency-usage codons of E. coli using GenScript software. The strain harboring the corresponding expression plasmid was grown in the auto-induction medium containing 50 μg/mL kanamycin. Data are averages of ten independent experiments. The error bars represent the standard error.