| Literature DB >> 20506237 |
Barbara Maertens1, Anne Spriestersbach, Uritza von Groll, Udo Roth, Jan Kubicek, Michael Gerrits, Marcus Graf, Michael Liss, Daniela Daubert, Ralf Wagner, Frank Schäfer.
Abstract
The genetic code is universal, but recombinant protein expression in heterologous systems is often hampered by divergent codon usage. Here, we demonstrate that reprogramming by standardized multi-parameter gene optimization software and de novo gene synthesis is a suitable general strategy to improve heterologous protein expression. This study compares expression levels of 94 full-length human wt and sequence-optimized genes coding for pharmaceutically important proteins such as kinases and membrane proteins in E. coli. Fluorescence-based quantification revealed increased protein yields for 70% of in vivo expressed optimized genes compared to the wt DNA sequences and also resulted in increased amounts of protein that can be purified. The improvement in transgene expression correlated with higher mRNA levels in our analyzed examples. In all cases tested, expression levels using wt genes in tRNA-supplemented bacterial strains were outperformed by optimized genes expressed in non-supplemented host cells.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20506237 PMCID: PMC2970903 DOI: 10.1002/pro.408
Source DB: PubMed Journal: Protein Sci ISSN: 0961-8368 Impact factor: 6.725
Side-by-Side Comparison of 100 wt and Sequence-Optimized Human Genes
| Ref_seq. | Name | Protein size (kDa) | Wild-type construct correct and commercially available | Optimized construct available upon | ratio total expression (opt/wt) |
|---|---|---|---|---|---|
| NM_002648 | Pim-1 oncogene (PIM1) | 35.6 | ✓ | ✓ | 1.65 ▴ |
| NM_006875 | Pim-2 oncogene (PIM2) | 34.2 | ✓ | ✓ | 1.29 ▴ |
| NM_001001852 | Pim-3 oncogene (PIM3) | 35.8 | ✓ | 5.79 ▴ | |
| NM_003668 | Mitogen-activated protein kinase-activated protein kinase 5 (MAPKAPK5) | 54.2 | ✓ | ✓ | 2.77▴ |
| NM_025195 | Tribbles homolog 1 (Drosophila) (TRIB1) | 41 | ✓ | ✓ | 0.83▾ |
| NM_004972 | Janus kinase 2 (JAK2) | 130 | ✓ | 11.51▴ | |
| NM_002037 | FYN oncogene related to SRC, FGR, YES (FYN) | 60.7 | ✓ | ✓ | 0.96▸ |
| NM_002110 | Hemopoietic cell kinase (HCK) | 59.6 | ✓ | 1.21▴ | |
| NM_005356 | Lymphocyte-specific protein tyrosine kinase (LCK) | 58 | ✓ | 0.41▾ | |
| NM_002011 | Fibroblast growth factor receptor 4 (FGFR4) | 87.9 | ✓* | / | |
| NM_002019 | Fms-related tyrosine kinase 1 (FLT1) | 150.7 | ✓ | 0 | |
| NM_005163 | v-akt murine thymoma viral oncogene homolog 1 (AKT1) | 55.6 | ✓ | ✓ | 0.10▾ |
| NM_003161 | Ribosomal protein S6 kinase, 70kDa, polypeptide 1 (S6K) | 59.1 | ✓ | 11.01▴ | |
| NM_005627 | Serum/glucocorticoid regulated kinase 1 (SGK1) | 48.9 | ✓ | ✓ | 3.76▴ |
| NM_005308 | G protein-coupled receptor kinase 5 (GPRK5) | 67.7 | ✓ | ✓ | 1.17▴ |
| NM_004333 | v-raf murine sarcoma viral oncogene homolog B1 (BRAF1) | 84.4 | ✓ | 0 | |
| NM_002880 | v-raf-1 murine leukemia viral oncogene homolog 1 (c-Raf) | 73 | ✓ | ✓ | 1,47▴ |
| NM_002576 | p21 protein (Cdc42/Rac)-activated kinase 1 (PAK1) | 60.6 | ✓* | / | |
| NM_002577 | p21 protein (Cdc42/Rac)-activated kinase 2 (PAK2) | 58 | / | ||
| NM_002755 | Mitogen-activated protein kinase kinase 1 (MKK1) | 43.4 | ✓ | 1.26▴ | |
| NM_004073 | Polo-like kinase 3 (Drosophila) (PLK3) | 71.6 | ✓ | ✓ | 0 |
| NM_005030 | Polo-like kinase 1 (Drosophila) (PLK1) | 68.2 | ✓ | ✓ | 50 ▴ |
| NM_002745 | Mitogen-activated protein kinase 1 (MAPK1) | 41.4 | ✓ | 0.38▾ | |
| NM_001315 | Mitogen-activated protein kinase 14 (MAPK14) | 41.6 | ✓ | 1.52▴ | |
| NM_002750 | Mitogen-activated protein kinase 8 (MAPK8) | 48.3 | ✓ | 0.80 ▾ | |
| NM_002093 | Glycogen synthase kinase 3 beta (GSK3B) | 46.7 | ✓ | 1.31 ▴ | |
| NM_002753 | Mitogen-activated protein kinase 10 (MAPK10) | 52.5 | ✓ | 1.11 ▴ | |
| NM_001292 | CDC-like kinase 3 (CLK3), transcript variant phclk3/152 | 16.8 | ✓ | 0 | |
| NM_000906 | Natriuretic peptide receptor A/guanylate cyclase A (atrionatriuretic peptide receptor A) (NPR1) | 119 | ✓ | ✓ | 0.95▸ |
| NM_001892 | Casein kinase 1, alpha 1 (CK1) | 37.2 | ✓ | 3.89 ▴ | |
| NM_005760 | CCAAT/enhancer binding protein zeta (CEBPZ) | 120.9 | ✓ | ✓* | / |
| NM_015658 | Nucleolar complex associated 2 homolog (S. cerevisiae) (NOC2L) | 85 | ✓ | 1.34 ▴ | |
| NM_022451 | Nucleolar complex associated 3 homolog (S. cerevisiae) (NOC3L) | 92.5 | ✓* | / | |
| NM_024078 | Nucleolar complex associated 4 homolog (S. cerevisiae) (NOC4L) | 58.5 | ✓ | ✓ | 0.76 ▾ |
| NM_003703 | NOP14 nucleolar protein homolog (yeast) (NOP14) | 97.7 | ✓ | ✓ | 0 |
| NM_014976 | Programmed cell death 11 (PDCD11) | 208.7 | ✓ | 1.91 ▴ | |
| NM_006331 | EMG1 nucleolar protein homolog (S. cerevisiae) (EMG1) | 26.7 | ✓ | 1.27 ▴ | |
| NM_014233 | Upstream binding transcription factor, RNA polymerase I (UBF) | 89.4 | ✓* | / | |
| NM_139071 | SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily d, member 1 (SMARCD1) | 54.9 | ✓ | 1.22▴ | |
| NM_000758 | Colony stimulating factor 2 (granulocyte-macrophage) (CSF2) | 14.4 | ✓ | ✓ | 1.72▴ |
| NM_000585 | Interleukin 15 (IL-15) | 14.9 | ✓ | ✓ | 2.26▴ |
| NM_001001437 | Chemokine (C-C motif) ligand 3-like 3 (CCL3L3) | 7.8 | ✓ | 0.81▾ | |
| NM_009855 | CD80 antigen (Cd80) | 30.5 | ✓ | 10 ▴ | |
| NM_000586 | Interleukin 2 (IL-2) | 14.6 | ✓ | ✓ | 0.89▾ |
| NM_000589 | Interleukin 4 (IL-4) | 14.9 | ✓ | 1.40▴ | |
| NM_000600 | Interleukin 6 (interferon, beta 2) (IL-6) | 21 | ✓ | ✓ | 1.10▴ |
| NM_002187, NM_000882 | Interleukin 12A and 12B (IL-12A and IL-12B) | 65 | ✓ | 0.1▾ | |
| AY890689, NM_000619, NP_000610 | Synthetic construct Homo sapiens clone FLH031198.01L interferon gamma (IFNG) | 16.8 | ✓ | 0.53▾ | |
| NM_006850 | Interleukin 24 (IL-24) | 18.2 | ✓ | 2.71▴ | |
| NM_000880 | Interleukin 7 (IL-7) | 17.4 | ✓ | ✓ | 2▴ |
| NM_000572 | Interleukin 10 (IL-10) | 18.6 | ✓ | ✓ | 1.00▸ |
| NM_024013 | Interferon, alpha 1 (IFN-α) | 19.2 | ✓ | 5 ▴ | |
| NM_000594 | Tumor necrosis factor (TNF-α) | 17.4 | ✓ | 1.56▴ | |
| NM_002985 | Chemokine (C-C motif) ligand 5 (CCL5) | 7.5 | ✓ | 2.72▴ | |
| NM_021975 | v-rel reticuloendotheliosis viral oncogene homolog A (avian) (RELA) | 60.2 | ✓ | 2 ▴ | |
| NM_020529 | Nuclear factor of kappa light polypeptide gene enhancer in B-cells inhibitor, alpha (NFκBIA) | 35.6 | ✓ | ✓ | 1.11▴ |
| NM_001429 | E1A binding protein p300 (EP300) | 264.1 | ✓ | 0 | |
| NM_002228 | Jun oncogene (AP1) | 35.7 | ✓ | ✓ | 2.63▴ |
| NM_002049 | GATA binding protein 1 (globin transcription factor 1) (GATA1) | 42.7 | ✓ | 0.91▾ | |
| NM_015859 | General transcription factor IIA, 1, 19/37kDa (TFIIA) | 41.5 | ✓ | ✓ | 50 ▴ |
| NM_000546 | Tumor protein p53 (p53) | 43.7 | ✓ | 0.51▾ | |
| NM_003403 | YY1 transcription factor (YY1) | 44.7 | ✓ | ✓ | 2.57▴ |
| NM_001514 | General transcription factor IIB (TFIIB) | 34.8 | ✓ | 0.83▾ | |
| NM_004379 | cAMP responsive element binding protein 1 (CREB1) | 36.7 | ✓ | 1.30▴ | |
| NM_016269 | Lymphoid enhancer-binding factor 1 (LEF1) | 44.2 | ✓ | 0 | |
| NM_018952 | Homeobox B6 (HOXB6) | 25.4 | ✓ | 1.17▴ | |
| NM_005901 | SMAD family member 2 (SMAD2) | 52.3 | ✓ | ✓ | 0.85▾ |
| NM_005238 | v-ets erythroblastosis virus E26 oncogene homolog 1 (avian) (ETS-1) | 50 | ✓ | 0.86▾ | |
| NM_014596 | zinc ribbon domain containing 1 (ZNRD1) | 13.9 | ✓ | 1.70▴ | |
| NM_000633 | B-cell CLL/lymphoma 2 (Bcl-2) | 26.3 | ✓ | 1.16▴ | |
| NM_001005862 | v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian) (HER-2) | 137.9 | ✓ | 0 | |
| NM_003042 | Solute carrier family 6 (neurotransmitter transporter, GABA), member 1 (SLC6A1) | 67 | ✓ | ✓ | 1.18▴ |
| NM_001045 | Solute carrier family 6 (neurotransmitter transporter, serotonin), member 4 (SLC6A4) | 70.3 | ✓ | 1.92▴ | |
| NM_014229 | Solute carrier family 6 (neurotransmitter transporter, GABA), member 11 (SLC6A11) | 70.6 | ✓ | 0.46▾ | |
| NM_016615 | Solute carrier family 6 (neurotransmitter transporter, GABA), member 13 (SLC6A13) | 63.7 | ✓ | ✓ | 1.60▴ |
| NM_024006 | Vitamin K epoxide reductase complex, subunit 1 (VKORC1) | 18.2 | ✓ | 0 | |
| NM_003264 | Toll-like receptor 2 (TLR2) | 89.8 | ✓ | 2 ▴ | |
| NM_030956 | Toll-like receptor 10 (TLR10) | 94 | ✓ | ✓ | 9.83▴ |
| NM_016562 | Toll-like receptor 7 (TLR7) | 120.9 | ✓ | 1.5 ▴ | |
| NM_012140 | Solute carrier family 25 (mitochondrial carrier; dicarboxylate transporter), member 10 (SLC25A10) | 31.2 | ✓ | 1.28▴ | |
| NM_014437 | Solute carrier family 39 (zinc transporter), member 1 (SLC39A1) | 34.2 | ✓ | ✓ | 0 |
| NM_000447 | Presenilin 2 (Alzheimer disease 4) (PSEN1) | 52.6 | ✓ | 0.71▾ | |
| NM_000220 | Potassium inwardly-rectifying channel, subfamily J, member 1 (KCNJ1) | 44.7 | ✓ | 0 | |
| NM_021625 | Transient receptor potential cation channel, subfamily V, member 4 (TRPV4) | 98.2 | ✓ | 10 ▴ | |
| NM_001651 | Aquaporin 5 (AQP5) | 28.3 | ✓ | ✓ | 0 |
| NM_005228 | Epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian) (EGFR) | 134.2 | ✓ | 0 | |
| NM_001753 | Caveolin 1, caveolae protein (CAV1) | 20.4 | ✓ | ✓ | 10.95▴ |
| NM_000593 | Transporter 1, ATP-binding cassette, sub-family B (MDR/TAP) (TAP1) | 80.9 | ✓ | ✓ | 0 |
| NM_000544 | Transporter 2, ATP-binding cassette, sub-family B (MDR/TAP) (TAP2) | 75.6 | ✓ | 0 | |
| NM_005561 | Lysosomal-associated membrane protein 1 (LAMP1) | 44.7 | ✓ | ✓ | 0 |
| NM_002294 | Lysosomal-associated membrane protein 2 (LAMP2) | 44.9 | ✓ | 0 | |
| NM_014398 | Lysosomal-associated membrane protein 3 (LAMP3) | 44.3 | ✓ | 0 | |
| NM_000086 | Ceroid-lipofuscinosis, neuronal 3 (CLN3) | 47.6 | ✓ | 0.31▾ | |
| NM_014319 | LEM domain containing 3 (LEMD3) | 99.9 | ✓ | 0 | |
| NM_000914 | Opioid receptor, mu 1 (OPRM1) | 44.7 | ✓ | 1.49▴ | |
| NM_023921 | Taste receptor, type 2, member 10 (TAS2R10) | 35.3 | ✓ | 0 | |
| NM_002507 | Nerve growth factor receptor (NGFR) | 45.1 | ✓ | 0 | |
| NM_004523 | Kinesin family member 11 (KIF11) | 119.1 | ✓ | 2.33▴ | |
| NM_001012271 | Baculoviral IAP repeat-containing 5 (BIRC5) | 16.4 | ✓ | 0.59▾ | |
| NM_001786 | Cell division cycle 2, G1 to S and G2 to M (CDC2) | 34 | ✓ | 3.47▴ | |
Genes are subdivided in protein families they encode for (kinases, RNA polymerase and ribosomal proteins, cytokines, transcription factors, membrane proteins and three “other proteins”). Columns are (left to right): Ref_seq.: GeneBank accession number; Name: gene symbol and complete gene name; protein size: size of full-length protein; wild-type construct correct and commercially available: refers to RZPD (Germany) or Geneservice (UK) (based on beginning of 2007); Optimized construct available upon de nova synthesis: ability to synthesize cDNA (✓*: wild-type construct failed to be synthesized); ratio total expression (opt/wt): absolute fluorescence value measured for expression of optimized construct divided by fluorescence value measured for expression of wild type constructs (average of triplicates, respectively); upright arrowhead (▴): expression wt < opt; horizontal arrowhead (▸): expression wt = opt (+/- 5 %); arrowhead downward (▾): expression wt > opt; (0) no expression, (/) no analysis possible due to lack of wild type construct; Cytokines are deprived of their signal sequences. Except CD80 antigen (NM_009855, mouse), all sequences are of human origin.
Figure 1Workflow of multi-gene study: 100 wt and sequence-optimized genes were cloned or synthesized into a pQE-T7 E. coli expression vector. PT7: T7 promoter; lac O: lac operator; RBS: ribosome-binding site; ATG: start codon; 6xHis: N-terminal hexahistidine tag; wt/optimized: cloning cassette to receive the gene coding sequence; Amb: amber stop codon; Stop: translational stop; ori: origin of replication; lacI: Lac repressor gene; Kanamycin: kanamycin resistance gene. The N-terminal 6xHis tag is exoproteolytically cleavable using the TAGzyme system. Every QIAgene E. coli contains a universal stop point for the TAGzyme protease.18 His tag sequences can be deleted by NdeI restriction for generation of a construct for expression of an untagged protein. The amber stop codon (UAG, Amb) can be used to incorporate a label making use of the amber suppression principle.19 Each wt and optimized construct was expressed in E.coli cells in vivo. The total cell lysate was labeled with the dye Chromeo P503 which only becomes fluorescent upon binding to an amino group of a protein. Lysates were separated on a SDS gel and scanned using an Ettan DIGE™ Fluorescent Scanner. Signals were quantified using the ImageQuant™ TL software. The factor (3.76) displays the ratio of protein expression using optimized (opt) and wild type (wt) sequences. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
Summary of In Vivo Expression Results of wt and Sequence-Optimized Genes
| Cases of opt >wt (%) | Cases of opt = wt (%) | Cases of opt <wt (%) | No expression (%) | Ratio total expression (opt s wt, mean) | Ratio total expression (opt/wt, median) | |
|---|---|---|---|---|---|---|
| All genes ( | 54.3 | 3.2 | 20.2 | 22.3 | / | / |
| Expressible genes ( | 70.0 | 4.1 | 25.9 | / | 3.50 | 1.29 |
In total, 73 out of 94 constructs could be expressed in vivo.
Amount of Protein After IMAC Purification Using 25 wt and Sequence-Optimized Genes for Expression in E. coli
| Protein yield (mg/L) | Purification mode | Yield ratio after purification | Ratio total expression | ||
|---|---|---|---|---|---|
| Name | Wild-type gene | Optimized gene | Native (N)/ denat. (D) | opt/wt (elution fractions) | opt/wt (lysate) |
| Pim-1 | 10.5 | 19.4 | N | 1.85 | 1.65 ▴ |
| Pim-2 | n.d. | 10 | N | 1.29 ▴ | |
| MAPKAPK5 | 14.9 | 80 | N | 5.37 | 2.77 ▴ |
| TRIB1 | 11.3 | 6.3 | N | 0.56 | 0.83 ▾ |
| FYN | n.d. | 6 | N | 0.96 ▸ | |
| MAPK1 | 24.3 | 11.5 | N | 0.47 | 0.38 ▾ |
| MAPK14 | n.d. | 43 | N | 1.52 ▴ | |
| MAPK8 | 33.8 | 42.2 | N | 1.25 | 0.80 ▾ |
| EMG1 | 30 | 31.5 | N | 1.05 | 1.27 ▴ |
| IFN gamma | 76 | 22 | N | 0.29 | 0.53 ▾ |
| IFN alpha | 0 | 13.3 | N | 5.00 ▴ | |
| TNF alpha | n.d. | 68 | N | 1.56 ▴ | |
| NFKB1a | 34 | 34.8 | N | 1.02 | 1.11 ▴ |
| YY1 | n.d. | 24 | N | 2.57 ▴ | |
| TFIIB | 19.1 | 4 | N | 0.21 | 0.83 ▾ |
| Kif11 | 2 | 3.5 | N | 1.75 | 2.33 ▴ |
| CDC2 | 8.2 | 14 | N | 1.71 | 3.47 ▴ |
| Caveolin 1 | 0 | 1.2 | N | 10.95▴ | |
| CSF2 | 8.5 | 23.8 | D | 2.80 | 1.72 ▴ |
| IL-4 | 25.5 | 34.8 | D | 1.36 | 1.40 ▴ |
| IL-6 | 19.6 | 21.2 | D | 1.08 | 1.10 ▴ |
| IL-7 | 0 | 16.2 | D | 2.00 ▴ | |
| IL-10 | 6 | 5.8 | D | 0.97 | 1.00 ▸ |
| IFN alpha | 0 | 25.5 | D | 5.00 ▴ | |
| TNF alpha | 25.5 | 38.6 | D | 1.51 | 1.56 ▴ |
| CCL5 | n.d. | 63 | D | 2.72 ▴ | |
| TFIIB | 25.9 | 19 | D | 0.73 | 0.83 ▾ |
Columns are (left to right): Name: gene symbol; wt gene: amount of protein quantified after expression and purification using the wt coding sequence; optimized gene: amount of protein quantified after expression and purification using the optimized coding sequence; n.d.: not determined; native/denat.: purification performed under native or denaturing conditions; yield ratio (opt/wt): factor calculated from protein yield in purification elution fractions; ratio total expression (opt/wt): factor calculated from expression level in crude lysates (see Table I).
Figure 2Optimized sequences increase yield of soluble protein in in vivo E. coli expression system. The expression in E. coli BL21(DE3) and Ni-NTA purification via His tag under native conditions of four wild type (WT) and optimized (OPT) sequences and optimized CAV1 is shown (wt CAV1 cannot be detected). Samples were analyzed on a SDS gel and stained with Coomassie Brilliant blue. Arrows indicate the protein of interest, arrowheads show lysozyme; elution fractions (E) were quantified with a Bradford assay and 3 μg protein was separated in case of sequence-optimized expression in comparison to the same volume of wt protein fraction. TL: total lysate, CL: cleared lysate; 2.5 μL of each fraction were separated R: resolubilized membrane fraction; BT: break through; W: wash. Note that some protein in the cleared lysates is insoluble and purification of the soluble protein results in enrichment in the elution fraction. Marker: Page ruler prestained protein Ladder (Fermentas); for more information on the genes and proteins see Table I. [Color figure can be viewed in the online issue, which is available at www.interscience.wiley.com.]
Figure 3Enhanced codon usage is only one aspect of gene optimization. (A) Human wt genes coding for PLK1, SMARCD1, CSF2, AP-1, YY1, CCL5, and CAV1 were expressed in Rosetta2 (white) and BL21CodonPlusRIPL (grey) E. coli strains, both supplemented with rare tRNAs. Wt and sequence-optimized genes coding for the same proteins were expressed in E. coli BL21(DE3) (black). (B) Lysates were labeled and quantified using the fluorescent dye Chromeo P503, separated on an SDS gel and analyzed with an Ettan DIGE scanner. Proteins bands were evaluated using the ImageQuant TL software. Every expression was done in triplicates. WT: wild type sequences; OPT: optimized sequences; C: control (mock transformation).
Figure 4mRNA level correlates with amount of recombinant protein. Expressions of CK1 (A), LCK (B) were analyzed at 4 different time points after IPTG induction at mRNA and protein level. 6 × 108 cells were harvested, total RNA was isolated and used for relative quantification by two-step real-time PCR. Real-time PCR measurements were done in triplicate with samples from two independent experiments. The fold changes in mRNA expression relative to the mRNA level at T0 are plotted against the time after induction. Representative Western blots show the expression levels of the corresponding proteins. Total cell lysates from an identical number of cells at the different time points post induction were subjected to SDS-PAGE and subsequent Western blotting using Penta-His HRP Conjugate.