| Literature DB >> 35688911 |
Dominic Simm1,2, Blagovesta Popova3, Gerhard H Braus3, Stephan Waack1, Martin Kollmar4,5.
Abstract
Heterologous protein expression is an important method for analysing cellular functions of proteins, in genetic circuit engineering and in overexpressing proteins for biopharmaceutical applications and structural biology research. The degeneracy of the genetic code, which enables a single protein to be encoded by a multitude of synonymous gene sequences, plays an important role in regulating protein expression, but substantial uncertainty exists concerning the details of this phenomenon. Here we analyse the influence of a profiled codon usage adaptation approach on protein expression levels in the eukaryotic model organism Saccharomyces cerevisiae. We selected green fluorescent protein (GFP) and human α-synuclein (αSyn) as representatives for stable and intrinsically disordered proteins and representing a benchmark and a challenging test case. A new approach was implemented to design typical genes resembling the codon usage of any subset of endogenous genes. Using this approach, synthetic genes for GFP and αSyn were generated, heterologously expressed and evaluated in yeast. We demonstrate that GFP is expressed at high levels, and that the toxic αSyn can be adapted to endogenous, low-level expression. The new software is publicly available as a web-application for performing host-specific protein adaptations to a set of the most commonly used model organisms ( https://odysseus.motorprotein.de ).Entities:
Mesh:
Substances:
Year: 2022 PMID: 35688911 PMCID: PMC9187722 DOI: 10.1038/s41598-022-13089-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Example of a Markov chain. For a protein sequence starting with M-D-G-E a typical gene sequence will be designed. The RSdCU frequencies are computed based on the set of selected sequences, which could be all genes of a species, the sub-section of the 10% most highly expressed genes of a species, the selection of all genes coding for trans-membrane proteins of a species, or any other user-specified set of genes. All frequencies are normalized within each codon box. The Markov chain is built by using the RSdCU for the transition/emission probabilities.
Figure 2Odysseus flowchart. The input for the process (top of the scheme) are a sequence (protein or DNA) in FASTA format and the selection of the host organism for which the gene will be designed. The resulting DNA sequence is the output of the process (bottom of the scheme). Computations during the process are represented by boxes, databases by cylinders, decisions by diamonds and the direction of data flow by arrows. Data input from external databases and computations with external software are represented by dotted lines.
Plasmids used in this study.
| Plasmid | Description | Source |
|---|---|---|
| pRS306 | pRS306 | [ |
| pME4859 | pRS306- | This study |
| pME4860 | pRS306- | This study |
| pME4861 | pRS306- | This study |
| pME4853 | pRS306 | This study |
| pME4854 | pRS306- | This study |
| pME4855 | pRS306- | This study |
| pME4856 | pRS306- | This study |
| pME4857 | pRS306- | This study |
| pME4858 | pRS306- | This study |
Yeast strains used in this study.
| Strain | Genotype | Source |
|---|---|---|
| W303-1A | EUROSCARF | |
| RH3771 | W303 containing 1 genomic copy | This study |
| RH3772 | W303 containing 2 genomic copy | This study |
| RH3773 | W303 containing 3 genomic copy | This study |
| RH3774 | W303 containing 1 genomic copy | This study |
| RH3775 | W303 containing 2 genomic copy | This study |
| RH3776 | W303 containing 3 genomic copy | This study |
| RH3777 | W303 containing 1 genomic copy | This study |
| RH3778 | W303 containing 2 genomic copy | This study |
| RH3779 | W303 containing 3 genomic copy | This study |
| RH3756 | W303 containing 1 genomic copy | This study |
| RH3757 | W303 containing 2 genomic copy | This study |
| RH3758 | W303 containing 1 genomic copy | This study |
| RH3759 | W303 containing 2 genomic copy | This study |
| RH3760 | W303 containing 1 genomic copy | This study |
| RH3761 | W303 containing 2 genomic copy | This study |
| RH3762 | W303 containing 1 genomic copy | This study |
| RH3763 | W303 containing 2 genomic copy | This study |
| RH3764 | W303 containing 3 genomic copy | This study |
| RH3765 | W303 containing 1 genomic copy | This study |
| RH3766 | W303 containing 2 genomic copy | This study |
| RH3767 | W303 containing 3 genomic copy | This study |
| RH3768 | W303 containing 1 genomic copy | This study |
| RH3769 | W303 containing 2 genomic copy | This study |
| RH3770 | W303 containing 3 genomic copy | This study |
| RH3780 | W303 containing 1 genomic copies | This study |
| RH3781 | W303 containing 2 genomic copies | This study |
| RH3465 | W303 containing 1 genomic copy | [ |
| RH3466 | W303 containing 1 genomic copy | [ |
| RH3467 | W303 containing 2 genomic copies | [ |
Figure 3GPome-plots (genome versus proteome plots). The plots show the relative codon usage (RCU) of the 308 most expressed proteins (“highly expressed”), the following 1013 proteins with medium expression, and the 5024 least expressed proteins of S. cerevisiae plotted against the RCU of all predicted yeast genes (x-axis). For comparison, the RCUs of the highly expressed proteins are shown unweighted and weighted. Weighting means that each gene is multiplied by its absolute abundance as given by the PaxDB data.
Figure 4Steady-state protein levels of GFP. Three types of gene design were tested in combination with one to three gene copies. All designed genes are based on the weighting scheme, by which each codon of a subset of genes is multiplied with its expression level as provided by PaxDB data. Gene1 is based on the subset of the 5024 least expressed genes, gene2 is based on the 308 highest expressed genes, and gene3 is based on the inversion of the codon usage of the highest expressed genes. (A) Western blot analysis of crude protein extracts from yeast strains, expressing GAL1-driven GFP from one, two and three copies. Protein expression was induced for 6 h in galactose-containing medium, crude protein extracts were prepared and equal protein amounts from all samples were used for Western blotting. The membrane was probed with anti-GFP antibody. GAPDH antibody was used as a loading control. The full-sized blots are available in Supplementary Fig. S2. (B) Quantification of the protein levels of GFP. Densitometric analysis of the immunodetection of GFP, relative to GAPDH loading control. The significance of the differences was calculated with a One-way Anova-test (**p = 0.002; ****p < 0.0001; n = 3). (C) Life-cell fluorescence microscopy of yeast cells, expressing GFP from three copies. Scale bar: 5 µm. (D) Quantification of the fluorescence intensity of GFP-expressing cells with different copy numbers and coding sequences. The mean fluorescence intensities were quantified using SlideBook6 software package (n = 100 per strain, except n = 200 for the control). The significance of the differences was calculated with a One-way Anova-test (*****p = 0.0).
Figure 5Expression of designed and human α-synuclein. (A) Western blot analysis for determination of the protein level of αSyn. Protein expression was induced for 6 h, crude protein extracts were prepared and the protein concentrations were determined with a Bradford assay. 160 μg crude protein extract from samples gene4 (L), gene5 (M) and gene6 (H), and 40 μg from samples “human” were used for Western blotting. The membrane was probed with anti αSyn antibody. GAPDH antibody was used as a loading control. The full-sized blots are available in Supplementary Fig. S3. (B) Quantification of the protein levels of αSyn. Densitometric analysis of the immunodetection of αSyn, relative to GAPDH loading control. The significance of the differences was calculated with a One-way Anova-test (*p = 0.0107; ***p = 0.00014; n = 3). (C) Western blot analysis of crude protein extracts from yeast strains, expressing GAL1-driven αSyn from one, two and three copies. Protein expression was induced for 6 h, crude protein extracts were prepared and the protein concentrations were determined with a Bradford assay. 160 μg crude protein extract from samples gene7, gene8 and gene9, and 40 μg from samples “human” were used for Western blotting. The membrane was probed with anti-αSyn antibody. GAPDH antibody was used as a loading control. The full-sized blots are available in Supplementary Fig. S4. (D) Quantification of the steady-state protein level of αSyn. Densitometric analysis of the immunodetection of αSyn, relative to GAPDH loading control. The significance of the differences was calculated with a One-way Anova-test (*p = 0.038; -p = 0.41; ***p = 0.00034; n = 3). (E) Quantification of SNCA gene expression. RNA was prepared from yeast strains after 6 h induction of αSyn expression. Relative αSyn mRNA levels were determined by qRT-PCR and normalized against H2A. Expression values represent the mean of three replicates ± standard error. (F) Growth analysis of yeast cells expressing αSyn from one, two and three gene copies, driven by the inducible GAL1-promoter on non-inducing (‘OFF’: glucose) and inducing (‘ON’: galactose) SC-URA medium after 3 days. Yeast cells expressing GFP from the same promoter were used as a control.
Figure 6The “inverted” codon usage. The schematic view at the top demonstrates the generation of an inverted codon usage compared to that of a switched codon usage. The plots at the bottom show the relative codon usage of the 308 highest expressed proteins, when weighted and inverted (left plot), and the relative codon usage of human αSyn (right plot).