Literature DB >> 21586584

Re-engineering multicloning sites for function and convenience.

Nathan C Crook¹, Elizabeth S Freeman, Hal S Alper.

Abstract

Multicloning sites (MCSs) in standard expression vectors are widely used and thought to be benign, non-interacting elements that exist for mere convenience. However, MCSs impose a necessary distance between promoter elements and genes of interest. As a result, the choice of cloning site defines the genetic context and may introduce significant mRNA secondary structure in the 5'-untranslated region leading to strong translation inhibition. Here, we demonstrate the first performance-based assessment of MCSs in yeast, showing that commonly used MCSs can induce dramatic reductions in protein expression, and that this inhibition is highly promoter and gene dependent. In response, we develop and apply a novel predictive model of structure-based translation inhibition to design improved MCSs for significantly higher and more consistent protein expression. In doing so, we were able to minimize the inhibitory effects of MCSs with the yeast TEF, CYC and GPD promoters. These results highlight the non-interchangeable nature of biological parts and represent the first complete, global redesign of a genetic circuit of such widespread importance as a multicloning site. The improved translational control offered by these designed MCSs is paramount to obtaining high titers of heterologous proteins in eukaryotes and to enabling precise control of genetic circuits.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2011 PMID： 21586584 PMCID： PMC3152365 DOI： 10.1093/nar/gkr346

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Expression vectors with predefined multiple cloning sites (MCSs) are among the most common tools employed in molecular biology and genetics. These vectors have enabled the facile expression and cloning of recombinant genes and have recently ushered in the era of synthetic biology (1). The flexibility of restriction enzyme sites in MCSs facilitate easy cloning of genes of interest for diverse applications from genetic analysis to creation of biofuels-producing strains. Common improvements to vectors containing MCSs are focused at controlling transcript levels [via promoter replacement/engineering (2), transcription machinery engineering (3) or copy number manipulations (4)] or translation rate [e.g. by improving codon bias (5) or by reducing expression noise (6)]. In all these applications, multiple cloning sites are thought to be benign, non-interacting elements that exist for mere convenience. However, a promoter element is usually placed upstream of the MCS. As a result, several base pairs (or even multiple restriction sites) will appear in the 5′-untranslated region (5′-UTR) of the mRNA of the cloned gene depending on the restriction site chosen. Thus, it is conceivable that the composition of these sites can significantly influence translation efficiencies of the downstream gene. Here, we demonstrate the first performance-based assessment of multiple cloning sites and develop a novel theoretical framework enabling the prediction of a MCS's effect on translation. Furthermore, we apply this understanding to rationally redesign these sites for improved function and reduced variability associated with restriction enzyme choice. We posit that this phenomenon of 5′-UTR structure inhibition is most pronounced when using shorter, codon-optimized genes. Secondary structure in the 5′-UTR of mRNA has been found to affect expression in both prokaryotes (7,8) and eukaryotes (9–14) at the translational level. In prokaryotes, translation is initiated by the assembly of the 70S initiation complex on the ribosome binding site (RBS), normally within a few base pairs of the start codon, and it is thought that RNA secondary structure can inhibit translation by occluding the RBS (7,15). In fact, predictive models of RBS performance explicitly treat the inhibitory effect of 5′-UTR secondary structure (7). Due to the differences in translation initiation in prokaryotes, the design criteria of prior methods would be of little use in highly relevant eukaryotic systems such as Saccharomyces cerevisiae. Hence, a novel modeling approach resulting from a distinct theoretical framework is needed to address the issue of 5′-UTR secondary structure for yeast systems. In eukaryotes, the 43S initiation complex must scan along the 5′-UTR before commencing translation at the start codon, often 50 bp or more from the 5′-cap structure (15). It has been hypothesized that the presence of secondary structure in these organisms decreases the rate of translation initiation by impeding ribosome scanning (12). Multicloning sites impose distance (and therefore a high likelihood of structure) between a promoter and the gene of interest in a restriction site-dependent manner, leading to the hypothesis that cloning location affects protein expression, especially in eukaryotes. In several cases, irreproducible or conflicting results have been explained by differences in restriction site usage (16,17). However, most attempts at mitigating translation-inhibiting secondary structure in eukaryotes result in ‘quick fixes’ such as point mutations, which are only applicable for the precise gene construct under consideration (18–22). Moreover, no prior work has successfully minimized secondary structure to optimize a genetic component of such widespread importance as the multicloning site or to develop a system which achieves nearly context-independent levels of protein expression, both of which are of critical significance to obtaining high titers of heterologous proteins in eukaryotes and to enabling precise control of genetic circuits. Therefore, due to their enormous utility and widespread use for heterologous gene expression, the characterization and optimization of MCSs to minimize the effects of mRNA structure in a more general context represents a promising and novel avenue toward improving protein titers and controlling protein production. A variety of algorithms exist for the prediction of RNA secondary structure (23–25). A common approach is to compute the free energy of the strand of interest through a partition function, using empirically determined base-stacking energies to weight each possible conformation (26,27). One limitation of this approach is that enumeration of all possible conformations becomes impractical for large strands, so certain classes of folds (e.g. pseudoknots) are commonly ignored, though are possibly significant. It is important to note that a strand's free energy of folding computed in this manner is not a simple function of its composition. Since MCSs must additionally contain certain sequence motifs, any attempt to rationally design MCSs based on minimized free energy is prohibitively difficult, necessitating the use of a metaheuristic such as a genetic or hill-climbing algorithm. This difficulty is exacerbated by the requirement that designed MCSs refrain from folding regardless of where the gene of interest is inserted, highlighting the potential rarity of desirable MCSs. In this study, we establish the variations in downstream protein translation imparted by multicloning sites and isolate the effect of secondary structure-based inhibition especially in cases of short, codon-optimized genes. This effect is demonstrated using the MCS of a common yeast vector system (28,29). Due to the unacceptably large variance found along the cloning site, a predictive model was developed to redesign multiple cloning sites with minimized secondary structure and thus improved mRNA translation. These models led to promoter specific, redesigned multiple cloning sites that outperform standard constructs.

MATERIALS AND METHODS

Strains and media

Yeast expression vectors were propagated in Escherichia coli DH10β. All yeast experiments were carried out in S. cerevisiae BY4741 (MATa; his3Δ1; leu2Δ0; met15Δ0; ura3Δ0) obtained from EUROSCARF, Frankfurt, Germany. Escherichia coli strains were routinely cultivated in Lysogeny Broth (LB) medium (30) (Teknova) at 37°C with 225 RPM orbital shaking. LB was supplemented with 100 µg/ml ampicillin (Sigma) when needed for plasmid maintenance and propagation. Yeast strains were cultivated on a yeast synthetic complete (YSC) medium containing 6.7 g of Yeast Nitrogen Base (Difco)/l, 20 g glucose/l and a mixture of appropriate nucleotides and amino acids (CSM-URA, Qbiogene) referred to here as YSC Ura−. All medium was supplemented with 1.5% agar for solid media. For E. coli transformations, 25 µl of electrocompetent E. coli DH10β (30) were mixed with 30 ng of ligated DNA and electroporated [2 mm Electrporation Cuvettes (Bioexpress) with Biorad Genepulser Xcell] at 2.5 kV. Transformants were rescued for 1 h at 37°C in 1 ml SOC Buffer (Cellgro), plated on LB agar and incubated overnight. Single clones were amplified in 5 ml LB medium and incubated overnight at 37°C. Plasmids were isolated (QIAprep Spin Miniprep Kit, Qiagen) and confirmed by sequencing. For yeast transformations, 50 µl of chemically competent S. cerevisiae BY4741 were transformed with 1 µg purified plasmid according to manufacturer's instructions (Frozen EZ Yeast Transformation II Kit, Zymo Research), plated on YSC Ura− agar, and incubated for 2 days at 30°C. Single colonies were picked into 5 ml YSC Ura− and incubated at 30°C.

Plasmid construction

Cloning procedures

PCR reactions were performed with Phusion DNA Polymerase (Finnzymes Inc.) according to manufacturer specifications. Digestions were performed according to manufacturer's (NEB) instructions, with digestions close to the end of a linearized strand running overnight and digestions of circular strands running for 1 h at 37°C. PCR products and digestions were cleaned with a QIAquick PCR Purification Kit (Qiagen). Phosphatase reactions were performed with Antarctic Phosphatase (NEB) according to manufacturer's instructions and heat-inactivated for 15 min at 65°C. Ligations (T4 DNA Ligase, Fermentas) were performed for 6 h at 22°C followed by heat inactivation at 65°C for 15 min.

yECitrine pBLUESCRIPT SK multicloning site series

yECitrine was cloned from pT5Y (Supplementary Table S1) using PCR. Primers matching 29 bp of yECitrine were used to add restriction sites to both ends of the gene, for a total of eight different yECitrine PCR products (forward primers: 16–23, reverse primer: 25). After digestion, these yECitrine fragments were each ligated separately into the multi-cloning sites of p416-TEF, p416-GPD and p416-CYC. The pCYC0xYFP series used oligo 26 as reverse primer because the XhoI site is not unique in p416-CYC. pGPD06YFP, pTEF06YFP, pCYC06YFP and pCYC08YFP were made with assembly PCR [see Designed Multicloning Site Series. TEFp, GPDp or CYC1p, CYC1 terminator, and assembly oligos (pGPD06YFP, pTEF06YFP and pCYC06YFP: 28,29, pCYC08YFP: 28 and 30) comprised the first reaction. Full-length product was amplified, digested and ligated as for the designed MCS series]. pCYC09YFP was constructed by swapping CYC1 for GPD in construct pGPD09YFP through SacI–XbaI fragmentation. This resulted in 27 distinct plasmids (Supplementary Table S4).

yECitrine designed multicloning site series

Novel MCSs were generated with assembly PCR. PCR products of TEFp (primers 31–32), GPDp (primers 33–34) or CYC1p (primers 35–36) were combined with CYC1 terminator (primers 37–38) and assembly oligos (39–42, 43–45, 46–48, 49–52 or 53–56) at 30 nM each and amplified (94°C for 1 min, 68°C for 2 min, 72°C for 3 min, 25 cycles). Full-length product was then amplified from 2.5 µl of this mixture (forward primers 31, 33 or 35; reverse primer 38), digested with SacI and KpnI and ligated to a phosphatased SacI–KpnI fragment of p416. yECitrine was inserted at each restriction site as for the pBLUESCRIPT SK series (forward primers 16–24, reverse primers 25, 26 or 57 as necessary) resulting in the constructs shown in Supplementary Table S2. pCYC111YFP was constructed with CYC1p, CYC1 terminator and primer 58 using assembly PCR because XhoI is not unique in this construct.

LacZ pBLUESCRIPT SK multicloning site series

LacZ was isolated from whole-genome extract of E. coli K12-MG1665 (Wizard Genomic DNA Purification Kit, Promega) with PCR (primers 59–60), fragmented with XbaI and ClaI and ligated to p416-GPD. LacZ was inserted at XbaI as for the pBLUESCRIPT SK series (primers 61–62). pTEF03LacZ, pTEF05LacZ, pTEF07LacZ and pTEF09LacZ were constructed using assembly PCR [LacZ-CYC1term (primers 38 and 63) and assembly oligos (pTEF03LacZ: 65, pTEF05LacZ: 66–67, pTEF07LacZ: 66 and 68, pTEF09LacZ: 66, 69–70) comprised the first reaction. Full-length product was amplified in a second reaction (primers 38 and 64)]. Each product was digested with XbaI and KpnI, and ligated to p416-TEF. The resulting plasmids are shown in Supplementary Table S5.

GFP pBLUESCRIPT SK multicloning site series

GFP was isolated from pZE-GFP (31) using PCR (forward primers 71–75, reverse primer 76), fragmented and ligated to p416-TEF at XbaI, BamHI, EcoRI, ClaI and XhoI as for the pBLUESCRIPT SK series. The resulting plasmids are shown in (Supplementary Table S6).

Flow cytometry analysis

Yeast colonies were picked in triplicate, grown in YSC Ura− to mid-log phase, resuspended to an optical density of 0.5 in ice-cold water, and analyzed (FACSCalibur Flow Cytometer, BD Biosciences. Excitation wavelength: 488 nm, detection wavelength: 530 nm). Day-to-day variability was accounted for by analyzing all comparable transformants on the same day. An average fluorescence and standard deviation was calculated from the mean values for the biological replicates. Flow cytometry data was analyzed using FlowJo software.

β-Galactosidase assay

Yeast colonies were picked in triplicate, grown in YSC Ura− to an optical density of 0.5, and prepared according to manufacturer's instructions (Novabright β-Galactosidase Enzyme Reporter Gene Chemiluminescent Detection Kit for Yeast Cells, Invitrogen). Luminescence was quantified with a SpectraMax M3 Multi-Mode Microplate Reader (Molecular Devices). Day-to-day variability was accounted for by analyzing all comparable transformants on the same day.

Computational studies and modeling efforts

Nupack2.1.2 (24) was used to perform all RNA folding calculations. Folding conditions of 30°C, 1 M Na+ and 0 M Mg2+ were utilized. All reported energies are the free energies of the ensemble of potential structures, as opposed to the minimum free energy structure. Pseudoknots were not considered due to computational limitations. First and second round computations were run on an intel Xeon processor running MATLAB. Third round computations were run on all cores of an intel core i7 processor running MATLAB. Most optimizations were run over 24 h.

First round of optimization

The first set of MCSs (pTEF1xYFP and pCYC11xYFP) were designed with the goal of maximizing the ensemble free energy of the complete 5′-UTR (32–35). Design proceeded using a hill-climbing algorithm in a two-step process, using the free energy of the longest possible 5′-UTR (i.e. cloning into the last possible restriction site in the MCS) as its score. The restriction sites were first reordered to maximize free energy, followed by the addition of up to 5 bp between each restriction site to further increase free energy (Figure 1B and C).

Figure 1.

Model construction and multicloning site design methodology. (A) The first round of modeling implemented a hill-climbing algorithm to search for regions of the mRNA whose free energy of folding correlated strongly with fluorescence. MCSs were then designed via a two-step process, (B) a hill-climbing algorithm to find the optimal ordering of restriction sites followed by, (C) a hill-climbing algorithm to further decrease the likelihood of secondary structure formation and (D) the second round of modeling undertook an exhaustive search of all possible pairs of regions to find the set which showed the greatest predictive ability.

Second round of modeling and optimization

To address the limitations of the first model of structure-based translation inhibition, a model framework was developed incorporating two (or more) regions whose free energy of folding correlates with protein production. These free energy barriers can occur as the complex is scanning along the 5′-UTR or as the complex is binding to the 5′-cap structure. If Ni is the number of complexes in state i and N + 1 is the number of complexes in the next state, then we have: where ΔG is the magnitude of the free energy barrier and β represents the Boltzmann constant of the system (i.e. how energetic each complex is and thus how likely it is to traverse energetic barriers). Such results from statistical mechanics are valid due to the large number of yeast cells measured. If there are N complexes in the first (unbound) state, we have: where β are the Boltzmann constants at each state, ΔG are the free energies of each barrier between them. We can rewrite the product to yield Assuming there are i states and the rate of translation initiation (hence protein production) is proportional to the number of initiation complexes in the last state (the state closest to the start codon), we have where f is the fluorescence value and C is a proportionality constant (since the data have been normalized to the fluorescence of a particular construct). If we take the logarithm of both sides, we can correlate the logarithm of the fluorescence to barrier free energies by fitting the Boltzmann constants and the proportionality constant, C: where the hat denotes the estimator of a variable. This framework was used to develop models for the second and third rounds of modeling. Models and novel MCSs were evaluated using the ensemble free energies of two disjoint segments of RNA as predictors. The boundaries for each segment were measured relative to the start codon. Although possibly between the boundaries of each segment, nucleotides which were not between the start of the 5′-UTR and 30 bp after the start codon were not included in folding calculations. In addition to the pBLUESCRIPT SK MCS data, the yECitrine expression resulting from a number of other post-promoter ‘inserts’ (see ‘yECitrine insert series’) were also used to train the predictive model for each promoter. A hill-climbing algorithm was implemented to search for the two segments whose free energies best correlated with the data for all the available constructs according to the framework above (Figure 1A). The correlation coefficient was used to score each potential model. Hill-climbing algorithms were similarly used to search for the best possible MCS in a two-step process similar to the first round of optimization (Figure 1B and C). For each potential MCS, a score was calculated using the model developed above. A positive value was given to those positions which, when yECitrine was inserted at that site, resulted in a higher predicted fluorescence than had been predicted at the same position (e.g. the third site from the end of the promoter) in other MCSs. A negative score was similarly given to underperforming positions. The total score for each potential MCS was the sum of these positive and negative values, and the MCSs with the greatest scores were selected.

Third round of modeling

Due to increased computational resources, the third round of modeling used an exhaustive search of pairs of disjoint predictive regions in all available data instead of a hill-climbing algorithm (Figure 1D). The predicted residual sum of squares (PRESS) was used to score each pair, as computed by the hat matrix.

RESULTS

Performance-based assessment of the pBLUESCRIPT SK multiple cloning site in yeast

To gain a quantitative performance assessment of a commonly used multiple cloning site in yeast, we inserted an optimized YFP fluorescent protein, yECitrine (36), after each restriction site in the p416 vector (29). This base vector is derived from the commonly used pRS yeast shuttle vector (28) and contains the popular pBLUESCRIPT SK MCS. Three common distinct yeast promoters were chosen to drive expression of these cassettes. Protein output [as measured by fluorescence of yECitrine (YFP)] changes significantly and exhibits drastic decreases as a function of position along the MCS (Figure 2). These results demonstrate that the choice of restriction site is not benign and can significantly influence performance. Moreover, this phenomenon is not strictly controlled by spacing/length as the relative fluorescence at each site depends strongly on the promoter being used to drive transcription. Additionally, it is clear that there exist promoter-specific effects beyond what would be expected from strength differences. Indeed, if the fluorescence trend was simply scaled by promoter strength, the graphs shown in Figure 2 would be identical. It is also worthy of note that the fluorescence trends are not monotonically decreasing, implying that any predictor function of MCS performance must not vary monotonically with the length of mRNA between the end of the promoter and the start codon. Finally, transcript analysis of yECitrine was not found to significantly correlate with fluorescence, thus implying the observed restriction site-dependent performance was predominately a translation-level effect (Information I in Supplementary Data). Based on this characterization, it is therefore imperative that any studies relying on the precise quantity of protein (e.g. promoter strength assays or comparative enzyme assays) consider and report the intervening nucleotides between the promoter and the gene of interest, as they can confound measurements of gene expression or activity.

Figure 2.

Performance assessment of the pBLUESCRIPT SK multicloning site. Three promoters (TEF, CYC1and GPD) were used to drive yECitrine inserted at each available restriction site of the pBLUESCRIPT SK MCS in the p416 vector. Each series has been scaled to unity at the first restriction site. Unscaled fluorescence values for pGPD01YFP, pTEF01YFP and pCYC101YFP are 1050, 611 and 30.2, respectively (Supplementary Table S4). Error bars represent the standard deviation in fluorescence observed across biological triplicates. Fluorescence is seen to vary in a promoter-specific manner across each of the sites in the MCS.

Determination of possible correlates of 5′-UTR-dependent translational inhibition

Given evidence that the restriction site-dependent inhibition is a translation-level effect, several physical characteristics of mRNA were considered as possible correlates of yECitrine fluorescence. Initially, both 5′-UTR GC content and length were evaluated using an expanded data set consisting of the TEFpmut5 promoter (2,37) and various intervening sequences (Supplementary Table S1). This data set represented the first instance in which we observed this translational inhibition, inspiring a more complete characterization of this effect in the wild-type, canonical TEF, GPD and CYC1 promoters in subsequent experiments. TEFpmut5 is almost identical to TEF, containing eight point mutations and retaining 95% of TEF's promoter activity, indicating that the two promoters are comparable. Relative fluorescence was plotted against length and GC content for these TEFpmut5 constructs (Figure 3A and B), and no clear relationship was observed in either variable. However, upon plotting the computed thermodynamic folding energy of the 5′-UTR (a more direct predictor of secondary structure) against yECitrine expression (Figure 3C), a clear monotonic downward trend was observed, consistent with earlier reports that significant 5′-UTR secondary structure can inhibit gene expression (11–13). Since RNA transcription begins in the 3′-end of the promoter, different promoters will yield different base pair compositions (and hence differing secondary structure) in the 5′-UTR. This result partially explains the promoter-specific impact of MCS found in Figure 2. Therefore, it was hypothesized that restriction site-dependent inhibition in the multicloning site was best explained by the thermodynamic free energy of folding of the 5′-UTR.

Figure 3.

Prospective correlates of expression in the TEFpmut5 insert series. yECitrine expression levels were measured in each of the TEFpmut5 constructs listed in Supplementary Table S1 and plotted against (A) 5′-UTR length, (B) GC content and (C) folding energy. Each plot has been scaled relative to the fluorescence of pT5Y. Error bars represent the standard deviation in fluorescence observed across biological triplicates. Fluorescence is seen to monotonically vary with free energy level, thus suggesting 5′-UTR secondary structure as the leading cause of this phenomenon.

Comparing the impact of 5′-UTR structure to codon usage and gene length

Beyond 5′-UTR structure, gene-specific traits such as length and codon usage can impact translation. To this end, genes for β-galactosidase and an E. coli optimized green fluorescent protein (GFP) (31) were inserted into the MCS of p416-TEF and performance was compared with yECitrine. The codon adaptation index (CAI) (38), a common measure of codon optimality, for both β-galactosidase and GFP in yeast are quite low. In addition, β-galactosidase is relatively long (>3 kb), whereas the lengths of GFP and yECitrine are almost identical (∼700 bp) (Table 1). In the case of yECitrine, a short, codon-optimal gene, 5′-UTR structure dominated as reporter output varied greatly as a function of cloning position (Figure 4). In contrast, as the gene of interest becomes longer and uses progressively rarer codons (as with β-galactosidase and GFP), the effects of gene length or codon biases become the rate-limiting steps in translation, dwarfing the effects of secondary structure. As a result, the restriction site-dependent effects of mRNA secondary structure are substantially muted by poor codon usage and/or large size (Figure 4). Therefore, the effect documented here of 5′-UTR structure inhibition is extremely relevant to synthetic biology in which codon-optimized genes are routinely being synthesized and used.

Table 1.

Genetic parameters for yECitrine, eGFP and LacZ

Gene	Length (bp)	Codon adaptation index
yECitrine	717	0.519
eGFP	756	0.0888
LacZ	3075	0.0570

Codon adaptation indices were computed with JCat (40) in S. cerevisiae and gene lengths are reported.

Figure 4.

Effect of gene length and codon usage on translational inhibition. LacZ and eGFP expression levels were measured in each of the constructs listed in Supplementary Tables S5 and S6, respectively, and compared with data for yECitrine. Each series has been normalized to unity at the first restriction site. Position on the MCS has been measured according to the unique restriction sites in the p416 vector. Error bars represent the standard deviation in reporter output observed across three biological replicates. The impact of 5′-UTR inhibition is most pronounced in short, codon optimized genes.

Re-engineering multicloning sites

Given the substantial effect MCSs can have on protein production, we sought to redesign these elements by mitigating secondary structure inhibition. An initial, crude model based on complete minimization of secondary structure across the entire 5′-UTR enabled the design of improved MCSs: TEF1 and CYC11 (Information II in Supplementary Data). However, this model is fundamentally limited as it suggested that protein output always decreased as a function of length across the 5′-UTR. Counterexamples to this feature were found in our data set. Due to this shortcoming, GPD1 was not constructed and a more accurate model framework was developed to redesign multicloning sites. To address the observation that adding specific sets of nucleotides between the promoter and the start codon can yield increases in translational efficiency, a new model framework was developed incorporating two (or more) regions whose free energy of folding correlates with protein production (Figure 5). Such a model is grounded in the fundamental biology of the process. Successful initiation requires the presence of eIF4a, an ATP-dependent helicase, which unwinds mRNA in preparation for ribosome loading. In addition, scanning through a structured 5′-UTR requires ATP, though the enzyme responsible is unknown (15). Thus, the initiation complex can be modeled as a particle passing through several states (Figure 5), each separated by a free energy of folding, before reaching the start codon (see ‘Materials and Methods’ section). The models which best explained the available data (CYCModel1, TEFModel1 and GPDModel1) are shown in Table 2. It is important to note that in no model was the presence of mRNA structure beneficial for reporter expression. To validate these models, a second set of promoter-specific MCSs were generated: TEF2, CYC12 and GPD2, detailed in Supplementary Table S2. It is important to note that this design process was nontrivial due to the large number of sequence constraints which must be satisfied in addition to the requirement that the designed MCSs refrain from folding in a variety of genetic contexts, in contrast to attempts at structure minimization in other systems, for which the number of sequence constraints is relatively low and applicability is restricted to a specific gene construct (7). Furthermore, the promoters for which these MCSs are designed differ in transcriptional output by up to two orders of magnitude from one another, providing an excellent test of our framework's applicability in multiple transcriptional contexts. yECitrine was cloned at each restriction site for the three MCSs, and the fluorescence measurements are shown in Figure 6.

Figure 5.

Table 2.

Computational models of yECitrine fluorescence based on 5′-UTR structure

Name	Region 1	Region 2	Model	Correlation coefficient	Predicted residual sum of squares	Residual sum of squares
CYC1Model1	[−166, −45]	[−37, 6]	ln( f ) = 0.0986 × ΔG₁ + 0.1253 × ΔG₂ + 0.5004	0.7809	0.3016	0.01701
TEFModel1	[−137, −7]	[−6, −1]	ln( f ) = 0.1042 × ΔG₁ + 41.5185 × ΔG₂ − 0.6856	0.5922	1.5128	1.5226
GPDModel1	[−115, −98]	[−53, 19]	ln( f ) = 2.3378 × ΔG₁ + 0.1227 × ΔG₂ − 1.4524	0.8340	1.2294	0.02174
CYC1Model2	[−105, −95]	[−53, −5]	ln( f ) = 1.1331 × ΔG₁ + 0.0936 × ΔG₂ − 0.1545	0.8600	0.1904
TEFModel2	[−93, −87]	[−32, −8]	ln( f ) = 106.9974 × ΔG₁ + 0.3197 × ΔG₂ + 0.4363	0.9100	0.2278
GPDModel2	[−126, −99]	[−76, −4]	ln( f ) = 0.6411 × ΔG₁ + 0.1221 × ΔG₂ + 1.2860	0.9536	0.2264

Indicated regions are measured relative to the first nucleotide of the start codon. The correlation coefficient was computed for all data available at the time of model training. The PRESS was computed with the hat matrix after regression. The residual sum of squares was computed for CYCModel1, TEFModel1 and GPDModel1 with the natural log of the data from pCYC12xYFP, pTEF2xYFP and pGPD2xYFP, respectively.

Figure 6.

Performance of designed multicloning sites. Performance of (A) pGPD2xYFP, (B) pTEF2xYFP and (C) pCYC12xYFP are depicted. Three MCSs were designed with the aid of the models listed in Table 2 and inserted after GPD, TEF or CYC1, respectively. Data in (A) have been scaled to the fluorescence of pGPD01YFP, in (B) to pTEF01YFP and in (C) to pCYC101YFP. The scaling for each series within each graph are identical. Position on the MCS has been measured according to the unique restriction sites in the p416 vector. Error bars represent the standard deviation in fluorescence observed across three biological replicates. These MCSs had improved performance compared with pBLUESCRIPT SK.

Model of translation inhibition by secondary structure in the 5′-UTR. The pre-initiation complex (green) scans in the 3′-direction and is impeded by one or more regions of mRNA structure, decreasing the rate of translation initiation. To capture this effect, a model was created that allowed for two or more regions of secondary structure that can influence translational efficiency. Performance of designed multicloning sites. Performance of (A) pGPD2xYFP, (B) pTEF2xYFP and (C) pCYC12xYFP are depicted. Three MCSs were designed with the aid of the models listed in Table 2 and inserted after GPD, TEF or CYC1, respectively. Data in (A) have been scaled to the fluorescence of pGPD01YFP, in (B) to pTEF01YFP and in (C) to pCYC101YFP. The scaling for each series within each graph are identical. Position on the MCS has been measured according to the unique restriction sites in the p416 vector. Error bars represent the standard deviation in fluorescence observed across three biological replicates. These MCSs had improved performance compared with pBLUESCRIPT SK. Computational models of yECitrine fluorescence based on 5′-UTR structure Indicated regions are measured relative to the first nucleotide of the start codon. The correlation coefficient was computed for all data available at the time of model training. The PRESS was computed with the hat matrix after regression. The residual sum of squares was computed for CYCModel1, TEFModel1 and GPDModel1 with the natural log of the data from pCYC12xYFP, pTEF2xYFP and pGPD2xYFP, respectively. The redesigned MCS for the GPD promoter exhibited superior performance over the original, unoptimized MCS (Figure 6A). This new MCS, GPD2, shows negligible multicloning site inhibition for the first eight restriction sites, which coupled with high levels of yECitrine expression, makes this the ideal MCS for this strong promoter (Figure 6A). Furthermore, this trend was predicted by GPDModel1, lending support to the hypothesis that protein expression is influenced by secondary structure in a few key regions of the 5′-UTR (Supplementary Figure S3A). The excellent agreement between model and observation suggests that secondary structure may be the only significant translational rate-limiting step in protein expression for this extraordinarily strong promoter with a short, codon-optimized protein. In further extension of this approach, the TEF-promoter-specific MCS TEF2 shows improved performance over pBLUESCRIPT SK or TEF1, exhibiting similar or increased expression levels across the sites in the MCS (Figure 6B). Furthermore, the observed expression trend was predicted remarkably well by TEFModel1, showing that mRNA structure is also a major limiting factor in this promoter (Supplementary Figure S3B), albeit not as limiting as in the GPD promoter case. Applying this approach for a yet weaker promoter (CYC1), a new MCS, CYC12, was designed that provides better, more consistent performance across the first four restriction sites than CYC11 or pBLUESCRIPT SK (Figure 6C). However, CYC11 (Supplementary Figure S2B) provides better performance than CYC12 or pBLUESCRIPT SK when cloning after the fourth restriction site. The measured performance of CYC12 was well predicted by CYC1Model1, validating its predictive ability (Supplementary Figure S3C). Taken together, these results indicate that the expression-inhibiting effects of multicloning sites can be substantially mitigated in a variety of transcriptional contexts through minimization of 5′-UTR secondary structure. In addition, no designed MCS elicited a significant change in gene expression noise, indicating that these constructs are ideal for development of precisely controlled gene networks (Supplementary Figure S4). However, it should be noted that neither TEF2 nor CYC12 matched the outstanding performance of GPD2, either due to random errors in the modeling process or due to the manifestation of other rate-limiting steps in expression not accounted for in our simplistic structure-based model of expression. As TEF and CYC1 are both substantially weaker promoters than GPD, the presence of additional rate-limiting factors (possibly stemming at the transcriptional level) is not surprising. Finally, all data collected above was used to upgrade the weighting factors and relevant 5′-UTR regions in our models (Table 2). These upgraded models are expected to give researchers more accurate predictions of 5′-UTR structure-based inhibition of protein expression in yeast.

DISCUSSION

We have demonstrated that simplistic models of 5′-UTR RNA secondary structure can be used to predict and rationally design multicloning site performance. The approach defined here is novel and significant for several reasons: (i) most modeling and prediction efforts in this area have examined prokaryotic systems (especially for ribosome binding sites), whereas this work utilizes yeast, a eukaryotic system. The mechanics of eukaryotic translation are sufficiently different and require a novel mechanistic approach. (ii) Most prior studies evaluate the impact of 5′ hairpin loops and their inhibitory effect on translation, especially when sequestering the start AUG. In contrast, our work demonstrates that the observed translation inhibition by structure was highly dependent on the position of the secondary structure, and not always a set distance from the transcription initiation site. (iii) Most prior studies evaluate the impact of specific point mutations that can change secondary structure. No prior work has successfully predicted and achieved a global redesign of a genetic circuit of such widespread importance as a multicloning site. In contrast to prior studies, this method of prediction and optimization of 5′-UTR structure is valid in a general context, enabling significant increases in expression despite the implementation of a diverse set of promoters and restriction sites. This aspect of translation-level control seems to be most strongly pronounced when expressing short, codon-optimized gene products. Moreover, this effect exhibits a promoter-specific nature implying that individual components of gene expression cassettes cannot be designed in isolation. It is also important to note that this phenomenon is not a generic effect of 5′-UTR length, as indicated by (i) the significant increases in expression observed upon adding length to the 5′-UTR and (ii) the inability of one-part folding models to predict the behavior of TEF1 and CYC11. Although this effect was first experimentally characterized here for pBLUESCRIPT SK, it is expected that other MCSs will behave similarly in yeast and perhaps other eukaryotes. In particular, 5′-UTR based folding models predict that significant secondary structure issues can arise in other common MCSs such as the one present in pUC. As a result, it is important to understand and appreciate this impact especially when attempting to compare experiments or genes cloned into distinct sites. Optimization of 5′-UTR secondary structure, therefore, represents a facile and cost-effective way to increase protein expression and product titers in eukaryotic bioprocesses, especially when it is undesirable to change promoters. Designed MCSs were found to be superior to the multicloning site found in the commonly used pBLUESCRIPT SK plasmids, and in the case of GPD2 showed negligible activity reduction along the MCS. This experiment shows not only that MCSs have a significant effect on translation, but also that MCSs can be rationally engineered to mitigate this effect. Such a model-based optimization approach is unprecedented for this ubiquitous genetic component and highlights the importance of rational design in synthetic biology. It is expected that a similar approach can be undertaken for other eukaryotic expression vectors. Control of 5′-UTR secondary structure also represents an alternative to promoter engineering, allowing protein expression to be controllably weakened by up to an order of magnitude without altering the dynamics of its regulation. As demonstrated, optimization of 5′-UTR secondary structure is context specific, making the performance of each multicloning site highly dependent on the upstream promoter. It is not unreasonable to expect that the nucleotides of the open reading frame could also participate in translation-inhibiting secondary structure. Therefore, in cases where inhibition due to secondary structure is significant (i.e. in highly codon optimized genes), the assumption of interchangeability of promoter, MCS and gene becomes highly questionable. These results go against several of the tenets of synthetic biology, especially with respect to the assumption of completely interchangeable, non-interacting parts and are part of a growing body of work indicating the non-modularity of genetic components (39). Yet, as the cost of gene synthesis decreases, these results demonstrate that it is more desirable to create entire self-sustained transcriptional/translational units—from promoter to terminator. This paradigm is in contrast to the widespread assumption that two arbitrary sequences, when attached, will not generate translation-inhibiting structure. These results have significant implications beyond redesign of gene expression cassettes. Expression vectors with multiple cloning sites have seen widespread use across the field of functional genetics and basic cloning. Given the strong difference in performance across sites in the MCS, experiments and conclusions will be highly dependent on these sites. Therefore, conclusions about gene impact, function or activity as well as promoter strength analysis will depend highly on the cloning sites used. As a result, many conflicting results and conclusions may be attributed to this phenomenon. In conclusion, we have demonstrated the first performance-based analysis of multiple cloning sites in yeast systems. Following this, we have shown that a simplistic model of 5′-UTR secondary structure with two regions can predict this phenomenon when it is the most dominant determinant of protein translation. Under these conditions, we have for the first time successfully redesigned multiple cloning sites for function rather than simple convenience. It is anticipated that this work can be extended to other vectors and potentially to other organisms, both eukaryotic and prokaryotic. The capacity to design MCSs with consistent performance across cloning sites will greatly impact the ease and utility of recombinant cloning and genetic analysis.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health (grant number R01GM090221); the National Science Foundation (Graduate Research Fellowship to N.C.); University of Texas at Austin Institute for Cellular and Molecular Biology (Summer Undergraduate Research Fellowship to E.F.). The content is solely the responsibility of the authors and does not necessarily reflect the official views of the National Institute of General Medical Sciences or the National Institutes of Health. Funding for open access charge: National Institutes of Health (grant R01GM090221). Conflict of interest statement. None declared.

37 in total

1. Mfold web server for nucleic acid folding and hybridization prediction.

Authors: Michael Zuker
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

2. Vienna RNA secondary structure server.

Authors: Ivo L Hofacker
Journal: Nucleic Acids Res Date: 2003-07-01 Impact factor: 16.971

Review 3. The molecular mechanics of eukaryotic translation.

Authors: Lee D Kapp; Jon R Lorsch
Journal: Annu Rev Biochem Date: 2004 Impact factor: 23.643

4. Role of mRNA secondary structure in translational repression of the maize transcriptional activator Lc(1,2).

Authors: L Wang; S R Wessler
Journal: Plant Physiol Date: 2001-03 Impact factor: 8.340

5. Engineering of promoter replacement cassettes for fine-tuning of gene expression in Saccharomyces cerevisiae.

Authors: Elke Nevoigt; Jessica Kohnke; Curt R Fischer; Hal Alper; Ulf Stahl; Gregory Stephanopoulos
Journal: Appl Environ Microbiol Date: 2006-08 Impact factor: 4.792

6. Engineering yeast transcription machinery for improved ethanol tolerance and production.

Authors: Hal Alper; Joel Moxley; Elke Nevoigt; Gerald R Fink; Gregory Stephanopoulos
Journal: Science Date: 2006-12-08 Impact factor: 47.728

7. Translational regulation of the JunD messenger RNA.

Authors: John D Short; Curt M Pfarr
Journal: J Biol Chem Date: 2002-06-24 Impact factor: 5.157

8. Optimized cassettes for fluorescent protein tagging in Saccharomyces cerevisiae.

Authors: Mark A Sheff; Kurt S Thorn
Journal: Yeast Date: 2004-06 Impact factor: 3.239

9. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure.

Authors: David H Mathews; Matthew D Disney; Jessica L Childs; Susan J Schroeder; Michael Zuker; Douglas H Turner
Journal: Proc Natl Acad Sci U S A Date: 2004-05-03 Impact factor: 11.205

10. Folding free energies of 5'-UTRs impact post-transcriptional regulation on a genomic scale in yeast.

Authors: Markus Ringnér; Morten Krogh
Journal: PLoS Comput Biol Date: 2005-12-09 Impact factor: 4.475

10 in total

Review 1. Bricks and blueprints: methods and standards for DNA assembly.

Authors: Arturo Casini; Marko Storch; Geoffrey S Baldwin; Tom Ellis
Journal: Nat Rev Mol Cell Biol Date: 2015-06-17 Impact factor: 94.444

2. Functional optimization of gene clusters by combinatorial design and assembly.

Authors: Michael J Smanski; Swapnil Bhatia; Dehua Zhao; YongJin Park; Lauren B A Woodruff; Georgia Giannoukos; Dawn Ciulla; Michele Busby; Johnathan Calderon; Robert Nicol; D Benjamin Gordon; Douglas Densmore; Christopher A Voigt
Journal: Nat Biotechnol Date: 2014-11-24 Impact factor: 54.908

3. The activity of yeast Apn2 AP endonuclease at uracil-derived AP sites is dependent on the major carbon source.

Authors: Kasey Stokdyk; Alexandra Berroyer; Zacharia A Grami; Nayun Kim
Journal: Curr Genet Date: 2021-01-01 Impact factor: 3.886

4. Use of expression-enhancing terminators in Saccharomyces cerevisiae to increase mRNA half-life and improve gene expression control for metabolic engineering applications.

Authors: Kathleen A Curran; Ashty S Karim; Akash Gupta; Hal S Alper
Journal: Metab Eng Date: 2013-07-12 Impact factor: 9.783

5. Characterization of plasmid burden and copy number in Saccharomyces cerevisiae for optimization of metabolic engineering applications.

Authors: Ashty S Karim; Kathleen A Curran; Hal S Alper
Journal: FEMS Yeast Res Date: 2012-11-20 Impact factor: 2.796

6. Restriction site free cloning (RSFC) plasmid family for seamless, sequence independent cloning in Pichia pastoris.

Authors: Thomas Vogl; Mudassar Ahmad; Florian W Krainer; Helmut Schwab; Anton Glieder
Journal: Microb Cell Fact Date: 2015-07-14 Impact factor: 5.328

7. One-pot DNA construction for synthetic biology: the Modular Overlap-Directed Assembly with Linkers (MODAL) strategy.

Authors: Arturo Casini; James T MacDonald; Joachim De Jonghe; Georgia Christodoulou; Paul S Freemont; Geoff S Baldwin; Tom Ellis
Journal: Nucleic Acids Res Date: 2013-10-22 Impact factor: 16.971

8. Design of synthetic yeast promoters via tuning of nucleosome architecture.

Authors: Kathleen A Curran; Nathan C Crook; Ashty S Karim; Akash Gupta; Allison M Wagman; Hal S Alper
Journal: Nat Commun Date: 2014-05-27 Impact factor: 14.919

9. A seamless and iterative DNA assembly method named PS-Brick and its assisted metabolic engineering for threonine and 1-propanol production.

Authors: Shuwen Liu; Haihan Xiao; Fangfang Zhang; Zheng Lu; Yun Zhang; Aihua Deng; Zhongcai Li; Cui Yang; Tingyi Wen
Journal: Biotechnol Biofuels Date: 2019-07-15 Impact factor: 6.040

10. Small, synthetic, GC-rich mRNA stem-loop modules 5' proximal to the AUG start-codon predictably tune gene expression in yeast.

Authors: Erwin Lamping; Masakazu Niimi; Richard D Cannon
Journal: Microb Cell Fact Date: 2013-07-29 Impact factor: 5.328

10 in total