Literature DB >> 28463494

Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol.

Julia Rohlhill¹, Nicholas R Sandoval², Eleftherios T Papoutsakis¹.

Abstract

Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence-function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence-function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the -35 RNA polymerase (RNAP) binding site and one overlapping with the -10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli.

Entities: Chemical Disease Gene Species

Keywords: biosensor; promoter engineering; sort-seq; transcription factor binding; transcriptional fine-tuning

Mesh：

Substances：

Year: 2017 PMID： 28463494 PMCID： PMC5569641 DOI： 10.1021/acssynbio.7b00114

Source DB: PubMed Journal: ACS Synth Biol ISSN： 2161-5063 Impact factor: 5.110

Precise control of gene expression is a necessity for designing predictable gene circuits in synthetic biology and increasing the yields of products encoded by biosynthetic pathways. Controlling the rate of transcription typically involves controlling the interactions between the RNA polymerase (RNAP), the promoter DNA, and any associated transcriptional regulators. Constitutive expression systems, while often used for heterologous protein production, fail to optimize expression levels of metabolic intermediates and thus often require the cell to carry high metabolic burdens.[1,2] Synthetic gene regulatory schemes frequently employ transcription factors such as activators and repressors to introduce various positive- and negative-feedback control mechanisms. Complex synthetic regulatory networks require orthogonal transcription factors with preferably modular DNA-binding and -sensing domains. When controlling the expression of measurable reporters, such as fluorescent proteins or antibiotic resistance markers, these transcription-factor-based biosensors have shown promise by increasing the efficiency of high-throughput screens and selections and by allowing real-time monitoring of intracellular metabolite concentrations in dynamic pathway regulation.[3,4] Dynamic regulation, which is widespread in natural systems, eliminates the need for expensive inducers and offers the potential for optimized schemes that minimize unnecessary metabolic burdens through autonomous pathway balancing.[4] Efforts to expand the synthetic biology toolbox have focused on characterizing a range of biosensors[5] and engineering existing regulators[6] to respond to new effectors.[7,8] Biosensor development could greatly benefit from additional small-molecule sensors and the elucidation of their corresponding operators. Protein–DNA binding interactions can be investigated and ultimately manipulated by quantifying the sequence–function relationship of promoter DNA. A method for elucidating sequence–function relationships employs fluorescence-activated cell sorting (FACS) and high-throughput sequencing and is generally called “sort-seq” or “FACS-seq”.[9] These sort-seq schemes begin with the generation of a library of mutated sequences for the regulatory element or protein of interest that is large and diverse enough to contain variants displaying a wide range of activities.[9] This library is linked to a genetic reporter and sorted into bins on the basis of fluorescence. Here, green fluorescent protein (GFP) expression levels represent the activity of the promoter library cloned upstream of the gfp gene (Figure ). Subsequent sequencing of gated populations enables the use of various analysis methods to quantify the activities of hundreds of thousands of variants. One such technique, originating from information theory, allows the quantification of the relationship between two variables, here the base at each nucleotide position (sequence) and output expression level (function) as determined by discrete sorted bins.[10] This quantification is achieved by calculating the mutual information, that is, the dependence of the two random variables on each other:[11,12]where b is the base at position i, μ is the activity bin, f(x, y) and f(x) represent the joint and marginal frequency distributions, respectively, and c is a correction factor.[12,13] If the bases at position i are independent of the resulting expression bin μ, that position is inconsequential to gene expression. Similarly, mutations with skewed distributions, occurring more frequently in low- or high-expression bins, identify vital nucleotide positions that play a deterministic role in the expression level and the resulting expression bin. While sort-seq approaches have been used to investigate regulatory sequences and proteins,[14] they have rarely been used in combination with mutual information techniques. Two papers of interest used the approach to analyze mammalian enhancers[11] (termed a massively parallel reporter assay (MPRA)) and CRP activator binding[12] to the prokaryotic lac promoter.

Figure 1

Sort-seq experimental method. The promoter library was generated using error-prone polymerase chain reaction (PCR) and transformed into NEB5α and ΔfrmR strains. The resulting populations spanned a large range of GFP expression levels and were sorted into seven or eight bins using FACS. The sorted populations were tagged and the promoters sequenced, allowing for the identification of mutations leading to higher or lower expression levels. These mutations could then be used to generate inducible promoters with predictable and tunable responses. Formaldehyde is a toxic compound but also a common cellular metabolite produced endogenously in all cells at low concentrations from various demethylation reactions.[15]Escherichia coli has a native formaldehyde-inducible promoter, P, that is found upstream of the frmRAB formaldehyde-detoxification operon. FrmR, the first product of the operon, is a member of the DUF156 family of DNA-binding transcriptional regulators.[16] It binds the frmRAB promoter region and is negatively allosterically modulated by formaldehyde.[16,17] FrmR is specific to formaldehyde, responding to acetaldehyde, methylglyoxal, and glyoxal to far lesser degrees and not at all to a range of other aldehydes and alcohols tested.[16,17] The genes frmA and frmB encode a formaldehyde dehydrogenase and S-formylglutathione hydrolase, respectively, and are responsible for detoxifying formaldehyde to formic acid in a glutathione-dependent pathway.[18] The negative-feedback regulation of the frmRAB operon is similar to that of many other prokaryotic operons, whereby the transcription factor represses its own transcription.[19] Characterizing P and the P–FrmR relationship adds another operator–regulator to the synthetic biology toolkit and offers further insight into protein–DNA molecular binding mechanisms. In addition to its ubiquitous role in all cells, formaldehyde is a central metabolic intermediate for methylotrophs. Synthetic methylotrophy, or the utilization of C1 compounds such as methanol as a carbon and energy source by non-native methylotrophs, has been pursued in earnest recently as methanol availability increases and its price declines.[20,21] Previous studies have shown labeling in E. coli glycolytic intermediates from 13C-methanol by heterologous expression of three enzymes from Bacillus methanolicus.[22] Methanol dehydrogenase (Mdh) converts methanol to formaldehyde, and hexulose-6-phosphate synthase (Hps) condenses formaldehyde with d-ribulose 5-phosphate to form hexulose 6-phosphate (Hu6P), after which phospho-3-hexuloisomerase (Phi) isomerizes Hu6P to fructose-6-phosphate (F6P) for entrance into central metabolism. Our group recently utilized a superior Mdh from Bacillus stearothermophilus(23) along with the B. methanolicus codon-optimized Hps and Phi enzymes to achieve E. coli growth on methanol with a small (1 g/L) yeast extract supplementation, demonstrating extensive 13C labeling from 13C-methanol into glycolytic and tricarboxylic acid intermediates and amino acids, as well as methanol conversion to the specialty chemical naringenin.[24] We have also demonstrated a strategy of scaffoldless enzyme assembly that can be used to achieve superior outcomes in synthetic methylotrophy.[25] Placing formaldehyde assimilation genes under the control of formaldehyde regulation emulates the native regulation of the methylotroph B. methanolicus, whereby hps and phi are transcriptionally induced by formaldehyde,[26] and results in autonomous pathway balancing. This dynamically regulated substrate utilization scheme is particularly beneficial considering the toxicity of formaldehyde, the metabolic burden of constitutively expressing the heterologous methanol assimilation genes at high levels, and the additional burden expected from future strain engineering for the production of valuable chemicals or secondary metabolites. Here we first characterize the native E. coli P response and regulation, identifying parameters for improvement. We investigate the influence of the P architecture on the strength of repressor binding by testing a set of P variants and isolate approximate FrmR-binding regions. We then describe the deconstruction and analysis of the P promoter using a sort-seq approach, obtaining, with single-nucleotide resolution, a map of the importance of each nucleotide position for expression, both with and without formaldehyde induction. The analysis of the resulting rich data set allowed us to identify mutations capable of changing promoter activity in a directed manner by manipulating repressor and RNAP binding interactions, and this information was used to design a set of formaldehyde-responsive promoters with tunable basal and induced expression levels. An engineered promoter was further used to implement and improve formaldehyde-controlled E. coli consumption of the non-native substrate methanol.

Results and Discussion

The E. coli Formaldehyde-Inducible Promoter Is an Ideal Candidate for Engineering

We aimed to construct a formaldehyde reporter plasmid and analyze the response of the native P. To construct the reporter plasmid, termed P–GFP–P–FrmR, P was cloned upstream of gfp, and the FrmR repressor was cloned under the control of the lac promoter to limit titration issues. P promoter activity was assayed by monitoring GFP expression via flow cytometry. The expression of the reporter plasmid was assayed in the NEB5α and ΔfrmR strains, representing two different regulatory systems (Figure a,b). In the NEB5α strain, FrmR levels were autoregulated by P on the E. coli chromosome in addition to being expressed from the reporter plasmid. In the ΔfrmR strain, FrmR was expressed only from the reporter plasmid.

Figure 2

Regulatory mechanisms and response of the E. coli formaldehyde-inducible promoter. Two configurations were used to investigate regulation. A P–GFP–P–FrmR reporter plasmid was used (a) with or (b) without chromosomal expression of FrmR under control of P. Representative time–response curves for the two configurations after induction with 0–500 μM formaldehyde are shown below their respective regulatory schemes. (c) Response curves fit to the Hill equation for the NEB5α strain with plasmid-expressed FrmR (blue circles) and the ΔfrmR strain with plasmid-expressed FrmR (green triangles). Numbers denote Hill coefficients. Error bars represent standard deviations of two replicates tested on different days. Time–response curves for formaldehyde concentrations from 1 to 500 μM show maximum activity from 100 to 250 min and up to an 8-fold dynamic range, calculated as the ratio of induced activity to uninduced activity (Figure a,b). FrmR expression was expectedly higher in the NEB5α strain, as evidenced by the 3-fold lower GFP expression at time zero compared with the ΔfrmR strain. The response curves were modeled with the Hill equation,[27,28] which is used to characterize the induction response and cooperativity of the system. This is of particular interest here because of the tetrameric structure of FrmR.[29,17] The Hill equation is given by eq :where Pmin and Pmax represent the basal promoter activity and the maximum promoter activity following induction, respectively; [I] is the formaldehyde inducer concentration; n is the Hill coefficient; and K is the apparent dissociation constant, which is related to the inducer concentration at which the promoter activity is half-maximal. The Hill coefficient indicates the cooperativity of the system (i.e., the positive or negative effect a single ligand-binding event has on subsequent events), with increasing values >1 corresponding to more sigmoidal response curves and higher cooperativity. The apparent Hill coefficient was 1.18 ± 0.13 when FrmR was present on the chromosome, indicating a noncooperative promoter–repressor–RNAP interaction when FrmR is both expressed from a plasmid and autoregulated by formaldehyde (Table S4). Without chromosomal FrmR expression, the apparent Hill coefficient was 0.46 ± 0.02 (Table S4). The response curve for the NEB5α strain without plasmid-expressed FrmR can be seen in Figure S2. Disruption of negative autoregulation typically increases the Hill coefficient, leading to a tighter sigmoidal response curve.[30,31] The opposite trend is seen here, with derepression continually increasing with higher formaldehyde concentrations, possibly as a result of the toxicity of formaldehyde and the interruption of autoregulation due to the high levels of plasmid-expressed FrmR. Compared with similarly characterized operator–regulator biosensors with dynamic ranges up to 210-fold,[5] the 5.5-fold range of the P–FrmR system at 100 μM formaldehyde induction has ample room for improvement. It is also highly sensitive, responding to dosed formaldehyde concentrations as low as 1 μM. This initial characterization suggests that the E. coli P is a strong candidate for engineering. Further promoter characterization through deconstruction and analysis of the P architecture can identify operator regions with high engineering potential.

Inverted Repeats Are Central to P Architecture and Response

The most distinct feature of the E. coli P promoter is a 19 base pair (bp) perfect inverted repeat, originally hypothesized as a FrmR binding site (Figure ).[32] Operator sequences often contain inverted repeats, which can form hairpin loops, an important structural feature for protein–DNA interactions.[33] A similarly regulated FrmR homologue was identified in Salmonella enterica under the control of a promoter lacking the large inverted repeat of the E. coli P.[29] A smaller incomplete 5′-ATAGTATA/TATAGTAT-3′ inverted repeat was noted within the Salmonella promoter, disruption of which was shown to ablate FrmR binding.[16] Here, P–GFP constructs were generated with different architectures to investigate each side of the large 19 bp inverted repeat, here termed site A and site B. Sites were replaced with scrambled and reverse-complemented sequences to test the importance of their presence and orientation for FrmR binding.

Figure 3

Response of P binding site variants 3 h after dosing with 0 or 100 μM formaldehyde. The 19 bp inverted repeat is shown with two green arrows facing one another to represent their complementary relationship. These two FrmR binding sites were analyzed by removing or reverse-complementing the sites in tested constructs. The promoter was deleted to yield the negative control, and the positive construct is the native P sequence. Error bars represent standard deviations of two replicates tested on different days. Replacing site A and site B individually with scrambled sequences resulted in constructs 1 and 3, respectively (Figure ). Both constructs showed response to formaldehyde, supporting the hypothesis that each site is independently capable of binding FrmR. The presence of only one site leads to higher expression compared with the native two-site P, an effect that is amplified with 100 μM formaldehyde induction. The formaldehyde concentration of 100 μM was chosen to provide strong induction without inhibiting growth. Site B leads to greater repression compared with site A, as evidenced by the higher expression levels for construct 3 compared with construct 1. The enhanced effect of site B is likely due to its position, which is closer to the transcription start site and overlaps with the −10 site. Because of the overlap between FrmR binding site B and the −10 site, it was difficult to resolve expression differences due to FrmR binding from those due to RNAP binding. In order to investigate this, we shifted site B upstream, decreasing the space between sites A and B from 15 to 10 nucleotides and separating site B from the −10 region in constructs 6–8. The −10 site was optimized to the canonical “TATAAT” sequence to investigate the effect of FrmR binding with increased basal expression due to RNAP binding. FrmR was still capable of binding to the shifted site B, as shown by the formaldehyde responsiveness of construct 7, which lacks site A. Construct 6 included both fully intact sites and maintained low basal expression while more than doubling the induced expression with a 7.3-fold dynamic range, suggesting the ability to increase the dynamic range of the promoter by separating the manipulation of RNAP binding from the transcriptional repressor binding. Ablation of FrmR binding was achieved in constructs 4 and 5, as shown by the lack of formaldehyde response. The expression levels of constructs 4 and 5 should therefore represent only the effect of RNAP binding on transcription. Construct 4 exhibits 1.6-fold higher expression levels than construct 5, possibly because of an effect of the scrambled or reversed site A sequence on RNAP binding. Construct 4 utilizes scrambled sequences for both site A and site B, confirming their necessity for FrmR binding in E. coli. Construct 5 has a scrambled site B and a partial reverse complement for site A that was unable to recover binding. The partial reverse complement of site A cannot bind FrmR independently on the basis of construct 5, but it appears to cause stronger repression when site B is also present. Construct 2, with a reversed site A, shows lower basal and induced expression compared with construct 1, which had a scrambled site A. The analogous construct 8 with reversed site A also showed stronger basal and induced repression compared with construct 7 with scrambled site A. This indicates a relationship between the binding sites since the partial reverse complement of site A contributes to repression only when site B is present and not independently.

High-Diversity Library Generation Ensures Rich Information Output

We aimed to generate a high-diversity P library containing variants covering a wide range of basal and induced activities, as we cannot analyze the effect of mutations that are not represented. Promoter library construction requires careful consideration for sort-seq experiments to ensure enough diversity to create the variants of interest for downstream analysis. The 200 bp P promoter on the P–GFP–P–FrmR reporter plasmid was targeted for mutation with error-prone polymerase chain reaction (PCR) using primers flanking the region (Table S2). The variability present in the P libraries was assayed using flow cytometry and compared with that in the unmutated parent P–GFP−P−FrmR strain. Increasing mutation frequency in the promoter region leads to a wider fluorescence distribution and therefore a wider range of promoter activity. However, a critical threshold mutation frequency causes an increasing percentage of the population to lose function, usually because of an inability to initiate transcription resulting from an inability to bind RNAP. The goal was to use a highly diverse library containing millions of unique sequences while retaining function. In this case, the majority of the population should also retain formaldehyde responsiveness, since the difference in activities among the library clones is essential for identifying repressor binding sites with high precision and accuracy. The final P library was chosen after three successive rounds of error-prone PCR achieved a mutation frequency that maximized the spread of the expressing population while minimizing the relative size of the nonexpressing population. The final library had an average of 6.6 mutations per 200 bp of the promoter, with an average of 2.4 of those mutations located within the first 80 bases. It covered a wider range of GFP expression, as evidenced both visually (Figure S3) and by a 2.2-fold increase in the robust percent coefficient of variation (%rCV), a metric for the spread of the fluorescence distribution that is resistant to outlier effects (Table S5).[34] Importantly, the fluorescence distribution of the P library showed a geometric mean similar to that of the unmutated P, and therefore, its wider distribution was due not only to clones with lower expression but also to clones with higher GFP expression. Individual colonies were selected and assayed with flow cytometry to confirm the existence of promoter variants resulting in unique fluorescence distributions under both induced and uninduced conditions. Limiting constraints affecting the size and diversity of the promoter library similarly limit the information output from sort-seq analysis. Mutational bias was minimized by using a blend of polymerases (see Methods) instead of the Taq polymerase, which has a well-documented preference for mutating A’s and T’s.[35] Analysis of the final P library bias from sequencing results indicated a slight preference for mutating C’s and G’s, with C → T/G → A being the most common mutation at a mutation rate of 1.7% and A → C/T → G being the rarest at a rate of 0.15% (Figure S4 and Table S6). The per-position mutation frequency was 2.8% on average and ranged from 1% to 8% (Figure S5). Mutations at each nucleotide position along the length of the 200 bp P are therefore represented in the final library, and while mutation bias skews the depth of the library by variable representation of mutations, this skew is taken into account in the calculation of mutual information.

High-Resolution Binding Sites and Identification of Mutations of Interest from Sort-Seq Data

We aim to identify nucleotide position targets for engineering and the FrmR binding region with high precision and accuracy through analysis of the sequencing data for each expression gate. Sequencing data were processed to calculate information footprints for each of three experiments with different levels of expressed and bound FrmR (Figures and S6). The first two experiments involved the NEB5α strain under induced (100 μM formaldehyde) and uninduced conditions with FrmR expressed from both the plasmid and the chromosome (Figure ), and the third experiment involved FrmR expression from the plasmid only in the ΔfrmR strain, presumably resulting in lower FrmR expression levels (Figure S6). Information footprints visualize the contribution of each nucleotide in the promoter sequence to GFP expression by analyzing the relationship between the mutations at a specific nucleotide position and the bins into which the mutated promoters were sorted.[12] Deleterious mutations reducing transcription would be consistently sorted into low-expression bins, identifying the corresponding nucleotide position as one with high “information content”, or high potential to affect gene expression. Similarly, high-information-content positions are identified from mutations at positions causing higher GFP expression and consistently falling into high-expression bins. High-information-content nucleotide positions within the promoter region are therefore ideal engineering targets for influencing output gene expression.

Figure 4

Information footprints of the E. coli P with different levels of FrmR binding in the NEB5α strain. The distributions of GFP expression from 10 000 cells are shown in the (a) uninduced and (c) induced P libraries. Information footprints for the (b) uninduced and (d) induced with 100 μM formaldehyde experiments illustrate significant nucleotide positions with and without FrmR bound. Yellow highlighting shows a large inverted repeat, while the two left and two right orange sites indicate smaller inverted repeats relevant to FrmR binding. Error bars indicate uncertainties inferred from subsampling. Heat maps displaying the distribution of mutations across different sequencing bins reveal the effects of mutating any individual nucleotide with exceptional precision (Figure ). While information footprints communicate the correlation between nucleotide position and gene expression, heat maps include more detailed information about the specific bases at each nucleotide position. Information footprints do not differentiate between positive or deleterious mutations, but heat maps visually display sequencing information by showing the distribution of A, T, C, and G at each nucleotide position for each expression gate. High-expression (up) mutations of interest were identified by analyzing mutations with a strong pattern of enrichment in high-expression sorting bins compared with the unsorted P library. Similarly, low-expression (down) mutations of interest had extremely low occurrence in high-expression bins and were highly enriched in low-expression bins. These trends confirm that the mutations of interest influenced the level of GFP expression and resulting sorting bin.

Figure 5

Identification of up and down mutations through sequence analysis. (a) Heat maps of the P sequence in each of eight sorting bins in the NEB5α strain under uninduced conditions. Mutated promoters isolated from low-GFP-expression bins are represented in the top heat maps, while those from high-GFP-expression bins are represented in the lower heat maps. Enrichment was calculated as the log2-fold change of each mutation relative to the unsorted promoter library, where red mutations are highly enriched and blue mutations are rare in each given bin. The native sequence is shown in black below the heat maps, and the identified up/down mutations are shown at their specified locations in green and red, respectively. (b) Enrichment profiles for three identified down mutations GCA → AGT from directly upstream of the −35 site are shown for each sorting bin. Down mutations are highly enriched (red) in lower-expression bins and rare (blue) in higher-expression bins. (c) Enrichment profiles for two identified up mutations AA → GG located far upstream. Up mutations are highly enriched (red) in high-expression bins and rare (blue) in low-expression bins. Comparing the information footprints for induced and uninduced experiments yields a single-nucleotide-resolution binding site for the transcriptional repressor FrmR. Within the larger 19 bp inverted repeat, four-nucleotide inverted repeats can be identified (Figure ). These 5′-ATAC/GTAT-3′ inverted repeats upstream of the −35 site and overlapping with the −10 site have lower information content in the induced state when less FrmR is bound to the region (Figure ). Within site A, three positions exist with much higher information content in the uninduced state compared with the induced state. Two are within the four-nucleotide inverted repeats 5′-ATAC/GTAT-3′, and one is directly centered in the nine-nucleotide spacer between site A and site B. Site B shows a similar pattern complicated by the −10 site. The two 5′-ATAC/GTAT-3′ nucleotide positions show higher information content in the uninduced state versus the induced state. The 3′ side of the inverted repeat is entirely within the −10 site. The mutational distribution across expression bins also identifies secondary versions of the FrmR binding site, noting a three-nucleotide 5′-ATA/TAT-3′ inverted repeat and an imperfect 10-nucleotide 5′-ATATAGAATA/TATAGTATAT-3′ inverted repeat directly flanking the six-nucleotide G and C tracts in site A and site B, respectively (Figure S7). Sequences with mutations extending each of these four inverted-repeat structures were sorted into low-expression bins, indicating that the DNA hairpin loops adopted multiple possible conformations (Figure S7). The RNAP binding site, consisting of two six-nucleotide regions centered at the −35 and −10 positions, is easily identifiable in the information footprints. The P promoter uses the canonical “TTGACA” −35 site and the noncanonical “TAGTAT” −10 sit, with the optimal 17-nucleotide spacer. An alternate “TATAGT” −10 site two nucleotides upstream was previously hypothesized[32] but is not supported by the low information content at those two positions. In agreement with the information footprints, the heat maps (Figure ) show that mutations in the −35 and −10 regions occur frequently in the lowest-GFP-expression sorting bin. This effect is particularly true for the −35 region, which is the consensus sequence, but less so for the −10 region, where three “up” mutations can be identified. Using these detailed information footprints and binding site information enabled the specific engineering of P promoters with predictable activities.

Informed Design of Tunable Formaldehyde-Inducible Promoters

Mutations identified during sequencing analysis were used in combination to generate variants capable of a wider range of basal and induced promoter activities. Twelve P variants were cloned using inverse PCR (see Methods and Table S3). Repression mutations were used for constructs 14 and 15, up mutations were used for construct 20, and combinations were used for other constructs. Site B down mutations, A → T at position −25 and C → T at position −20, extend the four-nucleotide inverted repeat to six nucleotides in constructs 14–17 and cause extremely low expression (Figure ). The induced expression from only one of the four constructs is higher than the uninduced expression from the native P. The same two down mutations are present in constructs 22–25, but their effects are negated by three up mutations in the −10 region, which essentially scramble the inverted repeat within site B and cause much higher basal and induced expression levels.

Figure 6

Response of specifically constructed P variants 3 h after dosing with 0 or 100 μM formaldehyde. Variants were constructed with mutations for higher (green) or lower (red) expression levels. The promoter was deleted to yield the negative control, and the positive construct was the native P sequence. Error bars represent standard deviations of two replicates tested on different days. Mutations that were highly enriched in high-expression bins were used in combination to create high-expression formaldehyde-responsive promoters. Construct 20 (Figure ) features six up mutations, including three within the −10 region, and had 27-fold higher uninduced GFP expression compared with the native promoter. Construct 20 also retains formaldehyde responsiveness, with 2-fold higher GFP expression in response to 100 μM formaldehyde than the native P. Construct 24 similarly displays high expression levels with only two essentially nonfunctional down mutations. Increasing the site A inverted repeat from four to five nucleotides has a significant effect on repression, as seen from the 6-fold lower basal and 3-fold lower induced expression in construct 25 compared with construct 24. The designed constructs exhibited the expression levels expected on the basis of the lengths of site A and site B inverted repeats for down mutations or the disruption of sites for up mutations. Quantitative sequence activity models seek to predict the behavior of variants assuming that mutations make additive contributions to activity. These models fail to account for secondary structures in the DNA and sequence features that are particularly important for transcription factor binding. Individual down mutations may cause lower GFP expression, but in combination they silence each other. For example, a G → A mutation at position −39 increases the length of the site A inverted repeat from four to five nucleotides, as in constructs 14, 17, 18, 19, 20, and 25 (Figure ), while a T → C mutation at position −57 would similarly lengthen the site inverted repeat. However, when the two mutations occur together, the shorter four-nucleotide site A is maintained as in constructs 15, 19, and 23.

Application of the Engineered Formaldehyde-Responsive Promoter Enables Higher Methanol Growth

The E. coli P is uniquely qualified to achieve dynamically regulated E. coli growth on methanol. Methanol is converted to the toxic intermediate formaldehyde by methanol dehydrogenase (Mdh) in the first step of assimilation, and therefore, proper pathway balancing is vital to prevent the accumulation of formaldehyde in the cell and associated growth inhibition. Here, with knowledge gained from our previous studies,[24] we pursue a strategy for autonomously sustainable synthetic E. coli methylotrophy using formaldehyde-inducible promoters. P and the high-expression P construct 20 (P20) were placed upstream of the methanol assimilation Mdh–Hps–Phi operon in a ΔfrmA and Δpgi strain. The frmA gene, encoding formaldehyde dehydrogenase, was deleted to minimize the loss of formaldehyde to carbon dioxide. Formaldehyde dissimilation in the ΔfrmA strain still occurs and has been attributed to promiscuous aldehyde dehydrogenase activity, but it occurs at a much lower rate.[24] Phosphoglucose isomerase (pgi) was similarly deleted to force the F6P from methanol assimilation down glycolysis, minimizing the loss of carbon to carbon dioxide during the conversion of F6P to glucose 6-phosphate and eventually ribulose 5-phosphate through the oxidative pentose phosphate pathway. The P strain successfully consumed methanol and grew to a higher cell density when the medium was supplemented with 60 or 240 mM methanol (Figure a,b,d,e), demonstrating for the first time formaldehyde-induced synthetic methylotrophy. The yields on methanol, calculated as reported elsewhere[24] by assuming that methanol consumption accounted for additional biomass in cultures supplemented with methanol, were similar for the P and P20 strains (Figure c,f), but the P20 strain achieved significantly higher biomass titers than the P strain with 60 or 240 mM methanol. We hypothesize that the higher formaldehyde-induced expression of key methanol assimilation genes in the P20 strain enable the more efficient management and assimilation of toxic intracellular formaldehyde.

Figure 7

Growth and yield on methanol for P–Mdh–Hps–Phi and P20–Mdh–Hps–Phi plasmids in the ΔfrmA ΔpgiE. coli strain. Strains were grown with M9 minimal medium supplemented with 1 g/L yeast extract with or without (a–c) 60 mM or (d–f) 240 mM methanol. (a, d) Growth curves normalized to a starting optical density (OD) of 0.2. Numbers denote millimolar methanol consumed for dosed P and P20 strains. (b, e) OD at 24 and 48 h for the P and P20 strains with and without methanol dosing. (c, f) Yields on methanol for the P and P20 strains in gram cell dry weight (CDW) per gram of methanol at 24 h. Error bars represent standard deviations (n = 4).

Conclusions

We have demonstrated the systematic and quantitative characterization, dissection, and analysis of the E. coli formaldehyde-inducible promoter with single-nucleotide resolution. We characterized the native P regulation and response and determined the general FrmR operator region using designed promoter variants. The sort-seq approach and analysis succeeded in not only confirming the FrmR binding site but also quantifying the effect of each nucleotide on both expression and formaldehyde inducibility. Utilizing strategically placed up and down mutations, we were able to engineer promoters with a range of basal and induced expression levels in a predictable manner. Application of an engineered formaldehyde-responsive promoter with higher basal and induced expression levels before methanol assimilation genes achieved higher biomass titers than the native E. coli P, demonstrating not only formaldehyde-controlled synthetic methylotrophy but its improvement through a sort-seq-guided engineering approach. The formaldehyde-inducible E. coli promoter is one of dozens of uncharacterized promoters regulated by simple inducible transcriptional regulators. The sort-seq method, analysis, engineering, and application described here can be applied to any transcriptional regulator–operator sequence to be used in synthetic biology or metabolic engineering applications, particularly for the characterization of additional biosensors for gene circuits and dynamic pathway regulation.

Methods

Strains, Plasmids, and Growth Media

E. coli strain NEB5α (New England Biolabs (NEB), Ipswich, MA) was used for plasmid cloning and maintenance. The ΔfrmR (JW0348-1) and ΔfrmA (JW0347-1) knockout strains were ordered from the Keio deletion collection.[36] The double-deletion ΔfrmA Δpgi strain was constructed by the method of Datsenko and Wanner[37] on the existing Keio ΔfrmA strain cured of its kanamycin resistance cassette. The genes on the P–Mdh–Hps–Phi plasmid were placed in an operon configuration under the trc promoter,[38] and the RBS calculator v2.0[39,40] was used to design synthetic ribosome binding sites for each gene. All of the strains and plasmids used can be found in Table S1. PCR primers were synthesized by Integrated DNA Technologies (IDT) (Coralville, IA) and can be found in Table S2. For flow cytometry analysis, cells were grown in MOPS minimal medium[41] supplemented with 0.4% (w/v) xylose, carbenicillin (100 μg/mL), and isopropyl β-d-1-thiogalactopyranoside (IPTG) (0.1 mM). Minimal medium was chosen because of the more defined response curves achieved compared with rich medium. Cultures were inoculated from overnights to an optical density (OD) of 0.05 in 5 mL in 15 mL disposable culture tubes, incubated at 37 °C for 2–3 h, and dosed with 0–500 μM formaldehyde (MP Biomedicals, Santa Ana, CA). Inverse PCR was used to create P variant constructs 1–8 and 14–25 by amplifying the plasmid backbones and incorporating promoter mutations on the amplification primers (Table S3). Variant plasmids were recircularized with the Q5 Site-Directed Mutagenesis Kit (NEB) and directly transformed into high-efficiency chemically competent NEB5α cells (NEB). Variant promoter sequences were confirmed with Sanger sequencing (UD Sequencing and Genotyping Center). For methanol growth assays, cells were grown in M9 minimal medium[42] supplemented with 1 g/L yeast extract, carbenicillin (100 μg/mL), and 0, 60, or 240 mM methanol (Sigma-Aldrich, St. Louis, MO). Overnight cultures were pelleted, resuspended in M9, and used to inoculate 30 mL of fresh medium in 250 mL baffled flasks with rubber stoppers to an OD of approximately 0.2. Growth curves were normalized to a starting OD of 0.2. All of the E. coli cultures were grown at 37 °C with 250 rpm shaking.

Promoter Library Generation

Error-prone PCR targeting the 200 bp P (Figure S1) was performed using the GeneMorph II Random Mutagenesis Kit (Agilent, Santa Clara, CA). The resulting promoter library was purified with a PCR purification kit (QIAGEN, Germantown, MD) and twice used as a template to obtain higher mutation rates. The P–GFP–P–FrmR plasmid was amplified omitting the native P, treated with DpnI (NEB), and extracted from agarose with a gel extraction kit (QIAGEN). The purified promoter library with unmutated overhang sequences was inserted back into the plasmid backbone using the NEBuilder HiFi DNA Assembly Master Mix (NEB). Two 20 μL reactions were transformed into a total of 20 aliquots containing 50 μL of chemically competent NEB5α cells. Following a 1 h recovery in 250 μL of SOC medium (NEB), all of the transformations were combined into two screw-top 125 mL flasks with 30 mL of LB medium supplemented with 1% (w/v) xylose and 100 μg/mL ampicillin. Cultures were incubated at 37 °C for 8 h, sampled every hour for flow cytometry analysis, and stored frozen at −80 °C in 15% (v/v) glycerol. Plating on solid LB medium supplemented with 100 μg/mL carbenicillin after the initial 1 h recovery indicated a library size of approximately 11 million. For the creation of the ΔfrmR promoter library, 4 mL of NEB5α library frozen stocks were thawed and miniprepped (QIAGEN), and 1 μg was transformed into ΔfrmR electrocompetent cells. Following a 1 h recovery in 3 mL of SOC medium, the cells were transferred to 15 mL of LB medium supplemented with 100 μg/mL ampicillin for 3 h and stored at −80 °C.

Flow Cytometry and Sorting

Cells were analyzed and sorted with a BD FACSAria IIu flow cytometer (Becton Dickinson (BD), Franklin Lakes, NJ). A blue solid-state laser (488 nm excitation) and a 530/30 nm filter was used to measure eGFP. FCS files were analyzed using Flowing Software v2.5.1 (Cell Imaging Core, Turku Centre for Biotechnology, Turku, Finland). For flow cytometry sampling, the geometric mean of the FITC-A fluorescence for 10 000 events was taken as the “promoter activity”. Prior to sorting, the cytometer was calibrated using Accudrop beads (BD) and SPHERO Rainbow calibration particles (Spherotech, Lake Forest, IL). On the day of sorting, 2–4 mL of library frozen stocks were thawed, centrifuged to remove excess glycerol, and used to inoculate 25 mL of MOPS minimal medium supplemented with 0.4% (w/v) xylose, 100 μg/mL ampicillin, and 0.1 mM IPTG. Cells were monitored hourly until they reached a postrecovery state (∼5 h), as indicated by 85–90% of cells expressing GFP. Cells were then dosed with 0 or 100 μM formaldehyde and sorted 2 h later (Figure S8). The final promoter library in the NEB5α and ΔfrmR strains was sorted into eight gates with approximately equivalent populations, and 1 000 000 events were collected from each gate directly into LB medium. Populations were recovered at 37 °C overnight and miniprepped for sequencing. Plating indicated that approximately 70% of the sorted cells survived.

Next-Generation Sequencing and Analysis

Multiplexed sequencing libraries were constructed per the manufacturer’s instructions with a Nextera DNA Library Preparation Kit (Illumina, San Diego, CA). Pooled libraries were sequenced on a MiSeq desktop sequencer (Illumina) using paired-end sequencing with a read length of 2 × 201 bases at the University of Delaware DNA Sequencing and Genotyping Center. Reads within each experiment and sorted population were processed to remove those under 201 nucleotides and redundant sequences. The number of mismatches between each read and the native sequence (the hamming distance) was calculated, and reads with more than approximately 17 mismatches were discarded. Over 3 million reads met all of the quality standards and were used for further analysis. Within each bin in an experiment, we calculated the frequency of each base at each position from the aligned reads and divided it by the total number of reads in the experiment to obtain the joint distribution f(b, μ) in eq . The marginal distributions, f(b) and f(μ), were calculated by summing the joint distribution along the appropriate dimension. The correction factor c(13) in eq was calculated as described previously[12] with the number of possible bases n equal to 4 and the number of bins nμ equal to 7 for the ΔfrmR experiment and 8 for the induced and uninduced NEB5α experiments. Sequence data are available at https://www.ncbi.nlm.nih.gov/bioproject/383844.

38 in total

1. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence.

Authors: Justin B Kinney; Anand Murugan; Curtis G Callan; Edward C Cox
Journal: Proc Natl Acad Sci U S A Date: 2010-05-03 Impact factor: 11.205

2. Spacing of the -10 and -35 regions in the tac promoter. Effect on its in vivo activity.

Authors: J Brosius; M Erfle; J Storella
Journal: J Biol Chem Date: 1985-03-25 Impact factor: 5.157

Review 3. Biosensor-based engineering of biosynthetic pathways.

Authors: Jameson K Rogers; Noah D Taylor; George M Church
Journal: Curr Opin Biotechnol Date: 2016-03-18 Impact factor: 9.740

4. Negative auto-regulation increases the input dynamic-range of the arabinose system of Escherichia coli.

Authors: Daniel Madar; Erez Dekel; Anat Bren; Uri Alon
Journal: BMC Syst Biol Date: 2011-07-12

5. A new alcohol dehydrogenase, reactive towards methanol, from Bacillus stearothermophilus.

Authors: M C Sheehan; C J Bailey; B C Dowds; D J McConnell
Journal: Biochem J Date: 1988-06-15 Impact factor: 3.857

6. Engineering an allosteric transcription factor to respond to new ligands.

Authors: Noah D Taylor; Alexander S Garruss; Rocco Moretti; Sum Chan; Mark A Arbing; Duilio Cascio; Jameson K Rogers; Farren J Isaacs; Sriram Kosuri; David Baker; Stanley Fields; George M Church; Srivatsan Raman
Journal: Nat Methods Date: 2015-12-21 Impact factor: 28.547

7. Engineering modular and orthogonal genetic logic gates for robust digital-like synthetic biology.

Authors: Baojun Wang; Richard I Kitney; Nicolas Joly; Martin Buck
Journal: Nat Commun Date: 2011-10-18 Impact factor: 14.919

8. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay.

Authors: Alexandre Melnikov; Anand Murugan; Xiaolan Zhang; Tiberiu Tesileanu; Li Wang; Peter Rogov; Soheil Feizi; Andreas Gnirke; Curtis G Callan; Justin B Kinney; Manolis Kellis; Eric S Lander; Tarjei S Mikkelsen
Journal: Nat Biotechnol Date: 2012-02-26 Impact factor: 54.908

9. Generating a Metal-responsive Transcriptional Regulator to Test What Confers Metal Sensing in Cells.

Authors: Deenah Osman; Cecilia Piergentili; Junjun Chen; Buddhapriya Chakrabarti; Andrew W Foster; Elena Lurie-Luke; Thomas G Huggins; Nigel J Robinson
Journal: J Biol Chem Date: 2015-06-24 Impact factor: 5.157

10. The mechanism of a formaldehyde-sensing transcriptional regulator.

Authors: Katie J Denby; Jeffrey Iwig; Claudine Bisson; Jodie Westwood; Matthew D Rolfe; Svetlana E Sedelnikova; Khadine Higgins; Michael J Maroney; Patrick J Baker; Peter T Chivers; Jeffrey Green
Journal: Sci Rep Date: 2016-12-09 Impact factor: 4.379

24 in total

1. De novo design of programmable inducible promoters.

Authors: Xiangyang Liu; Sanjan T P Gupta; Devesh Bhimsaria; Jennifer L Reed; José A Rodríguez-Martínez; Aseem Z Ansari; Srivatsan Raman
Journal: Nucleic Acids Res Date: 2019-11-04 Impact factor: 16.971

2. Adaptive laboratory evolution of methylotrophic Escherichia coli enables synthesis of all amino acids from methanol-derived carbon.

Authors: Jie Ren Gerald Har; Alec Agee; R Kyle Bennett; Eleftherios T Papoutsakis; Maciek R Antoniewicz
Journal: Appl Microbiol Biotechnol Date: 2021-01-06 Impact factor: 4.813

3. A Strongly Fluorescing Anaerobic Reporter and Protein-Tagging System for Clostridium Organisms Based on the Fluorescence-Activating and Absorption-Shifting Tag Protein (FAST).

Authors: Hannah E Streett; Katie M Kalis; Eleftherios T Papoutsakis
Journal: Appl Environ Microbiol Date: 2019-07-01 Impact factor: 4.792

Review 4. Figure 1 Theory Meets Figure 2 Experiments in the Study of Gene Expression.

Authors: Rob Phillips; Nathan M Belliveau; Griffin Chure; Hernan G Garcia; Manuel Razo-Mejia; Clarissa Scholes
Journal: Annu Rev Biophys Date: 2019-05-06 Impact factor: 12.981

5. SpeedyGenesXL: an Automated, High-Throughput Platform for the Preparation of Bespoke Ultralarge Variant Libraries for Directed Evolution.

Authors: Joanna C Sadler; Neil Swainston; Mark S Dunstan; Andrew Currin; Douglas B Kell
Journal: Methods Mol Biol Date: 2022

6. Development of Strong Anaerobic Fluorescent Reporters for Clostridium acetobutylicum and Clostridium ljungdahlii Using HaloTag and SNAP-tag Proteins.

Authors: Kamil Charubin; Hannah Streett; Eleftherios Terry Papoutsakis
Journal: Appl Environ Microbiol Date: 2020-10-01 Impact factor: 4.792