Literature DB >> 29318198

Transcription control engineering and applications in synthetic biology.

Michael D Engstrom1,2, Brian F Pfleger2,3.   

Abstract

In synthetic biology, researchers assemble biological components in new ways to produce systems with practical applications. One of these practical applications is control of the flow of genetic information (from nucleic acid to protein), a.k.a. gene regulation. Regulation is critical for optimizing protein (and therefore activity) levels and the subsequent levels of metabolites and other cellular properties. The central dogma of molecular biology posits that information flow commences with transcription, and accordingly, regulatory tools targeting transcription have received the most attention in synthetic biology. In this mini-review, we highlight many past successes and summarize the lessons learned in developing tools for controlling transcription. In particular, we focus on engineering studies where promoters and transcription terminators (cis-factors) were directly engineered and/or isolated from DNA libraries. We also review several well-characterized transcription regulators (trans-factors), giving examples of how cis- and trans-acting factors have been combined to create digital and analogue switches for regulating transcription in response to various signals. Last, we provide examples of how engineered transcription control systems have been used in metabolic engineering and more complicated genetic circuits. While most of our mini-review focuses on the well-characterized bacterium Escherichia coli, we also provide several examples of the use of transcription control engineering in non-model organisms. Similar approaches have been applied outside the bacterial kingdom indicating that the lessons learned from bacterial studies may be generalized for other organisms.

Entities:  

Keywords:  Bacteria; Metabolic engineering; Promoter; Synthetic biology; Termination; Transcription

Year:  2017        PMID: 29318198      PMCID: PMC5655343          DOI: 10.1016/j.synbio.2017.09.003

Source DB:  PubMed          Journal:  Synth Syst Biotechnol        ISSN: 2405-805X


Introduction

A central tenet of synthetic biology is the ability to assemble a practical system from biological components. In most cases, the components are proteins synthesized from genetic information encoded in DNA. One of the challenges faced by synthetic biologists is designing the appropriate genetic sequence to control protein expression at the level that optimizes system function. The central dogma of molecular biology (Fig. 1) describes the flow of genetic information from nucleic acid polymers (DNA and RNA) to amino acid polymers (proteins) [1]. Transcription is the process during which information encoded in DNA is converted to a transient carrier of information, mRNA. Translation is the process in which the mRNA code is converted to an amino acid polymer by ribosomes, themselves RNA catalysts (ribozymes) [2], [3], [4], [5]. The flow of genetic information terminates with the production of a protein, but often post-translational modifications and processes (see below) are needed to produce the final functional component. To optimize the abundance and other properties of the protein and the performance of the ultimate application, the rate and timing of each of these processes can be controlled by both cis-acting (within or adjacent to the protein coding sequence) and trans-acting (separate from the protein coding sequence) components.
Fig. 1

Overview of the central dogma of molecular biology. An operon encoding orfA and orfB is under the control of the indicated promoter (PKnown), which is engaged by the RNA polymerase holoenzyme (green and blue boxes, red circles, and black arcs). Sequences that will be transcribed into the ribosome binding sites (RBSs) for orfA and orfB (purple boxes) and the transcription terminator (TKnown; two inverted repeats [inverted arrowheads] flanked by a poly A-tract [red box] and poly U-tract [grey box]) are indicated. Red octagons represent translation stop codons. At the level of , alterations to the promoter and terminator sequences, the presence of transcription activators and repressors (depicted as golden ovals with green hexagon ligands bound), and nucleoid/supercoiling states may all influence orfA and orfB transcription (not shown). Through the process of transcription itself, the associated orfA and orfB mRNA is produced, which may in turn be bound by ribosomes (depicted as two variable size purple half circles) at the RBS. To terminate transcription, the depicted terminator sequence forms a stem-loop (blue), mediated by base pairing/stacking with the complementary sequences of the former inverted repeats (half arrowheads), and mRNA synthesis ceases at this site (poly A- and ploy U-tracts are indicated as above). Modifying RBS strength, RNA riboswitches, aptazymes (aptamer ribozymes), and RNA stability may each contribute to (not shown). Folding of a translated protein (green and orange circles) with the assistance of folding chaperones (not shown), enzymes that promote post-translation modifications such as phosphorylation (P in a white circle) or glycosylation (white hexagons), the presence of proteolytic enzymes (indicated in the figure and potentially targeted to a protein through a specific motif [Tag]) represent methods of .

Overview of the central dogma of molecular biology. An operon encoding orfA and orfB is under the control of the indicated promoter (PKnown), which is engaged by the RNA polymerase holoenzyme (green and blue boxes, red circles, and black arcs). Sequences that will be transcribed into the ribosome binding sites (RBSs) for orfA and orfB (purple boxes) and the transcription terminator (TKnown; two inverted repeats [inverted arrowheads] flanked by a poly A-tract [red box] and poly U-tract [grey box]) are indicated. Red octagons represent translation stop codons. At the level of , alterations to the promoter and terminator sequences, the presence of transcription activators and repressors (depicted as golden ovals with green hexagon ligands bound), and nucleoid/supercoiling states may all influence orfA and orfB transcription (not shown). Through the process of transcription itself, the associated orfA and orfB mRNA is produced, which may in turn be bound by ribosomes (depicted as two variable size purple half circles) at the RBS. To terminate transcription, the depicted terminator sequence forms a stem-loop (blue), mediated by base pairing/stacking with the complementary sequences of the former inverted repeats (half arrowheads), and mRNA synthesis ceases at this site (poly A- and ploy U-tracts are indicated as above). Modifying RBS strength, RNA riboswitches, aptazymes (aptamer ribozymes), and RNA stability may each contribute to (not shown). Folding of a translated protein (green and orange circles) with the assistance of folding chaperones (not shown), enzymes that promote post-translation modifications such as phosphorylation (P in a white circle) or glycosylation (white hexagons), the presence of proteolytic enzymes (indicated in the figure and potentially targeted to a protein through a specific motif [Tag]) represent methods of . Transcriptional control is a prevalent mode of regulating protein production because it is the first step in the process. Accordingly, it has received the most attention in terms of tool development and engineering applications. Transcriptional regulation allows a cell to allocate its valuable resources towards production of proteins when these proteins would increase fitness in a given environment. Furthermore, transcriptional regulation also provides a method for the cell to control the levels/stoichiometry of enzymes to avoid futile cycles, underutilized enzymes, build-up of undesired/toxic metabolites, and assembly of macromolecular structures. Over the last several decades, researchers have assembled methodologies for creating tools that enable fine-tuning of gene expression without the evolution time required to optimize natural systems. These tools have been applied to many organisms and heterologous systems. Indeed, fine tuning the levels of enzymes and other proteins has proven to be essential to many applications in metabolic engineering [6], [7], [8], [9], [10], [11], [12], [13], [14], [15] and common synthetic biology systems such as toggle switches and Boolean logic circuits [16], [17], [18], [19], [20], [21], [22], [23], cellular memory and calculation elements [24], [25], [26], [27], and regulatory oscillators [13], [20], [28]. Promoter and terminator libraries [12], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40] have enabled many of these applications. This mini-review will describe elements of transcriptional control and applications for engineered systems, but many control topics are beyond its scope. Although we will focus primarily on well-characterized examples of transcriptional control in model bacteria (e.g., Escherichia coli), we acknowledge that these transcriptional control methods are being increasingly extended to non-model organisms and eukaryotes. We are confident that the approaches developed for E. coli and other simple systems will lead to the creation of transcription tools for these organisms and find use in both metabolic engineering and synthetic biology applications. While this review will focus on transcription control, there are additional layers of control over protein expression at the post-transcriptional (e.g., riboswitches/toehold RNA switches [41], [42], [43], [44], RNA splicing/ribozymes [9], [45], [46], [47], and at the level of modifying ribosome binding site [RBS] strength [48], [49]) and post-translational levels (e.g., proteolytic activation/inactivation [50], [51], [52], [53], [54], [55], [56], glycosylation [57], [58], [59], [60], phosphopantetheine addition [61], [62], [63], phosphorylation [64], [65], [66], [67], [68], quaternary structure formation [69], [70], [71], [72], [73], [74], protease tags [75], and folding chaperones [76]). However, these are methods beyond the intended scope of this mini-review. Likewise, most genes in bacteria are arranged in operons [77], [78], [79], [80], [81]. In addition to understanding promoter strength and transcription termination (discussed here, but in a different context), engineering synthetic operons requires careful RBS design and positioning [48], [49]; an understanding of gene positioning, order, and orientation [78], [79], [80], [81]; an understanding of RNA stability and structure [82], [83], [84]; and an understanding of intergenic DNA and RNA sequences (related to the other operon engineering considerations) [29], [30], [85]. These too are all beyond the scope of this mini-review. Please note that transcription and translation are coupled in prokaryotes (i.e., simultaneously occurring in the same location). Intriguingly, once these processes are decoupled, transcripts become rapidly inactivated through such mechanisms as alterations to the RNA structure, RNA degradation, and forming RNA-DNA R-Loops [86], [87], [88], [89], [90], [91]. Furthermore, the bacterial nucleoid structure and DNA supercoiling surely play an overwhelming [92], [93], [94], [95], [96], [97], [98], but still poorly understood, function during transcription regulation. Nevertheless, transcription-translation decoupling, DNA structure, and supercoiling are all beyond the intentional scope of this mini-review.

Prokaryotic promoters, transcription initiation, termination, and tools

A logical point to begin any discussion of transcription regulation is at the level of the machinery and DNA/RNA sequence elements that facilitate mRNA transcript production. In particular, the machinery referred to here includes bacterial RNA polymerase (RNAP) holoenzymes. The DNA/RNA sequence elements referred to above include promoters and terminators. Remarkably, the components of the RNAP holoenzyme, α2ββ′ωσ, are conserved among bacteria [74], [99], [100] and have similar characteristics to eukaryotic and viral RNA polymerases [70], [73], [99], [100]. Additionally, RNAP holoenzymes have different activities toward different promoters and terminators [16], [35], [36], [101], [102], [103], [104], [105], [106], [107], [108], which motivates promoter and terminator library development.

Elements of promoter recognition by RNA polymerases

Among the well-characterized prokaryotic promoters are those promoters recognized by σ70 (sigma-70) and other group 1 sigma factors. Indeed, group 1 sigma factors represent those factors that drive the expression of genes necessary for cell viability and other housekeeping activities [109], [110], [111]. In E. coli, σ70 is the principal sigma factor (σA is the principal sigma factor in other prokaryotes) [109], [110], [111]. Although there are many additional sigma factors that serve important functions under a variety of different cellular conditions [109], [110], [111], [112], [113], we will primarily focus on σ70- or σA-regulated promoters. σ70, in complex with RNAP, recognizes key promoter sequences during early events of transcription initiation (for additional descriptions of transcription initiation and the anatomy of RNAP holoenzyme-promoter interactions please see Fig. 2A, Fig. 2B, and references [70], [71], [72], [110], [111], [114], [115], [116], [117], [118], [119], [120], [121]). This sigma factor is further sub-divided into four regions; elements of sub-region 4 (sub-region 4.2) and 2 (sub-regions 2.3 and 2.4) are involved in recognizing the −35 and −10 regions of a typical σ70 promoter, respectively [70], [71], [72], [109], [110], [111], [114], [115], [120], [121]. The −35 sequence has the consensus sequence TTGACA, while the −10 sequence has the consensus sequence (TATAAT) [121], [122], [123]. An additional TGN sequence upstream of the −10 sequence (the extended −10 sequence) is present in some bacterial promoters and is recognized by the 3.0 sub-region of σ70 [70], [71], [72], [111], [114], [115], [120], [121]. The optimal spacing between the end of the −35 sequences and the beginning of the −10 sequences is 17-bp (base pairs), which is consistent with sub-regions 2, 3, and 4 positioning when σ70 is docked into the RNAP holoenzyme [70], [71], [72], [109], [110], [111], [114], [120], [121], [122], [123]. In addition, a 6- to 8-bp variable region (recognized by the 1.2 sub-region of σ70) immediately downstream of the −10 sequence (the discriminator) also serves a function during early transcription initiation [71], [72], [110], [111], [114], [117], [118], [119], [120], [121].
Fig. 2

A representation of RNA polymerase holoenzyme-promoter interactions. (A) RNAP β and β′ subunits are indicated by green boxes. The RNAP α subunits (red circles and black arcs) are in contact with the bent upstream UP element. σ70 (dark blue box) sub-region 4.2 (pale green), 3.0 (pale orange), 2.3/2.4 (pale purple), 1.2 (pale pink), and 1.1 are associated with the indicated −35 sequence (pale green), extended −10 sequence (pale orange), −10 sequence (pale purple), discriminator (pale pink), and double stranded DNA, respectively. In this case, promoter elements are separated by 17-bp. (B) RNAP holoenzyme promoter binding is obstructed (red arrow and a red cross) by DNA bending mediated by nucleoid structuring factors or transcription factors (purple ovals, circles, and lines), other transcription regulators (golden ovals with green hexagon ligands bound) bound to operator sequences (gold box), spacing greater than 17-bp between the −35 and −10 sequences, and a −35 sequence deviating from the canonical −35 sequence (yellow and red bars). These figures are modified from the work by Browning and Busby [121].

A representation of RNA polymerase holoenzyme-promoter interactions. (A) RNAP β and β′ subunits are indicated by green boxes. The RNAP α subunits (red circles and black arcs) are in contact with the bent upstream UP element. σ70 (dark blue box) sub-region 4.2 (pale green), 3.0 (pale orange), 2.3/2.4 (pale purple), 1.2 (pale pink), and 1.1 are associated with the indicated −35 sequence (pale green), extended −10 sequence (pale orange), −10 sequence (pale purple), discriminator (pale pink), and double stranded DNA, respectively. In this case, promoter elements are separated by 17-bp. (B) RNAP holoenzyme promoter binding is obstructed (red arrow and a red cross) by DNA bending mediated by nucleoid structuring factors or transcription factors (purple ovals, circles, and lines), other transcription regulators (golden ovals with green hexagon ligands bound) bound to operator sequences (gold box), spacing greater than 17-bp between the −35 and −10 sequences, and a −35 sequence deviating from the canonical −35 sequence (yellow and red bars). These figures are modified from the work by Browning and Busby [121]. Optimal spacing and function of promoter sequences reflects the position of individual components in the RNAP holoenzyme both during promoter recognition and subsequent transcription initiation events. Briefly, during the early stages of promoter recognition, the RNAP holoenzyme α2 subunits CTDs (carboxyl-terminal domains) bind to the promoter UP elements (an AT-rich sequence [114], [121], [124], [125]), the σ70 sub-region 4.2 (proximal to the α2 subunits CTDs) associates with the promoter −35 sequence, and sub-regions 2.3 and 2.4 associate with the −10 sequence [70], [71], [72], [109], [110], [111], [114], [115], [120], [121]. A linker region between σ70 sub-region 4.2 and sub-regions 2.3, 2.4, and 3.0 accounts for the spacing between the −35 and −10 sequences (i.e., the linker is in an extended conformation when σ70 is docked into the RNAP holoenzyme) [70], [71], [72], [109], [110], [111], [114], [120], [121]. The clamp and jaw formed by the β and β′ subunits of the RNAP holoenzyme, respectively, are in an open conformation, while promoter DNA at this point is still in a closed duplex (referred to as the closed complex) [71], [72], [114], [126]. After initial contact with a promoter, DNA is bent around the RNAP holoenzyme; first the upstream portion of DNA proximal to the α2 subunits CTDs is bent, followed by bending of the region immediately upstream of the −10 sequence through the −10 sequence itself [70], [71], [72], [114]. During this bending, σ70 sub-region 1.2 is in contact with the discriminator, base stacks in duplex DNA are disrupted in the AT-rich −10 sequence, the base at position −11 (usually an A) is flipped into σ70 sub-region 2.3, and the −11/-10 nucleotide through the transcription initiation site (+1) (usually initiated with a purine [122], [127]) and adjacent bases (+2/+3) of duplex DNA are eventually dehybridized (i.e., this is step is kinetically limiting) forming the open complex [70], [71], [72], [111], [114], [115], [117], [120], [121], [128]. A separation between the clamp and jaw of RNAP, where central Mg2+ ions are coordinated by amino acids found in the β′ subunit and water molecules coordinated by amino acids found in both the β′ and β subunits [70], [71], [72], [74], [114], forms the active site cleft. The single stranded transcriptional start site (on the template strand), typically seven- or eight-nucleotides downstream of the −10 sequence [70], [71], [72], [110], [114], [120], is positioned into the active site cleft, and isomerization events result in a tightening (closing) of the RNAP jaw and clamp [71], [72], [114], [126]. Intriguingly, recent work shows that the RNAP jaw and clamp may transiently close during initial promoter recognition and reopen before the final tightening above [126]. Discriminator sequence-RNAP holoenzyme contacts presumably assist in positioning the template strand (i.e., open DNA strands may be scrunched into this cleft with different DNA-RNAP holoenzyme interactions leading to transcription initiation sites different from the typical seven- or eight- nucleotides downstream of the −10 sequence noted above) [71], [72], [114], [116], [119], [120]. The discriminator also serves a function in mediating open complex stability and may influence length of the initially transcribed RNA polymer at the time of promoter escape. As noted above, there is no consensus sequence for the discriminator region [116], [118]. Therefore, it instead seems that interactions between open DNA and σ70 sub-region 1.2, the presence of G/(G/C) bases at positions −6/-5, interactions between DNA and σ70 sub-region 1.1, continued interactions between σ70 and the aforementioned promoter elements above, spacing of these promoter elements, and other interactions between open- and closed-form DNA and RNAP holoenzyme subunits all likely contribute to recognition of the discriminator sequence [70], [71], [72], [110], [111], [114], [115], [116], [117], [118], [119], [120], [121]. It has recently been demonstrated that open complexes at certain promoters have discriminator regions that mediate stable open complexes and increased length of abortive, and presumably initial escape transcripts, while other promoters with discriminators that mediate unstable open complexes have shorter length abortive and initial escape transcripts [116]. The reason for these differences is likely accounted for by weaker discriminator sequence interactions with the RNAP holoenzyme along with needing to prescrunch these sequences into the RNAP active site cleft to make any productive interactions [70], [116], [118], [119]. Thus, relieving stress buildup resulting from scrunching open complex DNA into the active site cleft may be a significant driver of RNAP promoter escape [116], [119], [129], [130].

Promoter libraries

RNAP holoenzyme activity can vary over a wide range depending on the sequence of a series of promoters [16], [36], [101], [102], [103], [104], [105]. This correlation has been used to tune control of gene expression at the level of promoter recognition and transcription initiation. Constitutive promoters under the control of σ70, in contrast to non-constitutive inducible and repressible promoters (see the Transcription Factors section below), usually direct comparable levels of gene expression independent of environmental and regulator context. Although libraries of characterized non-constitutive promoters exist [33], [34], [38], [39], [40] (http://parts.igem.org/Promoters/Catalog/Ecoli/Multiple, http://parts.igem.org/Promoters/Catalog/Ecoli/Repressible, http://parts.igem.org/Promoters/Catalog/Ecoli/Positive), this mini-review will primarily focus on constitutive promoter libraries (e.g., http://parts.igem.org/Promoters/Catalog/Ecoli/Constitutive). These constitutive promoters are often characterized as being weak (yielding low levels of reporter gene expression) or strong (yielding high levels of reporter gene expression). Although weak promoters often deviate from the −35 and −10 consensus sequences and typical element spacing (strong promoters closely match these consensus sequences and typical element spacing), considerable variability may still exist between promoters with identical or nearly identical −35 and −10 elements [32], [37], [101], [102], [105], [131]. In addition, interactions between the 5′ UTRs (untranslated regions) and identical promoters contribute to gene expression variability, suggesting that promoter context is as important as the −35 and −10 sequences [29], [30]. Promoter libraries are usually generated either through synthesizing putative constitutive promoter sequences [29], [33], [37], randomly mutating the promoter sequence [12], [34], or generating libraries of putative promoter inserts [29], [30], [36]. For instance, Jensen et al. [37] demonstrated that a synthetic promoter sequence library originally engineered for Lactococcus lactis (see below) and inserted upstream of lacZ (encoding β-galactosidase), could be used in E. coli to titrate lacZ expression over four different orders of magnitude. This library was synthesized with changes primarily to regions flanking the −35 and −10 sequences, with a few −35 and −10 sequence variants. As one may expect, the strongest promoters in this collection have the −35 and −10 consensus sequence, along with optimal spacing. However many of the weaker promoters in this collection also match these elements and some strong promoters do not. The reasons for these deviations remain unclear, but may result from complex promoter-RNAP-holenzyme interactions, DNA supercoiling, and 3D DNA architecture noted in previous sections above. Indeed, Mutalik et al. [30] developed an analysis of variance (ANOVA) tool to quantify transcription variability arising from different promoter and 5′ UTR couplings. In other work by Mutalik et al. [29], these researchers subsequently leveraged known and random promoters, coupled to 5′ UTR, monocistron, and bicistron variants, to obtain transcription and translation initiation constructs that promote reliable reporter gene expression over a 1000-fold dynamic range. In the work by Alper et al. [12] the PLTet-O1 promoter was randomly mutated, inserted upstream of a gfp (encoding green fluorescent protein) reporter gene, and a library of 22 promoter mutants were selected for further characterization based on high reporter expression across the entire population of E. coli harboring these constructs. This promoter library had a 196-fold range in GFP production among member constructs. The cat (chloramphenicol acetyltransferase) gene was also swapped for gfp in this promoter library, the minimum inhibitory concentrations (MICs) of chloramphenicol were measured, and these MICs were correlated with promoter strength. This swap not only demonstrates that gene expression from these library promoters was constitutive, but that these promoters were modular. Leveraging this modularity, Alper et al. inserted library promoters upstream of ppc (encoding phosphoenolpyruvate carboxylase) and dxs (encoding deoxy-xylulose-P synthase). Biomass yields and lycopene production were to varying degrees associated with promoter strength. The Anderson promoter library (http://parts.igem.org/Promoters/Catalog/Anderson) is a tool composed of 20 promoters for titrating gene expression over three different orders of magnitude. This library is composed primarily of −35 and −10 sequence variants. Finally, the Weizmann Institute of Science promoter library [36] features nearly 2000 unique members for titratable and dynamic expression of a gfp reporter. This library was engineered using random inserts of genomic DNA cloned upstream of a previously promoterless reporter gene, a technique often followed when promoter sequences or regulation profiles are unknown [36], [132], [133], [134], [135], [136], [137]. Analyzing transcriptome data for putative transcription start sites or conserved upstream sequences are additional methods to identify promoter elements [127], [138], [139], [140], [141]. Promoter libraries have been developed and implemented in non-model organisms. As alluded to above, Jensen et al. [37] synthesized an Lc. lactis library using the known consensus sequences of the −35 and −10 regions along with variable sequence adjacent motifs. This library yielded titratable lacZ expression over four different orders of magnitude in this organism. Unlike E. coli above, in terms of lacZ reporter expression, Lc. lactis is less tolerant to changes in the −35 and −10 sequences. However, as was the case with E. coli, the spacer region and additional promoter elements also contribute to titratable lacZ expression in Lc. lactis. As was the case for Lc. Lactis [37], to engineer a Corynebacterium glutamicum library Rytter et al. [33] took consensus sequences for the C. glutamicum −10 region (TGNGNTAYAATGG), the E. coli −35 region with and without an AT-rich region upstream, and primarily changed the spacer between these regions to titrate lacZ expression over three different orders of magnitude. Finally, Markley et al. [34] used mutagenesis of a truncated version of the well-characterized Synechococcus elongatus strain PCC 7002 promoter PCpcB to engineer a library of 29 unique members to control expression of a yfp (encoding yellow fluorescent protein) reporter gene. This library yielded titratable expression of yfp over three orders of magnitude, with a mutation changing the original PCpcB −10 sequence (TATAAA) to the E. coli −10 consensus being responsible for much of the increased yfp expression of the top performing library member. Fig. 3 provides a roadmap for engineering promoter libraries, which may be a useful guide for further extending promoter libraries in other non-model organisms.
Fig. 3

A procedure to generate promoter and terminator libraries. To test putative promoters as sites of transcription initiation, known promoters (PKnown) or random sequences (PTest) may be inserted upstream of a previously promoterless ribosome binding site (RBS)-reporter1 gene translational fusion (purple box with a green arrow). These promoters may be constitutive (shown) or subject to transcription regulation by activators and repressors (not shown). Pale green and pale purple boxes indicate the −35 and −10 promoter sequences, respectively. To test putative transcription terminators, known terminators (TKnown) or random sequences (TTest) may be inserted upstream of a promoterless RBS-reporter2 translational fusion (purple box with a red arrow) and downstream of the PKnown/Test-reporter1 transcription fusion above. Additional known or random terminators may be inserted in tandem, and a known terminator may be inserted downstream of reporter2 to ensure termination of transcription after RNAP reads-through this gene (not shown). Inverted white arrows represent the sequence forming a terminator stem-loop (inverted repeats), and red and grey boxes represent poly A- and poly U-tracts, respectively. To introduce variability in promoter and terminator libraries (thin yellow, orange, red, grey, and white bars) these cis-regulatory elements may be derived from randomly sheared DNA, synthesized on demand with desired sequences, subjected to mutagenic PCR (polymerase chain reaction), or other similar methods. These cis-regulatory element constructs may be introduced into an organism of interest, reporter gene expression may be assessed for each library member (green and red suns), and this procedure may be iterated (starting with new random/mutated sequences or using library member sequences as inputs for further sequence variation generation) to achieve a desired result or a wider expression variability profile. To identify sequence characteristics that may be responsible for reporter gene expression level differences (rows of green [gene expression from promoters] and red [gene expression from transcription read-through] suns) among library member sequences (indicated), these constructs should be sequenced and compared with other members or known promoter sequences (iterated as desired).

A procedure to generate promoter and terminator libraries. To test putative promoters as sites of transcription initiation, known promoters (PKnown) or random sequences (PTest) may be inserted upstream of a previously promoterless ribosome binding site (RBS)-reporter1 gene translational fusion (purple box with a green arrow). These promoters may be constitutive (shown) or subject to transcription regulation by activators and repressors (not shown). Pale green and pale purple boxes indicate the −35 and −10 promoter sequences, respectively. To test putative transcription terminators, known terminators (TKnown) or random sequences (TTest) may be inserted upstream of a promoterless RBS-reporter2 translational fusion (purple box with a red arrow) and downstream of the PKnown/Test-reporter1 transcription fusion above. Additional known or random terminators may be inserted in tandem, and a known terminator may be inserted downstream of reporter2 to ensure termination of transcription after RNAP reads-through this gene (not shown). Inverted white arrows represent the sequence forming a terminator stem-loop (inverted repeats), and red and grey boxes represent poly A- and poly U-tracts, respectively. To introduce variability in promoter and terminator libraries (thin yellow, orange, red, grey, and white bars) these cis-regulatory elements may be derived from randomly sheared DNA, synthesized on demand with desired sequences, subjected to mutagenic PCR (polymerase chain reaction), or other similar methods. These cis-regulatory element constructs may be introduced into an organism of interest, reporter gene expression may be assessed for each library member (green and red suns), and this procedure may be iterated (starting with new random/mutated sequences or using library member sequences as inputs for further sequence variation generation) to achieve a desired result or a wider expression variability profile. To identify sequence characteristics that may be responsible for reporter gene expression level differences (rows of green [gene expression from promoters] and red [gene expression from transcription read-through] suns) among library member sequences (indicated), these constructs should be sequenced and compared with other members or known promoter sequences (iterated as desired).

Transcription terminators

In E. coli, transcription primarily terminates either with the assistance of Rho helicase (Rho-dependent termination) or through innate elements of the RNA transcript itself (Rho-independent termination) [142], [143], [144]. Engineering termination elements becomes especially important for many transcriptional control systems to avoid polar effects associated with transcription. Many of these polar effects may include transcription unit desegmentation (i.e., circumventing transcription regulation of downstream genes) [35], [145], [146], [147], [148], [149], [150], [151], [152], increased translation of upstream genes by prolonging transcription-translation coupling along with other alterations to translation mechanics (i.e., translational coupling) [91], [144], [150], [153], [154], [155], [156], potentially arresting downstream transcription through R-Loop formation [144], [157], [158], [159], altering DNA supercoiling [160], and influencing plasmid copy number [161]. Multiple terminators present in a single cell must be designed such that homologous recombination will not delete or invert intervening sequences between these constructs, underscoring the need to have a broad understanding of a variety of different terminators. To further characterize transcriptional termination [142], [143], [144]. As a hexamer complex, Rho associates with RNA, recognizes a C-rich and G-poor RNA rut (Rho utilization) sequence, allows RNA to threaded through an open region in the Rho complex, and subsequently closes around the associated RNA [142], [143], [144]. Signals that are still poorly understood pause the procession of RNAP-mediated RNA polymer elongation or promote RNAP backtracking (albeit, nucleoid structure may serve a function perturbing RNAP elongation dynamics along with promoting RNAP backtracking) [162], [163], and Rho, coupled to ATP hydrolysis, proceeds wrapping RNA around the hexamer complex [142], [143] and catches up to this previously paused RNAP (i.e., resuming transcript elongation may be a requirement for Rho-mediated termination [143], [164]). Subsequently, Rho disrupts the RNA-DNA duplex inside of RNAP and transcription terminates after dissociation of the RNA-DNA duplex and rehybridization of the DNA-DNA duplex [142], [143]. In contrast to Rho-dependent termination, Rho-independent termination events usually occur when an RNA stem-loop forms in the RNAP exit channel and a poly U-tract downstream of this stem loop is dehybridized from the associated DNA poly A-tract [35], [142], [144], [165]. In particular, pausing at this poly U-tract allows the nucleation of a GC-rich stem [35], [142], [144]. An additional poly A-tract is sometimes present upstream of some terminator stem-loops [35], [142], [144], but the precise function for this sequences remains unclear (e.g., this sequence may be the poly U-tract for transcription directed from the opposite strand of DNA, promote extension of the terminator stem into the poly U-tract, etc). Well known Rho-independent transcription terminators include the TRrnB1 and TRrnB2 terminators [166], [167], [168], the TTE T7 phage terminator [167], [168], and the λ phage TO terminator [169], [170], [171]. To further characterize transcriptional termination and solve the issue of integrating multiple terminators in tandem without promoting homologous recombination, additional libraries of known and tested Rho-independent terminators have been developed (http://parts.igem.org/Terminators/Catalog) [35]. As was the case for promoter libraries above, RNAP also has a wide range of activities toward intrinsic terminators [35], [106], [107], [108].

Terminator libraries

In the work by Chen et al. [35], 582 natural and synthetic Rho-independent terminators were characterized and a biophysical model of terminator strength was developed. This group used an experimental setup where a single promoter, PBAD (see below), drove expression of a gfp reporter gene. Downstream of gfp, putative terminators were positioned to terminate expression 5′ of a promoterless rfp (red fluorescent protein) reporter gene (i.e., rfp will only be expressed if RNAP reads through these tested terminators). Terminator strengths were determined by taking the ratio of gfp to rfp expression levels, normalized to the same ratio for a terminator-less RNA molecule, and varied over three different orders of magnitude. As was previously known for intrinsic terminators, the presence of a poly A-tract upstream (5′) of a GC-rich terminator stem, a longer terminator stem (approximately 8-bp), a stable unpaired loop region, and a poly U-tract downstream (3′) of the stem were all characteristics associated with strong terminators studied by this group. Free energy changes associated with the formation of all of these features, thus, went into the design of a biophysical model for predicting terminator strength. In addition, through identifying a variety of strong terminators in their library and model, Chen et al. were able to predictably terminate transcription with reduced homologous recombination-mediated changes to these terminator constructs. Fig. 3, in addition to describing promoter library generation, describes building terminator libraries.

Transcription control through transcription factor tools

As described above, promoters can often be classified as weak or strong. However, trans-acting factors, referred to as transcription factors, have the potential to invert these promoter strength metrics (i.e., a weak promoter could be transformed into a strong promoter, and a strong promoter into a weak one) (Fig. 4A). Factors that transform a weak promoter into a strong one are referred to as transcription activators; the factors that transform a strong promoter into a weak one are referred to as transcription repressors. Promoters with titratable (linear/dynamic) activity toward these transcription factors are generally used for analogue-type computation (Fig. 4A and B) [25]. Promoters that strongly respond to transcription activators or repressors (often with feedforward and feedback loops), resulting in a step-change in promoter strength, are generally used for digital-type gene circuits [16], [18], [19], [25] (Fig. 4A and C). DNA binding by many of these transcription factors is affected by environmental conditions, such as the presence of small metabolites and other ligands. In this way, small molecules can be used as switches of transcriptional control. Of particular note, in terms of digital transcription control design, Nielson et al. [23] have developed the automated system Cello (http://www.cellocad.org/). The Cello system uses the Verilog design language, which can be converted into DNA sequences by this system, and reliable and modular gene circuit components to automate gene circuit engineering. In the following sections, we highlight some of the more commonly used transcription factors and how they have been used to control gene expression by synthetic biologists.
Fig. 4

An overview of transcription regulation following analogue and digital expression dynamics. (A) A hypothetical transcription activity dynamics plot is depicted. Faint green and red lines represent transcription activity (activation and repression, respectively) subject to linear/analogue dynamics. Dark green and red curves represent activation and repression subject to step-change/digital dynamics, respectively. Individual points represent one hypothetical measurement of transcription activity. Concentrations of hypothetical activator or repressor ligands increase from left to right on the horizontal axis, and transcription activity increases from the bottom to the top of the vertical axis. (B) An example of a system subject to linear/analogue dynamics. This model is modified from the work of Daniel et al. [25]. Activator (gold ovals) fused to Reporter 1 (blue suns) is produced from a gene expressed from a low copy plasmid and subject to a positive feedback (auto-activation) loop at PActivator (green arrow). Activator (bound to inducer) also binds target sites on a high copy number plasmid activating reporter2, again at PActivator, and simultaneously titrates Activator from the low copy plasmid above. Based on inducer concentration, this model allows for log-linear titration of Reporter 2 (green suns) levels [25]. (C) A digital AND logic gate (engineered from two NOR gates and a NOT gate) is symbolically represented along with the associated truth table. The depicted model is modified from the work of Tamsir et al. [18]. This circuit is distributed among three co-cultured cells (pale brown, blue, and purple circles), and the scenario depicted represents the ON output state, when Reporter (green suns) is produced (1 and a green box in the truth table). Both inducer 1 (brown diamond) and inducer 2 (blue pentagon) are present (1 and a green box in the truth table) and bound to Regulators 1 and 2, respectively. These regulators bind PInducer1 and PInducer2, respectively, activating repressor (green arrow). Repressor is subsequently produced and represses inducer3 (red crossbar), the gene encoding the producer of inducer 3 (purple hexagon), at PRepress. Inducer 3 is not secreted (red arrow with a cross), and Regulator 3 (without inducer 3 bound) binds PInducer3 repressing repressor. Subsequently, Repressor does not repress reporter, at PRepress, leading to reporter expression and Reporter production. In the absence of either or both inducers 1 and 2 (0 and red boxes in the truth table) inducer3 is not repressed, which leads to inducer 3 secretion. Regulator 3 (with inducer 3 bound) activates repressor, at PRepress. Repressor subsequently represses reporter, blocking Reporter production (0 and a red box in the truth table).

An overview of transcription regulation following analogue and digital expression dynamics. (A) A hypothetical transcription activity dynamics plot is depicted. Faint green and red lines represent transcription activity (activation and repression, respectively) subject to linear/analogue dynamics. Dark green and red curves represent activation and repression subject to step-change/digital dynamics, respectively. Individual points represent one hypothetical measurement of transcription activity. Concentrations of hypothetical activator or repressor ligands increase from left to right on the horizontal axis, and transcription activity increases from the bottom to the top of the vertical axis. (B) An example of a system subject to linear/analogue dynamics. This model is modified from the work of Daniel et al. [25]. Activator (gold ovals) fused to Reporter 1 (blue suns) is produced from a gene expressed from a low copy plasmid and subject to a positive feedback (auto-activation) loop at PActivator (green arrow). Activator (bound to inducer) also binds target sites on a high copy number plasmid activating reporter2, again at PActivator, and simultaneously titrates Activator from the low copy plasmid above. Based on inducer concentration, this model allows for log-linear titration of Reporter 2 (green suns) levels [25]. (C) A digital AND logic gate (engineered from two NOR gates and a NOT gate) is symbolically represented along with the associated truth table. The depicted model is modified from the work of Tamsir et al. [18]. This circuit is distributed among three co-cultured cells (pale brown, blue, and purple circles), and the scenario depicted represents the ON output state, when Reporter (green suns) is produced (1 and a green box in the truth table). Both inducer 1 (brown diamond) and inducer 2 (blue pentagon) are present (1 and a green box in the truth table) and bound to Regulators 1 and 2, respectively. These regulators bind PInducer1 and PInducer2, respectively, activating repressor (green arrow). Repressor is subsequently produced and represses inducer3 (red crossbar), the gene encoding the producer of inducer 3 (purple hexagon), at PRepress. Inducer 3 is not secreted (red arrow with a cross), and Regulator 3 (without inducer 3 bound) binds PInducer3 repressing repressor. Subsequently, Repressor does not repress reporter, at PRepress, leading to reporter expression and Reporter production. In the absence of either or both inducers 1 and 2 (0 and red boxes in the truth table) inducer3 is not repressed, which leads to inducer 3 secretion. Regulator 3 (with inducer 3 bound) activates repressor, at PRepress. Repressor subsequently represses reporter, blocking Reporter production (0 and a red box in the truth table).

CI and Cro

Together, CI and Cro are transcription factors that naturally function as a toggle switch and have been utilized in a variety of synthetic regulatory systems. In nature, Cro binds PRM and represses cI, and CI binds PR and represses cro [172], [173], [174], [175], [176]. This toggle switch controls the fate of a cell infected with bacteriophage λ. If the Cro binding state is predominate a lytic infection with this phage occurs, but if the CI state predominates the phage may enter into the lysogenic cycle [172], [173], [174], [175], [176]. The Cro and CI binding sites have the same consensus sequence (TATCACCGGCGGTGATA), but each repressor has different affinities for variants of this sequence, explaining the switch like behavior [172], [173], [174], [175], [176], [177], [178]. The Cro and CI binding sites overlap the respective promoters of each repressor, which suggests that each blocks transcription by sterically interfering with RNAP holoenzyme promoter binding (potentially through DNA looping) [115], [121], [173], [174], [175], [179]. In synthetic biology, CI was one of three repressors tested in early toggle switches development by Gardner et al. [16]. Gardner et al. concluded that the CI857 temperature-sensitive variant (functional as a repressor at 30 °C, but not at 42 °C [180]) is an extremely efficient repressor at PLS1con. Kotula et al. [24] constructed a Cro and CI memory element based on the natural PR and PRM promoters. This group used a stable CI-ON state (resulting in lacZ reporter gene repression) coupled to TetR (see below) repression of cro. Adding aTc (anhydrotetracycline) derepressed cro, which in turn repressed cI and generated a stable Cro-ON state (resulting in lacZ reporter gene expression). Using a mouse model, where mice were treated with a strain of E. coli harboring this memory element and with or without aTc, Kotula et al. also demonstrated that this memory element is a useful in vivo reporter system. Finally, Tamsir et al. [18] used CI, CAP (catabolite activator protein), AraC, and TetR (see below) during their construction of all 16 two-input Boolean logic gates.

CAP (catabolite activator protein)

CAP serves a key regulatory function in catabolite repression and has been leveraged in synthetic systems to control gene expression. When CAP binds cAMP (cyclic AMP) it has a higher affinity for its cognate binding sites (AATGTGATCTAGATCACATTT), which are often upstream of operons related to carbohydrate metabolism [181], [182], [183], [184]. Indeed, the αCTD subunits of the RNAP holoenzyme make contacts with CAP, which improves promoter recognition by the holoenzyme [115], [181], [185], [186]. In terms of catabolite repression, adenylyl cyclase activity and cAMP synthesis is modulated by the availability of glucose in the extracellular milieu [184]. When glucose levels are high, adenylyl cyclase activity is low, and when glucose levels are low, adenylyl cyclase activity is high [184]. Synthetic systems based on promoters that are subject to catabolite repression (e.g., PBAD [found on vector pBAD and others], PLacUV5 expression systems, etc) are often used when strong negative regulation of genes is desired (i.e., for auto-induction systems that will only produce desired proteins when glucose is depleted from the extracellular milieu) [187], [188]. Daniel et al. [25], additionally, used PBAD to engineer transcription systems capable of performing analogue computation. Other components of these transcription control systems include LacI, AraC, and LuxR (see below).

LacI

Describing LacI-mediated repression of the E. coli lac operon represents one of the earliest models of gene regulation [77], [189]. Naturally, LacI dimers bind the operator sequence (TGGAATTGTGAGCGGATAACAATT) in the absence of the small molecule allolactose (or isopropyl β-d-1-thiogalactopyranoside [IPTG]) [77], [189], [190], [191], [192], [193], [194]. Occupancy at these operator sequences obstructs RNAP holoenzyme binding to PLac (a weak promoter without additional CAP-mediated activation) [121], [189], [192], [195], [196], [197]. In addition, LacI-mediated DNA looping (potentially assisted by nucleoid structuring factors and DNA supercoiling) may obstruct RNAP holoenzyme-PLac association [115], [121], [179], [189], [195], [196], [198]. When present, allolactose or IPTG occupy ligand-binding pockets on LacI, which results in a conformational change of this regulator and reduces its affinity for operator binding sequences [77], [189], [191], [193], [194]. Numerous regulation tools (noted above and below) have been based off of LacI-mediated repression. LacI-mediated repression is a tool used for engineering a variety of genetic circuits. To explore intrinsic and extrinsic noise in gene expression, Elowitz et al. [199] used LacI-mediated repression of reporter genes yfp and cfp (encoding the cyan fluorescent protein) placed nearly equidistant from the origin of replication on opposite sides of the E. coli chromosome. This group identified noisy reporter gene expression when LacI repressed these reporter genes. However, they found that varying levels of IPTG affected intrinsic and extrinsic reporter gene noise differently. Increasing IPTG levels decreased intrinsic noise, and extrinsic noise initially increased then decreased. LacI was used in addition to CI (see above) and TetR (see below) repressors, to construct toggle switches used by Gardner et al. [16]. In this work, LacI repressed cI or tetR expression in addition to a gfp reporter gene expressed from PTrc-2 (a strong synthetic promoter generated by fusing PTrp and PLac—containing a single LacI operator sequence). This approximates a step-change in reporter expression, where derepression by addition of IPTG is supplemented by cI expression and subsequent CI-mediated lacI repression at PLS1con. Earlier work has established that LacI-mediated regulation is subject to the all or nothing phenomenon (a step-change as opposed to a titration in gene expression) [200], [201]. Indeed, suboptimal induction with IPTG leads to a heterogeneous response at the single cell level (i.e., some cells express high levels of lacZ while others do not), but at the cell population level may appear as though the cells have uniform or expression levels similar to a fully induced system [201], [202]. Nevertheless, other groups have used systems where it was possible to titrate IPTG-mediated derepression [25], [200], [202]. For example, titratable expression is also sometimes achieved when lacY (the lactose and IPTG importer) is deleted from the background of the E. coli host strain [200], [202].

AraC

AraC-mediated regulation of the ara operon is another well-characterized system of transcriptional control. In nature, AraC is both a repressor and activator depending on l-arabinose-mediated isomerizations of this regulator [203], [204]. Indeed, when l-arabinose is absent in the extracellular milieu, AraC dimers bind promoter proximal and upstream distal sites in an outstretched configuration, which bends/loops DNA and obstructs RNAP holoenzyme promoter binding [115], [121], [204]. AraC recognizes an AT-rich sequence (usually 6-bp long) flanked by the sequence TRTGGAC on the upstream side and GCTA on the downstream side [203]. In the presence of l-arabinose, AraC binds this metabolite and isomerizes to a collapsed configuration [204]; AraC and CAP (PAra is a weak promoter without CAP activation) occupy binding sites proximal to PAra and PAraC and induce RNAP holoenzyme promoter binding and open complex formation [204]. This may be achieved by locally bending DNA such that the αCTDs of the RNA holoenzyme can still contact CAP despite the presence of AraC dimers between these two elements [115], [204]. AraC has been mobilized to engineer a variety of systems under the transcriptional control of this regulator. Many of these engineered systems promote both digital/step-change dynamics [18], [23] and analogue/linear dynamics [25], [205] It is intriguing to note, however, that for AraC-mediated induction at subsaturating levels of l-arabinose (intermediate induction) a dichotomy exists at the single cell level [205], [206]. Thus, a subset of cells in the population express a gene of interest from PBAD, while the remaining cells do not express this gene of interest. This gives the false impression that all cells in the population express the gene of interest at the same intermediate level [205], [206]. As was the case for LacI above, however, systems can be engineered where AraC-mediated regulation can be titratable. For example, titratable expression may be achieved when arabinose importers (e.g. araE) are constitutively expressed (natively, expression is linked to the presence of arabinose) from the E. coli chromosome [205].

TetR

TetR-mediated negative regulation of PTet is another model of tight E. coli gene repression. In its natural context, TetR binds as a dimer to the operator sequence (TCTATCATTGATAGG), which in turn obstructs RNAP holoenzyme-promoter interactions [115], [207], [208], [209]. In the presence of tetracycline (or the analogue aTc) this small molecule binds a tetracycline binding pocket in each dimer, which in turn causes a conformational change in this regulator and reduces its affinity for the above operator sequence [103], [207], [208], [209], [210]. Regulation by TetR at cognate promoters is incredibly tight, with as high as a 5000-fold difference between aTc induced and uninduced expression from some promoters [103]. TetR-mediated regulation has been leveraged to engineer synthetic regulatory elements. When constructing all two-input Boolean logic gates using site-specific recombinases to invert promoters or terminators controlling reporter gene expression, Siuti et al. [19] used TetR- and aTc-mediated regulation to control one of two site-specific recombinases (the other recombinase being controlled by a small regulatory RNA and LuxR- and AHL (acyl-homoserine lactone)-mediated regulation at PLux [see below]). TetR, CAP, LacI, and AraC (discussed above) are also useful components for automated logic gate assembly by the Cello design system [23]. Kotula et al. [24], as noted above, used TetR- and aTc-mediated regulation as a trigger element to regulate cro expression. Elowitz and Leibler [28] used TetR, along with LacI and CI (see above), to construct a synthetic repressilator. As the name of this construct suggests, Elowitz and Leibler described the synchronization and subsequent innate oscillatory character of reporter and regulator protein production and turnover.

LuxR

Although best understood for its functions during the quorum sensing phenomenon [211], [212], [213], [214], AHL-mediated regulation of LuxR is also associated with a variety of engineered transcription control systems. In nature, LuxI synthesizes the auto-inducer AHL, which in turn binds LuxR once sufficiently high levels of AHL are present in the cellular milieu (i.e., AHL binds reversibly and weakly to LuxR at low AHL levels) [211], [212], [213], [214], [215]. AHL-bound LuxR dimers recognize a LuxR box (ACCTGTAGGATCGTACAGGT) upstream of the extended −10 sequence of PLux [211], [212], [213], [214], [215], [216], [217]. In the case of PLux, LuxR dimers make contact with the αCTD subunits of the RNAP holoenzyme, which in turn facilitates promoter recognition (of the weak PLux) [216]. In synthetic biology, leveraging the natural functions of AHL and LuxR in quorum sensing allowed Basu et al. [218] to engineer a system where cells could be induced (aTc-mediated derepression of PLtet-O1) to produce AHL (sender cells) that in turn binds LuxR in receiver cells leading to a pulse of gfp expression (under control of PLux). Due to limited diffusion of AHL in solid media, follow-up work [219] extended LuxR-mediated regulation to demonstrate that band patterns of fluorescent reporter production could form when undifferentiated mixtures of receiver cells (harboring a complex mixture of plasmids with varying copy numbers encoding CI, LuxR variants, LacI variants, promoters responsive to these regulators, and gfp or dsred reporters) were plated with AHL-sender cells. Thus, LuxR- and AHL-mediated regulation could be useful for engineering auto-induction systems [220], [221] to simplify protein production and assist in cross-strain and cross-species communication in microbial consortia [222], [223], [224], [225], [226].

TALEs (transcription activator-like effectors)

Unlike the previous regulators above, which recognize specific DNA sequences, TALEs (transcription activator-like effectors) have a modular structure and may be naturally or artificially engineered to recognize nearly any specific DNA sequence. Thus, these regulators are especially useful tools for engineering transcription control regulation at any sequence on demand (i.e., where no known regulators are able to bind to desired target sequences or known regulators have been used elsewhere in the same construct) [227]. Although originally described as plant transcriptional activators secreted by prokaryotic plant pathogens [228], [229], [230], [231], [232], TALEs have been well-studied as modular DNA binding proteins. TALEs are large (>120 kDa) DNA binding proteins with N- (amino-) and C-termini containing secretion and nuclear localization signals, respectively, and a highly repetitive central helical region [228], [229], [231], [232], [233], [234]. The internal helical region typically contains 15.5–19.5 near perfect repeats of 34 amino acids, which accounts for most of the size of these regulators and imparts the overwhelming majority of DNA sequence-specific binding activity [228], [231], [232], [233], [234], [235], [236], [237], [238]. Indeed, Boch et al. [235] and Moscou and Bogdanove [236] cracked the TALE DNA binding specificity code, where residues 12/13 of the 34 amino acid repeat impart this nucleotide binding specificity (H/D=C, N/G=T, N/I=A, N/S=N, N/N=R, N/—=Y, H/G=T, H/A=V, N/D=C, N/K=G, H/—=T, H/I=C, H/N=G, N/A=G, and I/G=T). Thus, shuffling different repeat regions imparts TALEs with new DNA binding sequence specificities [21], [237], [239], [240], [241], [242], [243], [244], [245], a testament to the TALE modularity. The region immediately N-terminal to the central repeat region is required for initial nonspecific DNA binding [231], [233], [246], [247], [248]. In this initial event, the TALE associates loosely with DNA in an extended conformation and begins a search for its cognate binding sequence through 1D diffusion [246]. Upon recognition of its cognate binding sequence, the extended TALE conformation contracts, and the helical repeats tightly wrap around the major groove of this binding sequence [246]. In addition, the C-terminal domain of natural TALEs may contain regions that interact with transcription factors or may be fused to other regulatory proteins [21], [228], [230], [232], [240], [243], [244], [245]. TALEs have been mobilized as tools to control gene expression in synthetic biology. For the most part, TALE tools have been implemented in eukaryotes [21], [237], [239], [240], [243], [244], [245]. However, Politz et al. [242] demonstrated that a TALE recognizing the LacI operator sequence could effectively block transcription initiation and elongation. Expanding on this work and to demonstrate that TALE-mediated repression in E. coli could be reversed with the addition of small molecule ligands, Copeland et al. [241] used TALEs with TEV protease [249], [250] cleavage sequences and a tev gene expression construct under the control of PTet (aTc inducible). Upon TetR depression of tev (by the addition of aTc) and subsequent TEV-mediated TALE cleavage, a formerly TALE-repressed gfp or mcherry reporter could be subsequently expressed. This response to a small ligand (aTc) had similar dynamics to IPTG-mediated LacI derepression of a mcherry reporter gene, despite being an indirect response. In addition, by changing the binding specificity of the repeat region inside of the TALE scaffold used above, Copeland et al. demonstrated multiplexed gene repression at two E. coli chromosomal loci, the lysA and sucA promoters.

CRISPRi (clustered regularly interspaced short palindromic repeats interference)

The CRISPR (clustered regularly interspaced short palindromic repeats) system is believed to have evolved as a defensive system against foreign DNA [251], [252], [253], [254], [255]. Spacers intervening between inverted repeats (form stem-loops in RNA transcripts) contain nucleic acid sequences of bacteriophages and other mobile genetic elements [251], [252], [253], [254], [255]. In the case of Type II CRISPR systems, the spacer locus is transcribed as a single RNA transcript, a tracrRNA binds adjacent to the spacers (in the stem-loop), and RNase III-mediated processing generates mature guide crRNAs (containing the spacer nucleic acid sequences above) and tracrRNAs [256], [257]. These RNAs are subsequently loaded into Cas9 forming a mature endonuclease complex that will cleave DNAs in a sequence specific manner (directed by the guide crRNAs above) [257], [258], [259], [260], [261], [262], [263]. An additional sequence motif, referred to as the PAM (protospacer adjacent motif), also serves functions during target sequence recognition and subsequent DNA-RNA hybridization [257], [258], [259], [260], [261], [262], [263]. In the case of the commonly used Cas9 variant from Streptococcus pyogenes, NGG (the reverse complement of which occurs upstream of the DNA sequence hybridized to the RNA spacer sequence) is the PAM utilized by this enzyme [261], [262]. Although the catalytically active Cas9 has been extensively used for recombination-mediated genetic engineering [261], [262], [264], [265], [266], catalytically inactive Cas9 (dCas9) also has important regulatory functions [6], [7], [8], [267], [268], [269], [270], [271], [272], [273]. Modifications to the CRISPR system promote sequence-specific transcriptional regulation, CRISPRi (CRISPR interference). Indeed, CRISPR spacer sequences and dCas9 can be directed to nearly any DNA sequence, a similar phenomenon to what was described for TALEs above. Thus, as was the case for TALEs, CRISPRi is an especially useful tool for targeting regulatory proteins (e.g., dCas9) to any DNA sequence on demand [227]. For example, Qi et al. [268] used a dCas9 variant to block transcription initiation and elongation using sgRNAs (single guide RNAs) to various targets in the promoters and coding sequences of variant rfp and gfp reporter genes. Blocking transcription elongation was the most efficient method to decrease expression of the tested reporter genes. To demonstrate that CRISPRi systems could be used to probe a complex regulatory network, Qi et al. also used sgRNAs directed against genes in the lac operon and involved in catabolite repression (see above) to knockdown expression of tested genes. Bikard et al. [269] used a dCas9 variant to block transcription initiation and elongation by targeting the gfp reporter open reading frame, the reporter gfp promoter, and the gfp reporter ribosome binding site using crRNA guides directed against these sequences. This same group also fused the ω subunit of the RNAP core to dCas9 and demonstrated that transcription activation could be promoted by crRNA guides directed at sequences 43 to 59 base pairs upstream of the −35 sequence of a weak promoter. To successfully minimize acetate production in E. coli, Fernandez-Rodriguez et al. [14] used dCas9, a series of light-responsive two-component systems (see below), and sgRNAs directed against poxB, pta, and ackA (all involved in acetate production [274]). This demonstrates that dCas9 is also useful for controlling transcription of genes that encode factors controlling flux through a metabolic pathway. Through fusing activators and repressors to dCas9, others have also shown (for the most part in eukaryotes) that these hybrid proteins can mediate transcription activation and repression [270], [271], [272], [273]. In non-model organisms, CRISPRi can also be used to regulate gene expression. Indeed, Yao et al. [6] used CRISPRi to first regulate gfp reporter gene expression in the cyanobacterium Synechcocystis sp. PCC 6803. This group then used CRISPRi to repress expression of phaE (involved in polyhydroxybutyrate synthesis [275]), glgC (involved in generating glycogen storage molecules [276]), and a variety of aldehyde dehydrogenases and reductases (involved in depleting aldehyde pools available for alkane production [277]). Gordon et al. [8] used CRISPRi in S. elongatus strain PCC 7002 to tune expression of the cpc operon (involved in phycocyanin antennae synthesis) by using sgRNAs directed against cpcB. They also used sgRNAs directed against glnA (depletes α-ketoglutarate pools by converting this metabolite to glutamine, which promotes ammonium [preferred nitrogen source of cyanobacteria] assimilation and is involved in signaling carbon/nitrogen balance [278]) to moderately increase α-ketoglutarate levels, activate NtcA (a catabolic gene regulator activated by reduced ammonium assimilation and the modest increase in α-ketoglutarate levels [278], [279], [280]), and subsequently increase lactate levels (driven by NtcA-mediated enhanced flux to pyruvate from fixed carbon [279]). In the same pursuit, to redirect carbon flow toward succinate accumulation, Huang et al. [7] targeted glgC, sdhA, and sdhB (blocks production of fumarate from succinate [281], [282]) with sgRNAs directed against these genes. Thus, CRISPRi is a useful technology for rationally routing metabolites for the purposes of metabolic engineering.

Two-component regulatory systems

Unlike the one-component systems described above, two-component systems perturb gene expression in response to signals that are often located outside of the cytoplasm. In general, a sensor kinase binds a ligand (or in some cases low ligand concentration shifts the receptor-ligand complex equilibrium toward releasing a ligand). In turn, this ligand binding (or disengagement) stimulates autokinase activity (often at specific histidine residues) of the sensor kinase [64], [65], [66]. The specific response regulator for the activated sensor kinase (crosstalk is generally avoided in prokaryotic systems [64], [283], [284], [285]) is subsequently phosphorylated (usually at specific aspartate residues), undergoes conformational changes, and subsequently mediates differential gene regulation (direct or indirect activation or repression) at regulator-specific promoters [64], [65], [66]. Two-component systems have been used for regulation of synthetic transcription systems. Anderson et al. [22] used the PhoPQ system (PhoQ kinase is inactive under high Mg2+ concentrations [64], [65]), the cognate PMgrB, PLux and PSal (positively regulated by NahR in the presence of salicylate [286]), and gfp and inv (Invasin) [287] reporters to develop an AND digital logic gate. Intriguingly, this logic gate can be used to program bacterial cell invasion of mammalian cells of interest. Using a chimeric receptor (composed of a Cph1 photoreceptor [288], [289], [290] and the EnvZ histidine kinase domain [291]), Levskaya et al. [292] developed a sensor that would be inactivated in the presence of red light (i.e., OmpR would not be phosphorylated by EnvZ [291], [293] to drive expression of a lacZ reporter from POmpC). In an additional impressive feat, this same chimeric light receptor above, LuxR, LuxI, CI, and a lacZ reporter were used by Tabor et al. [294] to build a system that could recognize the edge of projected shapes. Recently, to engineer E. coli capable of distinguishing between red, green, and blue light, Fernandez-Rodriguez et al. [14] used the above chimeric receptor (switched on by infrared light and off by red light, switching the OmpR response regulator off or on, respectively); the CcaS receptor (switched on by green light and off by far-red light, switching the CcaR response regulator on or off, respectively [295], [296]); the YF1 receptor (switched off by blue light, which switches off response regulator FixJ [297]); CI and PhlF regulators; a T7 RNAP core enzyme; supplemental sigma factors for this T7 RNAP core; promoters recognized by each regulator or T7 RNAP holoenzyme; and bFMO, lacZ, and gusA reporter genes. These complex circuits allowed this group to replicate several colored images projected onto E. coli cultures harboring these constructs.

Conclusion

In this mini-review we have discussed several features of the natural biology of transcriptional regulation and several systems that have been used to control transcriptional regulation in model bacteria. Engineering control at this level is especially valuable for titrating protein (and therefore activity) levels, which leads to titratability of metabolites of interest, and engineering a plethora of gene circuits. Indeed, as outlined by the central dogma of molecular biology, transcriptional regulation represents the first level that the flow of genetic information from nucleic acid polymers (DNA and RNA) to amino acid polymers (proteins) can be regulated. Transcriptional regulation, as discussed in detail above, is mediated by controlling promoter strength, positioning and strength of terminators, and by transcriptional regulators. These methods of engineering transcription control can be readily extrapolated to controlling metabolic flux and control engineering in non-model organisms and eukaryotes. Nevertheless, there are still additional layers of post-transcriptional and post-translational control not covered by this mini-review. Furthermore, translation is reduced once transcription of an mRNA ceases. However, this phenomenon and the global effects of nucleoid structure and DNA supercoiling on gene transcription and translation (presumably at the apex of genetic information flow) remain poorly understood. Thus, along with combining the well-characterized regulatory systems discussed above in novel and interesting ways (potentially for practical applications or improved predictability of engineered constructs), exploring the use of novel regulatory systems and expanding control engineering to non-model organisms and eukaryotes remain areas of active research.
  296 in total

1.  Genome alignment, evolution of prokaryotic genome organization, and prediction of gene function using genomic context.

Authors:  Y I Wolf; I B Rogozin; A S Kondrashov; E V Koonin
Journal:  Genome Res       Date:  2001-03       Impact factor: 9.043

2.  Global RNA half-life analysis in Escherichia coli reveals positional patterns of transcript degradation.

Authors:  Douglas W Selinger; Rini Mukherjee Saxena; Kevin J Cheung; George M Church; Carsten Rosenow
Journal:  Genome Res       Date:  2003-02       Impact factor: 9.043

Review 3.  The acetate switch.

Authors:  Alan J Wolfe
Journal:  Microbiol Mol Biol Rev       Date:  2005-03       Impact factor: 11.056

4.  Analysis of the complex transcription termination region of the Escherichia coli rrnB gene.

Authors:  A Orosz; I Boros; P Venetianer
Journal:  Eur J Biochem       Date:  1991-11-01

5.  Fundamental relationship between operon organization and gene expression.

Authors:  Han N Lim; Yeong Lee; Razika Hussein
Journal:  Proc Natl Acad Sci U S A       Date:  2011-06-13       Impact factor: 11.205

6.  Large-scale purification of a stable form of recombinant tobacco etch virus protease.

Authors:  L J Lucast; R T Batey; J A Doudna
Journal:  Biotechniques       Date:  2001-03       Impact factor: 1.993

7.  Identification of invasin: a protein that allows enteric bacteria to penetrate cultured mammalian cells.

Authors:  R R Isberg; D L Voorhis; S Falkow
Journal:  Cell       Date:  1987-08-28       Impact factor: 41.582

Review 8.  Rho-dependent transcription termination: more questions than answers.

Authors:  Sharmistha Banerjee; Jisha Chalissery; Irfan Bandey; Ranjan Sen
Journal:  J Microbiol       Date:  2006-02       Impact factor: 3.422

9.  Direct observation of TALE protein dynamics reveals a two-state search mechanism.

Authors:  Luke Cuculis; Zhanar Abil; Huimin Zhao; Charles M Schroeder
Journal:  Nat Commun       Date:  2015-06-01       Impact factor: 14.919

10.  Unprecedented high-resolution view of bacterial operon architecture revealed by RNA sequencing.

Authors:  Tyrrell Conway; James P Creecy; Scott M Maddox; Joe E Grissom; Trevor L Conkle; Tyler M Shadid; Jun Teramoto; Phillip San Miguel; Tomohiro Shimada; Akira Ishihama; Hirotada Mori; Barry L Wanner
Journal:  MBio       Date:  2014-07-08       Impact factor: 7.867

View more
  24 in total

Review 1.  Recent advances in the applications of promoter engineering for the optimization of metabolite biosynthesis.

Authors:  Ning Xu; Liang Wei; Jun Liu
Journal:  World J Microbiol Biotechnol       Date:  2019-01-31       Impact factor: 3.312

Review 2.  Recent advances in genetic engineering tools based on synthetic biology.

Authors:  Jun Ren; Jingyu Lee; Dokyun Na
Journal:  J Microbiol       Date:  2020-01-02       Impact factor: 3.422

3.  A universal approach to gene expression engineering.

Authors:  Rahmi Lale; Lisa Tietze; Maxime Fages-Lartaud; Jenny Nesje; Ingerid Onsager; Kerstin Engelhardt; Che Fai Alex Wong; Madina Akan; Niklas Hummel; Jörn Kalinowski; Christian Rückert; Martin Frank Hohmann-Marriott
Journal:  Synth Biol (Oxf)       Date:  2022-08-22

4.  Biological signal generators: integrating synthetic biology tools and in silico control.

Authors:  Taylor D Scott; Kieran Sweeney; Megan N McClean
Journal:  Curr Opin Syst Biol       Date:  2019-02-27

5.  TSSFinder-fast and accurate ab initio prediction of the core promoter in eukaryotic genomes.

Authors:  Mauro de Medeiros Oliveira; Igor Bonadio; Alicia Lie de Melo; Glaucia Mendes Souza; Alan Mitchell Durham
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

6.  Metabolic pathway engineering.

Authors:  Hal S Alper; José L Avalos
Journal:  Synth Syst Biotechnol       Date:  2018-02-13

7.  Activation by NarL at the Escherichia coli ogt promoter.

Authors:  Patcharawarin Ruanto; David L Chismon; Joanne Hothersall; Rita E Godfrey; David J Lee; Stephen J W Busby; Douglas F Browning
Journal:  Biochem J       Date:  2020-08-14       Impact factor: 3.857

8.  A frame-shifted gene, which rescued its function by non-natural start codons and its application in constructing synthetic gene circuits.

Authors:  Kathakali Sarkar; Sayak Mukhopadhyay; Deepro Bonnerjee; Rajkamal Srivastava; Sangram Bagh
Journal:  J Biol Eng       Date:  2019-03-01       Impact factor: 4.355

9.  A small-molecule chemical interface for molecular programs.

Authors:  Vasily A Shenshin; Camille Lescanne; Guillaume Gines; Yannick Rondelez
Journal:  Nucleic Acids Res       Date:  2021-07-21       Impact factor: 16.971

10.  Abundance, diversity and domain architecture variability in prokaryotic DNA-binding transcription factors.

Authors:  Ernesto Perez-Rueda; Rafael Hernandez-Guerrero; Mario Alberto Martinez-Nuñez; Dagoberto Armenta-Medina; Israel Sanchez; J Antonio Ibarra
Journal:  PLoS One       Date:  2018-04-03       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.