| Literature DB >> 29973927 |
Cauã A Westmann1, Luana de Fátima Alves2,3, Rafael Silva-Rocha1, María-Eugenia Guazzaroni2.
Abstract
Although functional metagenomics has been widely employed for the discovery of genes relevant to biotechnology and biomedicine, its potential for assessing the diversity of transcriptional regulatory elements of microbial communities has remained poorly explored. Here, we experimentally mined novel constitutive promoter sequences in metagenomic libraries by combining a bi-directional reporter vector, high-throughput fluorescence assays and predictive computational methods. Through the expression profiling of fluorescent clones from two independent soil sample libraries, we have analyzed the regulatory dynamics of 260 clones with candidate promoters as a set of active metagenomic promoters in the host Escherichia coli. Through an in-depth analysis of selected clones, we were able to further explore the architecture of metagenomic fragments and to report the presence of multiple promoters per fragment with a dominant promoter driving the expression profile. These approaches resulted in the identification of 33 novel active promoters from metagenomic DNA originated from very diverse phylogenetic groups. The in silico and in vivo analysis of these individual promoters allowed the generation of a constitutive promoter consensus for exogenous sequences recognizable by E. coli in metagenomic studies. The results presented here demonstrates the potential of functional metagenomics for exploring environmental bacterial communities as a source of novel regulatory genetic parts to expand the toolbox for microbial engineering.Entities:
Keywords: bi-directional reporter; constitutive promoters; functional metagenomics; high-throughput screening; synthetic biology
Year: 2018 PMID: 29973927 PMCID: PMC6019500 DOI: 10.3389/fmicb.2018.01344
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Figure 1Schematic representation of the workflow for finding, characterizing and cross-validating novel bacterial cis-regulatory elements in environmental samples. From left to right: firstly, we have generated metagenomic libraries from soil samples in E. coli DH10B. The DNA fragments were cloned into a bi-directional reporter trap-vector (bearing mCherry and GFPlva fluorescent reporters), pMR1, which allowed for the screening of promoters in both DNA strands. Secondly, we have manually screened all visible fluorescent clones from our metagenomic libraries and analyzed the expression patterns of all green fluorescent clones on a microplate reader during 8 h. Lastly, we have selected 10 clones based on their GFPlva expression patterns for an in-depth analysis combining experimental (small DNA insert library generation) and in silico promoter prediction. This integrated strategy has allowed us to identify, validate and estimate the accessibility of novel promoter regions from metagenomic libraries.
Features of the generated metagenomic libraries.
| Total number of clones | 100,000 | 90,000 |
| Percentage of clones with insert (%) | 60 | 70 |
| Number of clones with insert | 60,000 | 63,000 |
| Total number and rate | 400 (1:150) | 700 (1:90) |
| Total number and rate | 270 (1:220) | 400 (1:157) |
| Total number and rate | 130 (1:460) | 300 (1:210) |
| Average insert size (kb) | 4.5 | 3.7 |
| Total metagenomic library size (Mb) | 270 | 233 |
| Estimated number of genomes | 60 | 52 |
Rate represented by the number of fluorescent clones divided by the total number of clones with inserts.
Assuming 4.5 Mb per genome (Raes et al., .
Figure 2Evaluating the expression dynamics of fluorescent clones. (A) LB-agar plate under blue light excitation comprising a subset of metagenomic isolated clones expressing GFPlva (top) and mCherry (bottom) fluorescent reporters. A few clones were observed to express both reporters. All isolated clones were initially considered to hold at least one endogenous promoter. (B,C) Indirect assessment of maturation times from both fluorescent reporters GFPlva (B) and mCherry (C) after 8 h (light bars) and 24 h (dark bars) of the beginning of the experiment. Maturation times are substantially lower for mCherry than for GFPlva, which excluded the former from further analyses. Positive controls for GFP and mCherry are represented by p100 and pRED, respectively. Fluorescence data has been normalized by OD600 values for each sample following normalization by values from the negative control (empty-pMR1). Data was transformed to log2 scale to allow better visualization of fluorescence variation. (D) Hierarchical representation of a metaconstitutome (i.e., all expression profiles from a single metagenomic library (USP3) in E. coli. Fluorescence time-lapse dynamics were measured during 8 h for each clone and represented as heat maps. Promoter activities (calculated as GFP/OD600) were normalized by the negative control (E. coli DH10B harboring empty pMR1) and transformed to log2 scale in order to facilitate the visualization of subtle activities. Positive controls (p100, p106, and p114-strong, medium and low expression, respectively) and negative control (pMR1) expression profiles are indicated by black arrows at the left side of the heatmap. Data are representative of three independent experiments.
Description of the ORFs contained in plasmids from the selected clones (pCAW1 to pCAW10) and their sequence similarities.
| 55% | Proteobacteria or Verrucomicrobia | 1 | Minus | 131 | Hypothetical protein (416) | 68% | Alginate lyase | |||
| 2 | Plus | 271 | Hypothetical protein (261) | 73% | 17-B-hydroxysteroid dehydrogenase | |||||
| 3 | Plus | 295 | Beta-glucosidase (777) | 66% | Beta-glucosidase | |||||
| 52% | Actinobacteria | 1 | Plus | 304 | Unkonwn | 33% | Unknown | |||
| 2 | Plus | 249 | Unkonwn | 33% | Unknown | |||||
| 53% | Proteobacteria | 1 | Minus | 318 | IS4 family Transposase (320) | 96% | IS4 family transposase | |||
| 2 | Minus | 1011 | DNA-directed RNA polymerase subunit beta' (1430) | 83% | RNA polymerase - Beta Subunit | |||||
| 3 | Plus | 120 | Uncharacterised protein (135) | 47% | Unknown | |||||
| 4 | Plus | 151 | Uncharacterised protein (130) | 37% | Unknown | |||||
| 5 | Plus | 94 | Uncharacterised protein (64) | 82% | Unknown | |||||
| 6 | Plus | 96 | Uncharacterised protein (86) | 48% | Unknown | |||||
| 7 | Plus | 173 | predicted protein (585) | 26% | Unknown | |||||
| 61% | Proteobacteria | 1 | Minus | 245 | Nosine monophosphate cyclohydrolase (246) | 63% | IMP cyclohydrolase | |||
| 2 | Minus | 214 | Phosphodiesterase (498) | 40% | Phosphodiesterase | |||||
| 3 | Minus | 402 | Hypothetical protein A2Y08_02680 (625) | 43% | Unknown | |||||
| 4 | Plus | 142 | Gentisate 1,2-dioxygenase (349) | 60% | Gentisate 1,2-dioxygenase | |||||
| 54% | Verrucomicrobia | 1 | Plus | 642 | Pyruvate:ferredoxin oxidoreductase (1565) | 80% | Pyruvate:ferredoxin oxidoreductase | |||
| 57% | Chloroflexi or Proteobacteria | 1 | Plus | 159 | Hypothetical protein BGO39_33875 (215) | 65% | MerR family | |||
| 2 | Plus | 336 | Hypothetical protein BGO39_33870 (347) | 78% | PrsW intramembrane metalloprotease | |||||
| 3 | Plus | 163 | Hypothetical protein BGO39_33865 (173) | 75% | Chromate transporter | |||||
| 46% | Actinobacteria | 1 | Minus | 391 | Hypothetical protein A2X07_06330 (480) | 45% | Por secretion system sorting domain | |||
| 2 | Minus | 250 | Hypothetical protein (586) | 65% | Polysaccharide Lyase | |||||
| 57% | Actinobacteria | 1 | Plus | 508 | Hypothetical protein AUH20_02325 (597) | 76% | 5-oxoprolinase / Hydantoinase_B | |||
| 2 | Minus | 348 | Oxidoreductase (336) | 61% | Flavin-utilizing monoxygenases | |||||
| 3 | Plus | 314 | Hypothetical protein ETSY1_46935 (279) | 76% | Cellulose biosynthesis BcsQ | |||||
| 43% | Bacteroidetes or Proteobacteria | 1 | Minus | 81 | Hypothetical protein (129) | 50% | Unknown | |||
| 2 | Minus | 303 | Formylglycine-generating enzyme (379) | 65% | Formylglycine-generating enzyme | |||||
| 3 | Minus | 457 | Acetylglucosamine-6-sulfatase (504) | 67% | Acetylglucosamine-6-sulfatase | |||||
| 56% | Proteobacteria | 1 | Plus | 204 | Hypothetical protein (195) | 50% | Unknown |
Classification based on PhylopythiaS (Patil et al., 2012) webserver
Truncated proteins
aa, amino acids
Sequences with an E-value higher than 0.001 in Blastp searches were considered to be unknown proteins
Classification based on Blastp.
Figure 3Schematic representation of six metagenomic inserts (contigs) showing predicted ORFs and experimentally validated/characterized promoters. Each contig is identified on the far left of each subfigure. Promoters are indicated by elbow-shaped arrows and name according to their relative position in the contig. Promoter directionality, regarding the leading and lagging strands, is represented by yellow and blue colors, respectively. Asterisks over specific promoters indicate regulatory regions which were cross-validated by matching in silico predictions. Dark arrows represent predicted ORFs, according to their relative positions in each contig (see Table 2 for more information). All genetic features respect their original relative sizes, following the 1 kb scale depicted at the bottom of this figure. Beneath each metagenomic insert, there is a heat map cluster representing the whole set of promoter activities measured during 8-h fluorescence assays. The first line of each cluster shows the original expression profile initially measured for each metagenomic insert. All other lines represent expression activities from de novo experimentally validated promoters within each contig (small DNA fragments). The second line of each cluster represents the endogenous promoter showing the most similar activity with respect to the original expression profile for each contig. All expression profiles are properly identified at the most rightmost side of each line, following their respective contig/promoter name. For the supplementary set of analyzed contigs, see Figure S4.
Figure 4Consensus of RpoD-related metagenomic promoters. (A) Known consensus sequences of the RpoD-dependent promoter determined in vitro, TTGACA (−35) and TATAAT (−10) separated by 17 plus/minus 2 bp in E. coli (Shimada et al., 2014). (B) Known consensus sequences of 582 promoters experimentally validated in E. coli (Shimada et al., 2014; Gama-Castro et al., 2016; Keseler et al., 2017). (C) The sequences of the 33 promoters experimentally validated in this study were aligned and subjected to Logo analysis (Crooks et al., 2004). The consensus from the metagenomic set (C) is very similar to the one from the experimentally validated set from E. coli (B).