Literature DB >> 19969537

Mining regulatory 5'UTRs from cDNA deep sequencing datasets.

Jonathan Livny1, Matthew K Waldor.   

Abstract

Regulatory 5' untranslated regions (r5'UTRs) of mRNAs such as riboswitches modulate the expression of genes involved in varied biological processes in both bacteria and eukaryotes. New high-throughput sequencing technologies could provide powerful tools for discovery of novel r5'UTRs, but the size and complexity of the datasets generated by these technologies makes it difficult to differentiate r5'UTRs from the multitude of other types of RNAs detected. Here, we developed and implemented a bioinformatic approach to identify putative r5'UTRs from within large datasets of RNAs recently identified by pyrosequencing of the Vibrio cholerae small transcriptome. This screen yielded only approximately 1% of all non-overlapping RNAs along with 75% of previously annotated r5'UTRs and 69 candidate V. cholerae r5'UTRs. These candidates include several putative functional homologues of diverse r5'UTRs characterized in other species as well as numerous candidates upstream of genes involved in pathways not known to be regulated by r5'UTRs, such as fatty acid oxidation and peptidoglycan catabolism. Two of these novel r5'UTRs were experimentally validated using a GFP reporter-based approach. Our findings suggest that the number and diversity of pathways regulated by r5'UTRs has been underestimated and that deep sequencing-based transcriptomics will be extremely valuable in the search for novel r5'UTRs.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19969537      PMCID: PMC2836559          DOI: 10.1093/nar/gkp1121

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Non-coding RNAs are now known to regulate gene expression in species from all kingdoms of life. Regulatory RNAs in bacteria, which have been identified in diverse species, fall into two main classes: trans-acting RNAs (sRNAs) and regulatory 5′ untranslated regions (r5′UTRs) [reviewed in ref. (1)]. sRNAs are transcribed independently from their target genes and, in most cases, hybridize to cognate mRNAs over short regions of imperfect complimentarity thereby modulating mRNA stability and/or availability for translation. In contrast, r5′UTRs are encoded as part of the mRNA and regulate transcription elongation, translation initiation, or message stability by switching between alternative structures in response to a specific stimulus. r5′UTRs participate in the regulation of a variety of cellular functions, including the biosynthesis, metabolism, and transport of amino acids, small metabolites, and vitamins, the heat- and cold-shock responses, and the autoregulation of ribosomal protein expression (2–6). While all r5′UTRs mediate regulation of gene expression after transcription initiation, the mechanisms by which they act vary considerably. Riboswitches are the most diverse and well-studied class of r5′UTRs. Binding of a cognate metabolite to a riboswitch alters its conformation and thereby affects the stability of a transcription terminator or alters the accessibility of the ribosome binding site (7). A similar mechanism is employed by T-boxes, r5′UTRs identified mainly in Gram-positive species, whose interaction with uncharged tRNAs leads to destabilization of a Rho-independent terminator 5′ of aminoacyl-tRNA synthtetase genes as well as genes involved in amino acid biosynthesis and transport (8–10). Leader peptides, which have been identified predominantly in Gram-negative species, also regulate amino acid biosynthesis operons by modulating transcription elongation (4). These r5′UTRs encode small ORFs with clusters of codons for a specific amino acid(s) followed by a Rho-independent terminator. When the cellular level of their cognate amino acid(s) is low, ribosomal stalling at these clusters destabilizes the adjacent terminator, leading to increased expression of the downstream operon. r5′UTRs can also mediate post- or co-transcriptional autoregulation of gene expression through direct interactions with proteins, a mechanism common in regulating the expression of ribosomal proteins and proteins mediating the cold shock response (3,5). Finally, r5′UTRs known as thermosensors undergo a conformational shift following changes in temperature that affects transcription or translation of the downstream gene (6). r5′UTRs are relatively short and usually do not encode proteins and thus functional homologues of known r5′UTRs are difficult to identify based on primary sequence conservation. However, since secondary-structure plays a central role in r5′UTR function, covariance models that identify predicted RNA structure conservation have proven useful in identifying functional homologues of characterized r5′UTRs. Kingdom-wide searches using covariance models have lead to the identification of many putative homologues of known r5′UTRs in diverse species and provided important insights into the evolution of r5′UTRs (11,12). However, several recent studies using bioinformatic approaches not based on homology to known r5′UTRs have yielded novel classes of r5′UTRs regulating biological pathways not previously known to be regulated by r5′UTRs (13–16). These observations suggest that current annotations of r5′UTRs represent only a partial catalogue, particularly for Gram-negative species where fewer r5′UTRs are known. High-throughput DNA sequencing technologies have recently been used to profile bacterial transcriptomes with unprecedented sensitivity (17–20). These studies have generated very large datasets that contain a great diversity of transcripts, including primary transcripts, processed derivatives, and degradation intermediates of messenger, structural, catalytic and regulatory RNAs. These new methodologies hold great potential for discovery of novel regulatory RNAs. However, to date, methods to distinguish r5′UTRs from the large number of other functional transcripts or from ‘transcriptional noise’ have not been reported. Here, we developed and implemented a bioinformatic approach to mine Vibrio cholerae cDNA deep sequencing datasets for r5′UTR-encoding loci. The results of this screen validate the sensitivity and specificity of our approach in distinguishing r5′UTRs from other types of transcripts, including catalytic, structural and trans-acting regulatory RNAs. Subsequent analyses of the RNAs identified in our screen revealed several putative V. cholerae functional homologues of known r5′UTRs that had been missed in previous annotations, including one that had been misannotated as an sRNA. We also identified dozens of candidates for novel r5′UTRs, two of which were shown to regulate expression of their downstream genes.

MATERIALS AND METHODS

Bioinformatic analyses

Alignment of the 454 reads [Supplementary Table S2 in ref. (17)] to the V. cholerae N16961 genome was conducted with BLASTN version 2.2.17 from NCBI. For each read, only the top hit was kept and only if the percent identity of the alignment was ≥90. Filtering of the 454 dataset for putative r5′UTRs was done using a variety of filters and parameters; the reported combination of filters and parameters was chosen as it yielded the highest ratio of known or putative r5′UTRs to candidate r5′UTRs. For step IV in our filtering of the 454 datasets, reads in different samples were considered to be overlapping if both the 5′- and 3′-ends of one read were within 40 nucleotides of the corresponding ends of the other read. Genome sequences and ORF and COG annotations were obtained from NCBI (accession numbers NC_002505, NC_002506). Gene Ontology (GO) Role Category designations were obtained from TIGR. Rfam annotations were based on version 9.1. Transcription terminators were predicted by RNAMotif, TransTerm and FindTerm as described (21). The Artemis Comparison Tool release 9 (22) was used to visualize 454 read abundances superimposed on genome annotations. Information on the function of Escherichia coli proteins was obtained from EcoCyc (23).

Construction of GFP reporter strains

5′UTRs were amplified using the oligos listed in Supplementary Table S1 and cloned into the NsiI and NheI sites of plasmid pXG10 (24). The respective 5′-end of each 5′UTR insert was determined based on the 5′-end of the corresponding cDNAs detected by 454. Each r5′UTR inserts included the start codon of its 3′ ORF along with up to 19 additional codons.

GFP reporter assays

Escherichia coli DH5a or V. cholerae NI6961 strains carrying the indicated plasmids as well as those carrying a control plasmid [pXG0 (24)] expressing luciferase instead of GFP were grown overnight in LB + 2.5 µg/ml chloramphenicol (Cm2.5). For the experiments shown in Figure 3, these overnight cultures were subcultured 1:50 in 96-well plates in M63 medium (0.2% glucose, 1 mM MgSO4, 1 µg/ml B1, Cm2.5) supplemented with 16 l-amino acids (Sigma Aldrich) (excluding l-leucine, glycine, l-histidine and l-lysine) each added to a final concentration of 20–200 µM. Where indicated l-leucine and glycine (Sigma Aldrich) were added to a final concentration of 2.5 mM, respectively. For experiments described in Figure 4, overnight cultures were sub-cultured 1:250 in LB Cm2.5 and grown to OD600 ∼ 0.8–1. Aliquots of these cultures were washed twice with one volume M63 medium and diluted to an OD600 of ∼0.4 in M6 medium supplemented with casamino acids (Dibco) to the final concentrations indicated. Total culture fluorescence was measured using a Synergy HT Multi-Mode Microplate Reader (BioTek) with 485/20 nm optical excitation filter, 528/20 nm emission filter and measurement height of 8.0 mm. GFP fluorescence was calculated by subtracting the fluorescence of strains carrying the GFP fusions from those carrying the luciferase control plasmid. For experiments described in Figure 4, GFP fluorescence was normalized to OD600.
Figure 3.

Expression of GFP fused to indicated 5′UTRs. Escherichia coli strains carrying the indicated fusions were grown in defined media lacking both glycine and l-leucine (blue diamonds) or supplemented with either glycine (red squares) or l-leucine (green triangles). Results from a representative experiment are shown.

Figure 4.

Effect of increased amino acid concentration on the expression of GFP fused to known or candidate r5′UTRs. Results from representative experiments in E. coli and V. cholerae are shown.

RESULTS AND DISCUSSION

Summary of V. cholerae small transcriptome datasets used in this study

The datasets used in our analyses were obtained by 454 pyrosequencing of DNA libraries complementary to primary transcripts and transcript fragments (from here on referred to collectively as transcripts) 14–200 nucleotides in length isolated from four independent cultures of V. cholerae (17). Of the 681 205 total reads in these datasets, 362 345 align with 100% identity to the V. cholerae genome, corresponding to 37 494 non-identical transcripts and 6208 sets of non-overlapping transcripts. Transcripts overlapping 17 of 19 (90%) and 20 of 22 (91%) previously annotated or characterized sRNAs and r5′UTRs, respectively, were identified. Initial analysis of these datasets yielded numerous candidates for novel sRNAs, several of which were confirmed by northern analysis (17). The identification of so many hitherto unannotated putative sRNAs along with the sensitivity with which previously annotated r5′UTRs were identified suggested to us that there would likely be unannotated V. cholerae r5′UTRs among the transcripts detected by deep sequencing. However, identifying unknown r5′UTRs within these large datasets required an effective way to distinguish such transcripts from the great number and diversity of other types of transcripts.

Filtering the datasets for r5′UTRs

We took several steps to filter our 454 datasets for transcripts derived from r5′UTRs. First, we discarded all transcripts that did not overlap putative 5′UTRs, defined as regions 100 bp upstream of annotated start codons and not overlapping other annotated genes. This filter led to a large reduction in the numbers of total transcripts and known sRNAs but to only a small decrease in the number of previously annotated r5′UTRs (Figure 1, filter I). However, we found that nearly half of previously annotated trans-acting regulatory, structural, and catalytic RNAs and nearly 20% of all unique transcripts detected overlapped putative 5′UTRs. Indeed, transcripts overlapping the 5′UTRs of 1048 (27%) V. cholerae ORFs were identified. Most of these 5′UTR transcripts are likely the result of aborted transcription or incomplete mRNA degradation and do not represent r5′UTRs. Transcripts produced by r5′UTR-mediated regulation usually do not extend into the coding region of the mRNA. Thus, we next removed all transcripts overlapping annotated ORFs. This step also led to a significant decrease in the number of total transcripts but to only a modest reduction in the numbers of known r5′UTRs (Figure 1, filter II). In an effort to enrich our datasets for r5′UTRs relative to sRNAs, we next eliminated all transcripts shorter than 100 nucleotides. This was based on previous observations that transcripts associated with characterized r5′UTRs are almost always longer than 100 nucleotides, whereas a number of known sRNAs are <100 nucleotides in length. Finally, from the set of transcripts fulfilling the above criteria, we filtered out those that did not overlap another transcript in at least one of the three other independent samples (Figure 1, filter IV), since we reasoned that transcripts detected in only one of the four independent samples were less likely to correspond to real r5′UTRs.
Figure 1.

Results of in silico mining of V. cholerae cDNA datasets for r5′UTRs. The values shown correspond to the percentage of all transcripts in the 454 datasets remaining after addition of each filter.

Results of in silico mining of V. cholerae cDNA datasets for r5′UTRs. The values shown correspond to the percentage of all transcripts in the 454 datasets remaining after addition of each filter. As shown in Figure 1, application of filters I–IV removed the vast majority of the total transcripts and of the sRNAs in the dataset but left behind most of the annotated r5′UTRs. Specifically, 96% of the total unique transcripts and 82% of the annotated sRNAs were eliminated, whereas only 25% of the annotated r5′UTRs were removed. The three sRNAs that remained included C1, msr and 6S RNA. C1 is an uncharacterized small intergenic transcript that was discovered by our group in a bioinformatic screen and annotated as a putative sRNA (25). As described below, C1 actually corresponds to an r5′UTR rather than to an sRNA. msr is a non-coding RNA gene found in Retron elements in diverse bacteria whose biological function is poorly understood (26,27). 6S RNA has been well characterized in E. coli where it acts as trans-acting regulatory RNA; however, unlike the vast majority of other characterized sRNAs, 6S does not target mRNAs but rather interacts with and modulates the activity of RNA polymerase (28). Thus, none of the sRNAs remaining in our filtered dataset corresponds to a canonical mRNA-targeting V. cholerae sRNA such as RyhB, Qrr1-4, VrrA, or MicX. Taken together, these observation suggest that our approach was effective in sensitively and specifically distinguishing known r5′UTRs from other transcripts, including other regulatory or catalytic non-coding RNAs. The above analysis yielded transcripts corresponding to 15 characterized or putative r5′UTRs previously annotated in the Rfam database. Rfam is a collection of multiple sequence alignments, consensus secondary structures, and covariance models (CMs) representing families of non-coding RNAs. New members of these families are identified by Rfam based on predicted secondary-structure conservation using sensitive BLAST filters in combination with CMs (12). We were initially surprised that previously annotated putative V. cholerae TPP riboswitches and a putative r5′UTR of ribosomal protein S15 were identified in our screen. Based on characterization of their homologues in other species, these r5′UTRs are thought to act through sequestration of ribosome binding sites (29,30). Thus, these putative r5′UTRs, unlike r5′UTRs that regulate expression through transcription termination, were not expected to produce discrete short transcripts. However, it has been shown that r5′UTRs that do not employ Rho-independent transcriptional termination elicit formation of stem loop structures at specific locations in the 5′UTR. We therefore postulate that even in the absence of a strong termination signal, these structured elements act as sites for transcription termination, RNA processing, and/or boundaries for RNA degradation to reproducibly yield short transcripts terminating within the 5′UTR that are detectable by high-throughput sequencing. Five previously annotated r5′UTRs were eliminated by our filtering. Three were eliminated because their corresponding transcripts overlapped ORFs while the other two were lost because their transcript terminated >100 bp upstream of the annotated start codon of their respective 3′ORFs. Two of the r5′UTRs removed by our filtering, LR-PK1 and mini-ykkC, are putative structured motifs identified in computational screens that have yet to be experimentally validated and thus may not correspond to functional r5′UTRs. However, one of the r5′UTRs missed has been experimentally characterized and the other two belong to well-characterized families of r5′UTRs and thus also likely correspond to bona fide r5′UTRs. It is therefore likely that a significant number of unannotated r5′UTRs are also missing from our list of candidate r5′UTRs.

Identification of candidate V. cholerae functional homologues of previously characterized or predicted r5′UTR based on conserved genomic context

In addition to transcripts overlapping previously characterized or putative regRNAs, we identified transcripts corresponding to 69 r5′UTRs that were not identified in Rfam (from here on referred to as candidate r5′UTRs). Since r5′UTRs are co-transcribed with their target mRNAs, we reasoned that comparing the respective genomic location of candidate and known r5′UTRs vis-à-vis their 3′ genes might enable us to identify functional homologues of known r5′UTRs missed in previous annotations. Thus, we compared the Clusters of Orthologous Groups (COG) designations of genes downstream of all loci we identified to those of genes downstream of known or putative r5′UTRs in Rfam. We also conducted similar comparisons with genes downstream of putative Riboswitch-like elements (RLEs) found in the RibEx database (13). Unlike Rfam, RibEx identifies putative r5′UTRs based on conserved primary sequence upstream of orthologous groups of genes in multiple genera and has been used to identify several hundred putative RLE families in addition to those annotated in Rfam. All genes 3′ of previously annotated V. cholerae r5′UTRs shared COG designations with genes 3′ of numerous Rfam r5′UTRs in other species. We also found many candidate r5′UTRs that share 3′ conserved genomic context (3′CGC) with known or putative r5′UTRs annotated in the Rfam database and/or with putative RLEs in RibEx (Table 1). In some cases, candidates shared 3′CGC with only a few annotated Rfam r5′UTRs and/or with seemingly functionally unrelated families of r5′UTRs, suggesting that the apparent conservation in genomic context might be coincidental and unlikely to reflect a functional or evolutionary relationship between the candidate and previously characterized r5′UTRs. However, 12 candidates (bold in Table 1) were found to share 3′CGC with more than 10 known or putative Rfam r5′UTRs in the same family, strongly suggesting they correspond to bona fide but previously unannotated V. cholerae r5′UTRs. We were surprised to find that C1 shared 3′CGC with numerous r5′UTRs in Rfam. For C1 and 14 candidate r5′UTRs in Table 1 (italicized) there is additional experimental and/or bioinformatic evidence suggesting they correspond to bona fide r5′UTR. Seven of these candidates are discussed in more detail below.
Table 1.

Candidate r5′UTRs sharing conserved genomic context with known families of r5′UTRs or with putative RibEx RLEsa

No./ nameaORF No.ORF product3′CGC (Rfam)b3′CGC and/or putative motif (RibEx)b,c
1VC032650S ribosomal protein L10L10 leader(117)RLE0035(35)
2VC057050S ribosomal protein L13L13 leader(64)RLE0227(17)
3VC0647Polynucleotide phosphorylase/polyadenylaseS15(1)RLE0154I(8)
4*VC0705Chorismate mutase/prephenate dehydrataseT-box(11), T-box(1)
5VC0875Prolyl-tRNA synthetaseT-box(15)RLE018(5)
6*VC0894Thiamine biosynthesis protein ThiISAM-IV(1)RLE0079(7)
7VC1091Oligopeptide ABC transporter, periplasmic oligopeptide-binding proteinT-box(4), SAM(8)RLE0210(6)
8*VC1623Carboxynorspermidine decarboxylasespeF(25), Lysine(17)
9VC2030Ribonuclease Erne5(32)
10*VC2108Erythronate-4-phosphate dehydrogenaseT-box(3), SAM(1)
11*VC2334Hypothetical proteinykoK(3)RLE0310(6)
12VC2356Sodium/alanine symporterGlycine(118)
13VC2439Methyl-accepting chemotaxis proteinGEMM RNA motif(20), SAM(1)
14VC2522Hypothetical proteinyybP-ykoY(9)RLE0334(5)
15VC2645Acetylornithine deacetylaseTPP(4)
16VC2712Xanthine/uracil permease family proteinPyrR(32), Purine(17), TPP(4)
17VCA0142C4-dicarboxylate transport transcriptional regulatory proteinMOCO RNA motif(1), GEMM RNA motif(2)RLE0123(2)
18VCA0179NupC family proteinPurine(11)
19VCA0278Serine hydroxymethyltransferaseGlycine(8)RLE0085(7)
20VCA0287Threonyl-tRNA synthetaseT-box(60)RLE020(5)
C1*VC24902-isopropylmalate synthaseT-box(11), Leu leader(28), ydaO-yuaA(1)
21VC000750S ribosomal protein L34
22VC0218Ribosomal protein L28RLE0348
23VC032450S ribosomal protein L11RLE0241, RLE0148(6)
24VC259730S ribosomal protein S10RLE0110(25)
25VC267950S ribosomal protein L31RLE0089(8)
26VCA0166Cold-shock transcriptional regulator CspARLE0357(6)
27VCA0184Cold-shock DNA-binding domain-containing proteinRLE0357(6)
28VCA0933Cold-shock domain-contain proteinRLE0357(6)
29VCA0819Co-chaperonin GroESRLE0003(75)
30VCA1075Hypothetical proteinRLE0037
31VCA0518Bifunctional fructose-specific PTS proteinRLE0062(8)
32VC2431DNA topoisomerase IV subunit BRLE0226, SAM
33VC2738Phosphoenolpyruvate carboxykinaseRLE0239(5)
34*VC1923Trigger factorRLE0241(7)
35*VC10463-ketoacyl-CoA thiolaseRLE0244(2)
36VC1258DNA gyrase, subunit ARLE0300(4)
37VC0633Outer membrane protein OmpURLE0331(5)
38VC0972Porin, putativeRLE0331(5)
39VC1130DNA-binding protein H-NSRLE0337(5)

aAsterisk denote candidates containing or directly upstream of a putative transcription terminator.

bThe number in parentheses denotes the number of r5′UTR in each family or of RLEs found to share 3′CGC with the candidate r5′UTR.

cRLEs predicted to be encoded by the candidate are underlined, RLEs with 3′CGC are not.

C1—The gene 3′ of C1, 2-isopropylmalate synthase, is annotated as the first gene in a putative leucine biosynthesis operon homologous to the leuABCD operon found in E. coli and many other Enterobacteriacea. In E. coli, this operon is regulated by the leader peptide LeuL (31). As shown in Figure 2A, we found that the C1 transcript overlaps a short open reading frame of 20 residues that encodes two clusters of three leucine-encoding codons. Importantly, half of these are CUA codons, which represent only 8% of all annotated leucine codons in V. cholerae. This over-representation of rare codons is a hallmark of characterized leader peptides and is important for their function (32). Consistent with other leader peptides, the short ORF encoded by C1 is followed by a Rho-independent terminator, which presumably mediates transcription termination in the absence of ribosome stalling. These observations suggest that C1 was misannotated as an sRNA; instead, it likely corresponds to the V. cholerae LeuL leader peptide.
Figure 2.

Features of putative V. cholerae (A) LeuL and (B) PheL leader peptides. The two numbers in bold denote the relative positions of the 5′- and 3′-ends of each transcript based on the 454 data. The third number indicates the relative position of the downstream ORF as annotated by NCBI. Cognate clusters of codons for each leader peptide are shown in red. The ‘+’ and ‘#’ symbols denote stop codons.

Candidate No. 4: vc0705 is a homologue of the gene encoding the E. coli PheA which is subject to co-transcriptional regulation through a leader peptide (33). Indeed, as shown in Figure 2B, candidate No. 4 possesses all the features of a phenylalanine-regulated leader peptide, as it encodes a 45 bp open reading frame that contains a cluster of six phenylalanine residues directly upstream of a putative Rho-independent terminator. Candidate No. 9: The E. coli RNase E has been shown to reduce the stability of its own transcript through its interactions with the rne 5′UTR (34). Putative homologues of this RNase E regulated motif have been identified by Rfam in several genera of Gamma-proteobacteria including Salmonella and Yersinia sp. Candidates No. 20: In E. coli, binding of the threonyl-tRNA to the 5′UTR of its own mRNA has been shown to prevent initiation of translation (35). Candidates No. 3: Expression of E. coli polynucleotide phosphorylase (Pnp) is post-transcriptionally auto-regulated through degradation of a double-stranded structure in the pnp mRNA leader (36–38). Candidate No. 1: In E. coli, L10 binds the 5′UTR of its own transcript to modulate its translation (39). Candidate No. 16: In silico annotations for both purine and PyrR-dependent r5′UTRs suggest they are found almost exclusively in Gram-positive species, with only five of the 357 purine or PyrR r5′UTRs in the Rfam database predicted in Gram-negative strains. However, variants of purine-sensing riboswitches have recently been discovered in Mesoplasma florum that share conserved structure with sequences upstream of putative xanthine/uracil permease genes in Vibrio sp. (40). Features of putative V. cholerae (A) LeuL and (B) PheL leader peptides. The two numbers in bold denote the relative positions of the 5′- and 3′-ends of each transcript based on the 454 data. The third number indicates the relative position of the downstream ORF as annotated by NCBI. Cognate clusters of codons for each leader peptide are shown in red. The ‘+’ and ‘#’ symbols denote stop codons. Candidate r5′UTRs sharing conserved genomic context with known families of r5′UTRs or with putative RibEx RLEsa aAsterisk denote candidates containing or directly upstream of a putative transcription terminator. bThe number in parentheses denotes the number of r5′UTR in each family or of RLEs found to share 3′CGC with the candidate r5′UTR. cRLEs predicted to be encoded by the candidate are underlined, RLEs with 3′CGC are not. Interestingly, only one of the seven putative r5′UTR homologues described above correspond to putative riboswitches, suggesting covariance models may be more effective in identifying functional homologues of known riboswitches compared to those of other types of r5′UTRs. This may reflect the fact that the function of riboswitches dictates a relatively higher level of structure conservation. Specifically, the secondary structures of riboswitches are constrained both at the expression platform and aptamer region, the latter needing to maintain a very specific conformation to preserve ligand specificity. In contrast, the only structural constraint on leader peptide function is in their terminator/antiterminator region. Similarly, auto-regulatory r5′UTRs need only to maintain structures that modulate RBS accessibility and/or affect transcript stability in response to protein binding. However, while covariance models such as those used by Rfam may be effective in identifying well-conserved functional homologues of known riboswitches, transcriptome mining may be more effective in identifying significantly diverged variants of known riboswitch families, such as functional homologues of the purine-sensing riboswitches in Mesoplasma florum. In addition to the seven loci described above, we found a number of candidate r5′UTRs that do not share 3′CGC with r5′UTR families annotated in Rfam but whose genomic context strongly suggests they are r5′UTRs nonetheless (candidates 21–28 in Table 1). Five of these candidates are upstream of genes encoding ribosomal proteins. These observations are consistent with several studies showing that post-transcriptional or co-transcriptional auto-regulation are common mechanisms for modulating the expression of ribosomal proteins (41). Indeed, transcription of the V. cholerae S10 operon has been shown to be regulated by an attenuator in the 5′UTR (42). We also identified candidate loci upstream of genes encoding putative V. cholerae homologues of the E. coli cold-shock proteins CspA and CspE. In E. coli, several genes involved in the cold-shock response, including cspA and cspE, are subject to auto-regulation mediated by structural changes in their 5′UTRs (5). Finally, candidate no. 29 was identified 5′ of the gene encoding the protein chaperone GroES; a putative ‘fourU’ thermosensor has been identified in the 5′UTR of GroES in Salmonella sp. (43). Taken together, our findings suggest that our mining was effective in identifying previously unannotated functional homologues of characterized 5′UTRs.

Identification of candidate r5′UTR that do not share conserved genomic context with known or putative r5′UTRs

Of the 69 previously unannotated putative r5′UTRs we identified in our screen, a total of 30 do not share 3′CGC with known families of r5′UTRs (Table 2). These candidate r5′UTRs were found upstream of genes implicated in a variety of cellular processes and of 12 ORFs encoding hypothetical proteins. Multiple candidates were identified upstream of genes implicated in glycolysis, electron transport and peptidoglycan biosynthesis/metabolism, suggesting these may represent members of novel r5′UTRs families that, like other families of r5′UTRs such as TPP riboswitches, are responsible for regulating different steps in the same pathway or process.
Table 2.

Candidate for novel r5′UTR lacking conserved genomic context with Rfam r5′UTRs or RibEx RLEs

NameORFORF productGene Ontology (GO) role category
40VC2656Fumarate reductase flavoprotein subunitAnaerobic respiration
41VCA0013Maltodextrin phosphorylaseCarbohydrate metabolism
42VC2188FlagellinCiliary or flagellar motility
43VC2678Primosome assembly protein PriADNA replication, synthesis of RNA primer
44VC1442Cytochrome c oxidase, subunit CcoNElectron transport
45VC2295Na( + )-translocating NADH-quinone reductase subunit AElectron transport
46*VC2701Thiol:disulfide interchange protein precursorElectron transport, protein thiol-disulfide exchange, cytochrome complex assembly
47VCA1067Aldehyde dehydrogenaseFermentation
48VC0374Glucose-6-phosphate isomeraseGlycolysis
49VCA0843Glyceraldehyde-3-phosphate dehydrogenaseGlycolysis
50VC0986Adenylate kinaseNucleobase, nucleoside and nucleotide interconversion
51VCA0623Transaldolase BPentose-phosphate shunt
52VCA0870d-alanyl-d-alanine endopeptidasePeptidoglycan biosynthetic
53VC2421N-acetyl-anhydromuranmyl-l-alanine amidasePeptidoglycan metabolism
54VC0322Preprotein translocase subunit SecEProtein secretion
55VC2748Nitrogen regulation protein NR(II)Regulation of nitrogen utilization
56*VC1796Middle operon regulator-related proteinRegulation of transcription, DNA-dependent
57VC1901Sodium/proton antiporterSodium ion transport, hydrogen transport
58VC0347RNA-binding protein HfqTargeting of mRNA for destruction, involved in RNA interference
59VC0038Hypothetical protein
60VC0381Hypothetical protein
61VC1576Hypothetical protein
62VC1613Hypothetical protein
63VC1891Hypothetical protein
64VC2002Hypothetical protein
65VC2264Hypothetical protein
66VC2647Hypothetical protein
67VCA0327Hypothetical protein
68VCA0363Hypothetical protein
69VCA0743Hypothetical protein

Asterisk denotes candidates containing or directly upstream of a putative transcription terminator.

Candidate for novel r5′UTR lacking conserved genomic context with Rfam r5′UTRs or RibEx RLEs Asterisk denotes candidates containing or directly upstream of a putative transcription terminator.

Using a GFP reporter approach to measure r5′UTR-mediated regulation

To experimentally test and begin to characterize a few of the candidate r5′UTRs identified in our screen, we adapted an approach that was developed by Urban et al. (24) to study sRNA-mediated regulation of mRNAs in trans. Urban and colleagues constructed a plasmid into which 5′UTRs of interest can be introduced directly downstream of a constitutive promoter to create translational fusions with a gene encoding GFP. The fluorescence generated from GFP is used as a means to gauge GFP expression from different fusions; a control fusion of GFP to the E. coli LacZ 5′UTR serves as a negative control for these assays. Since the identical constitutive promoter is present in all fusions, these constructs can be used to measure regulation of gene expression that is not mediated by changes in transcription initiation. To test the efficacy of this approach for measuring r5′UTR-mediated regulation, we compared expression of GFP fused to two characterized r5′UTRs, E. coli LeuL and the V. cholerae Glycine riboswitch, to that of GFP fused to the E. coli LacZ 5′UTR. As shown in Figure 3, when cultures were grown in minimal media supplemented with 16 amino acids excluding leucine and glycine, expression of all three fusions increased significantly. In the control (LacZ) fusion, GFP expression was similar when this medium was supplemented with either glycine or leucine (Figure 3). In contrast, expression of GFP fused to the V. cholerae Glycine riboswitch was almost completely repressed when glycine was added to the media; inhibition of GFP expression by glycine appears to be fairly specific, since addition of leucine did not repress GFP expression (Figure 3). Amino acid specificity was also observed with GFP expression from the LeuL fusion. However, in this case, expression was markedly decreased when leucine was added; the addition of glycine did not inhibit expression (Figure 3). Taken together, these observations suggest that monitoring GFP expression with this reporter system is a useful technique for investigating r5′UTR-mediated regulation. Finally, similar to the LeuL fusion, expression of a C1-GFP fusion was also repressed by leucine but not glycine, providing strong support for the bioinformatic evidence implicating C1 as the V. cholerae LeuL. Expression of GFP fused to indicated 5′UTRs. Escherichia coli strains carrying the indicated fusions were grown in defined media lacking both glycine and l-leucine (blue diamonds) or supplemented with either glycine (red squares) or l-leucine (green triangles). Results from a representative experiment are shown. In the case of LeuL, our results with the GFP fusion are consistent with previous studies showing that the LeuL leader down-regulates expression of the leuABCD operon in response to leucine (31,32). However, the glycine-mediated repression of GFP expression by the V. cholerae Glycine riboswitch was surprising as Mandal et al. (44) found that glycine had the opposite effect on expression of a reporter gene fused to the B. subtilis Glycine riboswitch. Interestingly, even though the B. subtilis and V. cholerae Glycine riboswitches share significant structure conservation (44), they are encoded 5′ of unrelated genes, the former upstream of gcvT, an aminomethyltransferase that mediates conversion of glycine to serine, and the latter upstream of vc1422, a putative sodium/alanine symporter. VC1422 is a homologue of E. coli CycA, an APC family transporter of glycine, serine and alanine (45), as well as several other gene products annotated as glycine symporters in both Gram-positive and Gram-negative species. Our findings suggest that even though the aptamer regions of the V. cholerae and B. subtilis Glycine riboswitches share significant structural conservation that has maintained their specificity for glycine, the two riboswitches elicit opposite regulatory responses on their respective 3′ genes. The V. cholerae Glycine riboswitch appears to have evolved to up-regulate glycine uptake in the absence of glycine, whereas the B. subtilis Glycine riboswitch has evolved to up-regulate glycine catabolism when glycine is abundant. The mechanisms that account for how these similar riboswitches elicit opposite effects on the expression of their respective 3′ genes warrants further investigation.

Two candidates for novel r5′UTRs down-regulate expression of their downstream gene in response to increased amino acid concentration

In the experiments described above, the cognate signals for the r5′UTRs of interest were known based on previous studies. However, for the candidate r5′UTRs that do not share 3′CGC with well-characterized classes of r5′UTRs, a priori determination of these signals is difficult. Since many r5′UTRs are known to be regulated by amino acids, we constructed fusions of GFP with several candidate r5′UTRs and measured their expression in minimal media supplemented with either 1 or 0.1% casamino acids (CAA). As shown in Figure 4, a construct carrying the E. coli LacZ 5′UTR produced more GFP in the presence of 1% casamino acids (CAA) than in 0.1% CAA, presumably due to an increase in translation efficiency. In contrast, fusions of GFP with E. coli or V. cholerae LeuL or the V. cholerae Glycine riboswitch exhibited less GFP expression in high CAA. Similar patterns of GFP expression (higher in 0.1% than 1% CAA) were also observed when two candidate r5′UTRs were fused to GFP (Figure 4, ppbG and thiI). One of these candidates is upstream of vca0870 encoding the V. cholerae homologue of penicillin-binding protein 7 (pbpG), a protein involved in peptidoglycan metabolism (46). The other candidate is upstream of a gene annotated as thiI. ThiI has been implicated in thiamine biosynthesis and tRNA modification; in Salmonella typhimurium, ThiI is the only component of the thiamine biosynthesis pathway whose expression is not regulated by TPP riboswitches (47). These observations suggest that the pbpG and thiI UTRs mediate co- or post-transcriptional repression of their respective downstream genes when amino acid concentrations increase. However, it is not clear from these data whether the pbpG and thiI UTRs influence on gene expression is triggered by their direct interaction with amino acids or through the participation of other factor(s). As shown in Figure 4, both of these candidate r5′UTRs exhibited more GFP expression in 0.1% than 1% CAA in V. cholerae as well as in E. coli. Thus, if additional factors are required for the regulatory effects of these V. cholerae UTRs, these factors appear to be conserved in E. coli. The relative expression of GFP from the reporter construct carrying the 5′UTR of candidate No. 12 in low and high CAA was similar to the control LacZ construct (Figure 4), suggesting that this candidate r5′UTR is not sensitive to changes in amino acid concentration; alternatively this candidate may not correspond to a r5′UTR. Effect of increased amino acid concentration on the expression of GFP fused to known or candidate r5′UTRs. Results from representative experiments in E. coli and V. cholerae are shown. As shown in Table 1, the thiI r5′UTR shares conserved 3′CGC with one SAM-IV riboswitch and with the RLE0079 motif. The latter motif was identified upstream of thiI homologues in seven Gram-negative species (13). We identified a canonical Rho-independent terminator near the 3′ end of the thiI 5′UTR, suggesting that the regulatory effects of this UTR on thiI expression may be achieved through a terminator/antiterminator switch. Indeed, in Northern analyses, the abundance of a small transcript overlapping the thiI 5′UTR was markedly increased in high versus low CAA (data not shown). Putative Rho-independent terminators were also identified within 100 bps of the thiI start codon in several E. coli strains, Shewanella sp., Streptococcus sp. and Vibrio sp. (21), suggesting that the thiI homologues in these strains may be regulated by a similar mechanism. The pbpG r5′UTR lacks 3′CGC with any known or putative r5′UTRs. Thus, it is not clear if this motif is conserved in other species. Since no terminator was predicted in the pbpG 5′UTR, the mechanism by which this r5′UTR mediates regulation of its downstream message is not clear.

CONCLUSIONS

Taken together our findings suggest that transcriptome profiles acquired through new deep sequencing techniques will be a rich source of information about r5′UTRs. We developed a simple set of filters to mine the V. cholerae small transcriptome acquired by pyrosequencing of cDNA libraries. Our approach appears to be effective, as we identified most of the previously annotated, though in most cases not experimentally verified, r5′UTRs but relatively few of the total transcripts or trans-acting regulatory RNAs found in the original datasets. We also identified numerous candidate r5′UTRs not annotated in previous computational screens that share conserved genomic context with known r5′UTRs. Finally, we identified candidate r5′UTRs upstream of several classes of genes whose expression has not been previously shown to be subject to regulation by r5′UTRs. Thus, our findings highlight the utility of mining deep-sequencing transcriptome data as a complementary approach to computational screens for identifying r5′UTRs. Overall, our observations suggest that the distribution of known classes of r5′UTRs and the diversity of functions regulated by r5′UTRs are much greater than what has been suggested by previous in silico genomics-based annotations. Although conservation-based computational approaches such as Rfam are invaluable for identification of r5′UTRs, their reliance on homology to known r5′UTRs is an inherent limitation which preclude the identification of new classes of r5′UTRs. Also, since these approaches often rely on seed alignments of r5′UTRs from closely related species, identification of functional homologues of known r5′UTRs in species that are highly diverged from those represented in the seed is often not possible. Thus, using high-throughput transcriptomics to identify novel r5′UTRs and/or functional homologues of known r5′UTRs in less well-studied bacterial species and then integrating these loci into kingdom-wide bioinformatic screens could significantly improve annotations for r5′UTRs, particularly outside well-studied genera. Several recent studies have revealed that the diversity of ligands and environmental cues that elicit r5′UTR-mediated regulation is greater than previously thought. Thus, as more families of r5′UTRs are identified using a variety of approaches, the task of identifying each of their specific cognate signals will become increasingly daunting. The GFP reporter approach we have implemented for validating r5′UTR-mediated regulation here should be useful in addressing this challenge, providing an efficient way to screen a large number of candidate r5′UTRs in a wide variety of conditions.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

National Institutes of Health (National Institute of Allergy and Infectious Diseases K99/R00 Pathways to Independence Award AI-076608 to J.L., R37-AI-42347 to M.K.W.); Howard Hughes Medical Institute (to M.K.W.). Funding for open access charge: Howard Hughes Medical Institute. Conflict of interest statement. None declared.
  47 in total

1.  Regulation of ribosomal protein synthesis in Vibrio cholerae.

Authors:  Todd D Allen; Tonya Watkins; Lasse Lindahl; Janice M Zengel
Journal:  J Bacteriol       Date:  2004-09       Impact factor: 3.490

2.  A glycine-dependent riboswitch that uses cooperative binding to control gene expression.

Authors:  Maumita Mandal; Mark Lee; Jeffrey E Barrick; Zasha Weinberg; Gail Mitchell Emilsson; Walter L Ruzzo; Ronald R Breaker
Journal:  Science       Date:  2004-10-08       Impact factor: 47.728

Review 3.  Diverse mechanisms for regulating ribosomal protein synthesis in Escherichia coli.

Authors:  J M Zengel; L Lindahl
Journal:  Prog Nucleic Acid Res Mol Biol       Date:  1994

4.  Polynucleotide phosphorylase of Escherichia coli induces the degradation of its RNase III processed messenger by preventing its translation.

Authors:  M Robert-Le Meur; C Portier
Journal:  Nucleic Acids Res       Date:  1994-02-11       Impact factor: 16.971

5.  Control of leu operon expression in Escherichia coli by a transcription attenuation mechanism.

Authors:  S R Wessler; J M Calvo
Journal:  J Mol Biol       Date:  1981-07-15       Impact factor: 5.469

6.  Regulation of pheA expression by the pheR product in Escherichia coli is mediated through attenuation of transcription.

Authors:  N Gavini; B E Davidson
Journal:  J Biol Chem       Date:  1991-04-25       Impact factor: 5.157

7.  Transcription attenuation in Salmonella typhimurium: the significance of rare leucine codons in the leu leader.

Authors:  P W Carter; J M Bartkus; J M Calvo
Journal:  Proc Natl Acad Sci U S A       Date:  1986-11       Impact factor: 11.205

8.  tRNA as a positive regulator of transcription antitermination in B. subtilis.

Authors:  F J Grundy; T M Henkin
Journal:  Cell       Date:  1993-08-13       Impact factor: 41.582

9.  E.coli polynucleotide phosphorylase expression is autoregulated through an RNase III-dependent mechanism.

Authors:  M Robert-Le Meur; C Portier
Journal:  EMBO J       Date:  1992-07       Impact factor: 11.598

10.  Autogenous control: ribosomal protein L10-L12 complex binds to the leader sequence of its mRNA.

Authors:  M Johnsen; T Christensen; P P Dennis; N P Fiil
Journal:  EMBO J       Date:  1982       Impact factor: 11.598

View more
  14 in total

1.  Riboswitch control of Rho-dependent transcription termination.

Authors:  Kerry Hollands; Sergey Proshkin; Svetlana Sklyarova; Vitaly Epshtein; Alexander Mironov; Evgeny Nudler; Eduardo A Groisman
Journal:  Proc Natl Acad Sci U S A       Date:  2012-03-19       Impact factor: 11.205

2.  Genome-wide detection of novel regulatory RNAs in E. coli.

Authors:  Rahul Raghavan; Eduardo A Groisman; Howard Ochman
Journal:  Genome Res       Date:  2011-06-10       Impact factor: 9.043

3.  Transcriptomic profiling of the oyster pathogen Vibrio splendidus opens a window on the evolutionary dynamics of the small RNA repertoire in the Vibrio genus.

Authors:  Claire Toffano-Nioche; An N Nguyen; Claire Kuchly; Alban Ott; Daniel Gautheret; Philippe Bouloc; Annick Jacq
Journal:  RNA       Date:  2012-10-24       Impact factor: 4.942

4.  How deep is deep enough for RNA-Seq profiling of bacterial transcriptomes?

Authors:  Brian J Haas; Melissa Chin; Chad Nusbaum; Bruce W Birren; Jonathan Livny
Journal:  BMC Genomics       Date:  2012-12-27       Impact factor: 3.969

5.  The intracellular sRNA transcriptome of Listeria monocytogenes during growth in macrophages.

Authors:  Mobarak A Mraheil; André Billion; Walid Mohamed; Krishnendu Mukherjee; Carsten Kuenne; Jordan Pischimarov; Christian Krawitz; Julia Retey; Thomas Hartsch; Trinad Chakraborty; Torsten Hain
Journal:  Nucleic Acids Res       Date:  2011-01-29       Impact factor: 16.971

6.  Deep sequencing uncovers numerous small RNAs on all four replicons of the plant pathogen Agrobacterium tumefaciens.

Authors:  Ina Wilms; Aaron Overlöper; Minou Nowrousian; Cynthia M Sharma; Franz Narberhaus
Journal:  RNA Biol       Date:  2012-02-16       Impact factor: 4.652

7.  Whole-genome mapping of 5' RNA ends in bacteria by tagged sequencing: a comprehensive view in Enterococcus faecalis.

Authors:  Nicolas Innocenti; Monica Golumbeanu; Aymeric Fouquier d'Hérouël; Caroline Lacoux; Rémy A Bonnin; Sean P Kennedy; Françoise Wessner; Pascale Serror; Philippe Bouloc; Francis Repoila; Erik Aurell
Journal:  RNA       Date:  2015-03-03       Impact factor: 4.942

8.  Prokaryotic whole-transcriptome analysis: deep sequencing and tiling arrays.

Authors:  Roland J Siezen; Greer Wilson; Tilman Todt
Journal:  Microb Biotechnol       Date:  2010-03       Impact factor: 5.813

9.  Parallel evolution of genome structure and transcriptional landscape in the Epsilonproteobacteria.

Authors:  Ida Porcelli; Mark Reuter; Bruce M Pearson; Thomas Wilhelm; Arnoud H M van Vliet
Journal:  BMC Genomics       Date:  2013-09-12       Impact factor: 3.969

10.  Ultra deep sequencing of Listeria monocytogenes sRNA transcriptome revealed new antisense RNAs.

Authors:  Sebastian Behrens; Stefanie Widder; Gopala Krishna Mannala; Xiaoxing Qing; Ramakanth Madhugiri; Nathalie Kefer; Mobarak Abu Mraheil; Thomas Rattei; Torsten Hain
Journal:  PLoS One       Date:  2014-02-03       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.