Literature DB >> 20700540

sRNAscanner: a computational tool for intergenic small RNA detection in bacterial genomes.

Jayavel Sridhar1, Narmada Sambaturu, Suryanarayanan Ramkumar Narmada, Radhakrishnan Sabarinathan, Hong-Yu Ou, Zixin Deng, Kanagaraj Sekar, Ziauddin Ahamed Rafi, Kumar Rajakumar.   

Abstract

BACKGROUND: Bacterial non-coding small RNAs (sRNAs) have attracted considerable attention due to their ubiquitous nature and contribution to numerous cellular processes including survival, adaptation and pathogenesis. Existing computational approaches for identifying bacterial sRNAs demonstrate varying levels of success and there remains considerable room for improvement. METHODOLOGY/PRINCIPAL
FINDINGS: Here we have proposed a transcriptional signal-based computational method to identify intergenic sRNA transcriptional units (TUs) in completely sequenced bacterial genomes. Our sRNAscanner tool uses position weight matrices derived from experimentally defined E. coli K-12 MG1655 sRNA promoter and rho-independent terminator signals to identify intergenic sRNA TUs through sliding window based genome scans. Analysis of genomes representative of twelve species suggested that sRNAscanner demonstrated equivalent sensitivity to sRNAPredict2, the best performing bioinformatics tool available presently. However, each algorithm yielded substantial numbers of known and uncharacterized hits that were unique to one or the other tool only. sRNAscanner identified 118 novel putative intergenic sRNA genes in Salmonella enterica Typhimurium LT2, none of which were flagged by sRNAPredict2. Candidate sRNA locations were compared with available deep sequencing libraries derived from Hfq-co-immunoprecipitated RNA purified from a second Typhimurium strain (Sittka et al. (2008) PLoS Genetics 4: e1000163). Sixteen potential novel sRNAs computationally predicted and detected in deep sequencing libraries were selected for experimental validation by Northern analysis using total RNA isolated from bacteria grown under eleven different growth conditions. RNA bands of expected sizes were detected in Northern blots for six of the examined candidates. Furthermore, the 5'-ends of these six Northern-supported sRNA candidates were successfully mapped using 5'-RACE analysis.
CONCLUSIONS/SIGNIFICANCE: We have developed, computationally examined and experimentally validated the sRNAscanner algorithm. Data derived from this study has successfully identified six novel S. Typhimurium sRNA genes. In addition, the computational specificity analysis we have undertaken suggests that approximately 40% of sRNAscanner hits with high cumulative sum of scores represent genuine, undiscovered sRNA genes. Collectively, these data strongly support the utility of sRNAscanner and offer a glimpse of its potential to reveal large numbers of sRNA genes that have to date defied identification. sRNAscanner is available from: http://bicmku.in:8081/sRNAscanner or http://cluster.physics.iisc.ernet.in/sRNAscanner/.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20700540      PMCID: PMC2916834          DOI: 10.1371/journal.pone.0011970

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Systematic experimental and computational approaches have led to the identification of ∼92 small RNAs (sRNAs) in Escherichia coli K12 MG1655 alone [1]. Many sRNAs have been assigned regulatory roles in the survival and physiology of the organism [2]. Prokaryotic sRNAs are known to play roles in regulation of sporulation [3], sugar metabolism [4], iron homeostasis [5], survival under oxidative stress [6], DNA damage repair, maintenance of cell surface components [7] and regulation of pathogenicity [8]. Though sRNAs do not code for peptides they exert their function through antisense modes by RNA–RNA base pairing [9], [10] or by antagonizing target proteins through RNA–protein interactions [11]. Genomic screens for sRNAs have been most extensively conducted in the model organisms E. coli K-12 [12], [13] and Bacillus subtilis [3]. More recently, significant numbers of sRNAs in pathogens such as Staphylococcus aureus [14], Pseudomonas aeruginosa [15] and Listeria monocytogenes [16] have been identified, though functional roles of the majority remain to be determined. Most computational methods, such as QRNA [17] and Intergenic Sequence Inspector [18], use intergenic sequence conservation among related genomes to identify sRNAs. By contrast, the RNAz [19] and sRNAPredict [15], [20] programs utilize estimated thermodynamic stability of conserved RNA structures and existing ‘orphan’ promoter and terminator annotations for sRNA predictions, respectively. Previous studies by Argaman et al. [12], Chen et al. [21], Pfeiffer et al. [22] and Valverde et al. [23] had used promoter and terminator signals to predict sRNAs but did not provide computational scripts for general use. This study implements a generic transcriptional signal detection strategy and applies it systematically to obtain reproducible computational results and matching ‘prediction scores’. Furthermore, sRNAPredict [15], [20] and SIPHT [24] require available promoter information and databases of rho-independent terminators predicted by TransTermHP [25] to identify sRNAs. Moreover, sRNAPredict2 requires as inputs sequence and structure conservation data as identified by Blast and QRNA, respectively, markedly hampering detection of sRNAs mapping to non-conserved intergenic sequences. The proposed tool overcomes these limitations by searching genome sequences for orphan transcriptional signals and integrating signal co-ordinates to identify candidate intergenic sRNAs without any pre-requirements. Comparative genomic approaches are restricted to identifying sRNA candidates located within conserved genomic backbone regions common to closely related bacteria [26]. However, most bacterial species have significant cumulative spans of multiple strain-specific sequences or islands, dispersed along the genome, many of which play key adaptive and/or pathogenesis-related roles [27], [28]. Indeed, genomic island-borne sRNAs have been identified in S. aureus [14] and Salmonella enterica serovar Typhimurium [22], [29]. Furthermore, sRNAs transcribed from strain-specific regions of S. Typhimurium were reported to partake in complex networks for stress adaptation and virulence regulation [8], [22], [28], [29] leading Toledo-Arana et al. [8] to emphasize the need for identification of strain-specific sRNAs in pathogens. S. Typhimurium is an important food-borne pathogen that causes a substantial burden of diarrhoeal disease globally. Life-threatening systemic infections can also occur in those with severe co-morbidities, at extremes of age and/or with impaired immune systems. We have constructed a position weight matrix (PWM) based tool named sRNAscanner, using E. coli K-12 MG1655 sRNA-specific transcriptional signals as positive training data, for the identification of intergenic sRNAs. Experimentally characterized E. coli sRNA promoters appear to vary slightly in base distribution frequencies when compared to E. coli mRNA promoters (Table S1a), though it remains possible that observed differences may be statistically insignificant. sRNAscanner cut-off thresholds were identified using the known E. coli K-12 MG1655 sRNAs as a positive dataset [30]. The predictive abilities of sRNAscanner and sRNAPredict2 [20] were then compared by analysing 13 bacterial genomes representative of diverse species. As a specific case study, we analyzed a S. Typhimurium complete genome sequence and experimentally validated a small set of previously uncharacterized predictions. Our results strongly support the accuracy and utility of sRNAscanner as a tool for the discovery of novel sRNA genes within intergenic regions of bacterial genomes and hint at the broader power of customized PWMs as a generic strategy for detection of defined genomic features in diverse bacterial genomes.

Methods

Summary of the sRNAscanner program

sRNAscanner uses as inputs matching complete bacterial genome sequence and protein coding table files in standard FASTA and tab-delimited text formats, respectively, to identify sRNA genes in intergenic regions. The sRNAscanner suite consists of algorithms to perform the following functions: (a) construct PWMs from sRNA-specific transcriptional signals, (b) search complete genome sequences using constructed PWMs to identify ‘orphan’ intergenic promoter and terminator locations, (c) perform coordinate based integration of promoter/terminator signals to define putative intergenic transcriptional units (TU) and (d) select predicted TUs based on umulative um of cores (CSS) values above a nominated threshold. The CSS value is determined by summating three individual matrix-specific um of cores (SS) values for each candidate TU (see below for calculation of SS value). sRNAscanner uses pre-computed PWM and the following pre-defined parameters to predict intergenic sRNAs: promoter box 1 SS value (≥2), promoter box 2 SS value (≥2), terminator SS value (≥3), spacer 1 range (defines distance between promoter boxes 1 and 2; 12–18), spacer 2 range (defines distance between promoter box 2 and terminator signal; 40–350), Unique Hit value (200) and CSS (≥14). The Unique Hit value identifies potential TU from a set of overlapping hits based on the presence of closely located start coordinates mapped within a defined window size which by default is set at 200 bp. sRNAscanner selects the TU with the maximum CSS value from each overlapping set as a unique representative hit for the set. Note: all parameters can be altered by users as required. Predicted TUs are examined for the presence of a putative ribosome binding site and initiation codon; if both signals are identified the TUs are classified as coding for putative mini-proteins [28]. Remaining TUs are considered to code for candidate sRNA molecules. A flowchart summarizing the sRNAscanner algorithm is shown in Figure 1.
Figure 1

Flowchart illustrating an overview of the sRNAscanner algorithm.

The final step was performed using the web-based TargetRNA [41] utility and/or by comparison of sRNAscanner hits with RNA deep sequencing datasets. The output dataset obtained is shown as the red outlined box at the bottom of the figure. sRNAscanner hits supported by TargetRNA only are classed as possible sRNA candidates, whilst those supported by deep seqeuncing are considered as probable sRNA candidates. Details of parameter values used in this study are as indicated in the text.

Flowchart illustrating an overview of the sRNAscanner algorithm.

The final step was performed using the web-based TargetRNA [41] utility and/or by comparison of sRNAscanner hits with RNA deep sequencing datasets. The output dataset obtained is shown as the red outlined box at the bottom of the figure. sRNAscanner hits supported by TargetRNA only are classed as possible sRNA candidates, whilst those supported by deep seqeuncing are considered as probable sRNA candidates. Details of parameter values used in this study are as indicated in the text.

Construction of PWMs from training data

sRNAscanner computes a PWM of four rows and x columns for N input sequences each having x residues; N and x can be any positive integer. The program uses multiple sequences of sRNA-specific transcriptional signals in fasta format as input for the construction of alignment matrices. The alignment matrix captures the number of occurrences, n, of letter i at position j across the set of aligned sequences. Subsequently, actual occurrence values were converted into log-odd scores; values that reflect the positional weights of each of the four bases (A, T, G, C) at each position. Frequency calculations and scoring schemes were adopted from previous algorithms and the positional weights were derived from the alignment matrix itself. A PWM was then derived from the above alignment matrix using the following formula (see Hertz and Stormo, 1999 [31] for details):In this formula N is the total number of input sequences and p is the a priori probability of the letter i occurring at position j of an input sequence; by definition for a four component system (A, C, G & T) this expected frequency is 0.25 for each of the four nucleotides, f = n/N is the frequency of the letter i in position j. Importantly, the precise genomic base frequency of the training or test genomes do not have a role in the construction of PWM. The log-odd scores are used for the construction of PWM; the algorithm was implemented using the PWM_create module of the sRNAscanner program. We have used ten promoter boxes and twenty one rho-independent terminators [21] of experimentally-verified E. coli K-12 sRNA genes as training data to construct PWM1 (promoter box1), PWM2 (promoter box2) and PWM3 (rho-independent terminator) (Table S1 and Figure S1).

Identification of intergenic sRNA specific transcriptional units

PWM1, PWM2 and PWM3 matrices were used individually to scan entire genome sequences, one nucleotide at a time, by a sliding window method as described previously [31]. The width of each sliding window was equal to the length of its matching input PWM. The matrix-specific SS value of each DNA sequence window was calculated by adding the PWM-determined scores corresponding to each of the respective bases within the window as described previously [31]. Each successive sliding window was assigned a SS value and it was compared against a selected threshold SS value obtained by analysis of the 92 known E. coli K-12 sRNA genes from the sRNAMap and Rfam datasets (http://srnamap.mbc.nctu.edu.tw/). sRNAscanner was run with an arbitrary minimum SS value of 1 for each of the three matrices to identify potential intergenic TUs which were then compared manually with the known K-12 sRNA genes to identify concordant pairs. Using these criteria and no imposed CSS cut-off, 66 of the 92 known sRNAs were identified as possessing sRNAscanner-detectable potential transcriptional signals (Table S2). Re-iterative empirical analyses using progressively higher matrix-specific SS values were performed to identify matrix-specific default SS thresholds that sought to maximize sensitivity whilst minimizing false-positive hits; SS cut-offs determined were as mentioned previously. Sequences having PWM1-, PWM2- and PWM3-specific SS values above the threshold scores were selected as potential promoter box 1, promoter box 2 and terminator signal hits, respectively. Next, the orientation, relative position and spacing of PWM-detected hits were examined against pre-defined allowable ranges for spacer 1 and spacer 2 to identify potential TUs. Spacer parameters used were based on analysis of the length and transcriptional signal spacing features of known E. coli and other Enterobacteriaceae sRNAs. Sequences satisfying both spacer checks and a selected CSS cut-off value were identified as likely TUs. The PWM3 SS value was expected to contribute most to the CSS score as for the known E. coli K-12 TUs detected by the program, PWM3 scores varied from 4.54–11.19, whilst the top values for PWM1 and PWM2 were 4.98 and 6.03, respectively. Importantly, higher SS values on one or both of the other matrices would not have compensated for a single below-threshold score. Identified TUs were compared with protein coding annotation files. Non-redundant, intact, non-overlapping TUs identified within intergenic regions alone and lacking putative ribosome binding sites and start codons were reported as probable sRNA-specific intergenic TUs.

sRNAscanner availabitlity and requirements

Project name: sRNAscanner; Home page: http://bicmku.in:8081/sRNAscanner or http://cluster.physics.iisc.ernet.in/sRNAscanner/; Operating system: Linux/Unix platforms; Programming language: C++; Compiler: g++/gcc 4.2 or higher; License: GNU GPL.

Bacterial strain and growth conditions

S. enterica Typhimurium wild type strain SL1344 (JVS-1574, MPIIB culture collection) was used for experimental validation. For early stationary phase (ESP) and late stationary phase (LSP) cultures, 25 ml of Luria-Bertani broth was inoculated with a 1/100 overnight culture and grown at 37°C in a shaking incubator (220 rpm) in a 100 ml flask. Optical density at 600 nm (OD600) was monitored. Two ESP cultures (OD600 = 0.5 [OD-0.5], OD600 = 2.0 [OD-2.0]) and four LSP cultures (3 h [3H], 6 h [6H], 9 h [9H] and overnight [ON] post-OD600 = 2.0) were obtained. Approximately 108 ESP (OD600 = 0.5) cells were treated with mitomycin C (0.5 µg/ml) [SOS], acidic LB (pH 5.4) [Acid] or cold shock (15°C) [Cold] for 30 min to induce an SOS response, acid stress or cold shock conditions, respectively. Abbreviations shown are to describe the eleven growth conditions. Salmonella pathogenicity island 1 (SPI-1) induced cultures [SPI-1] were grown with high salt-containing LB broth (0.3 M NaCl) for 12 hours at 37°C/220 rpm in tightly closed tubes. Salmonella pathogenicity island 2 (SPI-2) induced cultures [SPI-2] were prepared by inoculating 70 ml of SPI-2 medium [32] in 250 ml flasks, with 1/100 inoculums grown in SPI-2 medium overnight, and incubated at 37°C/220 RPM until reaching an OD600 = 0.3. The above cultures were spun down and the cell pellets mixed with stop mixture [95% ethanol (v/v), 5% phenol (v/v)] and immediately frozen in liquid nitrogen.

RNA isolation and Northern blot analysis

Total RNA was prepared from frozen cells using the TRizol (Invitrogen) method and treated with DNase I (Fermentas) as described previously [32]. Approximately 10 µg of RNA for each growth condition was added to 2× RPA buffer and run on 6% polyacrylamide/7 M urea gels, along with a pUC8 DNA ladder (Fermentas). After separation RNA was transferred to Hybond-XL nylon membranes (GE Healthcare) and UV cross-linked. Potential sRNA transcripts were detected using γ-ATP end-labeled oligonucleotide probes (Table S3).

5′ RACE mapping of RNA transcripts

5′RACE experiments were performed as described by Vogel and Wagner [33]. In summary, primary transcripts were treated with tobacco acid pyrophosphatase (TAP), ligated to A4 RNA adapters (500 pmol) at the 5′ends and reverse transcribed into cDNA with random hexamers (400 ng) using Superscript II Reverse Transcriptase (Invitrogen). Next, the first strand of the cDNA molecule was PCR amplified using an adapter-specific primer (JVO-0367) and matching sRNA-specific primer (Table S3). Amplified 5′ RACE products were cloned into TOPO pCR2.1 and sequenced from both ends with M13 primers.

Results and Discussion

Optimization of sRNAscanner with known E. coli K-12 MG1655 (NC_000913) sRNA data

We analysed the E. coli K-12 MG1655 (NC_000913) genome using pre-defined parameters (see User Guide) and matrices trained with data from ten promoter boxes and twenty one rho-independent terminators [21] of experimentally verified E. coli K-12 sRNA genes. To maximize sensitivity at the expense of specificity, we ran this analysis without application of a CSS cutoff. Predicted intergenic sRNA-specific transcriptional units were compared with the 92 reported E. coli K-12 sRNAs available in sRNAmap [1] and/or Rfam [34]. Physical locations of 66 of the 92 experimentally-validated sRNAs fully or partially overlapped with sRNAscanner-identified putative TUs. However, application of the program without a CSS cut-off led to extremely low specificity with >2,500 putative intergenic TU identified. Subsets of known MG1655 sRNA predicted by sRNAscanner and other computational and experimental methods are shown as a Venn diagram (Figure 2). The mean and standard deviation of the CSS of experimentally verified MG1655 sRNA transcriptional units detected by sRNAscanner were used to define a stringent CSS cut-off value of 14 (mean + standard deviation = 13.87). Nevertheless, the substantial overlap between whisker plots of CSS values for the known sRNAs and the uncharacterized sRNAscanner hits (Figure 3A) and the fact that these two sets remained unresolved even when CSS score distributions were plotted as a histogram (Figure 3B), suggested that many genuine E. coli K-12 intergenic TUs remained to be experimentally defined or that the matrices and/or the sRNAscanner algorithm lacked specificity. Interestingly, the single uncharacterized hit outlier with a CSS = 19.56 has also been predicted by SIPHT (Figure 3A). Lists of sRNAscanner-predicted (CSS>14) known and novel candidate sRNA TUs in MG1655 are as shown (Table S2 and Table S4).
Figure 2

Venn diagram showing the set of known E. coli K-12 MG1655 sRNA genes detected or missed by sRNAscanner.

The program was run using the training set-derived PWMs and parameters described in the text. The pale green elipse shown in dotted outline highlights the set of 66 known sRNA genes detected when the program was run without a CSS cut-off threshold. The darker green vertical oval indicates the set of 22 known sRNAs and a further 170 potentially novel intergenic sRNA detected using a CSS>14 cut-off. The sets of known E. coli K-12 MG1655 sRNA genes predicted bioinformatically by Wassarman et al. [13], Argaman et al. [12] and Chen et al. [21] are shown in blue-, red- and green-outline ovals, respectively. A further 61 sRNA genes identified through diverse experimental and bioinformatic means are shown in the yellow-outline oval.

Figure 3

Distribution of sRNAscanner cumulative sum of scores (CSS) for known sRNA and uncharacterized hits in E. coli K-12 MG1655.

The program was run using default parameters mentioned in the text. (A) The lower and top boundaries of the whisker plot boxes represent the 25th and 75th quartiles, respectively. The vertical lines extending from the boxes indicate the full range of the remaining CSS values with the exception of a single outlier, indicated as a cross, for the uncharacterized hits plot. (B) Histogram showing the CSS distributions of the two sets of sRNAscanner hits.

Venn diagram showing the set of known E. coli K-12 MG1655 sRNA genes detected or missed by sRNAscanner.

The program was run using the training set-derived PWMs and parameters described in the text. The pale green elipse shown in dotted outline highlights the set of 66 known sRNA genes detected when the program was run without a CSS cut-off threshold. The darker green vertical oval indicates the set of 22 known sRNAs and a further 170 potentially novel intergenic sRNA detected using a CSS>14 cut-off. The sets of known E. coli K-12 MG1655 sRNA genes predicted bioinformatically by Wassarman et al. [13], Argaman et al. [12] and Chen et al. [21] are shown in blue-, red- and green-outline ovals, respectively. A further 61 sRNA genes identified through diverse experimental and bioinformatic means are shown in the yellow-outline oval.

Distribution of sRNAscanner cumulative sum of scores (CSS) for known sRNA and uncharacterized hits in E. coli K-12 MG1655.

The program was run using default parameters mentioned in the text. (A) The lower and top boundaries of the whisker plot boxes represent the 25th and 75th quartiles, respectively. The vertical lines extending from the boxes indicate the full range of the remaining CSS values with the exception of a single outlier, indicated as a cross, for the uncharacterized hits plot. (B) Histogram showing the CSS distributions of the two sets of sRNAscanner hits.

Analysis of sRNAscanner performance characteristics

sRNAscanner was run with the training set derived matrices and pre-defined parameters. Excluding the 10 sRNAs used to inform the PWM1 and PWM2 matrices, sRNAscanner (CSS>14) detected 24% of the known E. coli K-12 sRNA genes [1]. Assessment of the specificity of sRNA prediction tools remains extremely challenging as there are no gold standards and known bacterial sRNAs are likely to represent no more than the tip of a vast ‘RNome’ iceberg. Even experimental validation is problematic as individual sRNA may only be expressed under highly specific conditions and/or at extremely low levels. We have attempted to examine the specificity of sRNAscanner through three bioinformatics approaches. sRNA genes used to inform the training dataset were included in these subsequent analyses. Firstly, we have generated a conventional Receiver Operating Characteristic (ROC) plot [35] based on analysis of the E. coli K-12 genome (Figure 4A). The set of known K-12 sRNAs predicted by sRNAscanner were defined as the ‘True positive’ set and the impact of the full range of CSS cut-off values was assessed. The ROC plot and related normalized frequency distribution graph (Figure 4B) suggested a major sensitivity–specificity sacrifice with there being no classical optimum point; favoring either led to a marked deterioration of the other. However, even by these criteria the sensitivity (Sn) – specificity (Sp) performance of sRNAscanner at CSS>14 (Sn = 32%; Sp = 95%) was comparable to that of sRNAPredict2 (Sn = 20%; Sp = 96%). Secondly, we compared the performance of the pre-computed training-set-derived PWMs with those of randomly generated ‘equivalent’ matrices and used both sets of matrices to analyse the E. coli K-12 genome sequence. Equivalent random matrices were generated by randomly shuffling entire columns within each matrix (R1 random matrices) (Figure S2), the numbers within individual columns (R2 random matrices) (Figure S3), and a combination of these two shuffling strategies (R3 random matrices) (Figure S4). This approach preserved the precise SS characteristics for matching genuine and random matrices and allowed the same SS and CSS thresholds to be used. However, only the R1 random matrices represented the same combination of nucleotide preferences, though present in distinct permutations as compared to the original matrices. The training and random PWM sets were used to search the E. coli K-12 genome to identify occurrences of each motif and, through integration of these data, TU-like arrangements. The ccurrence requencies (OF) of individual motifs were defined as the number of predictions per nucleotide of the genome. The ratios of OF obtained with the random and rationally-derived original matrices were expected to be inversely proportional to the ratios of matrix specificities [36]. However with the exception of the comparison between the genuine and R1 versions of PWM2, all three training PWM had higher OF than matching random matrices when applied to the K-12 genome sequence (Figure 4C). This was most marked for PWM3 with its three random versions exhibiting less than 20% of the hits observed with the training set-derived matrix. These data strongly argued against the random nature of bacterial intergenic DNA and demonstrated the relative abundance of terminator-like motifs in intergenic regions. Hits identified by the random matrices were compared with known sRNA regions to identify the number of known sRNA TUs detected. The stringent requirement for the correctly ordered, orientated and appropriately spaced occurrence of each of the three independently detected transcriptional signals was expected to filter out much of the noise. Indeed, use of the training dataset-derived PWMs resulted in identification of 66 known sRNA TUs (CSS scores [mean, range]: 12.87, 8.65–17.57), while use of the R1 random PWM, the best performing of the random versions, yielded only 14 known sRNA TUs with lower CSS scores (11.42, 9.77–14.09). The R2 and R3 shuffled matrices identified 5 and 9 potential sRNA TUs, respectively. Hence, the training matrices detected more than four times as many known sRNA TUs but only approximately twice as many total ‘TU’ hits as the R1 matrices (Figures 4D and 4E). Nevertheless, as the random matrices yielded up to 68% as many total ‘TU’ hits as the training set-derived PWMs it would appear that even with a stringent CSS>14 cut-off, that at best only about 40% of positive calls were valid. As a third approach, we hypothesized that the ratio of the numbers of hits obtained with the full complement of concatenated genuine intergenic DNA to those found on randomly shuffled intergenic sequences would provide a qualitative measure of specificity. The concatenated sequence comprising all K-12 intergenic sequences fused end-to-end (VIGS) was subjected to random nucleotide shuffling to generate ten random variants (RIGS-1 – RIGS-10). A length distribution histogram of the ‘sRNA’ hits in the VIGS and RIGS sequences is shown in Figure 4F. Consistent with a moderate level of specificity, the concatenated native intergenic sequence yielded approximately three times as many hits as those identified on the ‘average’ random intergenic sequence (435 vs 152) (Table S5). Use of future additional filters and/or genus-adapted PWMs may lead to incremental increases in specificity, perhaps with minimal loss of sensitivity. For example, TransTermHP-2.07-predicted rho-independent terminators in E. coli K-12 and S. Typhimurium LT2 typically exhibited PWM3 scores of ≥6 as opposed to the PWM3 minimum score criterion of >3, suggesting a possible route to specificity gain.
Figure 4

The three approaches used to estimate the specificity of sRNAscanner.

Conventional ROC (A) and normalized frequency distribution (B) plots were generated following analysis of the E. coli K-12 genome. The brown line in (A) denotes the point on the ROC curve which corresponds to CSS = 14. For these analyses, the set of 92 known sRNA were defined as the true positive set. Random matrices-based specificity analysis data are shown in panels (C), (D) and (E). (C) Histogram indicating the occurrence frequencies or predictions per nucleotide of intergenic hits with each of the three training set-derived matrices and the matching R1, R2 and R3 randomly shuffled versions of these matrices. The test genome sequence analysed was that of E. coli K-12 MG1655. (D) Graph showing the numbers of known MG1655 sRNA TU predicted by sRNAscanner within each of five CSS ranges plotted against the mid-point CSS value for the CSS range when the program was run with the training set-derived PWM or each of the three matching sets of random PWM in turn. (E) Bar graph showing the total numbers of hits (known and uncharacterized) predicted by sRNAscanner when the program was run with the training set-derived PWM and each of the matching random PWM. (F) Histogram showing the distribution of candidate ‘sRNA TUs’ predicted by length of sRNA within a composite sequence comprising concatenated intergenic sequences from E. coli K-12 (VIGS) and ten randomly suffled variants on this sequence (RIGS-1 – RIGS-10).

The three approaches used to estimate the specificity of sRNAscanner.

Conventional ROC (A) and normalized frequency distribution (B) plots were generated following analysis of the E. coli K-12 genome. The brown line in (A) denotes the point on the ROC curve which corresponds to CSS = 14. For these analyses, the set of 92 known sRNA were defined as the true positive set. Random matrices-based specificity analysis data are shown in panels (C), (D) and (E). (C) Histogram indicating the occurrence frequencies or predictions per nucleotide of intergenic hits with each of the three training set-derived matrices and the matching R1, R2 and R3 randomly shuffled versions of these matrices. The test genome sequence analysed was that of E. coli K-12 MG1655. (D) Graph showing the numbers of known MG1655 sRNA TU predicted by sRNAscanner within each of five CSS ranges plotted against the mid-point CSS value for the CSS range when the program was run with the training set-derived PWM or each of the three matching sets of random PWM in turn. (E) Bar graph showing the total numbers of hits (known and uncharacterized) predicted by sRNAscanner when the program was run with the training set-derived PWM and each of the matching random PWM. (F) Histogram showing the distribution of candidate ‘sRNA TUs’ predicted by length of sRNA within a composite sequence comprising concatenated intergenic sequences from E. coli K-12 (VIGS) and ten randomly suffled variants on this sequence (RIGS-1 – RIGS-10).

Head to head comparison of sRNAscanner and sRNAPredict2

A diverse group of bacterial genome sequences representative of Enterobacteriaceae, Vibrionaceae, Pseudomonadaceae, Bacillaceae, Clostridiaceae, Chlamydiaceae and Lactobacillaceae were analyzed using sRNAscanner. Intergenic transcriptional unit data derived from sRNAscanner analyses were compared with previously reported sRNAPredict2 results [20]. Manual curation of these predictions identified partial or complete overlaps with known sRNAs. sRNAscanner (CSS>14) and sRNAPredict2 detected a total of 180 (Sn = 31.3%) and 184 (Sn = 32%) known sRNA genes, respectively, across all 13 bacterial genomes investigated (Table 1). However, across the genomes analyzed 0 to 23 known sRNAs per genome, comprising a total of 88 known sRNAs, were predicted uniquely by sRNAscanner. By comparison, 92 known sRNAs were predicted uniquely by sRNAPredict2. However, sRNAPredict2 yielded appreciably more uncharacterized hits than sRNAscanner (2953 vs 2344), suggesting a higher signal-to-noise ratio for the latter. Similarly, large numbers of novel hits missed by sRNAPredict2 were predicted by sRNAscanner, and vice versa. Indeed, combined use of the two tools may potentially offer a degree of cross-validation. However, sRNAscanner as optimized presently appeared to be more appropriate for the analysis of genomes of Enterobacteriaceae and other medium/low G+C organisms. sRNAscanner sensitivity versus known sRNAs ranged from 51% for Clostridium tetani E88 (28.6% G+C) to 24% for Salmonella Typhi CT18 (51.9% G+C) to 0% for Mycobacterium tuberculosis CDC1551 (65.6% G+C). Detailed lists of known and putative sRNA regions predicted by sRNAscanner in the above genomes are provided as supplementary data files (see Table S4 and File S1).
Table 1

Comparison of sRNA gene predictions obtained using sRNAscanner and sRNAPredict2.

Bacterial strain/GenBank Acc. No/%G+C/No. of known sRNA genesa sRNAscannerb sRNAPredict2b sRNAscanner AND sRNAPredict2b Unique to sRNAscannerb Unique to sRNAPredict2b
KnownNovelKnownNovelKnownNovelKnownKnown
Bacillus anthracis Ames*/AE016879/35.4/97 495356086934971526
Clostridium tetani E88*/AE015927/28.6/53 27285201321727103
Chlamydia trachomatis * D-UW-3-Cx/AE001273/41.3/3 1270430610
Helicobacter pylori 26695*/AE000511/38.9/4 21070500420
Mycobacterium tuberculosis CDC1551*/AE000516/65.6/15 010500000
Pseudomonas aeruginosa PAO1*/AE004091/66.6/26 3174341023
Salmonella enterica serovar Typhi CT18*/AL513382/51.9/63 15175275721131416
Staphylococcus aureus N315*/BA000018/32.8/32 17253241441230512
Streptococcus pneumoniae TIGR4*/AE005672/39.7/25 919036231660
Streptococcus pyogenes M1 GAS*/AE004092/38.5/16 41626563813
Yersinia pestis KIM/AE009952*/47.7/42 728717755746010
Salmonella typhimurium LT2$/AE006468/52.2/106 2413546510233
Escherichia coli K12-MG1655$/U00096/50.8/92 2217019121371916

Complete lists of non-coding sRNA (including cis-regulatory & leader RNA) for the selected genomes were obtained from the Rfam database [34] which excludes tRNA and rRNA. Additional known sRNAs collated from the sRNAMap database [1] were also included in the lists.

Number of known and novel sRNA genes predicted using the following strategies: (1) sRNAscanner, (2) sRNAPredict2, (3) Predicted by sRNAscanner sRNAPredict2, (4) Predicted by sRNAscanner and by sRNAPredict2, (5) Predicted by sRNAPredict2 and by sRNAscanner. The sRNAscanner predictions were performed using the selected CSS cut-off (CSS>14).

The sRNAPredict2 data shown for 11 genomes were reproduced from Livny et al., 2006 [20].

K-12 and LT2 were newly analysed in this study using the latest version of sRNAPredict2 with the default parameters and blast partners described by Livny et al., 2006 [20].

Complete lists of non-coding sRNA (including cis-regulatory & leader RNA) for the selected genomes were obtained from the Rfam database [34] which excludes tRNA and rRNA. Additional known sRNAs collated from the sRNAMap database [1] were also included in the lists. Number of known and novel sRNA genes predicted using the following strategies: (1) sRNAscanner, (2) sRNAPredict2, (3) Predicted by sRNAscanner sRNAPredict2, (4) Predicted by sRNAscanner and by sRNAPredict2, (5) Predicted by sRNAPredict2 and by sRNAscanner. The sRNAscanner predictions were performed using the selected CSS cut-off (CSS>14). The sRNAPredict2 data shown for 11 genomes were reproduced from Livny et al., 2006 [20]. K-12 and LT2 were newly analysed in this study using the latest version of sRNAPredict2 with the default parameters and blast partners described by Livny et al., 2006 [20].

Identification of novel sRNAs in Salmonella enterica Typhimurium SL1344

Analysis of the S. Typhimurium LT2 genome using sRNAscanner under default conditions yielded a total of 38 known and 118 novel candidate sRNAs (Figure 5, Table S4). The genomic locations of the 118 novel sRNA candidates were compared with putative intergenic transcripts detected in deep sequencing libraries derived from Hfq-co-immunoprecipitated RNA obtained from S. Typhimurium SL1344 grown under multiple conditions [32], [37], [38] [unpublished data, J. Vogel]. S. Typhimurium SL1344 was used for all subsequent experimental validation as no comparable RNA deep sequencing dataset was available for S. Typhimurium LT2. Sixteen novel sRNA candidates were detected by both sRNAscanner and deep sequencing analysis (Table 2).
Figure 5

Venn diagram showing the numbers of known sRNAs in Salmonella Typhimurium LT2 that have been identified or reported by Pfeiffer et al. [, Papenfort et al. [ and Rfam [, Padalon-Brauch et al. [ and Sittka et al. [, [38.

The circles shown in red dotted outline and green solid outline, excluding the central pale green curve-sided triangular area, indicate the numbers of known sRNAs predicted by sRNAscanner without and with the use of a CSS cut-off (CSS>14), respectively. The central pale green curve-sided triangular area, including the innermost circle outlined in purple, represents the 118 novel, intergenic, non-overlapping candidate sRNAs predicted in this study; the innermost circle outlined in purple represents the 16-member subset comprising sRNA candidates found to have likely mRNA transcripts by comparison with RNA deep sequencing datasets [32], [38]. The $ superscript symbol indicates the five candidates belonging to both the Pfeiffer et al. [22] and Sittka et al. [32], [38] sets; the asterisk symbol denotes the one sRNA candidate mapping to the Padalon-Brauch et al. [29], Papenfort et al. [39] and Sittka et al. [32], [38] sets.

Table 2

Thirty three novel candidate sRNAs predicted by sRNAscanner AND RNA deep sequencing data or TargetRNA identification of putative cognate targets.

sRNA Ida Startb Endc Lengthc Flanking gene idd 5′RACE mappinge Strandg Target mRNAh mRNA Functioni Northernf Referencel
sRNA1 257730257795∼66STM0219/STM0220257730>>>NSDYes
sRNA2 23133042313591∼289STM2213/STM2214NTm >>>NSDNo
sRNA3 28080842808210∼127STM2665/STM26672808135>>>STM2284 glpA: sn-glycerol-3-phosphate dehydrogenaseYes [R14]
sRNA4 30189043019048∼145STM2875/STM2876NT>>>NSDNo
sRNA5 45971154597181∼71STM4351/STM4355.SNT<><STM1875 yobA: putative copper resistance proteinNo [R13]
sRNA6 37570153756884∼132STM3587/STM35883757010<<<NSDYes
sRNA7 32752923275116∼177STM3114/STM3115Not mapped><<STM0687 ybfM: putative outer membrane proteinNo [R19], [R20]
sRNA8 32405583240489∼70STM3078/STM3079.S3240515<<<NSDYes
sRNA9 757026756967∼60STM0693/STM0694NT<<<NSDNo
sRNA10 679927679828∼100STM0616/STM0617679922<<<NSDYes
sRNA11 139455139727273STM0118/STM0119NT>>>STM3954 yigG: putative inner membrane proteinNo [R15]
sRNA12 37338033733723∼81STM3564/STM35653733765<<>NSDYes
sRNA13 13599471360181∼235STM1283/STM1284NT<><NSDNo
sRNA14 14154591415501∼43STM1337/STM1338NT>>>NSDNo
sRNA15 16916731691952∼280STM1601/STM1602NT<>>NSDNo
sRNA16 13345701334697∼128STM1249/STM1250NT<<>STM0225 hlpA: periplasmic chaperoneNo
sRNA1729050052905353∼348STM2762/STM2763NT<>>STM0938 ybjE: putative inner membrane protein isrM (Northern) [28]
sRNA18691922691979∼57STM0627/STM0628NT<>>STM1403 sscB: secretion system chaperone$ NT [41]
sRNA1926339922634070∼78STM2513/STM 2514NT<><STM1426 ribE: riboflavin synthase alpha chainNT [R11]
sRNA2040724864072617∼131STM3862/STM3863NT<><STM2154 mrp: putative ATP-binding protein STnc410 (Predicted) [23], [R12]
sRNA2145619994562304∼305STM4316/STM4317NT<>>STM4316 STM4316: putative cytoplasmic proteinNT
sRNA2235286983528642∼56STM3360/STM3361NT><>STM3773 STM3773: putative transcriptional regulatorNT
sRNA2334744853474389∼96STM3305/STM3306NT><>STM0244 rcsF: colanic acid synthesis regulatorNT [43], [R14]
sRNA2421166952116622∼74STM2037/STM2038NT<<>STM4370 yifI: putative cytoplasmic proteinNT
sRNA2516278091627537∼272STM1551/STM1552NT<<>STM3766 STM3766: putative cytoplasmic proteinNT
sRNA267547175555∼84STM0064/STM0066NT>>>STM1379 orf48: putative amino acid permeaseNT
sRNA2720771772077243∼66STM1994/STM1995NT<>>STM4206 STM4206: putative phage glucose translocase rseX (Northern) [38], [39], [R16]
sRNA28230161230370∼209STM0194/STM0195NT>>>STM0176 stiB: putative fimbrial chaperoneNT [44]
sRNA2943154494315163∼286STM4102/STM4103NT<<<STM0335 STM0335: putative outer membrane proteinNT
sRNA3035982503597931∼319STM3445/STM3444NT<<<STM3138 mcpA: putative methyl-accepting chemotaxis proteinNT [R17]
sRNA3135551293554959∼170STM3384/STM3383NT><>STM4162 thiF: thiamine-biosynthetic proteinNT [R18]
sRNA32611107610950∼157STM0550/STM0549NT<<<STM3630 dppA: dipeptide transport proteinNT [R21], [R22]
sRNA3335288353528644∼191STM3361/STM3360NT><>STM1417 ssaP: type III secretion system apparatus protein$ NT [42]

The sixteen sRNA candidates (sRNA1 – sRNA16 [shown in bold]) predicted by sRNAscanner identified in deep sequencing RNA libraries [37], [38] were chosen for experimental validation by Northern and 5′RACE analyses; five of these sixteen deep sequencing-supported hits, shown underlined, were also identified by TargetRNA. The remaining 17 sRNA candidates listed were associated with TargetRNA-identified putative mRNA targets.

sRNAscanner-predicted transcript coordinates and length (nt).

Genes flanking candidate sRNA loci obtained from KEGG genome maps.

5′ends of the primary transcripts identified using 5′RACE experiments.

Stable transcripts identified by Northern analysis in this or other recent studies.

The middle arrowhead represents the orientation of the sRNA gene; left and right arrowheads indicate orientations of flanking genes.

Potential primary mRNA target identified using the TargetRNA tool [41].

GenBank functional annotations of the putative target mRNAs.

References relevant to the predicted target genes and/or the recently independently identified/predicted sRNAs; Full details of references indicated with ‘R’ are provided in Supporting Information (File S2).

NT, denotes not tested.

Venn diagram showing the numbers of known sRNAs in Salmonella Typhimurium LT2 that have been identified or reported by Pfeiffer et al. [, Papenfort et al. [ and Rfam [, Padalon-Brauch et al. [ and Sittka et al. [, [38.

The circles shown in red dotted outline and green solid outline, excluding the central pale green curve-sided triangular area, indicate the numbers of known sRNAs predicted by sRNAscanner without and with the use of a CSS cut-off (CSS>14), respectively. The central pale green curve-sided triangular area, including the innermost circle outlined in purple, represents the 118 novel, intergenic, non-overlapping candidate sRNAs predicted in this study; the innermost circle outlined in purple represents the 16-member subset comprising sRNA candidates found to have likely mRNA transcripts by comparison with RNA deep sequencing datasets [32], [38]. The $ superscript symbol indicates the five candidates belonging to both the Pfeiffer et al. [22] and Sittka et al. [32], [38] sets; the asterisk symbol denotes the one sRNA candidate mapping to the Padalon-Brauch et al. [29], Papenfort et al. [39] and Sittka et al. [32], [38] sets. The sixteen sRNA candidates (sRNA1 – sRNA16 [shown in bold]) predicted by sRNAscanner identified in deep sequencing RNA libraries [37], [38] were chosen for experimental validation by Northern and 5′RACE analyses; five of these sixteen deep sequencing-supported hits, shown underlined, were also identified by TargetRNA. The remaining 17 sRNA candidates listed were associated with TargetRNA-identified putative mRNA targets. sRNAscanner-predicted transcript coordinates and length (nt). Genes flanking candidate sRNA loci obtained from KEGG genome maps. 5′ends of the primary transcripts identified using 5′RACE experiments. Stable transcripts identified by Northern analysis in this or other recent studies. The middle arrowhead represents the orientation of the sRNA gene; left and right arrowheads indicate orientations of flanking genes. Potential primary mRNA target identified using the TargetRNA tool [41]. GenBank functional annotations of the putative target mRNAs. References relevant to the predicted target genes and/or the recently independently identified/predicted sRNAs; Full details of references indicated with ‘R’ are provided in Supporting Information (File S2). NT, denotes not tested.

Northern and 5′ RACE based verification of novel sRNAs predicted by both sRNAscanner and deep sequencing

Northern blot experiments using oligonucleotide probes targeting the 16 novel sRNA candidates mentioned above were performed (Table S3). RNA samples were harvested from cells grown and/or subjected to eleven different growth conditions. Six of the candidates (sRNA1, sRNA3, sRNA6, sRNA8, sRNA10 and sRNA12) yielded distinct Northern-detectable transcripts of broadly similar sizes to the sRNAscanner-predicted entities (Figure 6). The additional non-specific bands seen with sRNA3-, sRNA6- and sRNA8-specific probes may comprise degraded and/or processed forms of the matching sRNAs or overlapping mRNA transcripts. Given the above assumption, sRNA1 and sRNA12 were expressed under all growth conditions tested; sRNA8 and sRNA10 were detected in late stationary phase samples only, whilst sRNA3 appeared to be induced specifically under cold shock conditions. The sRNAscanner-predicted sRNA6 overlapped with a previously proposed processed 5′UTR fragment of the yhiI transcript [38] that was likely to match the transcript we detected under ESP-2.0 conditions. However, in this study the sRNA6 locus was also found to express a distinct ∼70 nt transcript found under LSP and SPI-1/SPI-2 inducing conditions only.
Figure 6

Total RNA was isolated from Salmonella Typhimurium SL1344 grown under eleven different conditions and subjected to Northern blotting using candidate sRNA-specific oligonucleotide probes.

Details of growth conditions examined are outlined in the Materials and Methods section. The curved arrows indicate the six putative Northern-detected transcripts mapping to loci predicted by sRNAscanner. Additional bands seen for sRNA3, sRNA6 and sRNA8, are believed to represent degradation and/or processed forms of cognate sRNAs or overlapping mRNA transcripts. The to-scale schematics shown below each gel image indicate sRNAscanner-predicted TUs (red/black/blue), deep sequencing identified transcripts (orange line) and 5′RACE-defined transcript start-sites (vertical black arrow). The yellow boxes indicate the probes used to detect transcripts by Northern blot experiments. Red boxes represent putative promoter sequences; blue boxes indicated putative terminator sequences.

Total RNA was isolated from Salmonella Typhimurium SL1344 grown under eleven different conditions and subjected to Northern blotting using candidate sRNA-specific oligonucleotide probes.

Details of growth conditions examined are outlined in the Materials and Methods section. The curved arrows indicate the six putative Northern-detected transcripts mapping to loci predicted by sRNAscanner. Additional bands seen for sRNA3, sRNA6 and sRNA8, are believed to represent degradation and/or processed forms of cognate sRNAs or overlapping mRNA transcripts. The to-scale schematics shown below each gel image indicate sRNAscanner-predicted TUs (red/black/blue), deep sequencing identified transcripts (orange line) and 5′RACE-defined transcript start-sites (vertical black arrow). The yellow boxes indicate the probes used to detect transcripts by Northern blot experiments. Red boxes represent putative promoter sequences; blue boxes indicated putative terminator sequences. The 5′ends of six candidate sRNA transcripts corresponding to the same Northern-supported candidates were successfully mapped by 5′RACE analysis. The 5′ RNA termini identified for sRNA1, sRNA6 and sRNA10 were coherent with computationally predicted transcriptional start sites but start-sites of the remaining three candidates varied significantly from those predicted by sRNAscanner (Table 2). The extents of overlap between sRNA predicted entities, deep sequencing identified sequences and 5′RACE mapped start-sites are shown schematically in Figure 6; Northern-detected transcripts were excluded as their precise locations could not be conclusively inferred on the basis of available data.

Potential biological significance of sRNAscanner predictions for Salmonella Typhimurium

Recent discoveries of three sRNAscanner identified hits that had originally been classified as novel provide further biological validation of this algorithm; sRNA17, sRNA20 and sRNA29 are now known as isrM [29], STnc410 [22] and rseX [39], [40], respectively. As many functionally characterized sRNAs are antisense regulators of cognate mRNA targets [41], we hypothesized that the presence of a matching TargetRNA hit may allow for more reliable identification of genuine sRNAs. However, we emphasize that bioinformatically-derived predictions of sRNA–mRNA interactions remain fraught with problems. Consequently, pending experimental validation by gel-shift assays or other methodologies TargetRNA data need to be treated as truly putative. We identified 22 sRNAscanner hits with TargetRNA-identified potential mRNA targets (Figure S5); five had also been detected in the deep sequencing dataset (Table 2). Several TargetRNA-identified genes play roles in pathogenesis. sRNA18 putatively targets STM1403 that codes for SscB, a type III secretion system (T3SS) chaperone encoded by Salmonella pathogenicity island 2 (SPI-2). SscB is needed for normal secretion and function of the SseF T3SS effector, which in turn is required for Salmonella-induced epithelial cell filamentation and bacterial proliferation in macrophages [42]. sRNA33 is believed to regulate ssaP, which is postulated to code for part of the SPI-2 T3SS translocon apparatus itself [43]. sRNA23 is predicted to regulate RcsF which has been proposed as one of two proximal membrane-located sensors for the Rcs phosphorelay signal transduction system that coordinately regulates expression of SPI-1/SPI-2, flagellar, fimbrial and capsule-related colonic acid synthesis genes [44]. sRNA28 is hypothesized to target stiB, a fimbrial chaperone gene, potentially allowing for sRNA28-based fine-tuning of Sti fimbriae expression [45]. sRNAs have also been shown to regulate S. Typhimurium outer membrane protein (OMP) profiles in response to envelope stress [46] or nutrient availability [39]. Similarly, sRNA29 and sRNA7 are predicted to interact with OMP-encoding genes (Table 2). Clearly, data supported solely by sRNAscanner and TargetRNA bioinformatics predictions remain speculative and robust experimentation would be required to validate these prior to drawing firm conclusions.

Conclusions

We have developed and implemented a simple PWM-based strategy for the discovery of intergenic sRNA genes. Despite use of a small, single species-derived training set, we have demonstrated the major utility of sRNAscanner to predict large numbers of potential sRNA genes in diverse bacterial species. Undoubtedly, it is vital to further experimentally validate the predictive accuracy of sRNAscanner and other sRNA prediction programmes using Northern blot analysis, ultra-high-density cDNA sequencing [37], [38] and other emerging tools. Nevertheless, caution is advisable in interpretation of results as each experimental method has its own strengths and weaknesses. Furthermore, transcriptional signals would be expected to vary considerably between phylogenetically distant organisms. Consistent with this idea, we found that the E. coli-derived PWMs used in this study performed well with medium and low GC genomes but not with high GC genomes. Consequently, we propose that an organism-targeted approach is likely to lead to significantly enhanced performance characteristics. Importantly the tool developed and the strategy proposed would allow users to generate individualized PWMs based on species-, genus- or family-derived training sets to better identify sRNA genes in selected bacterial organisms. In addition, a reiterative process of PWM optimization and selection of rationally informed cut-offs based on newly discovered and validated sRNAs may allow for progressively higher levels of specificity without excessive loss of sensitivity. Finally, we propose that PWM-based scanning strategies may in time prove to be a powerful way of revealing other cryptic codes not only in DNA but in protein molecules as well. Details of sRNAscanner training dataset. (0.03 MB PDF) Click here for additional data file. List of known E. coli K-12 MG1655 sRNA TUs identified by sRNAscanner. (0.08 MB PDF) Click here for additional data file. Oligonucleotides used in this study. (0.02 MB PDF) Click here for additional data file. Details of known and novel sRNA regions predicted by sRNAscanner in 13 bacterial genomes. (0.02 MB PDF) Click here for additional data file. Analysis of Virtual Intergenic Genome Sequences (VIGS) and Random Intergenic Genome Sequences (RIGS) derived from the E. coli K-12 genome using sRNAscanner and Glimmer. (0.03 MB PDF) Click here for additional data file. Training set-derived PWM1 - PWM3 matrices. (0.04 MB PDF) Click here for additional data file. R1 versions of random matrices. (0.03 MB PDF) Click here for additional data file. R2 versions of random matrices (0.03 MB PDF) Click here for additional data file. R3 versions of random matrices. (0.03 MB PDF) Click here for additional data file. TargetRNA-identified putative sRNA-mRNA interactions. (0.07 MB PDF) Click here for additional data file. Details of known and novel sRNAs predicted by sRNAscanner in the 13 genomes analysed. (0.47 MB XLS) Click here for additional data file. Supplementary References. (0.03 MB PDF) Click here for additional data file.
  45 in total

1.  Identification of novel small RNAs using comparative genomics and microarrays.

Authors:  K M Wassarman; F Repoila; C Rosenow; G Storz; S Gottesman
Journal:  Genes Dev       Date:  2001-07-01       Impact factor: 11.361

Review 2.  MicF: an antisense RNA gene involved in response of Escherichia coli to global stress factors.

Authors:  N Delihas; S Forst
Journal:  J Mol Biol       Date:  2001-10-12       Impact factor: 5.469

3.  A bioinformatics based approach to discover small RNA genes in the Escherichia coli genome.

Authors:  Shuo Chen; Elena A Lesnik; Thomas A Hall; Rangarajan Sampath; Richard H Griffey; Dave J Ecker; Lawrence B Blyn
Journal:  Biosystems       Date:  2002 Mar-May       Impact factor: 1.973

Review 4.  Regulatory roles for small RNAs in bacteria.

Authors:  Eric Massé; Nadim Majdalani; Susan Gottesman
Journal:  Curr Opin Microbiol       Date:  2003-04       Impact factor: 7.934

5.  Novel small RNA-encoding genes in the intergenic regions of Escherichia coli.

Authors:  L Argaman; R Hershberg; J Vogel; G Bejerano; E G Wagner; H Margalit; S Altuvia
Journal:  Curr Biol       Date:  2001-06-26       Impact factor: 10.834

6.  Computational identification of noncoding RNAs in E. coli by comparative genomics.

Authors:  E Rivas; R J Klein; T A Jones; S R Eddy
Journal:  Curr Biol       Date:  2001-09-04       Impact factor: 10.834

7.  SseBCD proteins are secreted by the type III secretion system of Salmonella pathogenicity island 2 and function as a translocon.

Authors:  T Nikolaus; J Deiwick; C Rappl; J A Freeman; W Schröder; S I Miller; M Hensel
Journal:  J Bacteriol       Date:  2001-10       Impact factor: 3.490

8.  The use of flow cytometry to detect expression of subunits encoded by 11 Salmonella enterica serotype Typhimurium fimbrial operons.

Authors:  Andrea D Humphries; Manuela Raffatellu; Sebastian Winter; Eric H Weening; Robert A Kingsley; Robert Droleskey; Shuping Zhang; Josely Figueiredo; Sangeeta Khare; Jairo Nunes; L Garry Adams; Renée M Tsolis; Andreas J Bäumler
Journal:  Mol Microbiol       Date:  2003-06       Impact factor: 3.501

9.  Secretion and function of Salmonella SPI-2 effector SseF require its chaperone, SscB.

Authors:  Shipan Dai; Daoguo Zhou
Journal:  J Bacteriol       Date:  2004-08       Impact factor: 3.490

10.  Deep sequencing of Salmonella RNA associated with heterologous Hfq proteins in vivo reveals small RNAs as a major target class and identifies RNA processing phenotypes.

Authors:  Alexandra Sittka; Cynthia M Sharma; Katarzyna Rolle; Jörg Vogel
Journal:  RNA Biol       Date:  2009-07-03       Impact factor: 4.652

View more
  25 in total

1.  RaoN, a small RNA encoded within Salmonella pathogenicity island-11, confers resistance to macrophage-induced stress.

Authors:  Yong Heon Lee; Sinyeon Kim; John D Helmann; Bae-Hoon Kim; Yong Keun Park
Journal:  Microbiology       Date:  2013-05-08       Impact factor: 2.777

Review 2.  Prevalence of small base-pairing RNAs derived from diverse genomic loci.

Authors:  Philip P Adams; Gisela Storz
Journal:  Biochim Biophys Acta Gene Regul Mech       Date:  2020-03-05       Impact factor: 4.490

3.  The transcriptional landscape and small RNAs of Salmonella enterica serovar Typhimurium.

Authors:  Carsten Kröger; Shane C Dillon; Andrew D S Cameron; Kai Papenfort; Sathesh K Sivasankaran; Karsten Hokamp; Yanjie Chao; Alexandra Sittka; Magali Hébrard; Kristian Händler; Aoife Colgan; Pimlapas Leekitcharoenphon; Gemma C Langridge; Amanda J Lohan; Brendan Loftus; Sacha Lucchini; David W Ussery; Charles J Dorman; Nicholas R Thomson; Jörg Vogel; Jay C D Hinton
Journal:  Proc Natl Acad Sci U S A       Date:  2012-04-25       Impact factor: 11.205

4.  RNIE: genome-wide prediction of bacterial intrinsic terminators.

Authors:  Paul P Gardner; Lars Barquist; Alex Bateman; Eric P Nawrocki; Zasha Weinberg
Journal:  Nucleic Acids Res       Date:  2011-04-07       Impact factor: 16.971

5.  nocoRNAc: characterization of non-coding RNAs in prokaryotes.

Authors:  Alexander Herbig; Kay Nieselt
Journal:  BMC Bioinformatics       Date:  2011-01-31       Impact factor: 3.169

6.  PresRAT: a server for identification of bacterial small-RNA sequences and their targets with probable binding region.

Authors:  Krishna Kumar; Abhijit Chakraborty; Saikat Chakrabarti
Journal:  RNA Biol       Date:  2020-10-25       Impact factor: 4.652

7.  Identification of novel growth phase- and media-dependent small non-coding RNAs in Streptococcus pyogenes M49 using intergenic tiling arrays.

Authors:  Nadja Patenge; André Billion; Peter Raasch; Jana Normann; Aleksandra Wisniewska-Kucper; Julia Retey; Valesca Boisguérin; Thomas Hartsch; Torsten Hain; Bernd Kreikemeyer
Journal:  BMC Genomics       Date:  2012-10-13       Impact factor: 3.969

8.  Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs.

Authors:  Jia-Shiun Khoo; Shiao-Fei Chai; Rahmah Mohamed; Sheila Nathan; Mohd Firdaus-Raih
Journal:  BMC Genomics       Date:  2012-12-13       Impact factor: 3.969

9.  Computational small RNA prediction in bacteria.

Authors:  Jayavel Sridhar; Paramasamy Gunasekaran
Journal:  Bioinform Biol Insights       Date:  2013-03-07

10.  Gene regulation by CcpA and catabolite repression explored by RNA-Seq in Streptococcus mutans.

Authors:  Lin Zeng; Sang Chul Choi; Charles G Danko; Adam Siepel; Michael J Stanhope; Robert A Burne
Journal:  PLoS One       Date:  2013-03-28       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.