| Literature DB >> 19343219 |
Xing Xu1, Yongmei Ji, Gary D Stormo.
Abstract
An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational program named RSSVM (RNA Sampler+Support Vector Machine), which employs Support Vector Machines (SVMs) for efficient identification of functional RNA motifs from random RNA secondary structures. RSSVM uses a set of distinctive features to represent the common RNA secondary structure and structural alignment predicted by RNA Sampler, a tool for accurate common RNA secondary structure prediction, and is trained with functional RNAs from a variety of bacterial RNA motif/gene families covering a wide range of sequence identities. When tested on a large number of known and random RNA motifs, RSSVM shows a significantly higher sensitivity than other leading RNA identification programs while maintaining the same false positive rate. RSSVM performs particularly well on sets with low sequence identities. The combination of RNA Sampler and RSSVM provides a new, fast, and efficient pipeline for large-scale discovery of regulatory RNA motifs. We applied RSSVM to multiple Shewanella genomes and identified putative regulatory RNA motifs in the 5' untranslated regions (UTRs) in S. oneidensis, an important bacterial organism with extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. From 1002 sets of 5'-UTRs of orthologous operons, we identified 166 putative regulatory RNA motifs, including 17 of the 19 known RNA motifs from Rfam, an additional 21 RNA motifs that are supported by literature evidence, 72 RNA motifs overlapping predicted transcription terminators or attenuators, and other candidate regulatory RNA motifs. Our study provides a list of promising novel regulatory RNA motifs potentially involved in post-transcriptional gene regulation. Combined with the previous cis-regulatory DNA motif study in S. oneidensis, this genome-wide discovery of cis-regulatory RNA motifs may offer more comprehensive views of gene regulation at a different level in this organism. The RSSVM software, predictions, and analysis results on Shewanella genomes are available at http://ural.wustl.edu/resources.html#RSSVM.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19343219 PMCID: PMC2659441 DOI: 10.1371/journal.pcbi.1000338
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1The Receiver Operating Characteristic (ROC) curves of RSSVM, RNAz and Dynalign+LIBSVM on all test sets and on test sets with identities lower than 70%.
“▵” and “○” mark the results at P-value cutoff of 0.90 and 0.50, respectively. Detailed data for this figure are provided in Dataset S1.
Figure 2The Correlation Coefficients of RNA classification (CC) by RSSVM, RNAz, Dynalign+LIBSVM and QRNA on test sets with different sequence identities (detailed values are in Table S1).
(A) At the overall FPR of 0.05. (B) At the more stringent overall FPR of 0.01 or 0.02. The lowest possible FPR that Dynalign+LIBSVM can achieve is 0.02.
Figure 3The Correlation Coefficients of predicted structures (CC) by RNA Sampler and RNAalifold, the corresponding core algorithms used by RSSVM and RNAz, respectively, for predicting common RNA structures, on test sets with different sequence identities.
Numbers of predicted regulatory RNAs with supporting evidence by RSSVM, RNAz and QRNA in the 1002 orthologous 5′-UTRs of five Shewanella species.
| RSSVM (FPR = 0.01) | RNAz (FPR = 0.01) | QRNA | |
| Total number of predicted regulatory RNAs | 166 | 109 | 112 |
| False positives on shuffled sequences | 0 | 0 | 13 |
| Matching known RNA motifs in Rfam | 17 | 16 | 11 |
| Overlapping with predicted transcription terminators or attenuators | 72 | 49 | 40 |
| Overlapping with predicted transcription terminators | 62 | 42 | 31 |
| Overlapping with predicted transcription attenuators | 56 | 37 | 32 |
| With literature support | 21 | 11 | 7 |
We searched all the orthologous UTRs with Infernal using all bacterial RNA motif models from Rfam, and 19 known RNA motifs gave Infernal scores higher than 10 bits and occurred in at least two orthologous sequences of a UTR set. 6 of the 19 RNA motifs have orthologous sequences from S. oneidensis and E. coli in the Rfam seed alignments.
Putative transcription terminators predicted by Rnall [43].
Putative transcription attenuators predicted by a previous comparative genomics study [44].
Numbers in the parentheses are the total numbers of known RNA motifs or predicted transcription terminators/attenuators in the 1002 Shewanella 5′-UTR sequence sets.
Predicted regulatory RNAs that match the known cis-regulatory RNA elements or genes in the Rfam database.
| Rank | GI | RSSVM | RNAz | QRNA | Gene Name | Gene Product | Matching RNA Family in Rfam | ||
| 1 |
|
|
| + | trpE | anthranilate synthase component I | RF00513 | Trp_leader | RNA element |
| 3 |
|
|
| + | SO1202 | conserved hypothetical protein | RF00005 | tRNA | tRNA |
| 4 |
|
|
| + | SO4727 | conserved hypothetical protein | RF00558 | L20_leader | RNA element |
| 6 |
|
|
| ppiD | peptidyl-prolyl | RF00506 | Thr_leader | RNA element | |
| 7 |
|
|
| + | thrA | aspartokinase I/homoserine dehydrogenase, threonine-sensitive | RF00506 | Thr_leader | RNA element |
| 8 |
|
|
| + | hisG | ATP phosphoribosyltransferase | RF00514 | His_leader | RNA element |
| 16 |
|
|
| rpsB | ribosomal protein S2 | RF00127 | t44 RNA | RNA gene | |
| 34 |
|
|
| + | SO1071 | conserved hypothetical protein | RF00080 | yybP-ykoY | Riboswitch |
| 39 |
|
|
| + | pheA | chorismate mutase/prephenate dehydratase | RF00513 | Trp_leader | RNA element |
| 64 |
|
|
| + | SO1007 | conserved hypothetical protein | RF00168 | Lysine | Riboswitch |
| 73 |
|
| 0.240 | + | Rne | ribonuclease E | RF00370 | sroD RNA | RNA gene |
| 93 |
|
|
| SO0547 | conserved hypothetical protein | RF00522 | PreQ1 | Riboswitch | |
| 100 |
|
|
| SO2715 | TonB-dependent receptor | RF00059 | TPP | Riboswitch | |
| 117 |
|
|
| + | lysC | aspartokinase III, lysine-sensitive | RF00168 | Lysine | Riboswitch |
| 120 |
|
| 0.420 | thiC | thiamin biosynthesis protein ThiC | RF00059 | TPP | Riboswitch | |
| 125 |
|
|
| nadB | L-aspartate oxidase | RF00522 | PreQ1 | Riboswitch | |
| 133 |
|
|
| SO0774 | 5-formyltetrahydrofolate cyclo-ligase family protein | RF00013 | 6S RNA | RNA gene | |
| 195 |
| 0.903 | 0.014 | rpsO | ribosomal protein S15 | RF00114 | S15 leader | RNA element | |
| 302 |
| 0.661 |
| + | SO0815 | TonB-dependent receptor C-terminal domain protein | RF00174 | Cobalamin | Riboswitch |
|
|
|
|
| ||||||
The rank is based on the P-value of RSSVM.
Bold fonts represent predictions above the P-value cutoff for RSSVM (0.95) or RNAz (0.50).
“+” represent QRNA predictions that fit the “RNA” model in at least two pairwise alignments.
The shuffled sequences were identified as “RNA” by QRNA.
Figure 4The Venn-diagram of the numbers of predicted regulatory RNAs by RSSVM, RNAz and QRNA.
The numbers in the parentheses are of the predictions matching known RNA motifs.
Predicted regulatory RNAs that have supporting literature evidence.
| Rank | GI | RSSVM | RNAz | QRNA (Q) Terminator (T) Attenuator (A) | Gene Name | Gene Product | Knowledge of Regulation | Reference |
| 5 |
|
|
| Q T - | ilvG | acetolactate synthase II, large subunit | Leader peptide, and transcription attenuator |
|
| 17 |
|
|
| - T A | ldhA | D-lactate dehydrogenase | Possible post-transcriptional effect |
|
| 23 |
|
| 0.105 | - T - | aspS | aspartyl-tRNA synthetase | tRNA synthetase leader | |
| 25 |
|
|
| - - - | ilvI | acetolactate synthase III, large subunit | Leader peptide, and transcription attenuator |
|
| 26 |
|
|
| - - - | flgB | flagellar basal-body rod protein FlgB | Putative GEMM element |
|
| 27 |
|
| 0.241 | - - - | aroH | phospho-2-dehydro-3-deoxyheptonate aldolase, trp-sensitive | Possible transcription termination |
|
| 35 |
|
|
| Q T A | leuA | 2-isopropylmalate synthase | Leader peptide, and transcription attenuator |
|
| 41 |
|
| 0.094 | - - - | pdhR | pyruvate dehydrogenase complex repressor | PdhR-box in |
|
| 52 |
|
|
| - T - | adhE | aldehyde-alcohol dehydrogenase | Stem-loop for occupying RBS in |
|
| 55 |
|
| 0.196 | - - - | ahpC | Alkyl hydroperoxide reductase, C subunit | Post-transcriptionally regulated by CsrA in |
|
| 63 |
|
| 0.012 | Q T A | glnS | glutaminyl-tRNA synthetase | tRNA synthetase leader |
|
| 83 |
|
| 0.451 | - T A | SO1769 | glutamate decarboxylase, putative | Possible post-transcriptional regulation in S. oneidensis |
|
| 88 |
|
|
| Q T - | rpoB | DNA-directed RNA polymerase, beta subunit | Transcriptional attenuation |
|
| 91 |
|
|
| - - - | rplJ | ribosomal protein L10 | Ribosomal protein leader | Rfam |
| 105 |
|
| 0.274 | Q T - | pflB | formate acetyltransferase | Possible post-transcriptional regulation |
|
| 106 |
|
|
| - - - | SO3896 | Outer membrane porin, putative | Post-transcriptional regulation in S. oneidensis |
|
| 109 |
|
|
| Q T A | rpsL | ribosomal protein S12 | Ribosomal protein leader |
|
| 112 |
|
| 0.179 | - T - | fliE | flagellar hook-basal body complex protein FliE | Putative GEMM element |
|
| 124 |
|
| 0.108 | - T - | secE | preprotein translocase, SecE subunit | RNaseIII sites in the leader sequence of SecE in E. coli |
|
| 147 |
|
| 0.456 | - T - | speA | biosynthetic arginine decarboxylase | Possible post-transcriptional regulation in S. oneidensis |
|
| 161 |
|
|
| Q - - | aroF | phospho-2-dehydro-3-deoxyheptonate aldolase, tyr-sensitive | Attenuator sensing tyr-tRNA |
|
| 163 |
|
| 0.102 | - - - | rplU | ribosomal protein L21 | Ribosomal protein leader |
|
same as those in Table 2.
Figure 5The predicted transcription terminator and anti-terminator structures of the LeuA operon in Shewanella.
(A) Alternative terminator and anti-terminator stem-loop structures improved on the previously proposed structures. Base pairs in the red boxes are the positions where compensatory mutations are observed; blue lines are leucine codons enriched in the leader peptide coding region. (B) Structural alignment of the anti-antiterminator and terminator structure in five Shewanella species. The orange arrows correspond to the anti-antiterminator stem and the violet arrows correspond to the terminator stem. Colored columns represent aligned positions within the stems: red and pink colors represent conserved base pairings, and yellow and green colors represent base pairings with covariant mutations.