| Literature DB >> 17616982 |
Zizhen Yao1, Jeffrey Barrick, Zasha Weinberg, Shane Neph, Ronald Breaker, Martin Tompa, Walter L Ruzzo.
Abstract
Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair-level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17616982 PMCID: PMC1913097 DOI: 10.1371/journal.pcbi.0030126
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Pipeline Flowchart
The boxes with solid lines indicate steps involving intensive computation (approximate running time is specified next to each). Other intermediate steps are specified in the boxes with dashed lines.
Figure 2The Empirical p-Value Distribution Based on the Permutation Test
The black curve shows the complementary cumulative distribution function for the composite scores on randomized datasets (i.e., for each score, the fraction of permuted alignments exceeding that score). The red pluses show the p-values for the composite scores of the motifs in the original (unpermuted) datasets. All p-values are greater than or equal to 2 × 10−4 as there are only 5,000 samples in the background distribution.
Motifs That Correspond to Rfam Families
Motif Prediction Accuracy Compared with Rfam
High-Ranking Motifs Not Found in Rfam
Figure 3Putative Autoregulatory Structure in L19 mRNA Leaders
(A) Sequence alignment of a conserved RNA structure found in the 5′ UTR of Firmicute rplS genes. Possible promoter −35 and −10 boxes in genomic DNA are shown, followed by the putative mRNA leader with the predicted secondary structures (P1 and P2), ribosome binding sites, and start codons highlighted. Numbers represent inserted nucleotides that are not shown. The examples shown are representative of 34 total sequences in the complete alignment, available in the online supplement at http://bio.cs.washington.edu/supplements/yzizhen/pipeline. Species abbreviations: Ame, Alkaliphilus metalliredigenes; Bac, Bacillus sp. NRRL; Bce, Bacillus cereus; Bcl, Bacillus clausii; Bha, Bacillus halodurans; Bsu, Bacillus subtilis; Chy, Carboxydothermus hydrogenoformans; Cpe, Clostridium perfringens; Dre, Desulfotomaculum reducens; Efa, Enterococcus faecalis; Fnu, Fusobacterium nucleatum; Gka, Geobacillus kaustophilus; Lac, Lactobacillus acidophilus; Ljo, Lactobacillus johnsonii; Lmo, Listeria monocytogenes; Lpl, Lactobacillus plantarum; Lsa, Lactobacillus sakei; Lsl, Lactobacillus salivarius; Oih, Oceanobacillus iheyensis; Sau, Staphylococcus aureus; Smu, Streptococcus mutans; Spn, Streptococcus pneumoniae; Spy, Streptococcus pyogenes; Sth, Streptococcus thermophilus; Swo, Syntrophomonas wolfei.
(B) Consensus sequence and secondary structure. Pairs supported by compensatory (when both bases in a pair mutate between sequences in the alignment) and compatible (when only one base mutates but pairing is preserved, e.g., G-C to G-U) are boxed.
(C) Structural model of the B. subtilis L19 mRNA leader, showing a possible alternate structure that could be stabilized by L19 binding to repress translation.
Figure 4Putative Autoregulatory Structure in L13–S9 mRNA Leaders
(A) Sequence alignment of a conserved RNA structure found in the 5′ UTR of Firmicute rplM–rpsI operons. The examples shown are representative of 27 total sequences in the complete alignment, available in the online supplement at http://bio.cs.washington.edu/supplements/yzizhen/pipeline. Details are as in the legend for Figure 3 with additional species abbreviations:
Lde, Lactobacillus delbruecki; Lla, Lactococcus lactis; Lme, Leuconostoc mesenteroides; Ppe, Pediococcus pentosaceus; Sag, Streptococcus agalactiae; Sep, Staphylococcus epidermidis; Sha, Staphylococcus haemolyticus; Ssa, Staphylococcus saprophyticus.
(B) Consensus sequence and secondary structure.
(C) Structural model of the B. subtilis L13–S9 mRNA leader, showing a possible alternate structure that could be stabilized by L13 or S9 binding to repress translation.