| Literature DB >> 15942029 |
Pål Saetrom1, Ragnhild Sneve, Knut I Kristiansen, Ola Snøve, Thomas Grünfeld, Torbjørn Rognes, Erling Seeberg.
Abstract
Several methods exist for predicting non-coding RNA (ncRNA) genes in Escherichia coli (E.coli). In addition to about sixty known ncRNA genes excluding tRNAs and rRNAs, various methods have predicted more than thousand ncRNA genes, but only 95 of these candidates were confirmed by more than one study. Here, we introduce a new method that uses automatic discovery of sequence patterns to predict ncRNA genes. The method predicts 135 novel candidates. In addition, the method predicts 152 genes that overlap with predictions in the literature. We test sixteen predictions experimentally, and show that twelve of these are actual ncRNA transcripts. Six of the twelve verified candidates were novel predictions. The relatively high confirmation rate indicates that many of the untested novel predictions are also ncRNAs, and we therefore speculate that E.coli contains more ncRNA genes than previously estimated.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15942029 PMCID: PMC1143698 DOI: 10.1093/nar/gki644
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Top ten predictions sorted by prediction confidence
| ID | Position | Length | Strand | Score | Annotation |
|---|---|---|---|---|---|
| I001 | 271879 | 100 | + | 0.22 | 271880–272035 + Carter |
| I002 | 4230937 | 150 | − | 0.22 | 4230927–4231086 − Carter |
| I003 | 719883 | 75 | + | 0.21 | 719854–719973 + Carter |
| I004 | 3766615 | 50 | + | 0.21 | Novel |
| I005 | 303544 | 50 | − | 0.19 | Novel |
| I006 | 262270 | 82 | − | 0.18 | Novel |
| I007 | 4626216 | 75 | + | 0.17 | Novel |
| I008 | 1702671 | 75 | + | 0.16 | 1702604–1702818 + Tjaden |
| I009 | 1859481 | 125 | + | 0.16 | 1859567–1859646 + Carter |
| I010 | 4527911 | 50 | + | 0.15 | 4527862–4527941 + Carter |
The given position is the 5′ end for predictions in the positive strand, and the 3′ end for predictions in the negative strand. The score is the classifier output for the highest scoring sequence window in a sequence.
Known ncRNA genes included in the set of intergenic sequences
| Gene | Overlap | Strand | Prediction | Previous predictions ( |
|---|---|---|---|---|
| C0067 ( | 60 of 124 | + | Not predicted | n/a |
| rdlA ( | 66 of 66 | + | Predicted 50 nt (−) | ? ( |
| rdlB ( | 65 of 65 | + | Not predicted | ? ( |
| rdlC ( | 67 of 67 | + | Not predicted | ? ( |
| IS061 ( | 60 of 157 | − | Not predicted | n/a |
| IS092 ( | 116 of 159 | − | Not predicted | n/a |
| rygC ( | 76 of 150 | + | Predicted 50 nt (+ and −) | + ( |
| SroG ( | 110 of 147 | − | Predicted 89 nt (−) | − ( |
| rdlD ( | 63 of 63 | + | Not predicted | − ( |
| SroH ( | 61 of 159 | − | Not predicted | + ( |
The overlap is the number of nucleotides from the ncRNA included as an intergenic sequence. The last column lists the strand and the reference to previous predictions overlapping the gene.
Unconfirmed transcripts from (8) included in the set of intergenic sequences
| Contig | Overlap | Strand | Prediction | Previous predictions ( |
|---|---|---|---|---|
| Contig_440 | 68 of 105 | + | Predicted 50 nt (+) and 50 nt (−) | + ( |
| Contig_68 | 76 of 157 | + | Predicted 49 nt (+) | + ( |
| Contig_606 | 83 of 103 | + | Predicted 63 nt (+) and 50 nt (−) | + ( |
| Contig_223 | 80 of 141 | − | Predicted 50 nt (−) | − ( |
| Contig_496 | 73 of 73 | + | Predicted 61 nt (+) and 49 nt (−) | − ( |
| Contig_286 | 102 of 102 | + | Predicted 50 nt (−) | + ( |
| Contig_181 | 43 of 43 | − | Not predicted | ? ( |
See Table 2 for header explanations.
Six predictions with varying confidence experimentally tested in the lab
| ID | Position | Length | Strand | Score | Annotation |
|---|---|---|---|---|---|
| I014 | 4373943 | 60 | − | 0.14 | Novel |
| I016 | 1218274 | 50 | − | 0.14 | Novel |
| I035 | 914278 | 100 | + | 0.1 | 914218–914571 ± Rivas |
| 914259–914378 + Carter | |||||
| I044 | 4366175 | 50 | + | 0.1 | Novel |
| I209 | 4006562 | 50 | + | 0.025 | 4006513–4006565 − Carter |
| I211 | 214141 | 50 | − | 0.025 | Novel |
See Table 1 for details on the prediction position.
Figure 1Northern hybridizations of selected predictions against total RNA from lag, log, and early and late stationary phases confirm 12 of 16 selected transcripts. The figure shows the complete northern blots after low stringency wash. The boxed bands indicate the bands that were still present after repeated washes of higher stringency, but the resulting blots are excluded because of poor resolution and picture quality. The indicated sizes are only approximate sizes because these are individual blots lined up together; see Supplementary Figure 2 for size estimates based on each individual blot. Note that most blots have a ∼120 nt band that corresponds to 5s rRNA.
Transcripts detected by primer extension
| Transcript | Strand | 5′ start | Predicted distance | Size | 5′ gene | 3′ gene | ||
|---|---|---|---|---|---|---|---|---|
| I001 | + | 271804 | 75 | 75 | b0257 | + | ykfC | + |
| I002 | − | 4231116 | 179 | 310 | b4024 (‘lysC’) | − | b4025 (‘pgi’) | + |
| I004 | + | 3766359 | 256 | n/a | o153 (‘yibG’) | + | yibH | − |
| I014 | − | 4374139 | 196 | 300 | o188 (‘efp’) | + | o155 (‘sugE’) | + |
The table lists the transcripts' 5′ ends; their orientation; the distance between the 5′ ends and the predicted transcripts; the transcripts' estimated size; and the name and orientation of 5′ and 3′ flanking genes (relative to the + strand). Note that the I004 5′ start point overlaps prediction HB_200 of Carter and colleagues (12), but we did not detect any northern signal that corresponded to this 5′ start (see Figure 1).