| Literature DB >> 17020924 |
Abstract
The knowledge about classes of non-coding RNAs (ncRNAs) is growing very fast and it is mainly the structure which is the common characteristic property shared by members of the same class. For correct characterization of such classes it is therefore of great importance to analyse the structural features in great detail. In this manuscript I present RNAlishapes which combines various secondary structure analysis methods, such as suboptimal folding and shape abstraction, with a comparative approach known as RNA alignment folding. RNAlishapes makes use of an extended thermodynamic model and covariance scoring, which allows to reward covariation of paired bases. Applying the algorithm to a set of bacterial trp-operon leaders using shape abstraction it was able to identify the two alternating conformations of this attenuator. Besides providing in-depth analysis methods for aligned RNAs, the tool also shows a fairly well prediction accuracy. Therefore, RNAlishapes provides the community with a powerful tool for structural analysis of classes of RNAs and is also a reasonable method for consensus structure prediction based on sequence alignments. RNAlishapes is available for online use and download at http://rna.cyanolab.de.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17020924 PMCID: PMC1636479 DOI: 10.1093/nar/gkl692
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Analysis of aligned tRNAs. A ClustalW alignment of 10 arbitrarily chosen tRNAs from Rfam was analysed with RNAlishapes. (A) The consensus structure predicted by RNAlishapes drawn as a squiggle plot using RNAplot from the Vienna RNA package (62). The sequence corresponds to the sequence of the most frequent base at each position. Colours indicate different stems (see B). (B) The alignment produced by ClustalW. Additionally the consensus structure is given on the last line together with the score in parentheses. The different stems are colour coded in the alignment as well as in the consensus structure. Note, that helical regions do not need to have the same length in all sequences. (C) Output of RNAlishapes, when running in shape probabilistic mode. Four consensus shapes with a probability >10−6 have been predicted. For each the free energy and the dot-bracket representation of the shrep (both on the first line), the probability of the shape and the shape notation (both on the second line) are computed.
Figure 2Alternate consensus structures for trp-Attenuators. Analysis of trp-operon leaders from different Corynebacterium spp. and Streptomyces spp. (A) MmFE structure for the alignment shown in (C). The blue hairpin corresponds to the terminator hairpin. (B) Shrep of the second best shape. The consensus structure comprises the same sequence regions as the structure in (A), making these two structures mutually exclusive. (C) Alignment of eight trp-operon leaders from different Corynebacterium spp. and Streptomyces spp. Colours indicate the different stems. Bases paired in both alternative structures are coded by the mixed colour.
Figure 3Consensus structure of T-box leader. T-box leader sequences from 16 species have been aligned using ClustalW. The resulting alignment has an average percentage identity of ∼59.1% and shows gap-rich regions. Consensus structures and their score (in parentheses) computed by RNAlishapes and RNAalifold are shown on the second last and last line, respectively.
Prediction accuracy for data-set I
| Algorithm | PI | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| % Sen. | % Sel. | MCC | % Sen. | % Sel. | MCC | % Sen. | % Sel. | MCC | ||
| RNAlishapes | H | 100.0 | 100.0 | 1.000 | 60.0 | 59.5 | 0.596 | 70.9 | 75.3 | 0.731 |
| M | 100.0 | 100.0 | 1.000 | 69.1 | 75.2 | 0.720 | 81.2 | 88.4 | 0.847 | |
| RNAalifold* | H | 90.5 | 100.0 | 0.950 | 78.9 | 77.8 | 0.782 | 59.8 | 60.6 | 0.601 |
| M | 77.8 | 100.0 | 0.880 | 57.4 | 57.4 | 0.571 | 84.4 | 92.1 | 0.881 | |
| ILM* | H | 76.2 | 69.6 | 0.722 | 43.7 | 36.5 | 0.395 | 51.3 | 43.0 | 0.469 |
| M | 100.0 | 75.0 | 0.863 | 70.4 | 55.1 | 0.620 | 59.9 | 51.5 | 0.554 | |
| PFOLD* | H | 95.2 | 100.0 | 0.975 | 66.2 | 88.7 | 0.765 | 70.9 | 92.6 | 0.810 |
| M | 100.0 | 100.0 | 1.000 | 87.0 | 92.2 | 0.895 | n.c. | n.c. | n.c. | |
Sensitivity (Sen.), selectivity (Sel.) and correlation coefficient (MCC, Matthews correlation coefficient) for consensus structures predicted by RNAlishapes, RNAalifold, ILM and Pfold for RNase P, tRNA-PHE and SSU rRNA. PI = mean pairwise sequence identity, H = high, M = medium, n.c. = not computed, * = data taken from Ref. (26)].
Prediction accuracy for data-set II
| RNA family | Alignment | % Sensitivity | % Selectivity | Correlation | |||||
|---|---|---|---|---|---|---|---|---|---|
| Source | Length (nt) | PI (%) | S | F | S | F | S | F | |
| U5 (RF00020) | Rfam | 122 | ∼85 | 96.7 | 96.7 | 100.0 | 100.0 | 0.983 | 0.983 |
| ClustalW | 122 | ∼85 | 93.3 | 96.6 | 0.949 | ||||
| 5S (RF00001) | Rfam | 120 | ∼85 | 61.8 | 61.8 | 80.8 | 80.8 | 0.703 | 0.703 |
| ClustalW | 120 | ∼84 | 61.8 | 72.4 | 0.664 | ||||
| Group II intron (RF00029) | Rfam | 84 | ∼83 | 100.0 | 100.0 | 100.0 | 100.0 | 1.000 | 1.000 |
| ClustalW | 84 | ∼83 | 89.5 | 89.5 | 94.4 | 0.918 | |||
| SRP bact. (RF00169) | Rfam | 104 | ∼84 | 90.0 | 90.0 | 93.1 | 93.1 | 0.914 | 0.914 |
| ClustalW | 104 | ∼83 | 90.0 | 93.1 | 0.914 | ||||
| SRP euk. (RF00017) | Rfam | 310 | ∼83 | 82.6 | 93.7 | 0.890 | |||
| ClustalW | 310 | ∼83 | 66.3 | 66.3 | 72.2 | 0.690 | |||
| 6S (RF00013) | Rfam | 203 | ∼66 | 69.8 | 69.8 | 75.5 | 75.5 | 0.724 | 0.724 |
| ClustalW | 203 | ∼66 | 58.5 | 72.1 | 0.647 | ||||
Selectivity, sensitivity and correlation (Matthews correlation coefficient) for consensus structures predicted by RNAlishapes (S) and RNAalifold (F) for U5 snRNA, 5S RNA, Group II intron, bacterial signal recognition particle (SRP) RNA, euk. SRP RNA and 6S RNA. In the case one approach performs better, the corresponding value is given in bold. (PI = mean pairwise sequence identity, Rfam = Rfam seed alignment, ClustalW = realigned sequences from Rfam seed alignment using ClustalW).