| Literature DB >> 24755880 |
Rolf Backofen1, Fabian Amman2, Fabrizio Costa3, Sven Findeiß4, Andreas S Richter5, Peter F Stadler6.
Abstract
The genome of most prokaryotes gives rise to surprisingly complex transcriptomes, comprising not only protein-coding mRNAs, often organized as operons, but also harbors dozens or even hundreds of highly structured small regulatory RNAs and unexpectedly large levels of anti-sense transcripts. Comprehensive surveys of prokaryotic transcriptomes and the need to characterize also their non-coding components is heavily dependent on computational methods and workflows, many of which have been developed or at least adapted specifically for the use with bacterial and archaeal data. This review provides an overview on the state-of-the-art of RNA bioinformatics focusing on applications to prokaryotes.Entities:
Keywords: RNA bioinformatics; RNA–RNA interaction; TSS annotation; gene finding; secondary structure prediction; target prediction
Mesh:
Substances:
Year: 2014 PMID: 24755880 PMCID: PMC4152356 DOI: 10.4161/rna.28647
Source DB: PubMed Journal: RNA Biol ISSN: 1547-6286 Impact factor: 4.652

Figure 1. Comparison of automated TSS annotation from dRNA-seq data with TTSpredator and TSSAR. The upper plot pair shows the mapped read coverage in the treated (L+) and untreated (L-) library for an exemplary region from H. pylori dRNA-seq data. Blue dashed lines indicate TSS annotated by TTS predator (using default parameter). The middle plot pair shows essentially the same data, but only the read start coverage is plotted. This is how TSSAR looks at the data. Dashed red lines indicate TSS annotated by TSSAR (P value cutoff of 10−4). The bottom part shows the positions of the annotated genes in the considered region. The read coverage plots indicate that the data produced by dRNA-seq is more complex than it might appear from the method description; therefore, statistical data analysis is required.

Figure 2. Evolutionary signals are used to classify multiple sequence alignments into non- or protein-coding. RNAz combines structural and thermodynamic descriptors and measures of sequence conservation to detect excess conservation of secondary structure, while RNAcode identifies increased conservation of putative ORFs compared with the observed sequence conservation of the nucleic acid sequences. Well-conserved structured RNAs, such as Xanthomonas sX13, which is involved in virulence-specific gene expression and hfq mRNA regulation, can easily be identified with RNAz. The E. coli transcript C0343, originally annotated as a small RNA, does not exhibit typical features of a structured RNA. Instead, RNAcode reveals a well-conserved short coding sequence. Dual transcripts such as B. subtilis sR1 are detectable by both RNAz and RNAcode.

Figure 3. Features describing a secondary structure graph. Each graph is described by the set of all neighborhood subgraphs (indicated by shaded areas) up to a maximal radius r around a reference nucleotide (marked by a circle).
Table 1. Web server for genome-scale prediction of sRNA target genes
| Name | Features for target prediction | Classifier | Functional enrichment | URL of web server | References | ||
|---|---|---|---|---|---|---|---|
| Conservation | Accessibility | Seed region | |||||
| CopraRNA | X | X | X | - | X | ||
| IntaRNA | - | X | X | - | X | ||
| RNApredator | - | X | - | - | X | ||
| sRNATarget | - | - | X | X | - | ||
| sTarPicker | - | X | X | X | - | ||
| TargetRNA2 | X | X | X | - | - | ||
All web servers are based on computational methods that score the sRNA–target interaction by their hybridization energy and by additional features as indicated in the table. Some servers directly allow for functional enrichment analysis of the highest-ranking target predictions.

Figure 4. Comparative prediction of sRNA targets as implemented in the CopraRNA pipeline. For a given pair of sRNA and mRNA sequences, the associated homologs are selected. In the next step, the best interaction in each species is determined and scored by its P value. Finally, all species-specific P values are combined into a single joint P value while taking the evolutionary distances into account.