| Literature DB >> 23516022 |
Jayavel Sridhar1, Paramasamy Gunasekaran.
Abstract
Bacterial, small RNAs were once regarded as potent regulators of gene expression and are now being considered as essential for their diversified roles. Many small RNAs are now reported to have a wide array of regulatory functions, ranging from environmental sensing to pathogenesis. Traditionally, noncoding transcripts were rarely detected by means of genetic screens. However, the availability of approximately 2200 prokaryotic genome sequences in public databases facilitates the efficient computational search of those molecules, followed by experimental validation. In principle, the following four major computational methods were applied for the prediction of sRNA locations from bacterial genome sequences: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) 'Orphan' transcriptional signals and (4) ab initio methods regardless of sequence or structure similarity; most of these tools were applied to locate the putative genomic sRNA locations followed by experimental validation of those transcripts. Therefore, computational screening has simplified the sRNA identification process in bacteria. In this review, a plethora of small RNA prediction methods and tools that have been reported in the past decade are discussed comprehensively and assessed based on their attributes, compatibility, and their prediction accuracy.Entities:
Keywords: base composition; comparative genomics; ncRNA; sRNA prediction; structure stability; transcriptional signal
Year: 2013 PMID: 23516022 PMCID: PMC3596055 DOI: 10.4137/BBI.S11213
Source DB: PubMed Journal: Bioinform Biol Insights ISSN: 1177-9322
Figure 1(A) Comparative genomics based protocols utilized in the computational sRNA prediction tools: QRNA, ERPIN, ISI and RNAZ; (B) methodology adapted in the transcriptional signal-based sRNA finders: sRNAscanner and sRNAPredict; (C) sequence based ab initio sRNA detection methods: Atypical GC, RNAGENiE and smyRNA; (D) non-sequence based ab initio sRNA detection methods: PsRNA and NAPP.
Abbreviations: IGR, InterGenic Region; sRNA, small RNA; rRNA, ribosomal RNA; tRNA, transfer RNA; CDS, Coding Domain Sequence; KO, KEGG Orthology; TFBS, Transcription Factor Binding Site.
Summary of the various computational methods applied for sRNA prediction in bacteria.
| 1 | Comparative genomics | It applies SCFG to test and differentiate the alignments in to: COD, RNA and OTH models | First systematic method for ncRNA detection among closely related organisms. Intergenic conservation is considered as indicator of sRNA regions | Restricted to pairwise alignments alone | |||
| Reads multiple sequence alignments and secondary structures to infer Secondary structure profile (SSP) | Complex RNA descriptors are not required. Dynamic programming was applied to search helix and hairpin structures (SSP) with log-odd score and E-value | Multiple sequence alignment and consensus structures are mandatory | |||||
| Search sRNAs based on intergenic conservation (IGR), RNA structural features and terminators | ISI has retained many sRNAs in | Conserved IGRs without flanking promoters and terminators are missed | |||||
| HMM based covariance model (CM) was used to build RNA secondary structure and search | |||||||
| Detects RNA specific common stems from multiple sequence alignments using distribution-mixture method | CM based search of particular RNA against genomes are computationally efficient | False positives are reported. Novel predictions are not possible | |||||
| RNA structure and thermodynamic stability based methods | RNAZ applies SVM based structural regression analysis to compute z-score and differentiate the minimal free energy structures | It applies RNAFOLD to generate secondary structure from sequence alignments | It can handle alignments with minimum of 10 sequences. | ||||
| It is part of sRNA annotation pipeline used in Rfam database. RNAZ can be applied for large scale genomic screens | It requires a fixed sequence alignment as input. Poor sensitivity with low pairwise sequence identity | ||||||
| 2 | Transcriptional signal based sRNA finders | Generic sRNA finder applied for any genome with specific training data | sRNA specific promoters, terminator signals were applied to identify IGR sRNAs. It predicts maximum number of known sRNAs in enterobacteriaceae | Current dataset has sensitivity with medium and low %GC genomes | |||
| Coordinate based algorithms to integrate the locations of promoters/TFBS, terminators along with sequence conservation | |||||||
| 3 | Sequence dependent Ab-initio sRNA detection methods | Compute G and C content of a particular position using sliding window and predicts RNA regions | Simple method to predict the sRNA locations with existing information from other databases | Fully depend on the information from other databases. Not possible to work with strains not indexed in other databases. | |||
| Known RNA structural elements were trained with neural networks and applied to differentiate RNA and non-RNA genes | Continuous atypical positions only considered as possible RNA regions | Not applied for particular RNA family | |||||
| Utilizes differential distributions of sequence motifs between ncRNAs and background genome sequences | Functional RNA elements double helices, uridine turns, UNCG loops, tetraloop receptors and mis-pairs are trained. It has high accuracy if motifs added with free energy of folding | Reliability is questionable due to lack of experimental validation | |||||
| 4 | Sequence independent Ab-initio sRNA detection methods | It uses KEGG orthology numbers of the flanking genes to locate the sRNA specific intergenic regions | Maximally scoring substrings of the input genome above the threshold are identified as RNA regions | Family specific RNA identification is not possible | |||
| IGR’s of reference genome are tiled into 50 nt segments and classified based on their occurrence profile in 1000 genomes | First Orthology based method successfully applied to predict sRNA specific gene clusters | Identification of ‘novel’ sRNAs and flanking genes not having KO numbers are not possible | |||||
| Search of ‘RNA-rich’ cluster in query genomes will identify sRNAs | Search is only restricted with the sRNAs reported in the reference and tracking of ‘novel’ sRNA is not possible |