| Literature DB >> 23887659 |
Giovanni Bussotti1, Cedric Notredame, Anton J Enright.
Abstract
In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23887659 PMCID: PMC3759867 DOI: 10.3390/ijms140815423
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Number of publications in PubMed found using the keyword “ncRNA” (dark grey) and “regulatory RNA” (pale gray). The x-axis represents the timeline, the y-axis the number of times the words “ncRNA” and “regulatory RNA” match a publication in PubMed normalized by the total number of publications in that year (expressed as one part per ten thousand).
Figure 2RNA mutations are tightly linked to the RNA structure conservation. (a) Example where the mutation of a C into an A is compensated by the change G–U. The two positions are not independent, but communicating one with the other to maintain the structure unvaried; (b) Same hairpin as shown in (a). The presence of the compensatory mutation is highlighted by the multiple sequence comparison.
A summary of methods, datasets and browsers for non-coding RNA analysis. The first column indicates the resource type. The second column the resource name. The third column reports the PubMed ID when available, if not the web address. The fourth column provides a brief description of the resource.
| Resource | Pubmed ID | Description | |
|---|---|---|---|
| Comparing ncRNAs (Section 2) | Mfold | 6163133 | Single sequence RNA secondary structure prediction. |
| RNAfold | 12824340 | ||
| WAR | 18492721 | WEB server allowing the execution of different alignment methods | |
| RNAalifold | 12079347 | Folding previously aligned RNAs (Plan A) | |
| PFOLD | 12824339 | ||
| ILM | 14693809 | ||
| Construct | 10518612 | ||
| Dynalign | 11902836 | Sankoff derived algorithm for the simultaneous alignment and secondary structure prediction (Plan B) | |
| Foldalign | 9278497 | ||
| Stemloc | 15790387 | ||
| Consan | 16952317 | ||
| pmmulti | 15073017 | ||
| R-Coffee | 18420654 | Aligners taking into account previously estimated secondary structure (Plan C) | |
| RNAcast | 16020472 | ||
| SARA | 18689811 | 3D structure alignment method | |
| DIAL | 17567620 | ||
| iPARTS | 20507908 | ||
| ARTS | 16204124 | ||
| SARSA | 18502774 | ||
| LaJolla | |||
| FRASS | 20553602 | ||
| Detecting ncRNAs (section 3) | ML-heuristic | 16267089 | Profile HMM |
| RAGA | 9358168 | Genetic algorithm | |
| RSEARCH | 14499004 | Covariance model | |
| Infernal | 12095421 | ||
| BlastR | 21624887 | BLAST-based dinucleotide homology search | |
| Datasets and browsers (section 4) | ENCODE | 22955616 | Consortium |
| Ensembl | 22086963 | ||
| FANTOM | 11217851 | ||
| HAVANA | Annotation team | ||
| GENCODE | 22955987 | Project for the annotation of all human gene features | |
| UCSC | 12045153 | Genome browser | |
| VEGA | 18003653 | ||
| RefSeq | 18927115 | Collection of DNA, transcripts, and proteins | |
| Rfam | 12520045 | ncRNA database | |
| NRED | 18829717 | ||
| lncRNAdb | 21112873 | ||
| RNAdb | 17145715 | ||
| fRNAdb | 17099231 | ||
| NONCODE | 15608158 |
Figure 3Consistency of RNA secondary structure predictions. In this example the human mir-3180 (Rfam accession id RF02010; AJ323057.1/363-249) was folded using different approaches yielding different output structures. (a) Secondary structure of the family as estimated by Rfam release 10.1; (b) RNAfold web server prediction based on Vienna RNA package version 2.0.0. [93]; (c) Mfold web server prediction, running Mfold version 4.6 [71].
Figure 4Number of non-coding and protein-coding genes annotated over the last Ensembl releases. The x-axis indicates the number and the date of the release. The vertical axis reports the number of ncRNA (blue line) and protein-coding genes (red line).