| Literature DB >> 24005040 |
Corinna Theis1, Christian Höner Zu Siederdissen, Ivo L Hofacker, Jan Gorodkin.
Abstract
Recent progress in predicting RNA structure is moving towards filling the 'gap' in 2D RNA structure prediction where, for example, predicted internal loops often form non-canonical base pairs. This is increasingly recognized with the steady increase of known RNA 3D modules. There is a general interest in matching structural modules known from one molecule to other molecules for which the 3D structure is not known yet. We have created a pipeline, metaRNAmodules, which completely automates extracting putative modules from the FR3D database and mapping of such modules to Rfam alignments to obtain comparative evidence. Subsequently, the modules, initially represented by a graph, are turned into models for the RMDetect program, which allows to test their discriminative power using real and randomized Rfam alignments. An initial extraction of 22 495 3D modules in all PDB files results in 977 internal loop and 17 hairpin modules with clear discriminatory power. Many of these modules describe only minor variants of each other. Indeed, mapping of the modules onto Rfam families results in 35 unique locations in 11 different families. The metaRNAmodules pipeline source for the internal loop modules is available at http://rth.dk/resources/mrm.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24005040 PMCID: PMC3905863 DOI: 10.1093/nar/gkt795
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The Figure shows on overview of the metaRNAmodules pipeline. metaRNAmodules extracts putative modules from FR3D, a database derived from PDB (A). During the mapping step on modified Rfam alignments (B), a large fragment of the modules is filtered out. The training of the new models (C) filters out further modules. The remaining modules are filtered and ranked according to (D).
Rfam families and cluster representatives
| Family | No. of modules | PDB res. | Rep. | Alignment+ pos. | Fit |
|---|---|---|---|---|---|
| 5S rRNA | 3 | 9:23-30/54-60 | 2GYC | 36-68/112-136 | |
| 9:32-37/43-48 | 3CCQ | 73-87/99-105 | |||
| A:71-79/97-105 | 2QBE | 154-168/193-211 | |||
| U4 snRNA | 1 | F:28-34/42-45 | 2OZB | 30-36/46-49 | |
| SR77777P RNA | 2 | A:181-187/212-216 | 2J37 | 254-261/286-290 | |
| B:190-194/205-209 | 1L9A | 264-268/279-283 | |||
| TPP riboswitch | 1 | X:59-63/76-80 | 2GDI | 272-277/290-295 | |
| SAM riboswitch | 1 | A:17-21/31-38 | 2GIS | 18-22/35-42 | |
| Purine riboswitch | 1 | X:22-25/45-52 | 1Y26 | 28-31/52-60 | |
| Bact. SRP RNA | 1 | A:14-18/29-33 | 1CQ5 | 49-53/65-69 | |
| Bact. SSU rRNA | 16 | A:1124-1132/1142-1149 | 1XNQ | 1241-1249/1270-1278 | |
| A:147-153/168-175 | 1N36 | 160-166/181-188 | |||
| A:1246-1253/1284-1291 | 2UXD | 1377-1384/1415-1423 | |||
| A:1303-1307/1330-1334 | 1HNW | 1435-1439/1462-1466 | |||
| A:1384-1387/1475-1479 | 2HHH | 1534-1537/1646-1650 | |||
| A:1429-1436/1465-1471 | 2VHO | 1562-1569/1614-1620 | |||
| A:242-247/277-284 | 2QPO | 298-303/334-343 | |||
| A:409-417/426-433 | 1VOV | 468-476/487-494 | |||
| A:446-455/477-488 | 1N36 | 507-516/535-564 | |||
| A:502-512/539-543 | 2B64 | 579-589/616-620 | |||
| A:515-522/527-536 | 2QBF | 592-599/604-613 | |||
| A:682-688/699-708 | 2GY9 | 761-767/778-788 | |||
| A:63-69/99-103 | 2HGR | 63-70/106-113 | |||
| A:779-783/799-803 | 2B9M | 861-865/881-885 | |||
| A:826-829/857-874 | 1HRO | 908-911/942-959 | |||
| A:887-894/905-910 | 2QB9 | 972-979/990-995 | |||
| PK-G12 23S rRNA | 1 | B:2295-2299/2317-2337 | 3BBX | 14-18/36-68 | |
| Arch. SRP RNA | 1 | B:193-199/208-214 | 1QZW | 254-260/269-275 | |
| 5S rRNA | 1 | 9:33-49 | 2GYC | 73-105 | |
| tRNA | 1 | C:912-926 | 1WZ2 | 15-31 | |
| Purine riboswitch | 1 | X:31-39 | 1Y26 | 37-46 | |
| Bact. SSU rRNA | 70 | A:507-524 | 1IBM | 584-601 | |
| A:320-333 | 2J02 | 379-392 | |||
| A:341-348 | 1N36 | 400-407 | |||
| A:689-698 | 1VS7 | 770-779 | |||
The Table presents Rfam families and the number of 3D modules for each family after merging the modules depending on family and position. Ten families on top represent internal loop modules, four families below represent hairpin modules. For each merged cluster, a representative, namely, the model with maximal , is shown. Columns three and four denote the chain and residue numbers of the PDB sequence as well as the PDBid where the representative is extracted. ‘Alignment pos.’ indicates the position of the module in Rfam with the aligned full sequence. The last column shows how good the representative fits into the consensus secondary structure of Rfam.
****means the module is located in a single stranded region, i.e. it fits very well in the consensus secondary structure.
***indicates a fairly well fit overlapping no more than two paired bases of the consensus structure.
**denotes representatives which fit less well because the paired bases of the modules overlap between three and five base pairs of stem regions of the consensus structure.
*imply that more than half of the paired bases match an already paired base of the consensus structure.
Figure 2.The Figure shows the absolute score distributions (‘A’) and density plots (‘B’) for the FR3D model of the U4 spliceosomal RNA kink-turn with structure (…(((&)))) on and . The coloured right tails show 1- , i.e. all values the 80% quantile values (8.7 and 20.5, respectively), of each distribution. and show the mean of each coloured region and = 9.7.
Minimal and maximal for different quantile values
| Min. | max. | % | ||
|---|---|---|---|---|
| Internal loop modules | ||||
| 0.80 | –7.23 | 33.66 | 977 | 49.3 |
| 0.85 | –9.57 | 33.76 | 958 | 48.3 |
| 0.90 | –11.99 | 35.32 | 832 | 42.0 |
| 0.95 | –11.99 | 35.45 | 572 | 28.9 |
Minimal and maximal for different quantile values P as well as the number of models with and the percentage of 1982 models.
Figure 3.Rfam alignment with mapped modules. Two modules (Module 1 and Module 2) with different structures map on Rfam SRP family (RF00017) at position 270–274/285–289. Highlighted regions below the alignment show the module base pairs.’’ are single base pairs of canonical or non-canonical pairing type, whereas ‘’ denote bases pairing twice, i.e. to two nucleotides of type ‘’ or ‘’ [see (34)]. Rfam sec. struc. is the consensus secondary structure of Rfam. ‘’ are used for ‘internal’ helices enclosing a multi-furcation of all terminal stems, ‘’ show simple terminal stems, ‘’ and ‘’ denote unpaired bases of a hairpin loop and a bulge loop, respectively, and ‘.’ are insertions relative to a known structure. The histogram below shows the level of conservation of the alignment bases. The modules fit very well in the consensus secondary structure. Covariation information for model training can be obtained for four columns (270, 271, 285 and 290). The other columns are highly conserved. The consensus barplot below means the higher a bar the more conserved.