| Literature DB >> 31032840 |
Jan Jelínek1,2, David Hoksza1,3, Jan Hajič1, Jan Pešek1, Jan Drozen1, Tomáš Hladík1, Michal Klimpera1, Jiří Vohradský2, Josef Pánek2.
Abstract
Secondary data structure of RNA molecules provides insights into the identity and function of RNAs. With RNAs readily sequenced, the question of their structural characterization is increasingly important. However, RNA structure is difficult to acquire. Its experimental identification is extremely technically demanding, while computational prediction is not accurate enough, especially for large structures of long sequences. We address this difficult situation with rPredictorDB, a predictive database of RNA secondary structures that aims to form a middle ground between experimentally identified structures in PDB and predicted consensus secondary structures in Rfam. The database contains individual secondary structures predicted using a tool for template-based prediction of RNA secondary structure for the homologs of the RNA families with at least one homolog with experimentally solved structure. Experimentally identified structures are used as the structural templates and thus the prediction has higher reliability than de novo predictions in Rfam. The sequences are downloaded from public resources. So far rPredictorDB covers 7365 RNAs with their secondary structures. Plots of the secondary structures use the Traveler package for readable display of RNAs with long sequences and complex structures, such as ribosomal RNAs. The RNAs in the output of rPredictorDB are extensively annotated and can be viewed, browsed, searched and downloaded according to taxonomic, sequence and structure data. Additionally, structure of user-provided sequences can be predicted using the templates stored in rPredictorDB.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31032840 PMCID: PMC6482342 DOI: 10.1093/database/baz047
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1rPredictorDB architecture.
RNA families included in rPredictorDB and their templates and sources. Names of RNAs are derived from names of PDB structures
|
|
|
|
|
|---|---|---|---|
| 16S rRNA |
| 1542 | |
| 18S rRNA | SILVA |
| 1869 |
| 18S rRNA |
| 1995 | |
| 5S rRNA | Rfam (RF00001) |
| 120 |
| 5S rRNA |
| 120 | |
| 5.8S rRNA | Rfam (RF00002) |
| 169 |
| 6S | Rfam (RF00013) |
| 184 |
|
| 187 | ||
| 9S rRNA | Rfam (RF02545) |
| 621 |
| Cobalamin riboswitch | Rfam (RF00174) |
| 172 |
| C-DI-AMP riboswitch | Rfam (RF00379) |
| 123 |
| CRPV-IRES | Rfam (RF00458) | Mammalian CRPV-IRES (PDB ID 6D9J) | 190 |
| CSFV IRES | Rfam (RF00209) | Viral CSFV IRES (PDB ID 4C4Q) | 233 |
| FMN riboswitch | Rfam (RF00050) | PDB ID 3F2Yf | 112 |
| Fungi U3 | Rfam (RF01846) |
| 333 |
| gcvB | ( |
| 206 |
| GLMS ribosyme | Rfam (RF00234) |
| 141 |
| Group I catalytic intron | Rfam (RF00028) |
| 192 |
| Group II intron lariat | NCBIa |
| 418 |
| Group II intron lariat in post-catalytic statec | NCBIa |
| 621 |
| IRES HCV | Rfam (RF00061) |
| 257 |
| Lariat capping ribozyme | Rfam (RF01807) |
| 188 |
| Lysine riboswitch | Rfam (RF00168) |
| 161 |
| Mammalian CPEB3 ribozyme | Rfam (RF00622) |
| 78 |
| M-box | Rfam (RF00380) |
| 161 |
| micF | Rfam (RF00033) |
| 95 |
| MLV encapsidation signal | Rfam (RF00374) | Viral MLV (PDB ID 1U6P) | 101 |
| ms1 | ( |
| 304 |
| oxyS | Rfam (RF00035) |
| 109 |
| PHI29 PROHEAD RNA | Rfam (RF00044) | Bacteriophage PHI29 (PDB ID 1FOQ) | 117 |
| RNaseP arch | Rfam (RF00373) |
| 347 |
| RNaseP bact a | NCBIa |
| 347 |
| RNaseP bact b | Rfam (RF00011) | PDB ID 2A64f | 414 |
| RNaseP nuc | Rfam (RF00009) |
| 341 |
| ryhB | ( |
| 90 |
| SAM I | Rfam (RF00162) |
| 94 |
| spot42 | Rfam (RF00021) |
| 119 |
(Continued)
Figure 2Snapshots of the rPredictorDB input (a) and output (b) interface. The searched RNA is Sus scrofa 18S rRNA.
Figure 3A snapshot of rPredictorDB output for C. porcelanus TR RNA. Panels (a) and (c) show secondary structure of a template (H. sapiens TR RNA) displayed by RNAplot and Traveler, respectively. Panels (b) and (d) show secondary structure of C. porcelanus TR predicted using H. sapiens TR RNA as a template, displayed by RNAplot and Traveler, respectively.
Figure 4A snapshot of rPredictorDB output for Hepacivirus C IRES HCV RNA. Panels (a) and (c) show secondary structure of the template (Hepacivirus C IRES HCV RNA PDB ID 5A2Q) displayed by RNAplot and Traveler, respectively. Panels (b) and (d) show secondary structure of Hepacivirus C IRES HCV RNA with acc. # U23386 predicted using Hepacivirus C IRES HCV RNA PDB ID 5A2Q as a template, displayed by RNAplot and Traveler, respectively.
Continued
|
|
|
|
|
|---|---|---|---|
| SRP bact small | Rfam (RF00169) |
| 114 |
| SRP bact large | Rfam (RF01854) |
| 266 |
| SRP Metazoa | NCBIa |
| 301 |
| Tetrahymena ribozyme | NCBIa | PDB ID 1X8Wf | 247 |
|
| Rfam (RF00025) |
| 159 |
| THF riboswitch | Rfam (RF01831) | PDB ID 4LVVf | 89 |
| tmRNA | Rfam (RF00023) |
| 377 |
| TPP riboswitch | NCBIa |
| 83 |
| tRNA Gly eukaryotic | Rfam (RF00005) |
| 74 |
| tRNA Gly bacterial |
| 75 | |
| u2 | Rfam (RF00004) |
| 188 |
| u1 | Rfam (RF00003) |
| 163 |
| u4 | Rfam (RF00015) |
| 144 |
| u5 | Rfam (RF00020) |
| 116 |
| u6 | Rfam (RF00026) |
| 112 |
| Vertebrate TR | Rfam (RF00024) |
| 451 |
| Yeast u1 | Rfam (RF00488) |
| 565 |
The sequences were obtained by NCBI BLAST search with ‘somewhat similar sequences’ parameters against nr database with query sequences taken from PDB. The reason was that the sequences in an appropriate Rfam family seemed incompatible with PDB structure, as they either were short fragments or had very low sequence similarity to the PDB sequence.
Sequences and/or template structure were copied from the paper publishing the template structure.
This family contains several very short fragments producing substructures that are hard to match with the template structure. Nevertheless, we included them into rPredictorDB as they had significant BLAST e-values (<1.10−12) and also, as they represent a good example of RNAs with extremely fragmented sequences.
It is impossible to distinguish which template should be used based on taxonomy, as some bacteria, e.g. Firmicutes, contain 6S RNAs of both template types. Therefore, the template producing a structure with a better z-score is used for each 6S RNA.
The template is applied to sequences according to taxonomy, i.e. a eukaryotic template to eukaryotic sequences, a prokaryotic template to prokaryotic sequences.
Organism not described or a synthetic expression system used.