| Literature DB >> 26509713 |
Corinna Theis1, Craig L Zirbel2, Christian Höner Zu Siederdissen3, Christian Anthon4, Ivo L Hofacker5, Henrik Nielsen6, Jan Gorodkin1.
Abstract
Recent experimental and computational progress has revealed a large potential for RNA structure in the genome. This has been driven by computational strategies that exploit multiple genomes of related organisms to identify common sequences and secondary structures. However, these computational approaches have two main challenges: they are computationally expensive and they have a relatively high false discovery rate (FDR). Simultaneously, RNA 3D structure analysis has revealed modules composed of non-canonical base pairs which occur in non-homologous positions, apparently by independent evolution. These modules can, for example, occur inside structural elements which in RNA 2D predictions appear as internal loops. Hence one question is if the use of such RNA 3D information can improve the prediction accuracy of RNA secondary structure at a genome-wide level. Here, we use RNAz in combination with 3D module prediction tools and apply them on a 13-way vertebrate sequence-based alignment. We find that RNA 3D modules predicted by metaRNAmodules and JAR3D are significantly enriched in the screened windows compared to their shuffled counterparts. The initially estimated FDR of 47.0% is lowered to below 25% when certain 3D module predictions are present in the window of the 2D prediction. We discuss the implications and prospects for further development of computational strategies for detection of RNA 2D structure in genomic sequence.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26509713 PMCID: PMC4624896 DOI: 10.1371/journal.pone.0139900
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Graphical outline of the approach.
The 13-way alignment is sliced into windows of size 40–120 nts. The windows are shuffled by SISSIz and both the original and shuffled data are scanned by RNAz for thermodynamically stable and evolutionarily conserved candidates (p-score > 0.9). For those windows the false discovery rate is estimated (FDR). Both the original and shuffled windows are also scanned by metaRNAmodules (mRm) and JAR3D for reliable 3D module predictions. The windows with and without 3D module predictions are counted for a Fisher’s exact test (FET). Windows with 3D module predictions are also used to estimate the FDR.
Fig 2Details of the genome wide scan.
(a) Windows (boxed) with the H. sapiens sequence as reference are screened with RNAz for (b) structural motifs. The metaRNAmodules models are applied on each sequence of the window (c) whereas the JAR3D models are applied on loop regions of the RNAz consensus structure (d). In both cases, predictions of different models can overlap. The same procedure is used to scan the shuffled data.
Contingency table for mRm model RF00177_235_2HHH_1380_1383_1471_1475 showing the number of original (second column) and shuffled windows (third column) with the (“3D module+”) and without the module (“3D module–”).
Furthermore, the respective fold change (the ratio of 3D module- and 3D module+ windows and the ratio between original and shuffled windows with and without 3D modules) and the sum (“Total”) of each row and column is shown. The adjusted p-value of the FET equals 9.8 × 10−4 with an odds ratio of (227/111 833)/(62/53 326) = 1.746.
| FET | original | shuffled | Fold change | Total |
|---|---|---|---|---|
|
| 227 | 62 | 3.7 | 289 |
|
| 111833 | 53326 | 2.1 | 165159 |
|
| 492.7 | 860.1 | ||
|
| 112060 | 53388 | 16548 |
False discovery rates for RNAz candidates (p-score > 0.9, 0.25 ≤ GC content ≤ 0.75) and 3D module predictions.
Only 3D modules that are enriched and with an odds ratio ≥ 1.25 are taken into account. Furthermore, JAR3D IL modules have average 3D sequence length ≥ 9. Subscripts denote which method for module prediction is used. “win/win” shows the number of shuffled and original windows that match a 3D module.
| FDR | |
|---|---|
|
| 47.0% |
|
| 34.1% |
| win | 815/2390 |
|
| 36.0% |
| win | 1224/3401 |
|
| 37.0% |
| win | 2020/5465 |
|
| 27.8% |
| win | 25/90 |
|
| 24.8% |
| win | 25/101 |
|
| 26.4% |
| win | 37/140 |