| Literature DB >> 22303350 |
Yohei Okada1, Yutaka Saito, Kengo Sato, Yasubumi Sakakibara.
Abstract
Identification of non-protein-coding RNAs (ncRNAs) in genomes is a crucial task for not only molecular cell biology but also bioinformatics. Secondary structures of ncRNAs are employed as a key feature of ncRNA analysis since biological functions of ncRNAs are deeply related to their secondary structures. Although the minimum free energy (MFE) structure of an RNA sequence is regarded as the most stable structure, MFE alone could not be an appropriate measure for identifying ncRNAs since the free energy is heavily biased by the nucleotide composition. Therefore, instead of MFE itself, several alternative measures for identifying ncRNAs have been proposed such as the structure conservation index (SCI) and the base pair distance (BPD), both of which employ MFE structures. However, these measurements are unfortunately not suitable for identifying ncRNAs in some cases including the genome-wide search and incur high false discovery rate. In this study, we propose improved measurements based on SCI and BPD, applying generalized centroid estimators to incorporate the robustness against low quality multiple alignments. Our experiments show that our proposed methods achieve higher accuracy than the original SCI and BPD for not only human-curated structural alignments but also low quality alignments produced by CLUSTAL W. Furthermore, the centroid-based SCI on CLUSTAL W alignments is more accurate than or comparable with that of the original SCI on structural alignments generated with RAF, a high quality structural aligner, for which twofold expensive computational time is required on average. We conclude that our methods are more suitable for genome-wide alignments which are of low quality from the point of view on secondary structures than the original SCI and BPD.Entities:
Keywords: centroid estimators; non-coding RNAs; structure conservation index
Year: 2011 PMID: 22303350 PMCID: PMC3268607 DOI: 10.3389/fgene.2011.00054
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1The distribution of the reference alignments over the bins of the normalized Shannon entropy. The density of each bar indicates the number of sequences in the alignments.
Figure 2The discrimination capacity of C-SCI, C-BPD, SCI, and BPD in AUC on the reference alignments and the CLUSTAL W alignments for each bin of normalized Shannon entropy.
Area under the ROC curve of all the methods.
| Method | Reference | RAF | CLUSTAL W |
|---|---|---|---|
| C-SCI | 0.937 | 0.942 | 0.837 |
| C-BPD (consensus) | 0.890 | 0.896 | 0.805 |
| C-BPD (pairwise) | 0.744 | 0.747 | 0.655 |
| SCI | 0.795 | 0.776 | 0.632 |
| BPD (consensus) | 0.756 | 0.755 | 0.672 |
| BPD (pairwise) | 0.711 | 0.713 | 0.621 |
Calculation time of each measurement.
| Method | RAF | CLUSTAL W | ||
|---|---|---|---|---|
| Time | Total time | Time | Total time | |
| C-SCI | 0.965 ± 1.65 | 3.05 ± 7.35 | 0.979 ± 1.67 | 1.01 ± 1.71 |
| C-BPD | 0.948 ± 1.63 | 3.03 ± 7.33 | 0.961 ± 1.65 | 0.989 ± 1.69 |
| C-BPD | 0.267 ± 0.444 | 2.35 ± 6.67 | 0.268 ± 0.444 | 0.295 ± 0.477 |
| SCI | 0.157 ± 0.270 | 2.24 ± 6.56 | 0.159 ± 0.273 | 0.187 ± 0.306 |
| BPD | 0.182 ± 0.881 | 2.27 ± 6.61 | 0.159 ± 0.274 | 0.186 ± 0.307 |
| BPD | 0.095 ± 0.158 | 2.18 ± 6.51 | 0.111 ± 0.809 | 0.138 ± 0.817 |
The result of calculation time is shown in seconds. Time: elapsed time for calculating the measurement only. Total time: total elapsed time for aligning sequences and calculating the measurement. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz).
Figure 3Elapsed time of calculating SCI following RAF alignments and C-SCI following CLUSTAL W alignments for the alignments of five sequences with respect to the length of sequences. All the experiments were executed on a Linux machine with AMD Opteron 2200SE (2.8 GHz).
Figure 4An example of the results on CLUSTL W alignments with low entropy. Predicted common secondary structures of five sequences in HIV_PBS family are shown. (A) The common secondary structure predicted by CentroidAlifold with γ = 1.0. (B) The common secondary structure predicted by RNAalifold. (C) The common secondary structure predicted by CentroidAlifold with γ = 1.0 for one of the negative control alignments generated by SISSIz. (D) The common secondary structure predicted by RNAalifold for one of the negative control alignments.