| Literature DB >> 22537005 |
Yingfeng Wang1, Amir Manzour, Pooya Shareghi, Timothy I Shaw, Ying-Wai Li, Russell L Malmberg, Liming Cai.
Abstract
BACKGROUND: The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22537005 PMCID: PMC3358654 DOI: 10.1186/1471-2105-13-S5-S1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparisons of averaged Z-score of Shannon base pairing entropies. Comparisons of averaged Z-score of Shannon base pairing entropies computed by NUPACK, RNAfold, and TRIPLE for each of the 13 ncRNA datasets downloaded from [18].
Comparisons of TRIPLE and NUPACK by the percentages of sequences falling in each category of a Z-score range.
| ncRNA | Method | Z ≥ 2.0 | Z ≥ 1.5 | Z ≥ 1.0 | Z ≥ 0.5 |
|---|---|---|---|---|---|
| Hh1 | TRIPLE | 26.67 | 40.00 | 53.33 | 73.33 |
| NUPACK | 0.00 | 0.00 | 20.00 | 53.33 | |
| sno_guide | TRIPLE | 14.43 | 24.45 | 38.39 | 58.19 |
| NUPACK | 0.73 | 8.80 | 27.63 | 45.23 | |
| sn_splice | TRIPLE | 40.51 | 50.63 | 60.76 | 65.82 |
| NUPACK | 3.80 | 18.99 | 48.10 | 70.89 | |
| SRP | TRIPLE | 35.06 | 44.16 | 59.74 | 67.53 |
| NUPACK | 3.90 | 36.36 | 72.73 | 85.71 | |
| tRNA | TRIPLE | 29.56 | 51.33 | 70.97 | 86.02 |
| NUPACK | 0.00 | 2.30 | 12.04 | 32.21 | |
| intron | TRIPLE | 60.75 | 69.16 | 78.50 | 85.98 |
| NUPACK | 1.87 | 19.63 | 61.68 | 85.05 | |
| riboswitch | TRIPLE | 34.64 | 48.37 | 60.13 | 78.43 |
| NUPACK | 1.96 | 18.95 | 45.75 | 69.28 | |
| miRNA | TRIPLE | 81.48 | 88.89 | 94.07 | 97.04 |
| NUPACK | 0.00 | 12.59 | 68.15 | 97.78 | |
| telomerase | TRIPLE | 29.41 | 35.29 | 41.18 | 58.82 |
| NUPACK | 11.76 | 17.65 | 35.29 | 47.06 | |
| RNase | TRIPLE | 50.70 | 70.42 | 81.69 | 92.25 |
| NUPACK | 5.63 | 23.94 | 48.59 | 72.54 | |
| regulatory | TRIPLE | 22.41 | 24.14 | 32.76 | 56.90 |
| NUPACK | 1.72 | 3.45 | 18.97 | 51.72 | |
| tmRNA | TRIPLE | 18.64 | 32.20 | 45.76 | 55.93 |
| NUPACK | 1.69 | 8.47 | 27.12 | 37.29 | |
| rRNA | TRIPLE | 36.16 | 50.62 | 70.87 | 83.06 |
| NUPACK | 4.75 | 21.07 | 42.56 | 61.16 |
Random sequences were obtained with di-nucleotide shuffling of the real ncRNA sequences.
Comparisons of TRIPLE and NUPACK by the percentages of sequences falling in each category of a Z-score range.
| ncRNA | Method | Z ≥ 2 | Z ≥ 1.5 | Z ≥1 | Z ≥ 0.5 |
|---|---|---|---|---|---|
| Hh1 | TRIPLE | 6.67 | 33.33 | 53.33 | 73.33 |
| NUPACK | 0.00 | 0.00 | 20.00 | 60.00 | |
| sno_guide | TRIPLE | 14.91 | 25.43 | 41.10 | 57.95 |
| NUPACK | 0.98 | 9.05 | 28.85 | 45.72 | |
| sn_splice | TRIPLE | 31.65 | 43.04 | 56.96 | 65.82 |
| NUPACK | 5.06 | 26.58 | 51.90 | 69.62 | |
| SRP | TRIPLE | 32.47 | 45.45 | 55.84 | 68.83 |
| NUPACK | 3.90 | 37.66 | 72.73 | 87.01 | |
| tRNA | TRIPLE | 24.07 | 45.31 | 64.25 | 79.47 |
| NUPACK | 0.00 | 2.12 | 14.69 | 33.45 | |
| intron | TRIPLE | 59.81 | 68.22 | 74.77 | 84.11 |
| NUPACK | 1.87 | 22.43 | 66.36 | 85.98 | |
| riboswitch | TRIPLE | 32.03 | 44.44 | 56.86 | 71.90 |
| NUPACK | 1.96 | 21.57 | 46.41 | 69.28 | |
| miRNA | TRIPLE | 75.56 | 81.48 | 90.37 | 93.33 |
| NUPACK | 0.00 | 9.63 | 70.37 | 98.52 | |
| telomerase | TRIPLE | 23.53 | 29.41 | 41.18 | 58.82 |
| NUPACK | 5.88 | 29.41 | 29.41 | 52.94 | |
| RNase | TRIPLE | 38.03 | 56.34 | 72.54 | 87.32 |
| NUPACK | 10.56 | 26.06 | 52.11 | 76.06 | |
| regulatory | TRIPLE | 18.97 | 25.86 | 31.03 | 51.72 |
| NUPACK | 0.00 | 1.72 | 24.14 | 50.00 | |
| tmRNA | TRIPLE | 15.25 | 27.12 | 38.98 | 57.63 |
| NUPACK | 3.39 | 6.78 | 27.12 | 42.37 | |
| rRNA | TRIPLE | 34.09 | 47.31 | 64.88 | 79.96 |
| NUPACK | 6.40 | 21.69 | 43.19 | 60.74 |
Random sequences were obtained with single nucleotide shuffling of the real ncRNA sequences.
Comparisons of TRIPLE and RNAfold by the percentages of sequences falling in each category of a Z-score range.
| Dataset | Method | ≥2 (%) | ≥1.5 (%) | ≥1(%) | ≥0.5 (%) |
|---|---|---|---|---|---|
| Hh1 | TRIPLE | 26.67 | 40.00 | 53.33 | 73.33 |
| RNAfold | 0.00 | 0.00 | 20.00 | 53.33 | |
| sno_guide | TRIPLE | 14.43 | 24.45 | 38.39 | 58.19 |
| RNAfold | 1.71 | 7.82 | 23.96 | 43.03 | |
| sn_splice | TRIPLE | 40.51 | 50.63 | 60.76 | 65.82 |
| RNAfold | 6.33 | 21.52 | 54.43 | 69.62 | |
| SRP | TRIPLE | 35.06 | 44.16 | 59.74 | 67.53 |
| RNAfold | 5.19 | 24.68 | 58.44 | 71.43 | |
| tRNA | TRIPLE | 29.56 | 51.33 | 70.97 | 86.02 |
| RNAfold | 0.18 | 4.25 | 24.78 | 47.96 | |
| intron | TRIPLE | 60.75 | 69.16 | 78.50 | 85.98 |
| RNAfold | 2.80 | 17.76 | 60.75 | 84.11 | |
| riboswitch | TRIPLE | 34.64 | 48.37 | 60.13 | 78.43 |
| RNAfold | 0.65 | 17.65 | 47.06 | 70.59 | |
| miRNA | TRIPLE | 81.48 | 88.89 | 94.07 | 97.04 |
| RNAfold | 0.00 | 7.41 | 65.93 | 97.78 | |
| telomerase | TRIPLE | 29.41 | 35.29 | 41.18 | 58.82 |
| RNAfold | 0.00 | 23.53 | 41.18 | 58.82 | |
| RNase | TRIPLE | 50.70 | 70.42 | 81.69 | 92.25 |
| RNAfold | 1.41 | 12.68 | 34.51 | 59.15 | |
| regulatory | TRIPLE | 22.41 | 24.14 | 32.76 | 56.90 |
| RNAfold | 0.00 | 6.90 | 27.59 | 63.79 | |
| tmRNA | TRIPLE | 18.64 | 32.20 | 45.76 | 55.93 |
| RNAfold | 1.69 | 10.17 | 33.90 | 50.85 | |
| rRNA | TRIPLE | 36.16 | 50.62 | 70.87 | 83.06 |
| RNAfold | 1.45 | 15.70 | 35.33 | 56.82 |
Random sequences were obtained with di-nucleotide shuffling of the real ncRNA sequences.
Comparisons of TRIPLE and RNAfold by the percentages of sequences falling in each category of a Z-score range.
| Dataset | Method | ≥2 (%) | ≥1.5 (%) | ≥1 (%) | ≥0.5 (%) |
|---|---|---|---|---|---|
| Hh1 | TRIPLE | 6.67 | 33.33 | 53.33 | 73.33 |
| RNAfold | 0.00 | 0.00 | 20.00 | 53.33 | |
| sno_guide | TRIPLE | 14.91 | 25.43 | 41.10 | 57.95 |
| RNAfold | 1.47 | 7.33 | 24.21 | 44.01 | |
| sn_splice | TRIPLE | 31.65 | 43.04 | 56.96 | 65.82 |
| RNAfold | 6.33 | 24.05 | 53.16 | 68.35 | |
| SRP | TRIPLE | 32.47 | 45.45 | 55.84 | 68.83 |
| RNAfold | 5.19 | 29.87 | 59.74 | 77.92 | |
| tRNA | TRIPLE | 24.07 | 45.31 | 64.25 | 79.47 |
| RNAfold | 0.00 | 6.19 | 26.19 | 48.85 | |
| intron | TRIPLE | 59.81 | 68.22 | 74.77 | 84.11 |
| RNAfold | 1.87 | 16.82 | 58.88 | 85.98 | |
| riboswitch | TRIPLE | 32.03 | 44.44 | 56.86 | 71.90 |
| RNAfold | 1.31 | 20.92 | 49.67 | 71.24 | |
| miRNA | TRIPLE | 75.56 | 81.48 | 90.37 | 93.33 |
| RNAfold | 0.74 | 10.37 | 69.63 | 97.78 | |
| telomerase | TRIPLE | 23.53 | 29.41 | 41.18 | 58.82 |
| RNAfold | 5.88 | 17.65 | 35.29 | 58.82 | |
| RNase | TRIPLE | 38.03 | 56.34 | 72.54 | 87.32 |
| RNAfold | 1.41 | 15.49 | 35.92 | 61.27 | |
| regulatory | TRIPLE | 18.97 | 25.86 | 31.03 | 51.72 |
| RNAfold | 0.00 | 5.17 | 32.76 | 67.24 | |
| tmRNA | TRIPLE | 15.25 | 27.12 | 38.98 | 57.63 |
| RNAfold | 0.00 | 11.86 | 35.59 | 45.76 | |
| rRNA | TRIPLE | 34.09 | 47.31 | 64.88 | 79.96 |
| RNAfold | 1.86 | 17.98 | 37.60 | 57.64 |
Random sequences were obtained with single nucleotide shuffling of the real ncRNA sequences.
Figure 2Percentages of free-energy of stems. Percentages of free-energy of stems from 51 Rfam datasets (percentages of stems with free-energy less than -12 are not given in this figure).
Figure 3Cumulative percentages of free-energy of stems. Cumulative percentages of free-energy of stems from 51 Rfam datasets (cumulative percentages of stems with free-energy less than -12 are not given in this figure). Note the step at -3.4.
Figure 4Illustration of the application of the generic production rule. Illustration of the application of the generic production rule S → aRbT that produces a base pair between positions i and j for the query sequence x, provided that the start non-terminal S0 derives x1x2 ... x-1Sx+1 ... x. Note that given i and j, the position of k can vary.