| Literature DB >> 25626878 |
Ali Esmaili-Taheri1, Mohammad Ganjtabesh2,3.
Abstract
BACKGROUND: The function of an RNA in cellular processes is directly related to its structure. The free energy of RNA structure in another important key to its function as only some structures with a specific level of free energy can take part in cellular reactions. Therefore, to perform a specific function, a particular RNA structure with specific level of free energy is required. For a given RNA structure, the goal of the RNA design problem is to design an RNA sequence that folds into the given structure. To mimic the biological features of RNA sequences and structures, some sequence and energy constraints should be considered in designing RNA. Although the level of free energy is important, it is not considered in the available approaches for RNA design problem.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25626878 PMCID: PMC4384295 DOI: 10.1186/s12859-014-0444-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Hierarchical decomposition [ 5 ]. The hierarchical decomposition of target structure into its sub-structures.
Biological description of the sequences and structures in dataset
|
|
|
|
|---|---|---|
| A1 | Minimal catalytic domains of the hairpin | 65 |
| ribozyme satelite RNA of the Tobacco ringspot | ||
| A2 | U3 snoRNA 5’ domain from Chlamydomonas | 79 |
| reinhardtii, in vivo probing | ||
| A3 | H.marismortui 5S rRNA | 122 |
| A4 | VS Ribozyme from Neurospora mitochondria | 166 |
| A5 | XS1 Ribozyme, Bacillus subtilis P RNA based | 314 |
| ribozyme | ||
| A6 | Homo Sapiens RiboNuclease P RNA | 340 |
| A7 | S20 mRNA from E. coli | 372 |
| A8 | Group II intron ribozyme D135 from Saccharomyces | 583 |
| cerevisiae mitochondria |
Biological description of the sequences and structures in dataset
|
|
|
|
|---|---|---|
| B1 | pre-amiR-lfy-1 | 178 |
| B2 | pre-amiR-lfy-2 | 178 |
| B3 | pre-amiR-white-1 | 178 |
| B4 | pre-amiR-white-2 | 178 |
| B5 | pre-amiR-ft-1 | 178 |
| B6 | pre-amiR-ft-2 | 178 |
| B7 | pre-amiR-trichome | 178 |
| B8 | pre-amiR-mads-1 | 178 |
| B9 | pre-amiR-mads-2 | 178 |
| B10 | pre-amiR-yabby-1 | 178 |
| B11 | pre-amiR-yabby-2 | 178 |
| B12 | pre-miRNA | 176 |
Results for dataset 10
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| A1 | 38 |
| 47 | 33.09 | 50 | 12.56 | 50 | 50 | 0.17 |
| A2 | 50 |
| 39 | 35.85 | 50 | 4.38 | 50 | 50 | 0.08 |
| A3 | 32 | 1.08 | 40 | 49.05 | 46 | 76.88 | 50 |
|
|
| A4 | 50 |
| 16 | 114.78 | 50 | 172.61 | 50 | 50 | 0.93 |
| A5 | 4 | 1558.55 | 42 | 283.96 | 35 | 1740.54 | 50 |
|
|
| A6 | 4 | 2612.28 | 14 | 287.76 | 20 | 3704.05 | 0 |
|
|
| A7 | 0 |
| 22 | 335.26 | 0 |
| 50 |
|
|
| A8 | 19 |
| 0 |
| 0 |
| 0 |
| 1097.67 |
The success count (SC) and expected time (E ) comparison between the existing approaches for dataset A-C10, including 10% of sequence constraints. The represents no result and the bold faces indicate the best results.
Results for dataset 20
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| A1 | 32 | 0.23 | 30 | 32.06 | 50 | 9.24 | 50 | 50 |
|
| A2 | 50 |
| 45 | 37.57 | 50 | 5.80 | 50 | 50 | 0.06 |
| A3 | 47 |
| 45 | 61.52 | 45 | 28.97 | 50 |
| 0.77 |
| A4 | 0 |
| 0 |
| 0 |
| 0 | 0 |
|
| A5 | 2 | 4130.43 | 38 | 305.61 | 5 | 17698.03 | 0 |
|
|
| A6 | 12 | 592.57 | 39 | 309.57 | 50 | 836.15 | 50 | 50 |
|
| A7 | 0 |
| 0 |
| 0 |
| 0 | 0 |
|
| A8 | 0 |
| 0 |
|
|
| 0 | 0 |
|
The success count (SC) and expected time (E ) comparison between the existing approaches for dataset AC20, including 20% of sequence constraints. The represents no result and the bold faces indicate the best results.
Results for dataset
|
|
|
|
|
| |||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| A1 | 42 |
| 40 | 31.65 | 50 | 8.80 | 50 | 50 | 0.12 |
| A2 | 50 |
| 41 | 37.52 | 50 | 3.35 | 50 | 50 | 0.05 |
| A3 | 20 |
| 7 | 60.67 | 34 | 164.47 | 0 |
| 4.74 |
| A4 | 0 |
| 0 |
|
|
| 0 | 0 |
|
| A5 | 0 |
| 0 |
| 0 |
| 0 |
|
|
| A6 | 0 |
| 0 |
| 0 |
| 0 | 0 |
|
| A7 | 0 |
| 0 |
| 0 |
| 0 | 0 |
|
| A8 | 32 |
| 43 | 1124.62 | 25 | 1059.20 | 33 |
| 124.56 |
The success count (SC) and expected time (E ) comparison between the existing approaches for dataset A-C30, including 30% of sequence constraints. The represents no result and the bold faces indicate the best results.
Results for dataset
|
|
|
|
|
|
| ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
| B1 | 29 | 7.16 | 46 | 131.05 | 50 | 22.28 | 0 | 50 |
| 50 | 74.20 |
| B2 | 26 | 12.01 | 40 | 143.55 | 50 | 24.29 | 0 | 50 |
| 50 | 53.52 |
| B3 | 25 | 10.50 | 44 | 139.07 | 50 | 10.58 | 50 | 50 |
| 50 | 48.05 |
| B4 | 27 | 12.59 | 36 | 171.93 | 50 | 23.63 | 0 | 50 |
| 50 | 77.71 |
| B5 | 25 | 12.10 | 48 | 169.66 | 50 | 11.79 | 50 | 50 |
| 50 | 67.24 |
| B6 | 22 | 14.42 | 40 | 185.99 | 50 | 11.87 | 0 | 50 |
| 50 | 77.20 |
| B7 | 44 |
| 37 | 173.37 | 50 | 24.58 | 0 | 50 | 2.38 | 50 | 57.86 |
| B8 | 26 | 12.68 | 45 | 168.02 | 50 | 15.23 | 50 | 50 |
| 50 | 55.80 |
| B9 | 31 | 11.27 | 41 | 173.59 | 50 | 11.38 | 50 | 50 |
| 50 | 85.09 |
| B10 | 27 | 9.62 | 43 | 171.63 | 50 | 13.50 | 50 | 50 |
| 50 | 71.05 |
| B11 | 44 |
| 40 | 167.91 | 50 | 11.75 | 0 | 50 | 3.13 | 50 | 61.63 |
| B12 | 27 | 14.53 | 47 | 175.10 | 50 | 28.69 | 0 | 50 |
| 50 | 70.12 |
The success count (SC) and expected time (E ) comparison between the existing approaches for dataset B, including natural sequence constraints. The best results are indicated in bold face.
The average free energy values of sequences generated for dataset
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| B1 | -72.69 | -152.16 | -117.89 | -112.41 | − | -105.87 |
|
| B2 | -72.69 | -151.71 | -120.34 | -115.04 | − | -106.49 |
|
| B3 | -75.19 | -155.22 | -115.10 | -109.54 | -108.30 | -108.36 |
|
| B4 | -69.29 | -149.49 | -120.21 | -113.08 | − | -103.32 |
|
| B5 | -75.19 | -154.02 | -105.83 | -111.39 | -104.20 | -109.97 |
|
| B6 | -71.49 | -152.47 | -115.60 | -107.05 | − | -107.63 |
|
| B7 | -75.49 | -156.88 | -115.35 | -118.80 | − | -112.18 |
|
| B8 | -69.69 | -152.14 | -113.89 | -110.18 | -99.61 | -104.70 |
|
| B9 | -72.19 | -149.95 | -116.41 | -111.26 | -111.40 | -106.91 |
|
| B10 | -73.49 | -155.33 | -123.21 | -111.02 | -107.40 | -110.94 |
|
| B11 | -76.79 | -156.54 | -123.35 | -114.59 | − | -114.78 |
|
| B12 | -74.49 | -154.24 | -113.75 | -115.51 | − | -110.36 |
|
The average free energy (A ) values of sequences generated by different approaches and the corresponding natural energy values for dataset B. ERD-EC is related to the ERD when the energy constraint is specified. The closest energy values to the natural ones are indicated in bold face.
The expected similarity of the generated sequences
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 74.97 | 75.09 | 47.91 | 98.01 |
|
|
| 72.76 | 76.93 | 51.01 | 97.40 |
|
|
| 80.20 | 78.30 | 60.15 | 97.37 |
|
|
| 80.55 | 70.69 |
| 97.12 | 53.57 |
The expected similarity (E ) between the generated sequences by each approach on different datasets. The lower similarities are indicated in bold face.
Results for dataset C
|
|
|
|
|
|
|---|---|---|---|---|
| INFO-RNA | 43.79 | 32 | 54.86 | 28.69 |
| MODENA | 165.86 | 395 | 27.05 | 35.73 |
| NUPACK | 483.41 | 336 | 20.21 | 38.91 |
| RNAifold | 1888.32 | 272 | 29.09 | 31.12 |
| ERD |
|
| 13.22 | 39.44 |
| ERD-EC | 217.99 | 386 |
|
|
The comparison of expected time (E ), success count (SC), expected energy distance (E ), and expected similarity to the natural sequences (E ) between the existing approaches for dataset C. ERD-EC is related to the ERD when the energy constraint is specified. The SC indicates how many sequences are successfully designed. The best results are indicated in bold face.
The nucleotides distribution in the generated sequences for all datasets
|
|
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
| |||
| Natural | 0.39 | 0.49 | 0.12 | 0.38 | 0.19 | 0.17 | 0.26 | 0.27 | 0.22 | 0.25 | 0.26 | ||
| ERD |
|
|
| 0.41 |
|
| 0.23 |
|
|
|
| ||
| INFO-RNA | 0.05 | 0.90 | 0.05 |
| 0.20 | 0.28 | 0.15 | 0.15 | 0.36 | 0.40 | 0.09 | ||
| MODENA | 0.13 | 0.83 | 0.04 | 0.83 | 0.07 | 0.05 | 0.06 | 0.37 | 0.28 | 0.28 | 0.07 | ||
| NUPACK | 0.28 | 0.72 | 0.01 |
| 0.22 | 0.14 |
| 0.24 | 0.31 |
| 0.18 | ||
| RNAifold | 0.07 | 0.92 | 0.02 | 0.90 | 0.03 | 0.04 | 0.03 | 0.39 | 0.27 | 0.29 | 0.04 | ||
The distribution of nucleotides in paired and unpaired regions are calculated for all existing approaches. Also, the total distribution of nucleotides is presented. The closest values to the natural ones are indicated in bold face.
Figure 2A typical output of ERD web server. Here, the target structure and its sequence constraints (fixed nucleotides in internal and hairpin loops) are given as input and 10 RNA sequences are designed with respect to the given constraints.