| Literature DB >> 23043260 |
Rune B Lyngsø1, James W J Anderson, Elena Sizikova, Amarendra Badugu, Tomas Hyland, Jotun Hein.
Abstract
BACKGROUND: RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23043260 PMCID: PMC3534541 DOI: 10.1186/1471-2105-13-260
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Design of artificial SV11 RNA. Dot plot of base pair Boltzmann probabilities for the designed sequence for the bistable SV11 target. Superimposed on the dot plot is a plot of base pairs in the two metastable SV11 structures, shown with open squares in different shades of grey. The secondary structures are also shown in the same shades of grey. Dots reflecting Boltzmann probabilities were rescaled by a factor of 0.75 to clearly separate them from any enclosing square representing a structure base pairs. The two conformations, that share no base pairs, are also show:, the native state (top) and meta-stable state (bottom).
Performance on two structure targets
| | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Identical structures | 173 | 61.5 | – | 0 | – | – | – | 4 | 139.0 | – | 3.3 |
| Different structures | 54 | 71.5 | 7.5 | 32 | 94.1 | 19.9 | 8.1 | 28 | 132.5 | 31.8 | 9.6 |
Performance on 291 two-structure targets generated by folding shorter RNASTRAND at 20°C and 37°C.
Performance on Rfam data set
| 01 | 117 | ∗ | ||||||
| 02 | 151 | 0.0 / – | 0.0 / 175 | 0.0 / 277 | 0.0 / 141999 | ∗ | ||
| 03 | 161 | 0.0 / – | 0.0 / 304 | 0.0 / 50698 | ∗ | |||
| 04 | 193 | 0.0 / – | ∗ | |||||
| 05 | 74 | |||||||
| 06 | 89 | ∗ | ||||||
| 07 | 154 | ∗ | ||||||
| 08 | 54 | |||||||
| 09 | 348 | 0.0 / – | 0.0 / 4127 | ∗ | ||||
| 10 | 357 | 0.0 / 245868 | 0.0 / 180 | 0.0 / – | 0.0 / 4046 | 0.0 / 8007 | 0.0 / 92211 | ∗ |
| 11 | 382 | 0.0 / 500078 | 0.0 / 184 | 0.0 / – | 0.0 / 7040 | 0.0 / 16634 | 0.0 / 77273 | ∗ |
| 12 | 215 | 0.0 / – | ∗ | |||||
| 13 | 185 | 0.0 / – | ∗ | |||||
| 14 | 87 | ∗ | ||||||
| 15 | 140 | 0.0 / – | ∗ | |||||
| 16 | 129 | 0.0 / 18734 | 0.0 / 11 | 0.0 / – | 0.0 / 102 | 0.0 / 124 | ∗ | |
| 17 | 301 | 0.0 / – | ∗ | |||||
| 18 | 360 | 0.0 / – | 0.0 / 101125 | ∗ | ||||
| 19 | 83 | ∗ | ||||||
| 20 | 119 | 0.0 / 3149 | 0.0 / 10 | 0.0 / – | 0.0 / 7.8 | 0.0 / 15 | 0.0 / 810 | ∗ |
| 21 | 118 | ∗ | ||||||
| 22 | 148 | 0.0 / – | ∗ | |||||
| 24 | 451 | 0.0 / 138348 | 0.0/ 182 | 0.0 / – | 0.0 / 1530 | 0.0 / 4170 | ∗ | |
| 25 | 210 | 0.0 / – | 0.0 / 463 | ∗ | ||||
| 26 | 102 | 0.0 / – | ∗ | |||||
| 27 | 79 | |||||||
| 28 | 344 | 0.0 / 125 | 0.0 / – | 0.0 / 6003 | ∗ | |||
| 29 | 73 | ∗ | ||||||
| 30 | 340 | 0.0 / – | 0.0 / 5011 | 0.0 / 118373 | ∗ | |||
| Total successes | 24 | 23 | 10 | 22 | 19 | 22 | 3 | |
Comparison of five inverse folding methods on 29 Rfam structures.
Nucleotide distribution in designed sequences
| | |||||||||||
| Original data | 0.57 | 0.30 | 0.13 | 0.30 | 0.20 | 0.23 | 0.27 | 0.23 | 0.24 | 0.28 | 0.24 |
| Frnakenstein | 0.55 | 0.36 | 0.09 | 0.32 | 0.31 | 0.09 | 0.29 | 0.25 | 0.29 | 0.19 | 0.26 |
| MODENA | 0.82 | 0.18 | 0 | 0.82 | 0.06 | 0.06 | 0.06 | 0.48 | 0.22 | 0.22 | 0.07 |
| INFO-RNA | 0.93 | 0.06 | 0.01 | 0.36 | 0.22 | 0.20 | 0.22 | 0.19 | 0.35 | 0.32 | 0.14 |
| RNA-SSD | 0.56 | 0.44 | 0 | 0.32 | 0.24 | 0.19 | 0.25 | 0.27 | 0.26 | 0.24 | 0.23 |
| RNAInverse | 0.46 | 0.41 | 0.14 | 0.29 | 0.25 | 0.21 | 0.25 | 0.23 | 0.24 | 0.26 | 0.26 |
| NUPACK | 0.73 | 0.27 | 0 | 0.42 | 0.26 | 0.09 | 0.22 | 0.28 | 0.31 | 0.22 | 0.18 |
| Inv | 0.32 | 0.39 | 0.28 | 0.30 | 0.26 | 0.22 | 0.22 | 0.20 | 0.21 | 0.30 | 0.29 |
Comparison of the nucleotide distributions of the successfully designed sequences from different methods on the Rfam dataset, with distribution observed across the original sequences from Rfam shown in the first row.
Figure 2Analysis of positional fitness schemes. Plot showing the minimum objective value in the population through the generations of the GA for the default parameters (solid black line) and ten variations where a single feature is changed by invoking the respective options shown in the legend. These corresponds to choosing positions for mutation uniformly at random, as well as based on positional fitness schemes 1, 2, 3, and 5; choosing pairs of sequences for recombination uniformly at random or based on individual fitnesses; and choosing recombination points uniformly at random, as well as based on positional fitness schemes 1 and 2.
Performance on RNASTRAND data set
| RNASTRAND | 189 | 178 | 196 | 176 | 185 | 73 |
| RNASTRAND-Refolded | 383 | 377 | 383 | 336 | 383 | 113 |
Successes of the benchmarked approaches on the RNASTRAND and RNASTRAND-Refolded.
Length dependency of success rate
| Frnakenstein | 0.50 | 0.50 | 0.56 | 0.76 | 0.62 | 0.44 | 0.97 | 0.47 | 0.17 | 0.22 |
| MODENA | 0.47 | 0.50 | 0.56 | 0.76 | 0.59 | 0.44 | 0.97 | 0.44 | 0.03 | 0.14 |
| INFO-RNA | 0.50 | 0.50 | 0.56 | 0.78 | 0.65 | 0.53 | 0.91 | 0.50 | 0.28 | 0.19 |
| RNAinverse | 0.50 | 0.50 | 0.56 | 0.78 | 0.62 | 0.42 | 0.97 | 0.42 | 0.08 | 0.00 |
| NUPACK | 0.50 | 0.50 | 0.56 | 0.78 | 0.56 | 0.44 | 0.97 | 0.47 | 0.14 | 0.16 |
| Inv | 0.44 | 0.38 | 0.47 | 0.51 | 0.15 | 0 | 0 | 0 | 0 | 0 |
The 363 unique structures of the RNASTRAND data set were binned according to length in 10 bins of roughly equal size, and for each bin, the range of lengths covered by the bin, the average length of structures in the bin, the number of structures in the bin are listed, as well as the success ratio on each bin computed for each method.