| Literature DB >> 29927936 |
Peter Eastman1, Jade Shi2, Bharath Ramsundar3, Vijay S Pande1.
Abstract
We use reinforcement learning to train an agent for computational RNA design: given a target secondary structure, design a sequence that folds to that structure in silico. Our agent uses a novel graph convolutional architecture allowing a single model to be applied to arbitrary target structures of any length. After training it on randomly generated targets, we test it on the Eterna100 benchmark and find it outperforms all previous algorithms. Analysis of its solutions shows it has successfully learned some advanced strategies identified by players of the game Eterna, allowing it to solve some very difficult structures. On the other hand, it has failed to learn other strategies, possibly because they were not required for the targets in the training set. This suggests the possibility that future improvements to the training protocol may yield further gains in performance.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29927936 PMCID: PMC6029810 DOI: 10.1371/journal.pcbi.1006176
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1The architecture of the policy network.
The shape of each layer's output is given in parentheses, where N is the number of bases in the sequence. "Conv1" and "Conv7" indicate single base convolution and seven base convolution layers, as described in the text. "Dense" indicates a dense (fully connected) layer, where every output depends on every input.
Puzzles in the Eterna100 benchmark.
The ones our method succeeding in solving are marked with an X.
| 1. Simple Hairpin | X | 51. medallion | X |
| 2. Arabidopsis Thaliana 6 RNA—Difficulty Level 0 | X | 52. [RNA] Repetitious Sequences 8/10 | |
| 3. Prion Pseudoknot—Difficulty Level 0 | X | 53. Documenting repetitious behavior | |
| 4. Human Integrated Adenovirus—Difficulty Level 0 | X | 54. 7 multiloop | X |
| 5. The Gammaretrovirus Signal—Difficulty Level 0 | X | 55. Kyurem 7 | X |
| 6. Saccharomyces Cerevisiae—Difficulty Level 0 | X | 56. JF1 | X |
| 7. Fractal 2 | X | 57. multilooping fun | |
| 8. G-C Placement | X | 58. Multiloop… | X |
| 9. The Sun | X | 59. hard Y | X |
| 10. Frog Foot | X | 60. Mat—Elements & Sections | |
| 11. InfoRNA test 16 | X | 61. Chicken feet | |
| 12. Mat—Martian 2 | X | 62. Bug 18 | |
| 13. square | X | 63. Fractal star x5 | X |
| 14. Six legd turtle 2 | X | 64. Crop circle 2 | |
| 15. Small and Easy 6 | X | 65. Branching Loop | X |
| 16. Fractile | X | 66. Bug 38 | |
| 17. Six legd Turtle | X | 67. Simple Single Bond | |
| 18. snoRNA SNORD64 | X | 68. Taraxacum officinale | |
| 19. Chalk Outline | X | 69. Headless Bug on Windshield | |
| 20. InfoRNA bulge test 9 | X | 70. Pokeball | |
| 21. Tilted Russian Cross | X | 71. Variation of a crop circle | |
| 22. This is ACTUALLY Small And Easy 6 | X | 72. Loop next to a Multiloop | X |
| 23. Shortie 4 | X | 73. Snowflake 4 | |
| 24. Shape Test | X | 74. Mat—Cuboid | |
| 25. The Minitsry | X | 75. Misfolded Aptamer 6 | X |
| 26. stickshift | X | 76. Snowflake 3 | |
| 27. U | X | 77. Hard Y and a bit more | |
| 28. Still Life (Sunflower In A Vase) | X | 78. Mat—Lot 2–2 B | |
| 29. Quasispecies 2–2 Loop Challenge | X | 79. Shapes and Energy | |
| 30. Corner bulge training | X | 80. Spiral of 5's | |
| 31. Spiral | X | 81. Campfire | |
| 32. InfoRNA bulge test | X | 82. Anemone | X |
| 33. Worm 1 | 83. Fractal 3 | ||
| 34. just down to 1 bulge | X | 84. Kyurem 5 | X |
| 35. Iron Cross | 85. Snowflake Necklace (or v2.0) | ||
| 36. loops and stems | X | 86. Methaqualone C16H14N2O Structural Representation | |
| 37. Water Strider | X | 87. Cat's Toy 2 | |
| 38. The Turtle(s) Move(s) | 88. Zigzag Semicircle | ||
| 39. Adenine | X | 89. Short String 4 | |
| 40. Tripod5 | X | 90. Gladius | |
| 41. Shortie 6 | X | 91. Thunderbolt | |
| 42. Runner | X | 92. Mutated chicken feet | |
| 43. Recoil | X | 93. Chicken Tracks | X |
| 44. [CloudBeta] An Arm and a Leg 1.0 | X | 94. Looking Back Again | |
| 45. [CloudBeta] 5 Adjacent Stack Multi-Branch Loop | X | 95. Multilooping 6 | X |
| 46. Triple Y | X | 96. Cesspool | |
| 47. Misfolded Aptamer | X | 97. Hoglafractal | |
| 48. Flower power | X | 98. Bullseye | |
| 49. Kudzu | X | 99. Shooting Star | |
| 50. "1,2,3and4bulges" | 100. Teslagon |
Fig 2Difficulty of solving puzzles as measured by (a) clock time or (b) steps, versus sequence length. Blue dots represent the 58 puzzles that were successfully solved on the first attempt. Red dots represent the 42 puzzles on which it gave up after running for 24 hours. (Two more of them were eventually solved on later attempts.) Longer sequences take more time per step, so fewer steps can be completed in 24 hours.
Fig 3Examples of puzzles with short stems solved by our model: Shortie 6 (a), and Kyurem 7 (b). The Shortie 6 solution strategy involved introducing asymmetric base pairing patterns of GC/CG for stems 1 and 2 and CG/CG for stems 3 and 4. Upon first glance, it seems that the model has successfully learned how to boost 4-loops, since it makes G-mutations at edge bases of the 4-loops at stems 3 and 4. However, while stem 4 is boosted correctly, the G-mutation in the 4-loop of stem 3 is actually in the wrong position and does not stabilize the loop. Kyurem 7 is significantly more difficult due to the presence of two multiloops joining the short stems, but our model successfully proposes asymmetric stem designs around each multiloop to stabilize the structure. Multiloop 1 consists of three stems with a common pattern of alternating GC/AU pairs, whereas multiloop 2 consists of stems composed of almost entirely GC pairs, except for the single AU closing pair in the third stem. Mutations to all but four base pairs in Kyurem 7, highlighted in red, result in misfolding of the structure, indicating the need for precise design of the stem base pairings.