Literature DB >> 31298841

Automated Design of Diverse Stand-Alone Riboswitches.

Michelle J Wu, Johan O L Andreasson, Wipapat Kladwang, William Greenleaf, Rhiju Das.

Abstract

Riboswitches that couple binding of ligands to conformational changes offer sensors and control elements for RNA synthetic biology and medical biotechnology. However, design of these riboswitches has required expert intuition or software specialized to transcription or translation outputs; design has been particularly challenging for applications in which the riboswitch output cannot be amplified by other molecular machinery. We present a fully automated design method called RiboLogic for such "stand-alone" riboswitches and test it via high-throughput experiments on 2875 molecules using RNA-MaP (RNA on a massively parallel array) technology. These molecules consistently modulate their affinity to the MS2 bacteriophage coat protein upon binding of flavin mononucleotide, tryptophan, theophylline, and microRNA miR-208a, achieving activation ratios of up to 20 and significantly better performance than control designs. By encompassing a wide diversity of stand-alone switches and highly quantitative data, the resulting ribologic-solves experimental data set provides a rich resource for further improvement of riboswitch models and design methods.

Entities: Chemical Disease Gene Species

Keywords: RNA; computer-assisted design; high-throughput measurements; molecular design; riboswitch; thermodynamic model

Mesh：

Substances：

Year: 2019 PMID： 31298841 PMCID： PMC6703183 DOI： 10.1021/acssynbio.9b00142

Source DB: PubMed Journal: ACS Synth Biol ISSN： 2161-5063 Impact factor: 5.110

Riboswitches use RNA conformational changes to transduce sensing of molecules in the cellular milieu into modulation of RNA transcription, ribosomal translation, pre-mRNA splicing, and RNA cleavage.[1] The ability to perform de novo design of arbitrary riboswitches would have broad impacts in synthetic biology as well as for RNA diagnostics, therapeutics, and biomedical imaging. Supporting these efforts, there are a rapidly growing number of synthetic and natural RNA “aptamer” sequences that bind drugs, metabolites, proteins, and other biologically important molecules that expand the possible inputs for novel riboswitches, and powerful design rules and software to create riboswitches with transcription and translation outputs.[2−4] Similarly, the possible outputs of riboswitches are being expanded to triggering of “light-up” fluorescence and toggling activities of CRISPR/Cas and other ribonucleoprotein complexes.[5−9] These newer applications would benefit from riboswitch mechanisms that do not require external molecular machinery or energy dissipation but instead broadcast their output simply after reaching thermodynamic equilibrium. Such molecules would be more likely to retain their functions when moved into different RNA contexts or used in extracellular environments where energy or additional molecular machinery cannot be provided. Inspired by the concept of stand-alone executables in software engineering, we term such molecules “stand-alone” riboswitches. Creating stand-alone riboswitches leads to a new design challenge. Natural and synthetic riboswitches achieve maximal activation ratios—defined as the ratio of observed output signal in the presence and absence of the input ligand—by toggling between states that are barely activated to states that are weakly activated, rather than to states that saturate the output.[10] Biological control is then achieved by subsequent amplification steps such as ribosomal translation of many proteins per activation step.[10,11] Effective stand-alone riboswitches, which forego such amplification machinery, require quantitative conversion between two distinct states, rather than changing the frequency of transient sampling of an active state. This design constraint necessitates a trade-off with good activation ratios.[10] Testifying to the difficulty of this additional trade-off, development of light-up sensors has required significant trial-and-error; success has been achieved through screening of many constructs, the majority of which exhibit little to no switching, with median activation ratios close to 1 and best-case activation ratios of 10.[5,6,9,10] Moreover, computational predictions of the success of light-up designs are poor (Figure S1), suggesting the need for richer datasets characterizing diverse RNAs. Further exemplifying their inherent design difficulty, stand-alone switches for CRISPR/Cas9 or other ribonucleoprotein complexes, which would enable reversible control of these complexes in therapeutic settings, have not been achieved.[12−14] Here, we present a detailed computational and experimental study involving thousands of diverse molecules to test the fully automated design of stand-alone riboswitches. For computational design, we describe RiboLogic, an algorithm for designing sequences of RNA molecules that are predicted to change their secondary structure in response to interactions with other molecules. Unlike prior software that might be applied to stand-alone switch design,[3,15−19] this package only requires the user to provide small aptamer segments to bind desired input molecules and the desired structures adopted in each state. For experimental characterization, we evaluate the switching of thousands of designed RNA molecules in vitro using repurposed Illumina sequencers, through the recently developed RNA-MaP (RNA on a massively parallel array) platform.[20−23] RiboLogic designs stand-alone riboswitches based on a flexible set of user-specified constraints. The algorithm accounts for any number of folding conditions, as defined by the concentrations of ligands defined by the user. These ligands can be small molecules, proteins with known aptamers, or other RNA strands engaged through base-pairing interactions. For example, in some of our tests below, we used flavin mononucleotide (FMN) as an input ligand; FMN binds to a small aptamer sequence discovered by in vitro selection (Figure A,D).[24] The user only needs to specify the sequence of this aptamer and the estimated dissociation constant of the aptamer-ligand complex under the experimental conditions, and RiboLogic will place this “input” segment within the design and optimize the surrounding sequence in each of the riboswitch states, simulating ligand binding to the aptamer (see Methods for details). In this example, the two states are RNA with no FMN present and with a concentration of 200 μM FMN (Figure B). For each of the target riboswitch states, the user can specify either a full desired secondary structure or, more simply, the substructure of an “output” segment that must be adopted or not adopted by the RNA in order to trigger or suppress an output, respectively. For example, in some of our tests below, we used binding of a fluorescently tagged MS2 viral coat protein to an MS2 RNA hairpin segment within the design as an output (Figure A,D); such interactions underlie most systems for CRISPR interference and activation and in situ RNA visualization but have not yet been used in standalone switch design.[5−9,12−14] The user only needs to specify the sequence and “active” secondary structure of this output element, and RiboLogic places this sequence relative to the input aptamer element and optimize surrounding sequences during its design process. We note that unlike prior natural and synthetic riboswitches, we demand that the RNA’s MS2 output segment take on the desired hairpin secondary structure as its dominant structure in the ON state, rather than simply sampling this structure more frequently than in the OFF state. Such complete conversion of structure is needed for a stand-alone riboswitch that would work without further amplification.

Figure 1

RiboLogic uses a graph representation and two scoring functions to design stand-alone riboswitches. (A) This energy diagram represents the thermodynamic model used, where the ligand-bound state is given an energetic bonus due to the chemical potential of the binding of the ligand. (B) A user specifies design constraints for a riboswitch of interest, e.g., the formation of the MS2 hairpin in the absence of a ligand and the nonformation of the hairpin in the absence of a ligand. (C) The sequence is initialized to all A’s except for known sequence constraints. (D) A graph representation is used to constrain the sequence space that is sampled by RiboLogic. In this example, the goal is to design a riboswitch whose formation of the MS2 RNA hairpin is modulated by the presence of the flavin mononucleotide (FMN) molecule. Bases connected by an arc are part of these secondary structure elements and are constrained to be complementary in sequence update. (E) Two scoring metrics are used to evaluate each design candidate. The base pair distance measures the number of base pairs that must be broken or formed to reach the target structure, while the base pair probability (bpp) score quantifies the probability of formation of each base pair in the target structure. (F) The scores change as expected during computational design, with the base pair distance decreasing and the base pair probability score increasing over optimization steps. RiboLogic uses simulated annealing to sample the space of possible sequences to satisfy the given constraints (Figure D,E). At each step, the sequence is mutated either at a single base or by sliding the position of a functional element (e.g., the FMN aptamer or MS2 hairpin; colored nucleotides in Figure D). For each sequence that is sampled, the minimum free energy secondary structure is determined for each solution condition (e.g., without and with 200 μM FMN) and evaluated by two scores (Figure E,F). The first score is a base pair distance that measures the number of base pairs that must be broken or formed to obtain the target structure or substructures in each solution condition, summed over the different solution conditions. The second score is a base pair probability score that sums the probabilities of formation of all base pairs that should form in the target structure or substructures, providing a smoother quantitative measure of structure formation than the first base pair distance score. RiboLogic implements several additional strategies to narrow the sequence space being explored. Mutation of the sampled sequences leverages a dependency graph-based approach, which ensures that bases that are paired in any target structure are always complementary in sequence (e.g., N’s connected by blue lines in Figure D).[25] In the case of designing riboswitches responsive to other input RNA molecules, the algorithm provides the option to automatically introduce the sequence complementary to the input in order to promote favorable interactions between the designed RNA and input RNA. As test cases for our methods, we designed stand-alone riboswitches where the binding of a small molecule or oligonucleotide ligand modulates the formation of the MS2 RNA hairpin, which can then transduce outputs by recruiting machinery coupled to the MS2 bacteriophage coat protein. This is the first example of MS2-controlling riboswitches, which could have broad applications.[26−28] We applied a quantitative, high-throughput array technology that enables fluorescence measurements over millions of individual RNA clusters generated on an Illumina array, which has been extensively tested using the MS2 system (Figure A,B).[20,22,23] The formation of the MS2 RNA hairpin was detected by flowing fluorescently labeled MS2 protein at increasing concentrations to get a binding curve (Figure B,C). The dissociation constant K was fit over tens to hundreds of clusters for each design, yielding a distribution of K measurements for each state (Figure D). By taking the median of each distribution, we calculated a K as a quantitative measure of the binding of each design, and the ratio of these K values with and without input ligand (e.g., FMN) gives an activation ratio, which we use as our figure of merit for riboswitches. This activation ratio is equal to the ratio of fluorescence of the riboswitch with and without input ligand at low MS2 concentrations and is therefore the most relevant performance measure for stand-alone switches that need to work without output amplification.[10] By carrying out fits of data from subnanomolar to many micromolar MS2 concentrations, we achieve high precision in these measurements. The resulting K values and activation ratios were strongly correlated across experimental replicates, confirming the high precision of the method (r2 = 0.94 for log K; errors in activation ratios well under 2-fold; see Figure S2).

Figure 2

Functional tests of riboswitches using a high-throughput array. (A) Each cluster on the array initially contained a single species of ssDNA from a synthesized oligo pool. dsDNA was generated by Klenow extension with a biotinylated primer, and RNA was transcribed by RNA polymerase until being stalled at the streptavidin roadblock. (B) Fluorescently labeled MS2 protein was flowed in at varying concentrations to enable measurement of binding. (C) The array technology enables measurement of binding curves over tens or hundreds of replicate clusters for each design and solution condition. (D) The median over the distribution of fit K’s was used to estimate the activation ratio of switching. In this example of an ON switch, the activation ratio of 11 was measured over 172 independent clusters displaying the same switch. We applied the algorithm to design switches responsive to three different small molecules–flavin mononucleotide (FMN), theophylline, and tryptophan. For stand-alone OFF switches, the MS2 hairpin should form when the ligand is absent and be disrupted when the ligand is added (Figure A). For ON switches, the MS2 hairpin should form only when the FMN is present and otherwise be disrupted (Figure B). By applying secondary structure constraints to the MS2 hairpin region in both the absence and presence of the ligand, we set up a two-state design problem. We were able to obtain a set of structurally diverse designs (Figures and 4A), and we experimentally characterized thousands of these molecules with the RNA-MaP method.

Figure 3

Figure 4

Design of ligand-responsive riboswitches. (A) Clustering of FMN switches based on the sum of base pair distances of predicted secondary structures reveals that RiboLogic designs with diverse structures achieve high activation ratios. (B) Distributions of experimentally measured activation ratios are shown for various types of designs, with medians shown as vertical lines. RiboLogic generally achieves significantly better activation ratios than baseline, as determined by a Wilcoxon rank-sum test (***p < 0.001). Baseline is the measured activation ratio for sequences made for other design problems. (C) In practice, several of the most promising designs would be experimentally screened to evaluate switch efficiency. To mimic this, we bootstrapped sets of ten designs and chose the design with the best activation ratio. The distributions of activation ratios for these best-of-ten designs were compared between RiboLogic and baseline. A best-of-ten strategy yields designs with significantly higher activation ratios than baseline.

Top ligand-responsive riboswitch designs. (A) Predicted secondary structures for a top OFF switches show disruption of the MS2 hairpin (red) upon binding of FMN, theophylline, or tryptophan (blue). (B) Predicted secondary structures for top ON switches show formation of the MS2 hairpin (red) upon binding of FMN, theophylline, or tryptophan (blue). Design of ligand-responsive riboswitches. (A) Clustering of FMN switches based on the sum of base pair distances of predicted secondary structures reveals that RiboLogic designs with diverse structures achieve high activation ratios. (B) Distributions of experimentally measured activation ratios are shown for various types of designs, with medians shown as vertical lines. RiboLogic generally achieves significantly better activation ratios than baseline, as determined by a Wilcoxon rank-sum test (***p < 0.001). Baseline is the measured activation ratio for sequences made for other design problems. (C) In practice, several of the most promising designs would be experimentally screened to evaluate switch efficiency. To mimic this, we bootstrapped sets of ten designs and chose the design with the best activation ratio. The distributions of activation ratios for these best-of-ten designs were compared between RiboLogic and baseline. A best-of-ten strategy yields designs with significantly higher activation ratios than baseline. We found that RiboLogic designs achieved activation ratios significantly better than unrelated designs made for other ligands, which were used as baseline comparisons (Figure B). For example, theophylline and tryptophan designs, which are expected not to respond to FMN-binding, were used as baseline measurements for comparison to FMN designs. For example, the median activation ratio for RiboLogic designs of FMN-responsive ON switches was 1.5 (Figure B, Table , Table S1). As the baseline comparison, the median activation ratios with respect to FMN for designs meant to be responsive to theophylline or tryptophan was 1.2. For each of the six switch design challenges (three ligands, ON vs OFF) the difference was significant (p < 10–10; Figure B, Table S2). In addition, RiboLogic designs also perform significantly better than no switching (activation ratio 1) in almost all design problems. We also provide a success rate by counting the number of designs that perform better than the median or 95th percentile of baseline designs (Table S3). Since other existing automated methods are not compatible with our design problem, we also compare our performance to previous rational design efforts of similar systems. Previous characterization of reversible riboswitches yielded a median activation ratio of 1.2.[6,9,29]

Table 1

Activation Ratios for RiboLogic Designs

design	maximum AR	median AR	best-of-ten median AR	count
FMN OFF	9.74	0.987	2.57	1357
FMN ON	14.4	1.46	3.89	853
theophylline OFF	9.92	1.73	4.86	97
theophylline ON	15.4	0.991	3.44	99
tryptophan OFF	4.29	1.17	2.28	89
tryptophan ON	4.55	1.08	2.09	94
miRNA OFF	21.8	0.825	1.66	188
miRNA ON	20.0	1.17	2.84	98

For each of these six small-molecule-triggered challenges, the best activation ratio was over 4-fold, and extended up to 15-fold for the theophylline ON switch tests (Figure B). In addition, previous design efforts generally involve experimentally testing several designs and choosing the best one.[3,6,9,30] Thus, we conducted a best-of-ten analysis, in which we randomly drew subsets of 10 designs and scored the best activation ratios. These best-of-ten trials showed clear separation of the activation ratios from baselines, and in the majority of cases gave activation ratios of 2.0 or greater (Figure C, Table S4). In addition, most designs exhibited K’s close to the affinity of the MS2 coat protein under the conditions in which they were supposed to be active (with ligand for ON switches; without ligand for OFF switches) (Figure S3). This bias likely reflects our design constraint that the stand-alone riboswitches should quantitatively convert to MS2-binding structures when activated rather than requiring subsequent molecular machinery to amplify their output. The stand-alone switch with the highest activation ratio of 15.4 achieved a K of 10 nM in the activated state, within experimental error of the intrinsic dissociation constant of the MS2 coat protein-RNA hairpin interaction (6 nM, measured in the same experiment). However, the activation ratios fell short of the thermodynamic optimum described by Wayment-Steele et al.(10) (Figure S4 and S5). We further tested if RiboLogic could design stand-alone riboswitches that are responsive to RNA inputs instead of small molecule ligands. Specifically, we applied the algorithm to design 286 switches that modulate MS2 binding based on the presence of miR-208a, a 22-nt miRNA implicated in cardiac hypertrophy.[31] This type of RNA-based system could be used in diagnostic devices or linked to downstream therapeutic events. Using RiboLogic, we were able to design both ON and OFF switches triggered by the miRNA strand (Figure A,B). We found that these designs generally took more iterations of optimization to satisfy the constraints as compared to the ligand-responsive switches (Figure S6), but diverse mechanisms were achieved (Figure C). Disappointingly, experimental evaluation did not show a significant difference between RiboLogic and baseline designs in terms of activation ratio. Nevertheless, the best-of-ten comparison showed significant differences and maximum activation ratios of 20 exceeded those of small molecule activated switches (Figure D,E, Table ). These computational and experimental observations suggest that design for RNA-responsive switches may be intrinsically more difficult, despite the larger binding energy of the RNA compared to the small molecule ligands, perhaps due to a large number of competing binding modes where the input RNAs hybridize to alternative locations in the riboswitch design. At the same time, this automated procedure can still lead to excellent microRNA sensors, at the expense of characterizing more designs.

Figure 5

Design of miRNA-responsive riboswitches. (A) This OFF switch is predicted to form the MS2 hairpin (red) only in the absence of the miRNA (blue). (B) This ON switch is predicted to form the MS2 hairpin (red) only in the presence of the miRNA (blue). (C) Clustering of miRNA switches based on the base pair distance between predicted secondary structures in the absence of the miRNA reveals that RiboLogic designs with diverse structures achieve high activation ratios. (D) The distribution of experimentally measured activation ratios are shown as scatter and violin plots, with medians shown as horizontal lines. Across all design problems, there is no significant difference between RiboLogic and baseline designs, as determined by a Wilcoxon rank-sum test. (E) We conducted a best-of-ten analysis by bootstrapping sets of ten designs and choosing the design with the best activation ratio. The distributions of activation ratios for these best-of-ten designs were compared between RiboLogic and baseline. This analysis results in designs with significantly higher activation ratios, but the distributions remain similar, with the exception of a few high performaning designs. Across these design challenges, we found that stand-alone riboswitches with high activation ratios could take a variety of forms. Some high performing designs had the MS2 sequence nested between the two sides of the aptamer, while others had the MS2 outside, with only a short hairpin between the two halves of the ligand-binding internal loop (Figure ; compare designs 2297 and 2343 to 512 and 2534). Some designs formed relatively simple secondary structures with long stems, while others formed more complex folds with three-way junctions (Figure ; compare designs 512 and 2357 to 1555 and 2534). Several structures contain large single-stranded regions, while some have regions designed to bind the functional elements when they are inactive (Figure ; compare design 2534 to 512). The size of our dataset enabled statistical analyses of these secondary structure features, highlighting several that were correlated with activation ratios (Figure S7). For example, the data showed that having more base pairs shared between states correlated with higher resulting activation ratios. Still, the correlations of any single feature with activation ratio, while statistically significant, were weak (r2 < 0.01). Machine learning models that take into account multiple features to predict design success will be interesting to develop and test prospectively. A related insight into current design limitations is also enabled by the diversity and large number of our riboswitches. We note that the designs produced by RiboLogic have features that are distinct from designs created by human experts. For the small molecule sensitive riboswitches (Figure ), the RiboLogic designs include numerous stems outside the aptamer segments that need to be broken or formed. These designs are not as “concise” as expert-designed riboswitches seen in the literature,[5,19] although it should be noted that some natural riboswitches do involve ornate conformational rearrangements.[32] For the miRNA-sensitive riboswitches (Figures ), the binding of the input miRNA and the RiboLogic riboswitch is typically not through a completely contiguous, long RNA–RNA duplex, as is typically the case in, e.g., toehold riboswitches[33,34] or DNA logical devices[35,36] designed by human experts. Automated riboswitch design might improve if these features seen in human designs were rewarded or seeded into the RiboLogic design algorithm. We hypothesized that errors in current RNA secondary structure energetic models might be limiting for RiboLogic stand-alone riboswitch designs. We carried out comparisons of K’s and activation ratios predicted by the ViennaRNA and NUPACK packages for small molecule and miRNA riboswitches, respectively. We saw poor correlations for both (r of 0.06 and 0.01 for small molecule and miRNA riboswitches, respectively; Figures S8 and 6). Several designs predicted to have poor activation ratios (near or lower than 1.0) in fact gave activation ratios near 10.0; and other designs predicted to have outstanding activation ratios (greater than 100.0) gave experimental activation ratios lower than 1.0 (Figure B). This experiment–theory correlation was better for small-molecule riboswitches compared to the miRNA riboswitches, consistent with the generally better activation ratios of the former, relative to baseline measurements (compare Figures B and 5D; Table S1). Future design efforts would benefit from more accurate computational models of RNA folding energetics. We present all data collected herein as the ribologic-solves dataset (Supplemental Data) to help guide and validate such improvements.

Figure 6

Comparison of predicted and measured activation ratios. (A) For small molecule riboswitches, the predicted activation ratio is somewhat correlated with measured activation ratio. (B) For miRNA riboswitches, the correlation between prediction and experiment is poor. Here, we have presented RiboLogic, an automated algorithm for designing stand-alone riboswitches that transduce input ligand binding into output effector binding without energy input or amplification by other molecular machines. We show that RiboLogic generates designs with diverse structural mechanisms and achieves activation ratios comparable to previous efforts in rational design of reversible riboswitches. In combination with improved thermodynamic models and high-throughput measurement techniques, we expect that this method and these data will enable improved automated design of switchable RNA elements for a wide variety of applications in biotechnology and medicine.

Methods

Design Algorithm

Overview

Given secondary structure constraints in multiple states defined by ligands or short RNA inputs, our method optimizes an RNA sequence using a simulated annealing algorithm. The starting sequence is arbitrarily set to all A’s, with the exception of known sequence constraints and updates to ensure complementarity in the target secondary structures. The length is specified by the user and is not changed during sequence optimization. In each step, a random mutation is made, and the new sequence is evaluated using a base pair distance and a base pair probability score. The sequence is updated on the basis of a Metropolis–Hastings acceptance criterion:where ΔG is the difference in score between the updated and current sequences and Tdesign is the temperature parameter. This temperature parameter is decreased over the course of the optimization and can be tuned by the user. By default, it decreases linearly from 5 to 1 over the course of design. This process is repeated until a satisfactory sequence is found or the maximum number of iterations specified by the user is reached.

Constraints

Sequence constraints can include fixed bases at specified positions as well as substrings that are disallowed from the final sequence. Secondary structure constraints can be given for multiple user-specified states, as defined by varying concentrations of the input ligands. For small molecule and protein ligands, the aptamer sequence, secondary structure, and dissociation constant must be specified. For each state, secondary structure constraints can be applied to any part of the input sequence, including any RNA inputs, and bases can be specified to be unpaired, paired to any other base, or paired with a specific other base. Secondary structure elements’ positions can be left unspecified, and RiboLogic will optimize its position as well. To further ensure diversity, for the tests herein, we enforced two different global arrangements of the aptamer and MS2 hairpin elements—one with the two parts of the aptamer loop adjacent to each other and one with the MS2 sequence nested within the aptamer segments.

Sequence Update

Sequences are represented in a dependency graph structure as described by Flamm et al.(25) Briefly, each base is a node and each base pair in the constraints forms an edge between nodes. The graph is maintained such that nodes connected by an edge are always complementary. Each time a base is mutated, its entire connected component is mutated accordingly to ensure that all nodes connected to the selected base maintains complementarity. In addition, sequence constraints are incorporated into this graph, disallowing mutations that would force a constrained base to change. In the case of RNA inputs, our method provides the option to automatically introduce the complement of the input sequence into the design sequence in order to promote interactions between strands. This complementary segment can be altered in length, moved, or mutated as a sequence update step.

Scoring Functions

Two scoring functions are used: a primary score based on a single minimum free energy secondary structure, and a base pair probability-based secondary score that is used in the primary score’s place when the it is the same between two sequences. On the basis of the predicted minimum free energy structures in each state, a base pair distance to the target secondary structure is calculated. The base pair distance is the number of base pairs that must be broken or formed in order to get from one secondary structure to the other.[37] If only a substructure is specified, this can include the breaking of base pairs formed with nucleotides outside of the subsequence specified. In addition, for small molecule riboswitches, if the energy of the ligand-bound conformation, with energetic bonus, is not lower than the ligand-free conformation, a penalty equal to the ΔG between the two states is applied to the base pair distance.where ΔG–aptamer is the free energy of the RNA alone in kcal/mol, [L] is the concentration of the input ligand, K is the affinity of the input ligand, ΔG+aptamer is the free energy of the RNA constrained to form the aptamer, R is the gas constant, T is the experimental temperature (37 °C = 310.15 K). We consider only structures that form the desired aptamer, as opposed to doing a minimum free energy calculation with an energetic bonus. This allows the algorithm to guide the sequence toward those that have a more favorable aptamer-forming conformation, even if it is not the minimum free energy structure. We used a value of of 133 for FMN and 150 for theophylline and tryptophan, based on initial K estimates for those input ligands (Figure S4) and experimental [L] = 200 μM, 2 mM, and 2.4 mM (FMN, theophylline, and tryptophan, respectively). However, since the score in eq is not highly sensitive to single mutations, a secondary base pair probability score is used when the base pair distance is unchanged between sequence updates. This measure of secondary structure formation over the full ensemble is defined bywhere s is the index of the folding state, i and j are indices of the base position in the sequence, X is an indicator variable representing whether base i and j should be paired in state s, and p is the probability of base i and j forming in state s according to the partition function calculation. The value of the indicator variable is 1 if the base pair should be formed, −1 if it should not be formed, and 0 if it is unconstrained. Folding of each sequence can be modeled using either ViennaRNA[38] or NUPACK.[39] NUPACK 3.0.5.[39] was used for design involving more than one RNA, in order to properly model multistrand RNA folding, while ViennaRNA 2.1.9[38] was used for designs involving small molecule aptamers. The score used for the Metropolis–Hastings criterion in eq was:By default, the sequence search terminates once the base pair distance reaches 0 or the number of steps reaches 10 000 steps. The software also provides the option to continue optimizing the sequence after the base pair distance reaches 0. Sequences were not filtered in any way before proceeding to experimental characterization.

Computation and Code Availability

All computation was performed on Intel Xeon Processors E5–2650. The code is available at https://github.com/wuami/RiboLogic. Average computation time for the design of a ligand-induced riboswitch varied widely, both across runs and depending on the design problem (Figure S6). Every 1000 iterations took about 2 min on one core.

High-Throughput Array Experiments

The experimental methods have been described in detail previously.[20,22] Briefly, DNA templates for designs were synthesized (CustomArray, Bothell, WA) and sequenced on Illumina MiSeq instruments, and RNA was transcribed directly on the sequencing chip in a repurposed Illumina Genome Analyzer II instrument. Fluorescently labeled MS2 protein was introduced at concentrations from 1.5 nM to 3 μM at room temperature. Incubation times varied from 0.8 to 1.5 h at the lowest concentrations to 10–20 min at the highest concentrations. Fluorescence images were collected and quantified to generate binding curves in buffer of 100 mM Tris-HCl pH 7.5, 80 mM KCl, 4 mM MgCl2, 0.1 mg/mL BSA, 1 mM DTT, 10 μg/mL yeast tRNA, 0.012% Tween20. These curves were measured in the absence and presence of the ligand of interest, with concentrations of 200 μM FMN, 2 mM theophylline, 4 mM tryptophan, and 100 nM miR-208a. These conditions were selected based on the K of each ligand. Each design was measured over an average of about 100 individual clusters on the flow cell. Median fit K values over all clusters for each design were used to compute the activation ratio. Designs were prepared and analyzed as part of the Eterna massive open laboratory experiments (rounds R95, R101, and R107).[42] Designs for which K measurements were made over fewer than 10 clusters were excluded from our analysis to avoid poor quality measurements. For diversity analysis, Levenshtein distance was computed between each pair of sequences to obtain a distance matrix. Complete-linkage hierarchical clustering was performed to obtain a dendrogram with each design as a leaf (hclust in R). For statistical analysis, two-sided Wilcoxon rank sum test was used to determine if activation ratios between design types were significantly different. Predicted K’s were computed as described by Wayment-Steele et al.(10) Calculations were performed in R,[40] with example scripts available at https://github.com/wuami/RiboLogic. The full dataset is available as Supplementary Data.

Chemical Mapping Experiments

One-dimensional chemical mapping measurements were performed as described previously.[41] 1M7 was used for FMN and tryptophan aptamers, while DMS was used for the theophylline aptamer.

40 in total

Review 1. Transcription attenuation: once viewed as a novel regulatory strategy.

Authors: C Yanofsky
Journal: J Bacteriol Date: 2000-01 Impact factor: 3.490

2. Design of multistable RNA molecules.

Authors: C Flamm; I L Hofacker; S Maurer-Stroh; P F Stadler; M Zehl
Journal: RNA Date: 2001-02 Impact factor: 4.942

3. Fluorescence imaging of cellular metabolites with RNA.

Authors: Jeremy S Paige; Thinh Nguyen-Duc; Wenjiao Song; Samie R Jaffrey
Journal: Science Date: 2012-03-09 Impact factor: 47.728

4. A new algorithm for RNA secondary structure design.

Authors: Mirela Andronescu; Anthony P Fejes; Frank Hutter; Holger H Hoos; Anne Condon
Journal: J Mol Biol Date: 2004-02-20 Impact factor: 5.469

5. NUPACK: Analysis and design of nucleic acid systems.

Authors: Joseph N Zadeh; Conrad D Steenberg; Justin S Bois; Brian R Wolfe; Marshall B Pierce; Asif R Khan; Robert M Dirks; Niles A Pierce
Journal: J Comput Chem Date: 2011-01-15 Impact factor: 3.376

Review 6. Riboswitches as versatile gene control elements.

Authors: Brian J Tucker; Ronald R Breaker
Journal: Curr Opin Struct Biol Date: 2005-06 Impact factor: 6.809

7. Clustering of RNA secondary structures with application to messenger RNAs.

Authors: Ye Ding; Chi Yu Chan; Charles E Lawrence
Journal: J Mol Biol Date: 2006-02-02 Impact factor: 5.469

8. Programming biomolecular self-assembly pathways.

Authors: Peng Yin; Harry M T Choi; Colby R Calvert; Niles A Pierce
Journal: Nature Date: 2008-01-17 Impact factor: 49.962

9. Computational design and experimental validation of oligonucleotide-sensing allosteric ribozymes.

Authors: Robert Penchovsky; Ronald R Breaker
Journal: Nat Biotechnol Date: 2005-10-23 Impact factor: 54.908

10. MicroRNA-208a is a regulator of cardiac hypertrophy and conduction in mice.

Authors: Thomas E Callis; Kumar Pandya; Hee Young Seok; Ru-Hang Tang; Mariko Tatsuguchi; Zhan-Peng Huang; Jian-Fu Chen; Zhongliang Deng; Bronwyn Gunn; Janelle Shumate; Monte S Willis; Craig H Selzman; Da-Zhi Wang
Journal: J Clin Invest Date: 2009-08-10 Impact factor: 14.808

11 in total

1. Bacterial two-component systems as sensors for synthetic biology applications.

Authors: John T Lazar; Jeffrey J Tabor
Journal: Curr Opin Syst Biol Date: 2021-10-15

2. RNA secondary structure packages evaluated and improved by high-throughput experiments.

Authors: Hannah K Wayment-Steele; Wipapat Kladwang; Alexandra I Strom; Jeehyung Lee; Adrien Treuille; Alex Becka; Rhiju Das
Journal: Nat Methods Date: 2022-10-03 Impact factor: 47.990

3. Closing the loop on crowdsourced science.

Authors: James M Robson; Alexander A Green
Journal: Proc Natl Acad Sci U S A Date: 2022-06-10 Impact factor: 12.779

Review 4. RNA-based fluorescent biosensors for live cell imaging of small molecules and RNAs.

Authors: Yichi Su; Ming C Hammond
Journal: Curr Opin Biotechnol Date: 2020-02-19 Impact factor: 9.740

5. Point-of-Use Detection of Environmental Fluoride via a Cell-Free Riboswitch-Based Biosensor.

Authors: Walter Thavarajah; Adam D Silverman; Matthew S Verosloff; Nancy Kelley-Loughnane; Michael C Jewett; Julius B Lucks
Journal: ACS Synth Biol Date: 2019-12-20 Impact factor: 5.110

6. RNA-based fluorescent biosensors for live cell detection of bacterial sRNA.

Authors: Rebekah Z Kitto; Kylee E Christiansen; Ming C Hammond
Journal: Biopolymers Date: 2020-08-12 Impact factor: 2.505

Review 7. Developments of Riboswitches and Toehold Switches for Molecular Detection-Biosensing and Molecular Diagnostics.

Authors: Tin Hoang Trung Chau; Dung Hoang Anh Mai; Diep Ngoc Pham; Hoa Thi Quynh Le; Eun Yeol Lee
Journal: Int J Mol Sci Date: 2020-04-30 Impact factor: 5.923