Riboswitches that couple binding of ligands to conformational changes offer sensors and control elements for RNA synthetic biology and medical biotechnology. However, design of these riboswitches has required expert intuition or software specialized to transcription or translation outputs; design has been particularly challenging for applications in which the riboswitch output cannot be amplified by other molecular machinery. We present a fully automated design method called RiboLogic for such "stand-alone" riboswitches and test it via high-throughput experiments on 2875 molecules using RNA-MaP (RNA on a massively parallel array) technology. These molecules consistently modulate their affinity to the MS2 bacteriophage coat protein upon binding of flavin mononucleotide, tryptophan, theophylline, and microRNA miR-208a, achieving activation ratios of up to 20 and significantly better performance than control designs. By encompassing a wide diversity of stand-alone switches and highly quantitative data, the resulting ribologic-solves experimental data set provides a rich resource for further improvement of riboswitch models and design methods.
Riboswitches that couple binding of ligands to conformational changes offer sensors and control elements for RNA synthetic biology and medical biotechnology. However, design of these riboswitches has required expert intuition or software specialized to transcription or translation outputs; design has been particularly challenging for applications in which the riboswitch output cannot be amplified by other molecular machinery. We present a fully automated design method called RiboLogic for such "stand-alone" riboswitches and test it via high-throughput experiments on 2875 molecules using RNA-MaP (RNA on a massively parallel array) technology. These molecules consistently modulate their affinity to the MS2bacteriophage coat protein upon binding of flavin mononucleotide, tryptophan, theophylline, and microRNA miR-208a, achieving activation ratios of up to 20 and significantly better performance than control designs. By encompassing a wide diversity of stand-alone switches and highly quantitative data, the resulting ribologic-solves experimental data set provides a rich resource for further improvement of riboswitch models and design methods.
Riboswitches
use RNA conformational
changes to transduce sensing of molecules in the cellular milieu into
modulation of RNA transcription, ribosomal translation, pre-mRNA splicing,
and RNA cleavage.[1] The ability to perform de novo design of arbitrary riboswitches would have broad
impacts in synthetic biology as well as for RNA diagnostics, therapeutics,
and biomedical imaging. Supporting these efforts, there are a rapidly
growing number of synthetic and natural RNA “aptamer”
sequences that bind drugs, metabolites, proteins, and other biologically
important molecules that expand the possible inputs for novel riboswitches,
and powerful design rules and software to create riboswitches with
transcription and translation outputs.[2−4] Similarly, the possible
outputs of riboswitches are being expanded to triggering of “light-up”
fluorescence and toggling activities of CRISPR/Cas and other ribonucleoprotein
complexes.[5−9] These newer applications would benefit from riboswitch mechanisms
that do not require external molecular machinery or energy dissipation
but instead broadcast their output simply after reaching thermodynamic
equilibrium. Such molecules would be more likely to retain their functions
when moved into different RNA contexts or used in extracellular environments
where energy or additional molecular machinery cannot be provided.
Inspired by the concept of stand-alone executables in software engineering,
we term such molecules “stand-alone” riboswitches.Creating stand-alone riboswitches leads to a new design challenge.
Natural and synthetic riboswitches achieve maximal activation ratios—defined
as the ratio of observed output signal in the presence and absence
of the input ligand—by toggling between states that are barely
activated to states that are weakly activated, rather than to states
that saturate the output.[10] Biological
control is then achieved by subsequent amplification steps such as
ribosomal translation of many proteins per activation step.[10,11] Effective stand-alone riboswitches, which forego such amplification
machinery, require quantitative conversion between two distinct states,
rather than changing the frequency of transient sampling of an active
state. This design constraint necessitates a trade-off with good activation
ratios.[10] Testifying to the difficulty
of this additional trade-off, development of light-up sensors has
required significant trial-and-error; success has been achieved through
screening of many constructs, the majority of which exhibit little
to no switching, with median activation ratios close to 1 and best-case
activation ratios of 10.[5,6,9,10] Moreover, computational predictions
of the success of light-up designs are poor (Figure S1), suggesting the need for richer datasets characterizing
diverse RNAs. Further exemplifying their inherent design difficulty,
stand-alone switches for CRISPR/Cas9 or other ribonucleoprotein complexes,
which would enable reversible control of these complexes in therapeutic
settings, have not been achieved.[12−14]Here, we present
a detailed computational and experimental study
involving thousands of diverse molecules to test the fully automated
design of stand-alone riboswitches. For computational design, we describe
RiboLogic, an algorithm for designing sequences of RNA molecules that
are predicted to change their secondary structure in response to interactions
with other molecules. Unlike prior software that might be applied
to stand-alone switch design,[3,15−19] this package only requires the user to provide small aptamer segments
to bind desired input molecules and the desired structures adopted
in each state. For experimental characterization, we evaluate the
switching of thousands of designed RNA molecules in vitro using repurposed Illumina sequencers, through the recently developed
RNA-MaP (RNA on a massively parallel array) platform.[20−23]RiboLogic designs stand-alone riboswitches based on a flexible
set of user-specified constraints. The algorithm accounts for any
number of folding conditions, as defined by the concentrations of
ligands defined by the user. These ligands can be small molecules,
proteins with known aptamers, or other RNA strands engaged through
base-pairing interactions. For example, in some of our tests below,
we used flavin mononucleotide (FMN) as an input ligand; FMN binds
to a small aptamer sequence discovered by in vitro selection (Figure A,D).[24] The user only needs to specify
the sequence of this aptamer and the estimated dissociation constant
of the aptamer-ligand complex under the experimental conditions, and
RiboLogic will place this “input” segment within the
design and optimize the surrounding sequence in each of the riboswitch
states, simulating ligand binding to the aptamer (see Methods for details). In this example, the two states are
RNA with no FMN present and with a concentration of 200 μM FMN
(Figure B). For each
of the target riboswitch states, the user can specify either a full
desired secondary structure or, more simply, the substructure of an
“output” segment that must be adopted or not adopted
by the RNA in order to trigger or suppress an output, respectively.
For example, in some of our tests below, we used binding of a fluorescently
tagged MS2 viral coat protein to an MS2 RNA hairpin segment within
the design as an output (Figure A,D); such interactions underlie most systems for CRISPR
interference and activation and in situ RNA visualization
but have not yet been used in standalone switch design.[5−9,12−14] The user only
needs to specify the sequence and “active” secondary
structure of this output element, and RiboLogic places this sequence
relative to the input aptamer element and optimize surrounding sequences
during its design process. We note that unlike prior natural and synthetic
riboswitches, we demand that the RNA’s MS2 output segment take
on the desired hairpin secondary structure as its dominant structure
in the ON state, rather than simply sampling this structure more frequently
than in the OFF state. Such complete conversion of structure is needed
for a stand-alone riboswitch that would work without further amplification.
Figure 1
RiboLogic
uses a graph representation and two scoring functions
to design stand-alone riboswitches. (A) This energy diagram represents
the thermodynamic model used, where the ligand-bound state is given
an energetic bonus due to the chemical potential of the binding of
the ligand. (B) A user specifies design constraints for a riboswitch
of interest, e.g., the formation of the MS2 hairpin
in the absence of a ligand and the nonformation of the hairpin in
the absence of a ligand. (C) The sequence is initialized to all A’s
except for known sequence constraints. (D) A graph representation
is used to constrain the sequence space that is sampled by RiboLogic.
In this example, the goal is to design a riboswitch whose formation
of the MS2 RNA hairpin is modulated by the presence of the flavin
mononucleotide (FMN) molecule. Bases connected by an arc are part
of these secondary structure elements and are constrained to be complementary
in sequence update. (E) Two scoring metrics are used to evaluate each
design candidate. The base pair distance measures the number of base
pairs that must be broken or formed to reach the target structure,
while the base pair probability (bpp) score quantifies the probability
of formation of each base pair in the target structure. (F) The scores
change as expected during computational design, with the base pair
distance decreasing and the base pair probability score increasing
over optimization steps.
RiboLogic
uses a graph representation and two scoring functions
to design stand-alone riboswitches. (A) This energy diagram represents
the thermodynamic model used, where the ligand-bound state is given
an energetic bonus due to the chemical potential of the binding of
the ligand. (B) A user specifies design constraints for a riboswitch
of interest, e.g., the formation of the MS2 hairpin
in the absence of a ligand and the nonformation of the hairpin in
the absence of a ligand. (C) The sequence is initialized to all A’s
except for known sequence constraints. (D) A graph representation
is used to constrain the sequence space that is sampled by RiboLogic.
In this example, the goal is to design a riboswitch whose formation
of the MS2 RNA hairpin is modulated by the presence of the flavin
mononucleotide (FMN) molecule. Bases connected by an arc are part
of these secondary structure elements and are constrained to be complementary
in sequence update. (E) Two scoring metrics are used to evaluate each
design candidate. The base pair distance measures the number of base
pairs that must be broken or formed to reach the target structure,
while the base pair probability (bpp) score quantifies the probability
of formation of each base pair in the target structure. (F) The scores
change as expected during computational design, with the base pair
distance decreasing and the base pair probability score increasing
over optimization steps.RiboLogic uses simulated annealing to sample the space of
possible
sequences to satisfy the given constraints (Figure D,E). At each step, the sequence is mutated
either at a single base or by sliding the position of a functional
element (e.g., the FMN aptamer or MS2 hairpin; colored
nucleotides in Figure D). For each sequence that is sampled, the minimum free energy secondary
structure is determined for each solution condition (e.g., without and with 200 μM FMN) and evaluated by two scores
(Figure E,F). The
first score is a base pair distance that measures the number of base
pairs that must be broken or formed to obtain the target structure
or substructures in each solution condition, summed over the different
solution conditions. The second score is a base pair probability score
that sums the probabilities of formation of all base pairs that should
form in the target structure or substructures, providing a smoother
quantitative measure of structure formation than the first base pair
distance score. RiboLogic implements several additional strategies
to narrow the sequence space being explored. Mutation of the sampled
sequences leverages a dependency graph-based approach, which ensures
that bases that are paired in any target structure are always complementary
in sequence (e.g., N’s connected by blue lines
in Figure D).[25] In the case of designing riboswitches responsive
to other input RNA molecules, the algorithm provides the option to
automatically introduce the sequence complementary to the input in
order to promote favorable interactions between the designed RNA and
input RNA.As test cases for our methods, we designed stand-alone
riboswitches
where the binding of a small molecule or oligonucleotide ligand modulates
the formation of the MS2 RNA hairpin, which can then transduce outputs
by recruiting machinery coupled to the MS2bacteriophage coat protein.
This is the first example of MS2-controlling riboswitches, which could
have broad applications.[26−28] We applied a quantitative, high-throughput
array technology that enables fluorescence measurements over millions
of individual RNA clusters generated on an Illumina array, which has
been extensively tested using the MS2 system (Figure A,B).[20,22,23] The formation of the MS2 RNA hairpin was detected by flowing fluorescently
labeled MS2 protein at increasing concentrations to get a binding
curve (Figure B,C).
The dissociation constant K was fit over tens to hundreds of clusters for each design,
yielding a distribution of K measurements for each state (Figure D). By taking the median of each distribution,
we calculated a K as
a quantitative measure of the binding of each design, and the ratio
of these K values with
and without input ligand (e.g., FMN) gives an activation
ratio, which we use as our figure of merit for riboswitches. This
activation ratio is equal to the ratio of fluorescence of the riboswitch
with and without input ligand at low MS2 concentrations and is therefore
the most relevant performance measure for stand-alone switches that
need to work without output amplification.[10] By carrying out fits of data from subnanomolar to many micromolar
MS2 concentrations, we achieve high precision in these measurements.
The resulting K values
and activation ratios were strongly correlated across experimental
replicates, confirming the high precision of the method (r2 = 0.94 for log K; errors in activation ratios well under 2-fold; see Figure S2).
Figure 2
Functional tests of riboswitches using
a high-throughput array.
(A) Each cluster on the array initially contained a single species
of ssDNA from a synthesized oligo pool. dsDNA was generated by Klenow
extension with a biotinylated primer, and RNA was transcribed by RNA
polymerase until being stalled at the streptavidin roadblock. (B)
Fluorescently labeled MS2 protein was flowed in at varying concentrations
to enable measurement of binding. (C) The array technology enables
measurement of binding curves over tens or hundreds of replicate clusters
for each design and solution condition. (D) The median over the distribution
of fit K’s was
used to estimate the activation ratio of switching. In this example
of an ON switch, the activation ratio of 11 was measured over 172
independent clusters displaying the same switch.
Functional tests of riboswitches using
a high-throughput array.
(A) Each cluster on the array initially contained a single species
of ssDNA from a synthesized oligo pool. dsDNA was generated by Klenow
extension with a biotinylated primer, and RNA was transcribed by RNA
polymerase until being stalled at the streptavidin roadblock. (B)
Fluorescently labeled MS2 protein was flowed in at varying concentrations
to enable measurement of binding. (C) The array technology enables
measurement of binding curves over tens or hundreds of replicate clusters
for each design and solution condition. (D) The median over the distribution
of fit K’s was
used to estimate the activation ratio of switching. In this example
of an ON switch, the activation ratio of 11 was measured over 172
independent clusters displaying the same switch.We applied the algorithm to design switches responsive to
three
different small molecules–flavin mononucleotide (FMN), theophylline,
and tryptophan. For stand-alone OFF switches, the MS2 hairpin should
form when the ligand is absent and be disrupted when the ligand is
added (Figure A).
For ON switches, the MS2 hairpin should form only when the FMN is
present and otherwise be disrupted (Figure B). By applying secondary structure constraints
to the MS2 hairpin region in both the absence and presence of the
ligand, we set up a two-state design problem. We were able to obtain
a set of structurally diverse designs (Figures and 4A), and we experimentally
characterized thousands of these molecules with the RNA-MaP method.
Figure 3
Top ligand-responsive
riboswitch designs. (A) Predicted secondary
structures for a top OFF switches show disruption of the MS2 hairpin
(red) upon binding of FMN, theophylline, or tryptophan (blue). (B)
Predicted secondary structures for top ON switches show formation
of the MS2 hairpin (red) upon binding of FMN, theophylline, or tryptophan
(blue).
Figure 4
Design of ligand-responsive riboswitches. (A)
Clustering of FMN
switches based on the sum of base pair distances of predicted secondary
structures reveals that RiboLogic designs with diverse structures
achieve high activation ratios. (B) Distributions of experimentally
measured activation ratios are shown for various types of designs,
with medians shown as vertical lines. RiboLogic generally achieves
significantly better activation ratios than baseline, as determined
by a Wilcoxon rank-sum test (***p < 0.001). Baseline
is the measured activation ratio for sequences made for other design
problems. (C) In practice, several of the most promising designs would
be experimentally screened to evaluate switch efficiency. To mimic
this, we bootstrapped sets of ten designs and chose the design with
the best activation ratio. The distributions of activation ratios
for these best-of-ten designs were compared between RiboLogic and
baseline. A best-of-ten strategy yields designs with significantly
higher activation ratios than baseline.
Top ligand-responsive
riboswitch designs. (A) Predicted secondary
structures for a top OFF switches show disruption of the MS2 hairpin
(red) upon binding of FMN, theophylline, or tryptophan (blue). (B)
Predicted secondary structures for top ON switches show formation
of the MS2 hairpin (red) upon binding of FMN, theophylline, or tryptophan
(blue).Design of ligand-responsive riboswitches. (A)
Clustering of FMN
switches based on the sum of base pair distances of predicted secondary
structures reveals that RiboLogic designs with diverse structures
achieve high activation ratios. (B) Distributions of experimentally
measured activation ratios are shown for various types of designs,
with medians shown as vertical lines. RiboLogic generally achieves
significantly better activation ratios than baseline, as determined
by a Wilcoxon rank-sum test (***p < 0.001). Baseline
is the measured activation ratio for sequences made for other design
problems. (C) In practice, several of the most promising designs would
be experimentally screened to evaluate switch efficiency. To mimic
this, we bootstrapped sets of ten designs and chose the design with
the best activation ratio. The distributions of activation ratios
for these best-of-ten designs were compared between RiboLogic and
baseline. A best-of-ten strategy yields designs with significantly
higher activation ratios than baseline.We found that RiboLogic designs achieved activation ratios
significantly
better than unrelated designs made for other ligands, which were used
as baseline comparisons (Figure B). For example, theophylline and tryptophan designs,
which are expected not to respond to FMN-binding, were used as baseline
measurements for comparison to FMN designs. For example, the median
activation ratio for RiboLogic designs of FMN-responsive ON switches
was 1.5 (Figure B, Table , Table S1). As the baseline comparison, the median activation
ratios with respect to FMN for designs meant to be responsive to theophylline
or tryptophan was 1.2. For each of the six switch design challenges
(three ligands, ON vs OFF) the difference was significant
(p < 10–10; Figure B, Table S2). In addition, RiboLogic designs also perform significantly
better than no switching (activation ratio 1) in almost all design
problems. We also provide a success rate by counting the number of
designs that perform better than the median or 95th percentile of
baseline designs (Table S3). Since other
existing automated methods are not compatible with our design problem,
we also compare our performance to previous rational design efforts
of similar systems. Previous characterization of reversible riboswitches
yielded a median activation ratio of 1.2.[6,9,29]
Table 1
Activation Ratios
for RiboLogic Designs
design
maximum AR
median AR
best-of-ten median AR
count
FMN OFF
9.74
0.987
2.57
1357
FMN ON
14.4
1.46
3.89
853
theophylline
OFF
9.92
1.73
4.86
97
theophylline ON
15.4
0.991
3.44
99
tryptophan OFF
4.29
1.17
2.28
89
tryptophan ON
4.55
1.08
2.09
94
miRNA OFF
21.8
0.825
1.66
188
miRNA ON
20.0
1.17
2.84
98
For each of these six
small-molecule-triggered challenges, the
best activation ratio was over 4-fold, and extended up to 15-fold
for the theophylline ON switch tests (Figure B). In addition, previous design efforts
generally involve experimentally testing several designs and choosing
the best one.[3,6,9,30] Thus, we conducted a best-of-ten analysis,
in which we randomly drew subsets of 10 designs and scored the best
activation ratios. These best-of-ten trials showed clear separation
of the activation ratios from baselines, and in the majority of cases
gave activation ratios of 2.0 or greater (Figure C, Table S4).
In addition, most designs exhibited K’s close to the affinity of the MS2 coat protein
under the conditions in which they were supposed to be active (with
ligand for ON switches; without ligand for OFF switches) (Figure S3). This bias likely reflects our design
constraint that the stand-alone riboswitches should quantitatively
convert to MS2-binding structures when activated rather than requiring
subsequent molecular machinery to amplify their output. The stand-alone
switch with the highest activation ratio of 15.4 achieved a K of 10 nM in the activated
state, within experimental error of the intrinsic dissociation constant
of the MS2 coat protein-RNA hairpin interaction (6 nM, measured in
the same experiment). However, the activation ratios fell short of
the thermodynamic optimum described by Wayment-Steele et al.(10) (Figure S4 and S5).We further tested if RiboLogic could design stand-alone
riboswitches
that are responsive to RNA inputs instead of small molecule ligands.
Specifically, we applied the algorithm to design 286 switches that
modulate MS2 binding based on the presence of miR-208a, a 22-nt miRNA
implicated in cardiac hypertrophy.[31] This
type of RNA-based system could be used in diagnostic devices or linked
to downstream therapeutic events. Using RiboLogic, we were able to
design both ON and OFF switches triggered by the miRNA strand (Figure A,B). We found that
these designs generally took more iterations of optimization to satisfy
the constraints as compared to the ligand-responsive switches (Figure S6), but diverse mechanisms were achieved
(Figure C). Disappointingly,
experimental evaluation did not show a significant difference between
RiboLogic and baseline designs in terms of activation ratio. Nevertheless,
the best-of-ten comparison showed significant differences and maximum
activation ratios of 20 exceeded those of small molecule activated
switches (Figure D,E, Table ). These computational
and experimental observations suggest that design for RNA-responsive
switches may be intrinsically more difficult, despite the larger binding
energy of the RNA compared to the small molecule ligands, perhaps
due to a large number of competing binding modes where the input RNAs
hybridize to alternative locations in the riboswitch design. At the
same time, this automated procedure can still lead to excellent microRNA
sensors, at the expense of characterizing more designs.
Figure 5
Design of miRNA-responsive
riboswitches. (A) This OFF switch is
predicted to form the MS2 hairpin (red) only in the absence of the
miRNA (blue). (B) This ON switch is predicted to form the MS2 hairpin
(red) only in the presence of the miRNA (blue). (C) Clustering of
miRNA switches based on the base pair distance between predicted secondary
structures in the absence of the miRNA reveals that RiboLogic designs
with diverse structures achieve high activation ratios. (D) The distribution
of experimentally measured activation ratios are shown as scatter
and violin plots, with medians shown as horizontal lines. Across all
design problems, there is no significant difference between RiboLogic
and baseline designs, as determined by a Wilcoxon rank-sum test. (E)
We conducted a best-of-ten analysis by bootstrapping sets of ten designs
and choosing the design with the best activation ratio. The distributions
of activation ratios for these best-of-ten designs were compared between
RiboLogic and baseline. This analysis results in designs with significantly
higher activation ratios, but the distributions remain similar, with
the exception of a few high performaning designs.
Design of miRNA-responsive
riboswitches. (A) This OFF switch is
predicted to form the MS2 hairpin (red) only in the absence of the
miRNA (blue). (B) This ON switch is predicted to form the MS2 hairpin
(red) only in the presence of the miRNA (blue). (C) Clustering of
miRNA switches based on the base pair distance between predicted secondary
structures in the absence of the miRNA reveals that RiboLogic designs
with diverse structures achieve high activation ratios. (D) The distribution
of experimentally measured activation ratios are shown as scatter
and violin plots, with medians shown as horizontal lines. Across all
design problems, there is no significant difference between RiboLogic
and baseline designs, as determined by a Wilcoxon rank-sum test. (E)
We conducted a best-of-ten analysis by bootstrapping sets of ten designs
and choosing the design with the best activation ratio. The distributions
of activation ratios for these best-of-ten designs were compared between
RiboLogic and baseline. This analysis results in designs with significantly
higher activation ratios, but the distributions remain similar, with
the exception of a few high performaning designs.Across these design challenges, we found that stand-alone
riboswitches
with high activation ratios could take a variety of forms. Some high
performing designs had the MS2 sequence nested between the two sides
of the aptamer, while others had the MS2 outside, with only a short
hairpin between the two halves of the ligand-binding internal loop
(Figure ; compare
designs 2297 and 2343 to 512 and 2534). Some designs formed relatively
simple secondary structures with long stems, while others formed more
complex folds with three-way junctions (Figure ; compare designs 512 and 2357 to 1555 and
2534). Several structures contain large single-stranded regions, while
some have regions designed to bind the functional elements when they
are inactive (Figure ; compare design 2534 to 512). The size of our dataset enabled statistical
analyses of these secondary structure features, highlighting several
that were correlated with activation ratios (Figure S7). For example, the data showed that having more base pairs
shared between states correlated with higher resulting activation
ratios. Still, the correlations of any single feature with activation
ratio, while statistically significant, were weak (r2 < 0.01). Machine learning models that take into account
multiple features to predict design success will be interesting to
develop and test prospectively.A related insight into current
design limitations is also enabled
by the diversity and large number of our riboswitches. We note that
the designs produced by RiboLogic have features that are distinct
from designs created by human experts. For the small molecule sensitive
riboswitches (Figure ), the RiboLogic designs include numerous stems outside the aptamer
segments that need to be broken or formed. These designs are not as
“concise” as expert-designed riboswitches seen in the
literature,[5,19] although it should be noted that
some natural riboswitches do involve ornate conformational rearrangements.[32] For the miRNA-sensitive riboswitches (Figures ), the binding of
the input miRNA and the RiboLogic riboswitch is typically not through
a completely contiguous, long RNA–RNA duplex, as is typically
the case in, e.g., toehold riboswitches[33,34] or DNA logical devices[35,36] designed by human experts.
Automated riboswitch design might improve if these features seen in
human designs were rewarded or seeded into the RiboLogic design algorithm.We hypothesized that errors in current RNA secondary structure
energetic models might be limiting for RiboLogic stand-alone riboswitch
designs. We carried out comparisons of K’s and activation ratios predicted by the
ViennaRNA and NUPACK packages for small molecule and miRNA riboswitches,
respectively. We saw poor correlations for both (r of 0.06 and 0.01 for small molecule
and miRNA riboswitches, respectively; Figures S8 and 6). Several designs predicted
to have poor activation ratios (near or lower than 1.0) in fact gave
activation ratios near 10.0; and other designs predicted to have outstanding
activation ratios (greater than 100.0) gave experimental activation
ratios lower than 1.0 (Figure B). This experiment–theory correlation was better for
small-molecule riboswitches compared to the miRNA riboswitches, consistent
with the generally better activation ratios of the former, relative
to baseline measurements (compare Figures B and 5D; Table S1). Future design efforts would benefit
from more accurate computational models of RNA folding energetics.
We present all data collected herein as the ribologic-solves dataset (Supplemental Data) to help guide
and validate such improvements.
Figure 6
Comparison of predicted and measured activation
ratios. (A) For
small molecule riboswitches, the predicted activation ratio is somewhat
correlated with measured activation ratio. (B) For miRNA riboswitches,
the correlation between prediction and experiment is poor.
Comparison of predicted and measured activation
ratios. (A) For
small molecule riboswitches, the predicted activation ratio is somewhat
correlated with measured activation ratio. (B) For miRNA riboswitches,
the correlation between prediction and experiment is poor.Here, we have presented RiboLogic, an automated
algorithm for designing
stand-alone riboswitches that transduce input ligand binding into
output effector binding without energy input or amplification by other
molecular machines. We show that RiboLogic generates designs with
diverse structural mechanisms and achieves activation ratios comparable
to previous efforts in rational design of reversible riboswitches.
In combination with improved thermodynamic models and high-throughput
measurement techniques, we expect that this method and these data
will enable improved automated design of switchable RNA elements for
a wide variety of applications in biotechnology and medicine.
Methods
Design
Algorithm
Overview
Given secondary structure constraints in multiple
states defined by ligands or short RNA inputs, our method optimizes
an RNA sequence using a simulated annealing algorithm. The starting
sequence is arbitrarily set to all A’s, with the exception
of known sequence constraints and updates to ensure complementarity
in the target secondary structures. The length is specified by the
user and is not changed during sequence optimization. In each step,
a random mutation is made, and the new sequence is evaluated using
a base pair distance and a base pair probability score. The sequence
is updated on the basis of a Metropolis–Hastings acceptance
criterion:where ΔG is the difference
in score between the updated and current sequences and Tdesign is the temperature parameter. This temperature
parameter is decreased over the course of the optimization and can
be tuned by the user. By default, it decreases linearly from 5 to
1 over the course of design. This process is repeated until a satisfactory
sequence is found or the maximum number of iterations specified by
the user is reached.
Constraints
Sequence constraints
can include fixed
bases at specified positions as well as substrings that are disallowed
from the final sequence. Secondary structure constraints can be given
for multiple user-specified states, as defined by varying concentrations
of the input ligands. For small molecule and protein ligands, the
aptamer sequence, secondary structure, and dissociation constant must
be specified. For each state, secondary structure constraints can
be applied to any part of the input sequence, including any RNA inputs,
and bases can be specified to be unpaired, paired to any other base,
or paired with a specific other base. Secondary structure elements’
positions can be left unspecified, and RiboLogic will optimize its
position as well. To further ensure diversity, for the tests herein,
we enforced two different global arrangements of the aptamer and MS2
hairpin elements—one with the two parts of the aptamer loop
adjacent to each other and one with the MS2 sequence nested within
the aptamer segments.
Sequence Update
Sequences are represented
in a dependency
graph structure as described by Flamm et al.(25) Briefly, each base is a node and each base pair
in the constraints forms an edge between nodes. The graph is maintained
such that nodes connected by an edge are always complementary. Each
time a base is mutated, its entire connected component is mutated
accordingly to ensure that all nodes connected to the selected base
maintains complementarity. In addition, sequence constraints are incorporated
into this graph, disallowing mutations that would force a constrained
base to change. In the case of RNA inputs, our method provides the
option to automatically introduce the complement of the input sequence
into the design sequence in order to promote interactions between
strands. This complementary segment can be altered in length, moved,
or mutated as a sequence update step.
Scoring Functions
Two scoring functions are used: a
primary score based on a single minimum free energy secondary structure,
and a base pair probability-based secondary score that is used in
the primary score’s place when the it is the same between two
sequences. On the basis of the predicted minimum free energy structures
in each state, a base pair distance to the target secondary structure
is calculated. The base pair distance is the number of base pairs
that must be broken or formed in order to get from one secondary structure
to the other.[37] If only a substructure
is specified, this can include the breaking of base pairs formed with
nucleotides outside of the subsequence specified. In addition, for
small molecule riboswitches, if the energy of the ligand-bound conformation,
with energetic bonus, is not lower than the ligand-free conformation,
a penalty equal to the ΔG between the two states
is applied to the base pair distance.where ΔG–aptamer is the
free energy of the RNA alone in kcal/mol, [L] is
the concentration of the input ligand, K is the affinity of the input ligand, ΔG+aptamer is the free energy of the RNA constrained
to form the aptamer, R is the gas constant, T is the experimental temperature (37 °C = 310.15 K).
We consider only structures that form the desired aptamer, as opposed
to doing a minimum free energy calculation with an energetic bonus.
This allows the algorithm to guide the sequence toward those that
have a more favorable aptamer-forming conformation, even if it is
not the minimum free energy structure. We used a value of of 133 for FMN and 150 for theophylline
and tryptophan, based on initial K estimates for those input ligands (Figure S4) and experimental [L] = 200 μM,
2 mM, and 2.4 mM (FMN, theophylline, and tryptophan, respectively).However, since the score in eq is not highly sensitive to single mutations, a secondary
base pair probability score is used when the base pair distance is
unchanged between sequence updates. This measure of secondary structure
formation over the full ensemble is defined bywhere s is the index
of the
folding state, i and j are indices
of the base position in the sequence, X is an indicator variable representing whether
base i and j should be paired in
state s, and p is the probability of base i and j forming in state s according to the partition
function calculation. The value of the indicator variable is 1 if
the base pair should be formed, −1 if it should not be formed,
and 0 if it is unconstrained.Folding of each sequence can be
modeled using either ViennaRNA[38] or NUPACK.[39] NUPACK
3.0.5.[39] was used for design involving
more than one RNA, in order to properly model multistrand RNA folding,
while ViennaRNA 2.1.9[38] was used for designs
involving small molecule aptamers.The score used for the Metropolis–Hastings
criterion in eq was:By default, the sequence search terminates
once the base pair distance reaches 0 or the number of steps reaches
10 000 steps. The software also provides the option to continue
optimizing the sequence after the base pair distance reaches 0. Sequences
were not filtered in any way before proceeding to experimental characterization.
Computation and Code Availability
All computation was
performed on Intel Xeon Processors E5–2650. The code is available
at https://github.com/wuami/RiboLogic.Average computation time for the design of a ligand-induced
riboswitch varied widely, both across runs and depending on the design
problem (Figure S6). Every 1000 iterations
took about 2 min on one core.
High-Throughput Array Experiments
The experimental
methods have been described in detail previously.[20,22] Briefly, DNA templates for designs were synthesized (CustomArray,
Bothell, WA) and sequenced on Illumina MiSeq instruments, and RNA
was transcribed directly on the sequencing chip in a repurposed Illumina
Genome Analyzer II instrument. Fluorescently labeled MS2 protein was
introduced at concentrations from 1.5 nM to 3 μM at room temperature.
Incubation times varied from 0.8 to 1.5 h at the lowest concentrations
to 10–20 min at the highest concentrations. Fluorescence images
were collected and quantified to generate binding curves in buffer
of 100 mM Tris-HCl pH 7.5, 80 mM KCl, 4 mM MgCl2, 0.1 mg/mL
BSA, 1 mM DTT, 10 μg/mL yeast tRNA, 0.012% Tween20. These curves
were measured in the absence and presence of the ligand of interest,
with concentrations of 200 μM FMN, 2 mM theophylline, 4 mM tryptophan,
and 100 nM miR-208a. These conditions were selected based on the K of each ligand. Each design
was measured over an average of about 100 individual clusters on the
flow cell. Median fit K values over all clusters for each design were used to compute the
activation ratio. Designs were prepared and analyzed as part of the
Eterna massive open laboratory experiments (rounds R95, R101, and
R107).[42]Designs for which K measurements were made over
fewer than 10 clusters were excluded from our analysis to avoid poor
quality measurements. For diversity analysis, Levenshtein distance
was computed between each pair of sequences to obtain a distance matrix.
Complete-linkage hierarchical clustering was performed to obtain a
dendrogram with each design as a leaf (hclust in R). For statistical
analysis, two-sided Wilcoxon rank sum test was used to determine if
activation ratios between design types were significantly different.
Predicted K’s
were computed as described by Wayment-Steele et al.(10) Calculations were performed in R,[40] with example scripts available at https://github.com/wuami/RiboLogic. The full dataset is available as Supplementary Data.
Chemical Mapping
Experiments
One-dimensional chemical
mapping measurements were performed as described previously.[41] 1M7 was used for FMN and tryptophan aptamers,
while DMS was used for the theophylline aptamer.
Authors: Joseph N Zadeh; Conrad D Steenberg; Justin S Bois; Brian R Wolfe; Marshall B Pierce; Asif R Khan; Robert M Dirks; Niles A Pierce Journal: J Comput Chem Date: 2011-01-15 Impact factor: 3.376
Authors: Hannah K Wayment-Steele; Wipapat Kladwang; Alexandra I Strom; Jeehyung Lee; Adrien Treuille; Alex Becka; Rhiju Das Journal: Nat Methods Date: 2022-10-03 Impact factor: 47.990
Authors: Walter Thavarajah; Adam D Silverman; Matthew S Verosloff; Nancy Kelley-Loughnane; Michael C Jewett; Julius B Lucks Journal: ACS Synth Biol Date: 2019-12-20 Impact factor: 5.110
Authors: Tin Hoang Trung Chau; Dung Hoang Anh Mai; Diep Ngoc Pham; Hoa Thi Quynh Le; Eun Yeol Lee Journal: Int J Mol Sci Date: 2020-04-30 Impact factor: 5.923
Authors: Doo Nam Kim; Bernhard C Thiel; Tyler Mrozowich; Scott P Hennelly; Ivo L Hofacker; Trushar R Patel; Karissa Y Sanbonmatsu Journal: Nat Commun Date: 2020-01-09 Impact factor: 14.919