Aaron W Feldman1, Floyd E Romesberg1. 1. Department of Chemistry, The Scripps Research Institute , La Jolla, California 92037, United States.
Abstract
The information available to any organism is encoded in a four nucleotide, two base pair genetic code. Since its earliest days, the field of synthetic biology has endeavored to impart organisms with novel attributes and functions, and perhaps the most fundamental approach to this goal is the creation of a fifth and sixth nucleotide that pair to form a third, unnatural base pair (UBP) and thus allow for the storage and retrieval of increased information. Achieving this goal, by definition, requires synthetic chemistry to create unnatural nucleotides and a medicinal chemistry-like approach to guide their optimization. With this perspective, almost 20 years ago we began designing unnatural nucleotides with the ultimate goal of developing UBPs that function in vivo, and thus serve as the foundation of semi-synthetic organisms (SSOs) capable of storing and retrieving increased information. From the beginning, our efforts focused on the development of nucleotides that bear predominantly hydrophobic nucleobases and thus that pair not based on the complementary hydrogen bonds that are so prominent among the natural base pairs but rather via hydrophobic and packing interactions. It was envisioned that such a pairing mechanism would provide a basal level of selectivity against pairing with natural nucleotides, which we expected would be the greatest challenge; however, this choice mandated starting with analogs that have little or no homology to their natural counterparts and that, perhaps not surprisingly, performed poorly. Progress toward their optimization was driven by the construction of structure-activity relationships, initially from in vitro steady-state kinetic analysis, then later from pre-steady-state and PCR-based assays, and ultimately from performance in vivo, with the results augmented three times with screens that explored combinations of the unnatural nucleotides that were too numerous to fully characterize individually. The structure-activity relationship data identified multiple features required by the UBP, and perhaps most prominent among them was a substituent ortho to the glycosidic linkage that is capable of both hydrophobic packing and hydrogen bonding, and nucleobases that stably stack with flanking natural nucleobases in lieu of the potentially more stabilizing stacking interactions afforded by cross strand intercalation. Most importantly, after the examination of hundreds of unnatural nucleotides and thousands of candidate UBPs, the efforts ultimately resulted in the identification of a family of UBPs that are well recognized by DNA polymerases when incorporated into DNA and that have been used to create SSOs that store and retrieve increased information. In addition to achieving a longstanding goal of synthetic biology, the results have important implications for our understanding of both the molecules and forces that can underlie biological processes, so long considered the purview of molecules benefiting from eons of evolution, and highlight the promise of applying the approaches and methodologies of synthetic and medical chemistry in the pursuit of synthetic biology.
The information available to any organism is encoded in a four nucleotide, two base pair genetic code. Since its earliest days, the field of synthetic biology has endeavored to impart organisms with novel attributes and functions, and perhaps the most fundamental approach to this goal is the creation of a fifth and sixth nucleotide that pair to form a third, unnatural base pair (UBP) and thus allow for the storage and retrieval of increased information. Achieving this goal, by definition, requires synthetic chemistry to create unnatural nucleotides and a medicinal chemistry-like approach to guide their optimization. With this perspective, almost 20 years ago we began designing unnatural nucleotides with the ultimate goal of developing UBPs that function in vivo, and thus serve as the foundation of semi-synthetic organisms (SSOs) capable of storing and retrieving increased information. From the beginning, our efforts focused on the development of nucleotides that bear predominantly hydrophobic nucleobases and thus that pair not based on the complementary hydrogen bonds that are so prominent among the natural base pairs but rather via hydrophobic and packing interactions. It was envisioned that such a pairing mechanism would provide a basal level of selectivity against pairing with natural nucleotides, which we expected would be the greatest challenge; however, this choice mandated starting with analogs that have little or no homology to their natural counterparts and that, perhaps not surprisingly, performed poorly. Progress toward their optimization was driven by the construction of structure-activity relationships, initially from in vitro steady-state kinetic analysis, then later from pre-steady-state and PCR-based assays, and ultimately from performance in vivo, with the results augmented three times with screens that explored combinations of the unnatural nucleotides that were too numerous to fully characterize individually. The structure-activity relationship data identified multiple features required by the UBP, and perhaps most prominent among them was a substituent ortho to the glycosidic linkage that is capable of both hydrophobic packing and hydrogen bonding, and nucleobases that stably stack with flanking natural nucleobases in lieu of the potentially more stabilizing stacking interactions afforded by cross strand intercalation. Most importantly, after the examination of hundreds of unnatural nucleotides and thousands of candidate UBPs, the efforts ultimately resulted in the identification of a family of UBPs that are well recognized by DNA polymerases when incorporated into DNA and that have been used to create SSOs that store and retrieve increased information. In addition to achieving a longstanding goal of synthetic biology, the results have important implications for our understanding of both the molecules and forces that can underlie biological processes, so long considered the purview of molecules benefiting from eons of evolution, and highlight the promise of applying the approaches and methodologies of synthetic and medical chemistry in the pursuit of synthetic biology.
The field of synthetic biology was first
defined in 1911 by Stéphane
Leduc[1] with the goal of creating new biological
forms and functions. The modern field is largely focused on using
the engineering-like approach of design, test, and standardize to
optimize naturally derived “parts”, most commonly novel
DNA elements. However, the most fundamental approach to create new
forms and functions is to develop unnatural base pairs (UBPs) that
expand the genetic alphabet and, thus, increase the amount of information
that may be stored in an organism’s DNA. The effort to expand
the genetic alphabet was first championed by Steven Benner and focused
on unnatural nucleotides bearing hydrogen-bonding (H-bonding) patterns
that are orthogonal to those employed by the natural base pairs (Figure ).[2] However, it was unclear if H-bonding was the only force
suitable for controlling base pairing. Indeed, the Kool laboratory
reported the remarkable observation that a DNA polymerase could selectively
insert the difluorotoluene analog of dTTP opposite dA in the template.[3] We and the Hirao laboratory thus initiated efforts
to use hydrophobic and packing forces to control UBP formation. Hirao
has focused on derivatizing natural purine and pyrimidine scaffolds
to create “shape complementary” UBPs (Figure ),[4] while we focused on nucleotides bearing nucleobase analogs with
little to no homology to their natural counterparts.
Figure 1
dNaM-dTPT3, dZ-dP, and dDs-dPx (R = H or −CH(OH)–CH2OH)
UBPs.
dNaM-dTPT3, dZ-dP, and dDs-dPx (R = H or −CH(OH)–CH2OH)
UBPs.Although our efforts were consistent
with the tenets of modern
synthetic biology, we used synthetic chemistry to generate the required
parts, which is in many ways more consistent with Leduc’s original
vision.[1] Furthermore, we optimized the
parts using a medicinal chemistry-like approach, inspired by the perspective
that any effort endeavoring to develop foreign, man-made parts that
function within a cell will need to optimize solubility, cellular
uptake, stability, off-target activity, toxicity, dose, and dosing
regimen. The field of medicinal chemistry approaches the same problems
through the construction of structure–activity relationships
(SARs) that allow for empirical optimization in the absence of a complete
understanding of the process being optimized, a strategy that we adapted
for UBP development.Here, we recount our efforts to develop
UBPs that ultimately culminated
in the discovery of the dNaM–dTPT3 UBP (Figure ), as
well as a family of related UBPs, all of which have now been used
to create semi-synthetic organisms (SSOs) that store[5,6] and retrieve[7] increased information.
This places us at the doorstep of realizing Leduc’s vision
of creating organisms with novel forms and functions.
Parts Optimization:
First Generation UBP Candidates
The synthetic biology parts
required for the expansion of the genetic
alphabet are the unnatural nucleotides that selectively pair to form
a UBP, and their optimization, at least initially, was measured by
both duplex stability and DNA polymerase recognition. In most cases,
analogs were synthesized as triphosphates (referred to as dXTP, where X is a nucleoside analog), as well as phosphoramidites
for incorporation into DNA via solid phase synthesis. While characterization
of the stability of duplex DNA containing the UBPs revealed many interesting
trends, such as the effects of solvation,[8] stability proved uncorrelated with polymerase recognition, as is
also the case with natural base pairs,[9] and polymerase recognition quickly emerged as our primary focus.
The majority of our early efforts employed the exonuclease-deficient
Klenow fragment of E. coli DNA polymerase I (Kf).
We focused specifically on two steps, the insertion of an unnatural
triphosphate opposite its cognate analog in a template (a step that
we also refer to as UBP synthesis or unnatural nucleotide incorporation),
and the continued primer extension by insertion of the next correct
triphosphate, in each case characterizing efficiency (second-order
rate constant, kcat/KM) and fidelity ((kcat/KM)correct/(kcat/KM)incorrect). Steady-state
conditions were employed, which allowed rapid SAR construction but
only provided information about the rate-limiting step of insertion
or extension, both of which are actually a complex series of reactions
(e.g., triphosphate binding, conformational changes, phosphoryl transfer,
product release). During early development, this was not a problem
as the rate-limiting step was invariably phosphoryl transfer.Our search began with the simple benzene and naphthalenenucleobase
analogs (Figure A).[10,11] We found that dDMTP is a poor polymerase substrate,
as incorporation opposite dDM or dTM in
the template was virtually undetectable. In contrast, dTMTP was more efficiently incorporated opposite dDM (kcat/KM = 1.4 ×
106 M–1 min–1) and
dTM (kcat/KM = 2.2 × 106 M–1 min–1), only 20–30-fold less than the efficiency
with which a natural base pair is synthesized in the same sequence
context. However, both the dTM–dDM heteropair and dTM–dTM self-pair
are limited by poor extension and mispairing with dA, likely because
dA is the most hydrophobic of the natural nucleotides. We also found
that d2Np efficiently directs the insertion of d2NpTP (kcat/KM = 2.8 × 106 M–1 min–1), but misincorporation of dATP was again problematic
(kcat/KM =
1.1 × 105 M–1 min–1). Addition of a methyl substituent to the position ortho to the
glycosidic bond (hereafter referred to simply as the ortho position)
yielded d2MN, which generally increased the rate of incorporation
of the triphosphate against hydrophobic analogs in the template. However,
d2MNTP is again efficiently inserted opposite dA. Moving
the single methyl substituent from the ortho to the meta position
(d3MN) reduces insertion opposite dA but also reduces
the efficiency of UBP synthesis. The additional methyl substituent
of dDMN resulted in efficient pairing with both dTM and d2MN, but it also increased the rate of
dATP insertion when in the template. Thus, while mispairing generally
remained problematic, at this point it was clear that several of these
UBPs showed promising synthesis rates. However, none of them supported
continued primer extension at a detectable level (kcat/KM < 103 M–1 min–1).
Figure 2
(A) First generation
methylated benzene and naphthalene analogs.
(B) Second generation methylated benzene analogs. Sugar and phosphate
groups omitted for clarity.
(A) First generation
methylated benzene and naphthalene analogs.
(B) Second generation methylated benzene analogs. Sugar and phosphate
groups omitted for clarity.The isocarbostyril scaffold also received early development
attention
(Figure A,B).[10,12−14] When in the template, the simplest of the series,
dICS, directs the insertion of triphosphates bearing
simple substituted benzene nucleobases, such as dDMTP,
dTMTP, or dDMNTP, with only modest efficiency,
but d2MNTP is inserted significantly more efficiently.
Addition of a 1-propynyl group to the 7-position of dICS, affording dPICS, results in a triphosphate that is
generally inserted more efficiently opposite other unnatural analogs.
The methyl group of dPIM resulted in efficient but indiscriminate
insertion of the triphosphate, while addition of the methyl group
of dMICS resulted in the indiscriminate templating of
natural triphosphates. Although the methyl group of d5MICS had little systematic effect on UBP synthesis, it did dramatically
reduce self-pairing. Despite variable rates of UBP synthesis, the
rates of extension remained problematic. While 6-aza substitution
(dNICS) results in 2-fold reduced self-pair synthesis,
it also results in a 2-fold increased rate of extension. Remarkably,
replacing the oxygen of dICS with sulfur (dSICS) results in an 80-fold increase in the rate of self-pair synthesis
and a 4-fold increase in the rate of extension. The combination of
both modifications in dSNICS retained the increased rate
of synthesis but further increased the rate of extension 12-fold (kcat/KM = 2.2 ×
104 M–1 min–1). These
results provided early hints as to the important but complicated role
of the ortho substituent.
Figure 3
(A) Isocarbostyril analogs. (B) Heteroatom derivatized
isocarbostyril
analogs. (C) Furan and thiophene fused pyridone and thiopyridone analogs.
Sugar and phosphate groups omitted for clarity.
(A) Isocarbostyril analogs. (B) Heteroatom derivatized
isocarbostyril
analogs. (C) Furan and thiophene fused pyridone and thiopyridone analogs.
Sugar and phosphate groups omitted for clarity.A series of pyridones and thiopyridones fused to furan and
thiophene
rings in both meta- and para-linked orientations
were investigated (Figure C).[15] UBP synthesis with these
analogs was generally inefficient, with a para-linked furan appearing
to be particularly detrimental. The thiopyridone triphosphate analogs
are generally inserted more efficiently, but the effect of sulfur
was less pronounced than with the isocarbostyril scaffold, and none
of these analogs emerged as particularly promising.Nucleoside
analogs bearing azaindole scaffolds (Figure ) were found to efficiently
pair with various unnatural nucleotides in the template, and with
reasonable selectively against the natural triphosphates when in the
template, with the exception of dP7AI, which efficiently
templates the insertion of dCTP.[16] However,
while UBP synthesis again proved amenable to optimization, the resulting
UBPs were generally refractory to extension, which by now had emerged
as the greatest challenge to optimization.
Figure 4
Azaindole analogs. Sugar
and phosphate groups omitted for clarity.
Azaindole analogs. Sugar
and phosphate groups omitted for clarity.
A Structural Interlude
The dominant SAR that emerged from
these first generation UBP candidates
was that while the hydrophobicity and aromatic surface area of aromatic
nucleobase derivatives appeared promising for the optimization of
UBP synthesis, the resulting UBPs generally proved refractory to continued
primer extension. The NMR structure of a duplex containing the dPICS self-pair (Figure A) was solved to better understand this SAR.[17] These studies revealed a generally unperturbed duplex structure,
with the propargyl moieties of dPICS disposed as expected
in the major groove; however, considerable distortion was observed
at the site of the UBP itself. Rather than adopting the canonical
edge-on Watson–Crick geometry, the dPICS nucleobases
interact via cross-strand intercalation. This intercalation appears
to be driven by favorable stacking of the large aromatic surfaces
of each dPICS nucleobase and has been observed with other
nucleotides bearing nucleobase analogs with extended aromatic surface
area.[18] We hypothesized that the same mode
of pairing occurs at the primer terminus, which we envisioned would
account for the efficient rates of UBP synthesis, but also the inefficient
rates of continued extension, as deintercalation would be required
to appropriately position the primer terminus.
Figure 5
(A) Duplex structure
of DNA containing the dPICS–dPICS UBP.[16] (B) Duplex structure
of DNA containing the dNaM–d5SICS UBP.[36] (C) Structure of d5SICSTP paired opposite dNaM in the polymerase active site.[37] In chemical structures, sugar and phosphate
groups omitted for clarity.
(A) Duplex structure
of DNA containing the dPICS–dPICSUBP.[16] (B) Duplex structure
of DNA containing the dNaM–d5SICS UBP.[36] (C) Structure of d5SICSTP paired opposite dNaM in the polymerase active site.[37] In chemical structures, sugar and phosphate
groups omitted for clarity.
Continued Parts Optimization: Second Generation UBP Candidates
Based on the SAR generated with the first generation analogs, we
began testing whether smaller nucleobase analogs could be optimized
for UBP synthesis, with the expectation that they would be less prone
to cross-strand intercalate. These second generation efforts started
with a more complete analysis of the simple benzene scaffold explored
previously (Figure B). The parent analog, dBENTP, is poorly recognized
by Kf.[19] We then explored dMM1, dMM2, and dMM3, but found little improvement.[19] The efficiency of triphosphate insertion was
progressively increased with dDMTP, dDM2TP, dDM3TP, dDM4TP, and dDM5TP, and insertion opposite dA was eliminated with dTMTP. Despite this progress with insertion, extension of primers terminating
with these analogs remained inefficient.We next explored heteroatom
derivatization of these small scaffolds
with bromo-, fluoro-, and cyano-substituents (Figure A). Bromo- and cyano-substituents tended
to decrease the rate of mispairing with natural nucleotides,[20] but extension rates remained poor. A systematic
analysis of fluoro substitution identified a single meta substituent
(d3FB) as particularly interesting, with the resulting
self-pair both synthesized and extended with at least moderate efficiency,[21] but further optimization efforts proved unproductive.
Figure 6
(A) Bromo-,
cyano-, and fluoro-substituted benzene analogs. (B)
Methoxy-substituted benzene analogs. Sugar and phosphate groups omitted
for clarity.
(A) Bromo-,
cyano-, and fluoro-substituted benzene analogs. (B)
Methoxy-substituted benzene analogs. Sugar and phosphate groups omitted
for clarity.The situation improved
with a family of methoxy-derivatized analogs
(Figure B), and those
possessing an ortho-methoxy substituent were the first to provide
UBPs that were consistently extended with at least reasonable efficiency.[22] In fact, with dMMO2 paired against
dTM in the template, the resulting primer was extended
with an efficiency that is only 36-fold lower than that of a natural
base pair in the same sequence context. Mutation of the polymerase
indicated that the increased extension resulted from the ability of
the ortho-methoxy group to accept an H-bond, and similar substituents
within the natural nucleobases are known to play a similar role.[23−25] Consistent with the need for an ortho H-bond acceptor at the primer
terminus, dTM paired opposite dMMO2 in the
template is only extended poorly (kcat/KM = 6.3 × 103 M–1 min–1). Although the d3FB self-pair was an exception, the SAR strongly suggested that an ortho
H-bond acceptor was essential, and its inclusion emerged as a central
design theme.A variety of heterocyclic N-
and C-nucleotides, which can also position an H-bond
acceptor at the position
ortho to the glycosidic linkage, were next examined[26−28] (Figures and 8). The pyridone analog triphosphates were inserted with only marginal
efficiency, but the resulting UBPs were extended with greater efficiency,
consistent with the proposed role of the ortho H-bond acceptor. Conversion
of dPYR to the corresponding thiopyridone (dSPYR) resulted in UBPs that were still reasonably well extended but synthesized
less efficiently.[29] No pyridine analogs
were efficiently inserted as triphosphates, but d2Py was
better extended when at the primer terminus than was d3Py or d4Py, again consistent with the importance of an
H-bond acceptor in the developing minor groove.[28] Problematically, however, the pyridine analogs paired more
efficiently with dATP than any unnatural triphosphates when in the
template, and no improvements were found with various alkyl or heteroatom
substituents or with an increased aromatic surface (dQL).[30]
Figure 7
Derivatized monocyclic pyridone analogs.
Sugar and phosphate groups
omitted for clarity.
Figure 8
Pyridine and substituted pyridine analogs. Sugar and phosphate
groups omitted for clarity.
Derivatized monocyclic pyridone analogs.
Sugar and phosphate groups
omitted for clarity.Pyridine and substituted pyridine analogs. Sugar and phosphate
groups omitted for clarity.The most pronounced SAR generated from these second generation
candidates was that efficient UBP synthesis requires a hydrophobic
ortho substituent, while efficient extension requires the same substituent
to be more hydrophilic, at least hydrophilic enough to act as an H-bond
acceptor. This apparent physicochemical dichotomy challenged our confidence
that both UBP synthesis and extension could be simultaneously optimized.
Parts
Optimization: Third Generation UBP Candidates
With no clear
strategy to satisfy the conflicting demands of UBP
synthesis and extension, we pivoted to a screen-based strategy. Two
complementary screens were performed, one gel-based screen that focused
on UBP incorporation and extension under steady state conditions,
and one fluorescence-based plate screen that focused on the efficiency
and fidelity of full length synthesis.[29] Remarkably, from 3600 candidate UBPs, both screens identified dMMO2–dSICS as the most promising. This
UBP appeared to satisfy the physicochemical contradiction, because
the sulfur of the dSICSthioamide is relatively hydrophobic
but still able to accept an H-bond, while simple bond rotation allows
the methoxy group of dMMO2 to direct either a hydrophobic
methyl group or the oxygen lone pairs into the developing minor groove.The identification of dMMO2–dSICS reinvigorated our design efforts. Steady-state kinetics revealed
that while dMMO2 and dSICS were both incorporated
and extended relatively efficiently (kcat/KM = 3.4 × 105 M–1 min–1 and 1.7 × 106 M–1 min–1, respectively, with
dSICS in the template, and 1.4 × 106 M–1 min–1 and 1.1 × 106 M–1 min–1, respectively, with
dMMO2 in the template), replication fidelity was limited
by dSICS self-pairing. Based on previous SAR, we explored
the addition of a methyl group to the distal aromatic ring of dSICS to introduce steric interactions to disfavor self-pairing.
These efforts identified d5SICS, and thus dMMO2–d5SICS emerged as our lead UBP.We next
turned to the optimization of dMMO2TP insertion
opposite d5SICS, which was now the rate-limiting step
of replication (Figure ). Based on previous results that a meta-fluoro
substituent or an expansion of aromatic surface area increased the
efficiency of triphosphate incorporation, we explored the analogs
d5FM and dNaM (Figure B).[31] Gratifyingly,
opposite d5SICS, both d5FMTP and dNaMTP are more efficiently inserted than dMMO2TP. Although the former is limited by the synthesis and extension
of the dA–d5FM mispair, the efficiency
and fidelity of replicating DNA containing the dNaM–d5SICS UBP is excellent (kcat/KM = 5.0 × 106 M–1 min–1 and 3.7 × 107 M–1 min–1, for insertion of dNaMTP opposite
d5SICS and d5SICSTP opposite dNaM, respectively, and 1.2 × 106 M–1 min–1 and 2.7 × 106 M–1 min–1, respectively, for each subsequent extension).
In fact, dNaM–d5SICS was the first
of our UBP candidates to be amplified in a standard PCR reaction with
high efficiency and fidelity.[32] To determine
whether dNaM is optimal for pairing with d5SICS, we further explored derivatization of the dMMO2 scaffold
with Cl, Br, I, Me, and Pr meta-substituents, but
none improved replication.[33] We next examined
28 analogs with different para-substituents, and the SAR was augmented
with PCR amplification assays
using Taq, OneTaq, and KOD polymerases (Figure ).[34,35] Several of the most
promising analogs were further derivatized with a meta-fluoro or methoxy
substituent. These efforts identified dEMO, dFIMO, and dFEMO as better partners for d5SICS than dMMO2 (Figure B); however, none was more optimal in these in vitro assays than dNaM.
Figure 9
(A) Para- and meta-substituted
dMMO2 analogs. (B)
Optimized dMMO2 analogs. Sugar and phosphate groups omitted
for clarity.
(A) Para- and meta-substituted
dMMO2 analogs. (B)
Optimized dMMO2 analogs. Sugar and phosphate groups omitted
for clarity.
Parts Standardization
The successful development of a synthetic biology “part”
includes not only its optimization for function, but also its standardization
for function in different contexts, which here corresponds to recognition
of the unnatural triphosphates by different DNA polymerases and within
different local sequence contexts. To explore the extent of standardization
of dNaMTP and d5SICSTP, we first examined
the fidelity with which DNA containing the corresponding UBP was PCR-amplified
in a variety of different sequence contexts using DeepVent, Taq, or Phusion polymerase, which demonstrated that retention
of the UBP was near or in excess of 99% per doubling.[36] DNA containing a UBP within a 40-nt stretch of randomized
natural nucleotides was PCR amplified 1024-fold with OneTaq, and the products were analyzed by deep sequencing.
A slight enrichment of the sequence 5′-GCNaM was
observed, but this bias is no greater than that observed with natural
sequences, demonstrating a sufficient level of standardization of
the dNaM–d5SICS UBP for in
vivo use.
A Second Structural Interlude
Having
identified a family of promising UBPs, we again returned
to a structural characterization. Collaborating with the Dwyer group,
we first solved the structure of a DNA duplex containing dMMO2–d5SICS. Surprisingly, the UBP still formed a
cross-strand intercalated structure,[37] and
solution-state NOEs revealed that the dNaM–d5SICS UBP did as well (Figure B),[38] although significantly
less so than the dPICS self-pair. While our interests
centered around function and not structure, this mode of pairing with
dNaM–d5SICS raised a perplexing question:
because the UBP resembles a natural mispair more than a correct pair,
how is it efficiently recognized by DNA polymerases, which are known
to have evolved to reject triphosphates that form mispairs?To address this question, we collaborated with the Marx group to
solve the structures of the binary complex of KlenTaq DNA polymerase
bound to a primer–template with dNaM in the templating
position and with a primer terminating with a ddC, as well as the
corresponding ternary complex with d5SICSTP bound. The
data revealed that formation of the UBP drives the same large conformational
change of the polymerase caused by the formation of a natural base
pair (Figure )[39,40] and, remarkably, that the conformational change of the polymerase
reciprocally drives a conformational change within the UBP, causing
it to adopt an edge-to-edge paired natural Watson–Crick-like
structure (Figure C). Thus, while a natural base pair is replicated with an induced-fit
mechanism, the UBP is replicated with a similar, but subtly different,
mutually induced-fit mechanism, providing a mechanistic explanation
for its efficient replication. Nonetheless, after synthesis, the nascent
UBP again adopts a cross-strand intercalated structure,[41] explaining the SAR data that identified a requirement
for deintercalation prior to extension. At this point, we surmised
that any further optimization of the UBP would require the careful
optimization of intrastrand packing with neighboring natural nucleotides
over cross-strand intercalation.
Figure 10
(A) Superimposition of the binary complex
of KlenTaq polymerase
bound to DNA with dNaM in the templating position in
the open conformation (yellow) and the corresponding ternary complex
bound to d5SICSTP in the closed conformation (purple).
(B) Superimposition of ternary complex between KlenTaq polymerase,
dNaM template DNA, and d5SICSTP (purple),
or a natural dG template and dCTP (gray). Reproduced from ref (38). Copyright 2012 Nature
Publishing Group.
(A) Superimposition of the binary complex
of KlenTaq polymerase
bound to DNA with dNaM in the templating position in
the open conformation (yellow) and the corresponding ternary complex
bound to d5SICSTP in the closed conformation (purple).
(B) Superimposition of ternary complex between KlenTaq polymerase,
dNaM template DNA, and d5SICSTP (purple),
or a natural dG template and dCTP (gray). Reproduced from ref (38). Copyright 2012 Nature
Publishing Group.
Growing the Family of in Vitro Replicated UBPs
Based on the proposed mechanism of replication, we speculated
that
dNaM–d5SICS might be optimized by
distal ring contraction and heteroatom derivatization of d5SICS, potentially favoring deintercalation, and thus extension, while
preserving synthesis efficiency. Thus, we explored four derivatives
with the distal ring replaced with para- or meta-linked thienyl, methyl
furanyl, or methyl thienyl rings, as well as an additional derivative
to explore fluorination at the meta position (Figure ).[42] Gratifyingly,
we found that dTPT2, dTPT3, and dFTPT3 form UBPs with dNaM that are more efficiently replicated
within DNA than the dNaM-d5SICS UBP (as
demonstrated by a pre-steady-state kinetic assay, as the steady-state
rates were now limited by product dissociation[43]). The most efficiently replicated was dNaM–dTPT3, which thus emerged as our lead UBP.
Figure 11
Distal
ring-contracted d5SICS analogs. Sugar and phosphate
groups omitted for clarity.
Distal
ring-contracted d5SICS analogs. Sugar and phosphate
groups omitted for clarity.At this point, we had explored several analogs of both dMMO2 and d5SICS since the identification of dMMO2–dSICS, but we had never explored
these analogs
as partners with previous generation analogs. Thus, using a PCR-based
screen, we examined the amplification of DNA containing 111 different
unnatural nucleotides (resulting in approximately 6000 candidate UBPs)
drawn from our now expanded set of analogs. While we found that dNaM–dTPT3 is generally the most efficiently
replicated of the UBPs examined, we identified seven additional and
physicochemically distinct UBPs that are replicated significantly
better than dNaM–d5SICS (Figure ). Again drawing
on established tenets of medicinal chemistry, these results are important,
because our long-term goal of using the UBPs in a living SSO brings
with it additional constraints, which may be differently satisfied
by UBPs with differing physicochemical properties.
Figure 12
A family of well replicated
dNaM–dTPT3-like UBPs. Sugar and phosphate
groups omitted for clarity.
A family of well replicated
dNaM–dTPT3-like UBPs. Sugar and phosphate
groups omitted for clarity.
In Vivo Performance and Optimization
In
2014, we demonstrated that when dNaMTP and d5SICSTP are imported into Escherichia coli via transgenic
expression of an algal nucleosidetriphosphate transporter,
they are used by cellular polymerases to replicate DNA containing
the UBP.[5] However, unlike in vitro replication, replication in this SSO showed significant sequence
biases, some of which were still observed with the dNaM–dTPT3 UBP.[6] This
is perhaps not surprising considering that the in vivo replication environment is distinct from the in vitro environment, which had been used to generate the SAR that drove
UBP optimization and standardization. Thus, we screened 135 candidate
UBPs, drawn from 91 unnatural triphosphates selected to cover the
range of analogs that had been explored in vitro,
for those that when added to the media were able to support high level
retention of the corresponding UBP on a plasmid within the SSO.[44] Much of the SAR generated was consistent with
that generated in vitro, but there were several interesting
differences. In particular, the in vivo environment
was somewhat more permissive to the nature of the ortho substituent
that had proven so critical for in vitro replication
(although the best UBPs still retained these H-bond acceptors). Perhaps
more importantly, the in vivo screen identified four
additional UBPs, each formed by pairing a dNaM analog
opposite dTPT3, that are more efficiently replicated
and with less sequence bias than dNaM–dTPT3 (Figure ), the
most promising of which was dCNMO–dTPT3. While this UBP is at present the most promising lead for further
SSO development, it is again the physicochemical diversity offered
by a family of well replicated UBPs that is likely to prove most valuable
as our efforts shift toward achieving in vivo transcription
and translation, which we have only just begun to explore with dNaM–dTPT3.
Figure 13
A family of UBPs optimized
for in vivo expansion
of the genetic alphabet. Sugar and phosphate groups omitted for clarity.
A family of UBPs optimized
for in vivo expansion
of the genetic alphabet. Sugar and phosphate groups omitted for clarity.
Conclusions and the Chemist’s Approach
to Synthetic Biology
We have used synthetic chemistry, coupled
with the methods of medicinal
chemistry, to develop a family of UBPs that function not only in vitro but also in vivo and have used
them to create SSOs that can store more information in their DNA.
The SARs elucidated from the examination of over 150 unnatural nucleotides
have guided development and identified key elements that the unnatural
nucleotides must possess, most clearly, an ortho substituent that
is capable of providing a hydrophobic surface as well as an H-bond
acceptor and a nucleobase surface that favors intrastrand packing
over cross-strand intercalation. While this Account has recounted
our efforts to optimize replication, we have also recently demonstrated
that DNA containing the UBPs may be transcribed into RNA in an SSO
and used during translation at the ribosome to produce proteins with
noncanonical amino acids.[7] This lays the
foundation for the creation of SSOs with forms and functions not available
to their natural counterparts and thereby achieves a long-standing
goal of synthetic biology.At its core, synthetic biology aims
to create parts that function
within living cells, imparting them with novel attributes. While the
tenets of the field were originally implicitly founded on the use
of chemistry to create those parts, its modern incarnation has focused
on the use of parts assembled from natural components or components
intended to mimic their natural counterparts. This would seem justified
by the eons of evolution that optimized the natural components for
functioning in a cell, at least for a similar function. However, most
natural components are recognized by or interact with multiple other
components in every cell, possibly in unknown ways, and thus their
introduction may have unintended consequences. While truly synthetic
parts made by chemists do not benefit from eons of evolution, they
are foreign to cells, possibly even drawing upon forces not used by
their natural counterparts, and thus they may be more orthogonal and
possibly easier to introduce and optimize without perturbation. In
this case, optimization must proceed with less information, because
less is known about how the parts might interact with their biological
targets, and the methodology and lessons of medicinal chemistry, which
conceptually face essentially the same challenges, provide the blueprint
for success. If the combination of synthetic and medicinal-like chemistry
can produce molecules that function alongside those evolved by nature
for the most central of its processes, to store and retrieve information,
then this approach is likely to be capable of discovering molecules
that effectively participate in any biological process, potentially
opening a new vista for chemists in synthetic biology.
Authors: Steven A Benner; Nilesh B Karalkar; Shuichi Hoshika; Roberto Laos; Ryan W Shaw; Mariko Matsuura; Diego Fajardo; Patricia Moussatche Journal: Cold Spring Harb Perspect Biol Date: 2016-11-01 Impact factor: 10.005
Authors: Yoshiyuki Hari; Gil Tae Hwang; Aaron M Leconte; Nicolas Joubert; Michal Hocek; Floyd E Romesberg Journal: Chembiochem Date: 2008-11-24 Impact factor: 3.164
Authors: Vivian T Dien; Matthew Holcomb; Aaron W Feldman; Emil C Fischer; Tammy J Dwyer; Floyd E Romesberg Journal: J Am Chem Soc Date: 2018-11-12 Impact factor: 15.419
Authors: Yu Kawamata; Julien C Vantourout; David P Hickey; Peng Bai; Longrui Chen; Qinglong Hou; Wenhua Qiao; Koushik Barman; Martin A Edwards; Alberto F Garrido-Castro; Justine N deGruyter; Hugh Nakamura; Kyle Knouse; Chuanguang Qin; Khalyd J Clay; Denghui Bao; Chao Li; Jeremy T Starr; Carmen Garcia-Irizarry; Neal Sach; Henry S White; Matthew Neurock; Shelley D Minteer; Phil S Baran Journal: J Am Chem Soc Date: 2019-04-02 Impact factor: 15.419
Authors: Aaron W Feldman; Vivian T Dien; Rebekah J Karadeema; Emil C Fischer; Yanbo You; Brooke A Anderson; Ramanarayanan Krishnamurthy; Jason S Chen; Lingjun Li; Floyd E Romesberg Journal: J Am Chem Soc Date: 2019-06-26 Impact factor: 15.419
Authors: Julian Heinrich; Karolina Bossak-Ahmad; Mie Riisom; Haleh H Haeri; Tasha R Steel; Vinja Hergl; Alexander Langhans; Corinna Schattschneider; Jannis Barrera; Stephen M F Jamieson; Matthias Stein; Dariush Hinderberger; Christian G Hartinger; Wojciech Bal; Nora Kulak Journal: Chemistry Date: 2021-12-04 Impact factor: 5.020