Sequence-defined synthetic oligomers and polymers are promising molecular media for permanently storing digital information. However, the information decoding process relies on degradative sequencing methods such as mass spectrometry, which consumes the information-storing polymers upon decoding. Here, we demonstrate the nondestructive decoding of sequence-defined oligomers of enantiopure α-hydroxy acids, oligo(l-mandelic-co-d-phenyl lactic acid)s (oMPs), and oligo(l-lactic-co-glycolic acid)s (oLGs) by 13C nuclear magnetic resonance spectroscopy. We were able to nondestructively decode a bitmap image (192 bits) encoded using a library of 12 equimolar mixtures of an 8-bit-storing oMP and oLG, synthesized through semiautomated flow chemistry in less than 1% of the reaction time required for the repetition of conventional batch reactions. Our results highlight the potential of bundles of sequence-defined oligomers as efficient media for encoding and decoding large-scale information based on the automation of their synthesis and nondestructive sequencing processes.
Sequence-defined synthetic oligomers and polymers are promising molecular media for permanently storing digital information. However, the information decoding process relies on degradative sequencing methods such as mass spectrometry, which consumes the information-storing polymers upon decoding. Here, we demonstrate the nondestructive decoding of sequence-defined oligomers of enantiopure α-hydroxy acids, oligo(l-mandelic-co-d-phenyl lactic acid)s (oMPs), and oligo(l-lactic-co-glycolic acid)s (oLGs) by 13C nuclear magnetic resonance spectroscopy. We were able to nondestructively decode a bitmap image (192 bits) encoded using a library of 12 equimolar mixtures of an 8-bit-storing oMP and oLG, synthesized through semiautomated flow chemistry in less than 1% of the reaction time required for the repetition of conventional batch reactions. Our results highlight the potential of bundles of sequence-defined oligomers as efficient media for encoding and decoding large-scale information based on the automation of their synthesis and nondestructive sequencing processes.
The storage of information produced by
human activities is essential
for civilization. The explosion in data production in recent decades
demands a corresponding expansion in data storage capacity. However,
conventional technologies based on magnetic, optical, and electronic
media consume a significant amount of physical space and energy for
storing and maintaining information.[1−3] Long-term storage also
requires the periodic refreshment of the data, given the deterioration
of media over time.[4] Large-scale information
storage in sequence-defined
biomacromolecules such as DNA has
evolved from curiosity-driven research to a promising alternative
to existing information storage technologies.[5−8] DNAs can store digital information
in their chemical structures using only a few atoms per bit, the unit
of digital information.[9−11] In addition, the structural integrity of these information-storing
macromolecules can be preserved for an extended period without requiring
additional energy for extensive cooling or the periodic refreshment
of the stored data.[12] Compared to DNA,
sequence-defined polymers are beneficial thanks to the simpler chemical
structures of monomers, ease of synthesis, and enhanced stability
in various conditions.[13,14] Therefore, sequence-defined oligomers
and polymers have attracted recent interest as information storage
media.[15−18]However, realization of these macromolecular media for information
storage requires that several key challenges be overcome. In these
media, information is encoded in the form of sequences of the monomers
constituting the polymer chains through the repetitive coupling of
the individual monomers in a stepwise manner.[19−21] Consequently,
encoding information by chemical synthesis imposes significant cost
and time constraints for large-scale information storage. To overcome
this challenge, parallel synthesis of sequence-defined macromolecules
by automated and continuous processes is necessary to accelerate the
rate of chemical encoding.[22−26]Decoding the information stored in sequence-defined macromolecules
also presents a challenge. Most of the current methods for decoding
the information stored in sequence-defined polymers rely on destructive
techniques such as tandem mass spectrometry, which involves the fragmentation
of the parent molecules.[27−31] Consequently, these methods inevitably consume the information-storing
polymers during each decoding attempt, making large-scale or additional
synthesis processes necessary for replenishing the polymers, especially
when frequent decoding is required. Therefore, nondestructive methods
for sequencing synthetic polymers must be developed for macromolecular
media to become practically suitable for information storage.[32−37] Recently, nanopore sequencing of oligonucleotides has been adopted
for decoding sequence-defined oligomers.[38−40]On the
other hand, NMR spectroscopy, which can detect the structural
differences around the atom of interest, is widely used for the analysis
of microstructures (arrangements of enantiomeric repeating units along
the polymer backbone) of polymers. Especially, 13C NMR
has been used to measure comonomer distributions and tacticity of
stereoregular polymers such as vinyl polymers
and poly(l/d-lactide)s.[41−44] However, the results of 13C NMR spectroscopy only show the cumulative populations of
the relative orientation of the enantiomeric repeating units constituting
the polymer backbone;[45−47] this is especially true for high-molecular-weight
polymers with molecular weight distribution. Previously, Meyer and
co-workers reported the sequencing of the repeating fragments composed
of lactic, glycolic, or caproic acid by 1H and 13C NMR.[48,49] Despite these previous works, full sequencing
of oligoesters up to octamers has not been achieved.Here, we
report the spectroscopic sequencing of sequence-defined
oligoesters composed of enantiopure α-hydroxy acids. A library
of sequence-defined oligomers of the enantiopure α-hydroxy acids,
oligo(l-mandelic-co-d-phenyl lactic
acid)s (oMPs), and oligo(l-lactic-co-glycolic
acid)s (oLGs) was constructed by semiautomated flow chemistry within
less than 1% of the reaction time required to prepare the same set
of oligomers by conventional batch reactions and the accompanying
purification processes. The sequence of each oligoester could be unambiguously
decoded from a single 13C nuclear magnetic resonance (NMR)
spectrum. In addition, we show that a maximum of 32 bits (4 bytes)
of digital information can be stored in an NMR sample containing an
equimolar mixture of oMP and oLG and that this information can be
decoded by a single 13C NMR measurement based on the nonoverlapping
chemical shifts in the sequence-indicating peaks of oMP and oLG. Our
results highlight the potential of bundles of sequence-defined oligomers
as efficient media for encoding and decoding large-scale information
through the automation of their synthesis and nondestructive sequencing
processes.
Results and Discussion
Accelerated Synthesis of Sequence-Defined Octamers of α-Hydroxy
Acids by Flow Chemistry
We employed a cross-convergent approach
to synthesize the sequence-defined oligomers in a step-economical
manner.[50−54] The cross-convergent approach involves the deconstruction of the
target sequence into smaller segments built from building blocks composed
of a minimum number of uniquely identifiable monomers. Therefore,
the sequence-defined building blocks obtained from the permutation
of the monomers covering all of the possible sequences of a minimal
number of repeating units are prerequisites for the cross-convergent
approach (Scheme ).
The permutation in which l-mandelic acid (M) represents 0
and d-phenyl lactic acid (P) represents 1 yielded four dyads
(MM, MP, PM, and PP), which cover all possible sequences of the dimers
of M and P having protective groups for the hydroxyl and carboxyl
end groups. Cross convergence of these dyads would produce 16 tetramers
of M and P (tetrads) covering all of the possible sequences during
the divergent stage of the synthesis, such that the number of possible
products is maximized.
Scheme 1
Synthesis of Sequence-Defined
Oligoesters
The cross-convergent synthesis of the sequence-defined
oligomers
and polymers relies on the repetition of a set of chemical reactions,
wherein the number of required synthetic steps increases in proportion
to the target molecular weight or the number of sequence-defined products.
To encode information in oligoesters at a rate higher than that of
conventional batch processes, we used the semiautomated method to
synthesize all the possible tetrads in a continuous flow process[26] (Figure ). The controlled feeding of the desired dyads to the corresponding
deprotection reactor line, one for the desilylation of the tert-butyldimethylsilyl (TBDMS) group with trifluoroborane
etherate (BF3·Et2O) and the other for the
allyl transfer reaction to morpholine catalyzed by tetrakis(triphenylphosphine)palladium(0)
(Pd(PPh3)4), was achieved by using a set of
the computer-controlled six-way valve systems. The hydroxyl product
resulting from the desilylation process was purified with water while
the carboxyl product from the deallylation process was purified with
a 1 M HCl aqueous solution in an in-line extractor. Finally, the two
deprotected precursors were converged for esterification while injecting
a coupling agent, 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide hydrochloride
(EDC) and 4-(dimethylamino) pyridinium 4-toluenesulfonate (DPTS).
Figure 1
Flow chemistry
for encoding 8-bit information in sequence-defined
oMPs. A continuous flow synthesis of four dyads yielded 16-tetrad
library. Subsequently, a sequence-defined octameric oMP, MPPPPMPM,
could be obtained by cross convergence of MPPP and PMPM using a semiautomated
flow system.
Flow chemistry
for encoding 8-bit information in sequence-defined
oMPs. A continuous flow synthesis of four dyads yielded 16-tetrad
library. Subsequently, a sequence-defined octameric oMP, MPPPPMPM,
could be obtained by cross convergence of MPPP and PMPM using a semiautomated
flow system.However, the coupling yield of this fully continuous
setup was
lower (35–40% after purification) than that of the batch reaction
(typically >90%) to form an identical tetrad; this was due to the
residual byproducts produced during the deprotection steps, such as
allyl morpholinium and PPh3, which remained in the reaction
mixture after the allyl transfer reaction. To counter the detrimental
effects of these residual byproducts, we included an offline purification
step for their removal using an automated instrument for silica column
chromatography. This step could be completed within 15 min. The reinjection
of the purified dyads into the flow reactor for esterification improved
the coupling yield to 90% or greater, which was comparable to that
of the batch reaction. The repetition of this semiautomated process
allowed the synthesis of a 16-tetrad library to be completed on a
gram scale within 24 h.The resulting tetrads of M and P were
subsequently subjected to
cross convergence to form an octameric oMP having the desired sequence
representing an 8-bit binary code. The target tetrads encoding 4-bit
fragments of information (200 mg) were injected into the flow reactor.
This was followed by offline purification via column chromatography.
This flow process produced an octameric oMP encoding the target 8-bit
information within 1 h; this period included the offline purification
step. The purified oMPs, acquired with similar yields of tetrads,
were fully characterized by 1H and 13C NMR spectroscopy
techniques and matrix-assisted laser desorption and ionization-time-of-flight
(MALDI-TOF) mass spectrometry, which confirmed their purity.Based on the nonoverlapping of the peaks of l-lactic acid
(L) and glycolic acid (G) units with those of the M and P units in 13C NMR spectroscopy, we also synthesized a sequence-defined
oligoester of L and G as an 8-bit-storing molecular medium using the
flow chemistry setup described above in the same condition. The injection
of four dyads of L and G having the same protective groups as those
of the dyads of M and P into the flow chemistry setup yielded 16 tetrads
on the 0.5 g scale in 24 h. The purified tetrads, obtained in yields
of 88–92%, were subsequently converged to the targeted octameric
oLGs using the flow reactor employed for the synthesis of the oMPs.
The overall yields of oLGs from dyads were above 75%.
Nondestructive Sequencing of oMPs and oLGs by NMR Spectroscopy
The use of enantiomeric α-hydroxy acids as monomers for the
cross convergence to oMPs and oLGs renders the resulting oligoesters
to exhibit an absolutely defined stereochemical configuration. This
simplified the analysis of the NMR signals, as it removed the complexities
arising from the splitting of the peaks caused by the uncertainty
of the chemistry and stereochemical configuration of the neighboring
monomers.We first examined the 13C NMR spectra of
the tetrads of M and P to obtain information for decoding the sequence
of the constituting monomers of the oMP in relation to the neighboring
monomer units. We determined that the peaks corresponding to the ipso-carbons of the aromatic substituents of M and P could
be used for sequencing, as these peaks appeared as singlets at the
specific chemical shifts corresponding to the relative position of
the TBDMS protective group (Si-terminus) with respect to that of the
carboxyl terminus having an allyl protective group (All-terminus).
The positions of the monomers were numbered from 1 to 4 in the direction
from the Si-terminus to the All-terminus. The peaks corresponding
to the ipso-carbons of the four M units of the homotetrad
MMMM from the Si-terminus to the All-terminus appeared in the order
M(1), M(2), M(4), and M(3) at the chemical shifts of 138.64, 133.16,
133.16, and 132.76 parts per million (ppm), respectively. Similarly,
the 13C NMR spectrum of the tetrad PPPP showed a set of ipso-carbon peaks of the P repeating units in the order
P(1), P(2), P(4), and P(3) at 137.76, 136.13, 135.58, and 135.28 ppm,
respectively.In contrast to the NMR spectra of the homotetrads,
MMMM and PPPP,
those of the heterotetrads indicated that the ipso-carbon of the phenyl group of the tetrads experiences an electronic
shielding/deshielding effect arising from the presence of aromatic
substituents in the vicinity. The ipso-carbons of
the M units were deshielded when they were neighbors with P units
but shielded when they were neighbors with M units. Similar trends
were observed in the case of the 13C NMR peaks of the ipso-carbons of the P units, which were shifted downfield
when the repeating units were neighbors with P units and upfield when
the repeating units were neighbors with M units. For example, the 13C NMR spectrum of MMPM shows the downfield shift of the ipso-carbon peaks of M(2) and M(4), in contrast to the spectrum
of MMMM owing to the presence of P(3) (Figure A). In the case of tetrad PPMP, the ipso-carbon peaks corresponding to P(2) and P(4) were upfield-shifted,
in contrast to the spectrum of PPPP, because of the presence of M(3)
(Figure B). These
slight variances in the chemical shifts of the peaks of the repeating
units of the oligoesters arising from the chemical structures of the
neighboring units were used to estimate the sequence of the repeating
units (Figure S1).
Figure 2
13C NMR spectra
of homotetrads and heterotetrads of
M and P units. (A) 13C NMR spectra of MMMM and MMPM. The
overlapped ipso-carbons of M units at positions 2
and 4 (133.16 ppm) were deshielded (133.55 and 133.30 ppm, respectively)
when neighboring with the P unit. (B) 13C NMR spectra of
PPPP and PPMP. The ipso-carbons of P units at positions
2 and 4 (136.13 and 135.58 ppm, respectively) were shielded (135.67
and 135.39 ppm, respectively) when neighboring with the M unit.
13C NMR spectra
of homotetrads and heterotetrads of
M and P units. (A) 13C NMR spectra of MMMM and MMPM. The
overlapped ipso-carbons of M units at positions 2
and 4 (133.16 ppm) were deshielded (133.55 and 133.30 ppm, respectively)
when neighboring with the P unit. (B) 13C NMR spectra of
PPPP and PPMP. The ipso-carbons of P units at positions
2 and 4 (136.13 and 135.58 ppm, respectively) were shielded (135.67
and 135.39 ppm, respectively) when neighboring with the M unit.The sequencing of octameric oMPs by 13C NMR spectroscopy
was expectedly more complicated than the sequencing of tetrads because
of the increased number of possible sequences corresponding to the
eight peaks appearing within a narrow chemical shift range. To identify
a subtle change in the sequences of oMPs, we investigated a series
of 13C NMR spectra of oMPs having one M unit moving from
the 2-position to 7-position. Despite the identical composition of
monomers, these oMPs clearly exhibited the ipso-carbon
peaks corresponding to all repeating units at distinguished chemical
shifts (Figure S13). Encouraged by these
results, we established the rules for decoding the sequence of oMPs
based on the electronic shielding/deshielding effects caused by the
neighboring monomers. These decoding rules are summarized in Figure A.
Figure 3
NMR sequencing of octameric
oMPs. (A) Decoding diagrams of sequence-defined
octameric oMPs. (B) Decoding of octameric oMPs based on the 13C NMR spectrum and deciphered chemical structure of oMP. Red circles
indicate the chemical shift that should be checked for sequencing.
NMR sequencing of octameric
oMPs. (A) Decoding diagrams of sequence-defined
octameric oMPs. (B) Decoding of octameric oMPs based on the 13C NMR spectrum and deciphered chemical structure of oMP. Red circles
indicate the chemical shift that should be checked for sequencing.Based on these rules, we attempted NMR sequencing
of an oMP, Si-MPMMPPPM-All,
as shown in Figure B. The 13C NMR spectrum exhibited four peaks in the 134.5–137.5
ppm range, indicating that the oMP consists of four M and four P units
as per (Rule i). The presence of the peak at 138.6 ppm suggests that
the units at positions 1 and 2 are M and P, respectively. Rule iii
suggests that an M unit is present at position 3, given the absence
of a peak at 136.0 ppm. The three units at the All-terminus are P,
P, and M units, which are present at positions 6, 7, and 8, respectively.
This is based on the peak of ω-carbon of the
allyl protective group at 118.7 ppm. Finally, given that no peak was
present at 133.0 ppm, Rule v suggests that an M unit is at position
4 and a P unit at position 5. The results of the nondestructive sequencing
of the oMP using the above-described rules to decode its 13C spectrum were verified by comparing them with the results of tandem
mass sequencing performed using a MALDI-TOF/TOF mass spectrometer
(Figure S15). Thus, this nondestructive
sequencing method can be applied repeatedly without a loss of the
oligoester, which remains intact in the solution. In addition, the
solution can be stored in a conventional NMR tube for more than a
year.Similarly, the sequence of an oLG could be decoded by
analyzing
the 13C NMR spectrum to determine the chemical shifts of
the peaks corresponding to the α-carbons of the L and G units
composing the sequence-defined octamer. The positions of the monomers
of the oLG were assigned from A to H, starting from the Si-terminus
to the All-terminus. After confirming the decodability of a single
G unit at different positions of oLG, we composed the rules for decoding
the sequence of oLGs based on the peak of the α-carbon. These
decoding rules are summarized in Figure A.
Figure 4
NMR sequencing of octameric oLGs. (A) Decoding
diagrams of sequence-defined
octameric oLGs. (B) Decoding of octameric oLGs based on the 13C NMR spectrum and the deciphered chemical structure of oLG. Red
circles indicate the checkpoint for sequencing.
NMR sequencing of octameric oLGs. (A) Decoding
diagrams of sequence-defined
octameric oLGs. (B) Decoding of octameric oLGs based on the 13C NMR spectrum and the deciphered chemical structure of oLG. Red
circles indicate the checkpoint for sequencing.As a demonstration, Si-LLGGLLLL-All was sequenced
based on the
decoding rules, as shown in Figure B. The number of peaks present in the 68.0–70.0
ppm range suggests that the oLG is composed of six L and two G units
(Rule i). The presence of a peak at 68.0 ppm suggests that the first
unit at the A position is L. Rule iii suggests that an L unit is present
at position B, given the presence of a peak at 68.5 ppm. The absence
of a peak at 68.8 ppm suggests that the third repeating unit is G.
The appearance of a peak at 60.5 ppm is indicative of a fourth repeating
unit of G. Rule v suggests that an L unit is present at position H,
given the absence of a peak in the 61.0–61.2 ppm region. The
absence of a peak in the 69.4–69.5 ppm range suggests that
the seventh repeating unit is L. Finally, the appearance of a peak
at 69.2 ppm is indicative of an L unit at position F. Therefore, the
final sequence of the oLG was determined to be Si-LLGGLLLL-All, which
is in keeping with the proposed structure of oLG. Moreover, tandem
mass sequencing performed using an electrospray ionization (ESI) mass
spectrometer yielded a sequence identical to that obtained from NMR
sequencing of oLG (Figure S16).Our
rules for decoding the sequences of oligoesters based on their 13C NMR spectra indicate that there is no duplication between
the 256 possible 13C NMR spectra of oMPs or the 256 possible
spectra of oLGs. This one-to-one correspondence between oligoesters
and their respective NMR spectra suggests that the sequences of these
oligoesters can be determined by comparing the acquired spectrum with
a library of the spectra of 8-bit-storing oligoesters. We envisage
that these presynthesized oligoesters covering all of the possible
sequences could be used as a pool of 8-bit packets to compose and
store large-size digital information that can be readily decoded by
nondestructive NMR sequencing.
Encoding and Decoding of Digital Information in Oligoesters
We chose two pairs of enantiomeric α-hydroxy acids, l-mandelic acid/d-phenyl lactic acid and l-lactic
acid/glycolic acid, to compose oMPs and oLGs with the aim of confirming
that there is no overlapping of the peaks of oligoesters having markedly
different substituent chemistries. The NMR spectra of these oligoesters
showed that their peaks of interest were present in different chemical
shift regions and did not overlap (Figure S17). Hence, the sequences of an oMP and oLG could be decoded simultaneously
from a single NMR spectrum of a mixture of the two oligoesters. Therefore,
a mixture of an oMP and oLG can store 16 bits, which can be decoded
based on a single 13C NMR measurement of the mixture.To demonstrate this idea, we attempted the accelerated encoding of
information in sequence-defined oligoesters through flow chemistry
to show that the nondestructive sequencing of oligoesters by NMR spectroscopy
can be exploited for the archival storage and retrieval of digital
information (Figure ). A bitmap image (192 bit) was converted into a chemical sequence
distributed into 12 sets of oMPs and oLGs. A set of an oMP and oLG,
each storing 8 bits, was constructed by repeating the semiautomatic
flow processes using tetrads of M and P or L and G as the precursors.
The encoding of all 192 bits of information in the library of 12 oMPs
and oLGs could be completed within 12 h by running two flow processes
in parallel. The encoding time was significantly lower (∼1%)
than the time required to complete the synthesis of the same set of
oligoesters by repeating the conventional batch reaction and purification
processes. Following the decoding rules, the 12 sets of oMPs and oLGs
could be completely decoded (Figure S18).
Figure 5
Encoding and decoding process of enantiopure oligoesters. The converted
bitmap image was encoded into 12 sets of sequence-defined octameric
oMP and oLG by a semiautomated flow process. Nondestructive decoding
of the 04 mixture revealed the absolute sequence of oMP
and oLG, which could be retrieved to digital information (highlighted
to a red rectangles in a bitmap image).
Encoding and decoding process of enantiopure oligoesters. The converted
bitmap image was encoded into 12 sets of sequence-defined octameric
oMP and oLG by a semiautomated flow process. Nondestructive decoding
of the 04 mixture revealed the absolute sequence of oMP
and oLG, which could be retrieved to digital information (highlighted
to a red rectangles in a bitmap image).The synthesized oligoesters were grouped into 12
sets of equimolar
mixtures of the oMP and oLG (1:2 w/w, 15 μmol), which were dissolved
in 0.5 mL of CDCl3 and stored in conventional NMR tubes
labeled 1–12, respectively. The assigned tube numbers corresponded
to the externally given addresses for the 12 sets of 16 bits of information.
Each tube was subjected to an NMR spectrometer (Varian, 125 MHz for 13C) to acquire the spectrum. The sequencing of oMP and oLG
was completed using a single spectrum containing the ipso-carbon peaks of oMPs and the α-carbon peaks of oLG, which
did not overlap. The 12 spectra of the mixtures of oMP and oLG could
be decoded completely, and the stored information could be fully retrieved
within 1 h by comparing the recorded spectra acquired by the minimum
number of scans by NMR (32 scans for 110 s) with the existing reference
spectra of oMPs and oLGs (Figure ).
Figure 6
13C NMR spectra in a range of ipso-carbons
(left), ω-carbon (center), and α-carbons
(right) of the octameric mixture 09 with different numbers
of scans. The peaks in the spectrum with 32 scans (maroon) could be
distinguishable and matched with the reference spectrum (black).
13C NMR spectra in a range of ipso-carbons
(left), ω-carbon (center), and α-carbons
(right) of the octameric mixture 09 with different numbers
of scans. The peaks in the spectrum with 32 scans (maroon) could be
distinguishable and matched with the reference spectrum (black).Oligoesters could be degraded by hydrolysis or
the epimerization
of α-proton in solution, in particular at high temperatures
or in the presence of basic catalysts.[55,56] The end groups
of oMPs and oLGs used in this study are protected by TBDMS and an
allyl ester group, which prevents degradation by hydrolysis. To investigate
stability, the oligoesters were contained in NMR tubes with a cap
and parafilm seal and without any additives. The decoding and retrieval
of the stored image could be achieved without any reading errors even
after 10 months using the same NMR samples (Figure S19). We also note that the storage capacity in the mixture
of oMP and oLG could be expanded by the correlation of two sequences.
For example, 4-bit information can be encoded by the correlation of
position 1 of oMP and position A of oLG, which makes the mixture of
oMP and oLG to store 32 bits.
Conclusions
In conclusion, we demonstrated the nondestructive
sequencing of
oligoesters composed of enantiopure α-hydroxy
acids by NMR spectroscopy based on the sequence-specific 13C NMR spectral peaks of absolutely configured oligoesters in the 13C NMR spectra. The sequence-defined octameric oligoesters
were synthesized at an accelerated rate by a step-economical cross-convergent
synthesis based on semiautomated flow chemistry while using a feed
of sequence-defined dyads and tetrads of M and P or L and G. The flow
chemistry-based synthesis of the information-storing oligoesters accelerates
the rate of encoding by a factor of 100 compared with the rate of
synthesis for conventional batch processes. The synthesized sequence-defined
octaesters, oMPs and oLGs, could both store 8-bit information. The 13C NMR spectra of the oMPs and oLGs contained the peaks arising
from the enantiopure monomers with respect to their relative positions
between the Si- and All-protective groups. This, in turn, allowed
for sequencing without the degradation of the information-storing
molecules. We also demonstrated that the nondestructive decoding of
the information-storing oligoesters can be combined with accelerated
encoding through flow chemistry to allow for the permanent storage
of digital information without requiring any additional energy or
synthesis processes for maintaining the stored information. Thus,
our results suggest that the bundles of sequence-defined oligomers
can serve as efficient media for storing large-scale information based
on their automated synthesis and nondestructive sequencing.
Methods
Materials
l-Lactic acid (≥98%), l-mandelic acid (≥99%), allyl bromide, tert-butyldimethylsilyl chloride, trifluoroborane etherate, tetrakis(triphenylphosphine)palladium
(0), and morpholine were purchased from Sigma-Aldrich and used without
further purification. Glycolic acid (≥98%) and 1-(3-dimethylaminopropyl)-3-ethylcarbodiimide
hydrochloride were purchased from Tokyo Chemical Industry and used
without purification. d-Phenyl lactic acid (≥95%)
was purchased from AK Scientific, Inc. and used without purification.
Dichloromethane was distilled over CaH2 under N2.
General Instruments
A Legato 101 syringe pump was purchased
from Kd Scientific. A Cadent 3TM syringe pump was purchased from IMI
Norgren. SEP-10 was purchased from Zaiput Flow Technologies. PFA tubing
(1/16″ OD/0.02″ and 0.03″ ID) was purchased from
Revodix. An Omnifit EZ column was purchased from Diba Industries Inc.
Gastorr AG-42-01 was purchased from GL Sciences. A CF-2 fraction collector
was purchased from Spectrum Chemical Mfg. Corp. Automated column chromatography
was performed on a Biotage Selekt flash chromatography purification
system equipped with a Sfär silica column cartridge. Hexane
and ethyl acetate were used as eluents. 1H NMR and 13C NMR spectra were recorded on a Varian INOVA 500 MHz NMR
spectrometer in CDCl3. ESI-MS analyses were performed on
a SCIEX TripleTOF 5600. MALDI-TOF MS/MS analyses were performed on
a Bruker Ultraflex TOF/TOF mass spectrometer.
Continuous Flow Synthesis for Sequence-Defined Tetrads
All permutations of enantiopure oMP and oLG tetrads were generated
by a continuous flow system consisting of a synthetic step and cleaning
step, which is operated via six-way syringe pumps connected with a
programmable controller. About 2 mmol of a dyad (1 M in DCM) and BF3·Et2O (7 M in DCM) were injected at 0.1 mL/min
and mixed through a T-mixer. The mixture was allowed to react in the
reaction loop (volume of 2 mL). Simultaneously, 2 mmol of a dyad (1
M in tetrahydrofuran (THF)) and Pd(PPh3)4 and
morpholine mixture (0.03 M and 1.05 M in THF) were injected at 0.1
mL/min and mixed through the T-mixer. The mixture was allowed to react
in the reaction loop (volume of 2 mL). Deprotected dyads, hydroxyl
and carboxylic acids, were purified by automated flash column chromatography
using HEX/EA and ether/MeOH eluents, respectively. Subsequently, the
purified dyads (1 M in DCM, 0.1 mL/min) were reinjected and mixed
with EDC·HCl and DPTS (0.7 M and 0.07 M in DCM, 0.2 mL/min).
The mixture was allowed to react in the reaction loop (volume of 6
mL). After the synthesis step, the DCM solvent (6 mL) was purged to
the flow reactor for cleaning the reaction loop. Synthesis and cleaning
cycles were repeated 16 times for generating every permutation of
tetrads. Collected tetrads were purified with automated flash column
chromatography (86–92% yield).
Synthesis of Octameric Oligoesters by the Semiautomated Flow
System
The octameric oMPs and oLGs were obtained by flow
synthesis following the same conditions described for the synthesis
of tetrads. About 200 mg of a tetrad (0.25–0.5 M in DCM) and
BF3·Et2O (7 M in DCM) were injected at
0.1 mL/min and mixed through a T-mixer. The mixture was allowed to
react in the reaction loop (volume of 2 mL). Simultaneously, 200 mg
of a tetrad (0.25–0.5 M in THF) and Pd(PPh3)4 and morpholine mixture (0.03 and 1.05 M in THF) were injected
at 0.1 mL/min and mixed through the T-mixer. The mixture was allowed
to react in the reaction loop (volume of 2 mL). Deprotected tetrads
were purified by automated flash column chromatography. The purified
tetrads (0.25–0.5 M in DCM, 0.1 mL/min) were reinjected and
mixed with EDC·HCl and DPTS (0.7 M and 0.07 M in DCM, 0.3 mL/min).
The mixture was allowed to react in the reaction loop (volume of 9
mL). The collected octameric oligoester was purified with automated
flash column chromatography (85–91% yield).
13C NMR Sequencing of Octameric oMPs
(Rule
i) The number of M units in oMP can be determined by counting the
number of peaks in the chemical shift regions of 132.5–133.5
ppm and 138–139 ppm. The number of peaks in the 134.5–137.5
ppm range is identical to the number of P units (Figure S2). (Rule ii) The monomer units at positions 1 and
2 can be determined based on the peak of the ipso-carbon that is deshielded most by a silicon atom present in its
proximity (Figure S3). In the range of
138.4–138.6 ppm, the peak corresponding to the ipso-carbon of Si-M(1), which neighbors M(2), appears downfield to that
of the ipso-carbon that neighbors P(2). Similarly,
the peak related to the ipso-carbon of Si-P(1) appears
in the range of 137.24–60 ppm owing to the adjacent monomer
units. (Rule iii) The presence of a P unit at position 3 can be predicted
based on the peak of the monomer unit at position 2 (Figure S4). The peak corresponding to P(3) appears at 133.4
ppm in the case of M(2) and 135.9 ppm in the case of P(2). (Rule iv)
The ω-carbon peak of the allyl protective group
appears at the characteristic chemical shift in the range of 118.36–119.13
ppm because of the sequence of the monomer units at positions 6–8
(Figure S5). (Rule v) The monomer units
on positions 4 and 5 can be predicted by an indirect method (Figure S6). With P at position 3, the presence
of the peak at 134.9 ppm is indicative of M(4). In the case of M(3)
and P(2), the peak at 133.0 ppm suggests that the P unit is at position
4. In the case of M(3) and M(2), the peak at 132.9 ppm suggests that
the P unit is at position 4.
13C NMR Sequencing of Octameric oLGs
(Rule
i) The number of L units in the oLG can be determined by counting
the number of peaks in the chemical shift regions of 67.9–69.5
ppm. The number of peaks in the range of 60.2–61.5 ppm range
is identical to the number of G units (Figures S7 and S8). (Rule ii) The monomer unit at position A
can be determined based on the presence of the peak of the α-carbon
at a specific chemical shift (Figures S9 and S10). At 68.0–68.1 ppm, the peak corresponding
to the α-carbon of L(A) appears and G(A) appears at 61.3–61.5
ppm. (Rule iii) The monomer unit at position B can also be revealed
depending on the presence of the peak in specific regions (Figures S9 and S10). The peak corresponding
to the α-carbon of L(B) appears at 68.5 ppm and that of G(B)
appears at 60.3–5 ppm. (Rule iv) The monomer units at positions
C and D are predicted by an indirect method (Figure S11). When the L unit
is at position B, the peak at 68.8 ppm suggests that the L unit is
at position C. In the case of L(B) and G(C), the peak at 60.5–6
ppm suggests that the G unit is at position D. In the case of L(B)
and L(C), the peak at 68.9 ppm suggests that the L unit is at position
D. The α-carbon of G(B) appears at 60.3, 60.4 or 60.5 ppm in
the case of neighboring G(C), L(C) and L(D), or L(C) and G(D), respectively.
When the G units are at positions B and C, the peak at 60.7 ppm suggests
that the G unit is at position D. (Rule v) The monomer unit at position
H is predicted by the presence of α-carbon at the specific chemical
shift (Figure S12). The peak corresponding
to the α-carbon of G(H) appears at 61.0–2 ppm. (Rule
vi) The monomer units at positions F and G are predicted by an indirect
method (Figure S12). The α-carbon
of G(H) appears at 61.0, 61.1, or 61.2 ppm in the case of neighboring
L(F) and L(G), G(F) and L(G), or G(G), respectively. The peak corresponding
to the α-carbon of L(H) appears at 69.4 or 69.5 ppm in the case
of neighboring L(F) and G(G) or G(F) and G(G), respectively. In the
case of G(G) and G(H), the peak at 60.9 ppm suggests that the G unit
is at position F. In the case of L(G) and L(H), the peak at 69.2 ppm
suggests that the L unit is at position F.
Authors: Ruijiao Dong; Ruiyi Liu; Piers R J Gaffney; Marc Schaepertoens; Patrizia Marchetti; Christopher M Williams; Rongjun Chen; Andrew G Livingston Journal: Nat Chem Date: 2018-12-03 Impact factor: 24.427
Authors: Zhixue Zhu; Christine J Cardin; Yu Gan; Claire A Murray; Andrew J P White; David J Williams; Howard M Colquhoun Journal: J Am Chem Soc Date: 2011-11-10 Impact factor: 15.419
Authors: Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose Journal: Nat Biotechnol Date: 2018-01-29 Impact factor: 54.908
Authors: Brian J Cafferty; Alexei S Ten; Michael J Fink; Scott Morey; Daniel J Preston; Milan Mrksich; George M Whitesides Journal: ACS Cent Sci Date: 2019-05-01 Impact factor: 14.553