| Literature DB >> 26290177 |
Andrew B MacConnell1, Patrick J McEnaney1, Valerie J Cavett1, Brian M Paegel1.
Abstract
The promise of exploiting combinatorial synthesis for small molecule discovery remains unfulfilled due primarily to the "structure elucidation problem": the back-end mass spectrometric analysis that significantly restricts one-bead-one-compound (OBOC) library complexity. The very molecular features that confer binding potency and specificity, such as stereochemistry, regiochemistry, and scaffold rigidity, are conspicuously absent from most libraries because isomerism introduces mass redundancy and diverse scaffolds yield uninterpretable MS fragmentation. Here we present DNA-encoded solid-phase synthesis (DESPS), comprising parallel compound synthesis in organic solvent and aqueous enzymatic ligation of unprotected encoding dsDNA oligonucleotides. Computational encoding language design yielded 148 thermodynamically optimized sequences with Hamming string distance ≥ 3 and total read length <100 bases for facile sequencing. Ligation is efficient (70% yield), specific, and directional over 6 encoding positions. A series of isomers served as a testbed for DESPS's utility in split-and-pool diversification. Single-bead quantitative PCR detected 9 × 10(4) molecules/bead and sequencing allowed for elucidation of each compound's synthetic history. We applied DESPS to the combinatorial synthesis of a 75,645-member OBOC library containing scaffold, stereochemical and regiochemical diversity using mixed-scale resin (160-μm quality control beads and 10-μm screening beads). Tandem DNA sequencing/MALDI-TOF MS analysis of 19 quality control beads showed excellent agreement (<1 ppt) between DNA sequence-predicted mass and the observed mass. DESPS synergistically unites the advantages of solid-phase synthesis and DNA encoding, enabling single-bead structural elucidation of complex compounds and synthesis using reactions normally considered incompatible with unprotected DNA. The widespread availability of inexpensive oligonucleotide synthesis, enzymes, DNA sequencing, and PCR make implementation of DESPS straightforward, and may prompt the chemistry community to revisit the synthesis of more complex and diverse libraries.Entities:
Keywords: DNA-encoded libraries; combinatorial synthesis; one-bead-one-compound; split-and-pool
Mesh:
Substances:
Year: 2015 PMID: 26290177 PMCID: PMC4571006 DOI: 10.1021/acscombsci.5b00106
Source DB: PubMed Journal: ACS Comb Sci ISSN: 2156-8944 Impact factor: 3.784
Figure 1DNA-encoded solid-phase synthesis. (A) TentaGel Rink-amide resin (160-μm diameter) is first elaborated with a common linker (gray) containing a coumarin chromophore and arginine. Linker resin is further functionalized with an alkyne and Fmoc-protected glycine. Azide-functionalized DNA headpiece (HDNA), consisting of two complementary strands of DNA (cyan) covalently joined via two PEG tethers (magenta), is coupled substoichiometrically (0.004 equiv) to alkyne sites via CuAAC, yielding bifunctional-HDNA resin (Fmoc-protected amine for chemical coupling and 5′-phosphoryl-CC-3′ overhang for enzymatic cohesive end ligation). (B) A forward primer module (green) is first enzymatically ligated to resin. Encoded synthesis proceeds as alternating steps of monomer coupling (scaffold elements shown in purple hues, side chain elements shown in orange hues) and coding module ligation (correspondingly in purple or orange hues). After the last encoding step, a reverse primer module (green) is ligated. The finished resin displays oligomer and a structure-encoding DNA message flanked by primer binding sequences for PCR amplification. (C) The DNA sequence encodes the series of reaction conditions that the bead experienced. Here, the DNA sequence encodes acylation with chloroacetic acid, treatment with methylamine, acylation with (2S,3E)-5-chloro-2,4-dimethyl-3-pentenoic acid, treatment with 3-methoxypropylamine, and acylation with N-Fmoc-l-proline followed by Fmoc removal.
Scheme 1DNA-Encoded Solid-Phase Synthesis Reaction Sequence
Figure 2Encoding language design and optimization. (A) Each target heteroduplex coding module (schematic at top) is composed of two hybridized oligonucleotide strands. Each strand is 5′-phosphorylated (yellow “P”), displays a strand-specific overhang sequence (orange or purple), and coding region that is complementary (gray background). (B) Sufficiently self-complementary sequences may form undesired homoduplexes. Enforcing a coding region sequence structure of either 5′-NNRRRRNN-3′ or 5′-NNYYYYNN-3′ decreases the stability of potential homoduplexes relative to the target heteroduplex. (C) Some sequences (e.g., homopolymers) can form stable off-target heteroduplexes with occluded, unreactive overhangs. (D) Self-complementary sequences can form intramolecular secondary structures (hairpins) that prevent target heteroduplex formation.
Encoding Sequence Thermodynamic Parameters and Ligation Yield
| set 1 | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| coding [+] | Δ | Δ | Δ | ΔΔ | ΔΔ | Δ | Δ | ΔΔ | OH1 yield | OH3 yield | OH5 yield | |||
| 1X01 | TGGAAAGT | 37.1 | –13.4 | –3.9 | –5.0 | –9.5 | –8.4 | – | – | –3.6 | –9.8 | 70 | 71 | 68 |
| 1X02 | ACGGAGCA | 49.9 | –16.3 | –3.6 | –6.9 | –12.7 | –9.4 | – | – | –3.6 | –12.7 | 70 | 70 | 63 |
| 1X03 | TTGGAGTT | 37.1 | –13.4 | –1.6 | –5.0 | –11.8 | –8.4 | 1.9 | 35.2 | –3.6 | –9.8 | 72 | 73 | 69 |
| 1X04 | AAGGAGGT | 40.7 | –14.2 | –4.9 | –4.7 | –9.3 | –9.5 | – | – | –3.6 | –10.6 | 75 | 74 | 66 |
| 1X05 | AGAAAGCA | 38.5 | –13.8 | –3.5 | –3.1 | –10.2 | –10.6 | 20.6 | 17.9 | –3.6 | –10.1 | 74 | 74 | 67 |
| 1X06 | ACAGAACT | 36.5 | –11.4 | –3.5 | –2.0 | –7.8 | –9.4 | – | – | –3.6 | –7.7 | 72 | 72 | 61 |
| 1X07 | TAAGGAGT | 33.5 | –12.1 | –4.9 | –3.1 | –7.2 | –9.0 | – | – | –3.6 | –8.5 | 72 | 74 | 68 |
| 1X08 | ATGGGAGT | 40.9 | –14.1 | –5.4 | –6.5 | –8.7 | –7.6 | – | – | –3.6 | –10.5 | 74 | 75 | 65 |
| 1X09 | TGAAGGAA | 36.3 | –13.7 | –3.5 | –3.5 | –10.1 | –10.1 | – | – | –3.6 | –10.0 | 71 | 73 | 66 |
| 1X10 | TTGAGGAT | 35.4 | –13.2 | –1.6 | –3.1 | –11.6 | –10.1 | 20.0 | 15.4 | –4.6 | –8.6 | 75 | 75 | 70 |
Strand designations are [+] for top strand and [−] for bottom strand.
TM,het = melting temperature of the target heteroduplex (50 mM Na+, 10 mM Mg2+, 1 mM nucleotide triphosphates, 10 μM each oligonucleotide).
ΔGhet = target heteroduplex free energy of formation.
ΔGhomo = most stable homoduplex (of all overhang-appended parents) free energy of formation.
ΔGhet,2° = most stable off-target heteroduplex (of all overhang-appended parents) free energy of formation.
TM,hp = highest hairpin melting temperature (of all overhang-appended parents); entries marked “–” yielded no predicted hairpin formation; no hairpins predicted for any overhang-appended [+] parent.
ΔΔGhet/homo = ΔGhet – ΔGhomo.
ΔΔGhet/het,2° = ΔGhet – ΔGhet,2°.
ΔTM,hp = TM,het – TM,hp.
OHX yield = experimentally measured ligation yield of overhang-appended parent for each set’s overhangs (OH1, OH3, and OH5 for set 1; OH2, OH4, and OH6 for set 2). OH1 = 5′-ATGG-3′; OH2 = 5′-TCA-3′; OH3 = 5′-GTT-3′; OH4 = 5′-CTA-3′; OH5 = 5′-TTC-3′; OH6 = 5′-CGC-3′.
Chart 1DNA-Encoded Oligomer Synthesis and Single-Bead Quantitation
Chart 2DNA-Encoded Compound Purity and Side Product Identification
Figure 3DNA-encoded combinatorial library plan and quality control. (A) The library scaffold features a linear arrangement of three positions for diversification (Pos1, Pos2, Pos3). Each position displays either an amino acid or N-substituted glycine. Amino acids featured Cα diversity in side chain, side chain stereochemistry, and N-methylation. The central position, Pos2, uniquely featured 1 of 6 different “linker” amino acids in addition to the L- and D- complement. N-substitution of glycine was executed with 1 of 21 different amines (gray). (B) The mixed-scale combinatorial DESPS was conducted in wells of a filtration microplate that housed a mixture of 160- and 10-μm bifunctional-HDNA library resin. The 160-μm QC beads were harvested by filtration and single beads were placed into separate wells for qPCR analysis. The resultant amplicons were purified and sequenced. The single QC beads were retrieved from qPCR supernatant, transferred to individual trifluoroacetic acid cleavage reactions, and the cleavage products subjected to mass spectrometric analysis. (C) DNA sequence data (shown as numeric identifiers) were used to predict the compound structure on each QC bead. The predicted exact mass of [M + H]+ (green) agreed with the observed predominant ion (black) in the MALDI-TOF mass spectra.