Literature DB >> 28446765

Three-dimensional modeling of single stranded DNA hairpins for aptamer-based biosensors.

Iman Jeddi1, Leonor Saiz2.   

Abstract

Aptamers consist of short oligonucleotides that bind specific targets. They provide advantages over antibodies, including robustness, low cost, and reusability. Their chemical structure allows the insertion of reporter molecules and surface-binding agents in specific locations, which have been recently exploited for the development of aptamer-based biosensors and direct detection strategies. Mainstream use of these devices, however, still requires significant improvements in optimization for consistency and reproducibility. DNA aptamers are more stable than their RNA counterparts for biomedical applications but have the disadvantage of lacking the wide array of computational tools for RNA structural prediction. Here, we present the first approach to predict from sequence the three-dimensional structures of single stranded (ss) DNA required for aptamer applications, focusing explicitly on ssDNA hairpins. The approach consists of a pipeline that integrates sequentially building ssDNA secondary structure from sequence, constructing equivalent 3D ssRNA models, transforming the 3D ssRNA models into ssDNA 3D structures, and refining the resulting ssDNA 3D structures. Through this pipeline, our approach faithfully predicts the representative structures available in the Nucleic Acid Database and Protein Data Bank databases. Our results, thus, open up a much-needed avenue for integrating DNA in the computational analysis and design of aptamer-based biosensors.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28446765      PMCID: PMC5430850          DOI: 10.1038/s41598-017-01348-5

Source DB:  PubMed          Journal:  Sci Rep        ISSN: 2045-2322            Impact factor:   4.379


Introduction

Cellular-level protein production is traditionally determined using several bioanalytical approaches, which rely on antibody or enzyme recognition. These include flow cytometry coupled with intracellular cytokine staining, enzyme-linked immunospot (ELISPOT), enzyme linked immunosorbent assay (ELISA), and polymerase chain reaction (PCR)[1, 2]. ELISA and PCR are robust technologies for detecting either cytokines or cytokine mRNA in blood; however, they cannot be used to identify specific populations of cytokine producing cells[2]. In contrast, flow cytometry and ELISPOT report the frequency of cytokine positive cells but not the cytokine concentration[1]. While these well-established methods can be sensitive and robust, they employ complex detection procedures, involving expensive reagents and multiple time consuming washing steps. In addition, these traditional strategies provide no information about the temporal dynamics of cytokine production, which is vital information in understanding the body’s immune response[3]. In order to fill this gap, aptamer-based affinity sensing strategies are emerging as viable alternatives to antibody-based immunoassays[4]. Aptamer-based biorecognition elements are short nucleic acid molecules and thus are more robust and simple than antibody-based probes. Aptamers bind specifically diverse targets including ions, organic dyes, amino acids, nucleotides, RNA, biological cofactors, other small organic molecules, oligosaccharides, peptides, toxins, enzymes, growth factors, transcription factors, antibodies, viral proteins and/or components, cells, and bacteria[5]. The selection of aptamers is termed Systematic Evolution of Ligands by Exponential Enrichment (SELEX) and involves the discovery of full-length aptamers from large pools of randomized single-stranded DNA or RNA (1014 to 1015 variants). The selection process is highly iterative and involves exposing the oligonucleotides to a target that is either coupled to a matrix or surface. The unbound molecules are then washed away and the bound molecules are recovered and amplified. This process results in highly robust and specific aptamers that are 25–40 nucleotides long[6]. The relatively simple chemical structure of aptamers allows the insertion of electrochemical or fluorescent reporter molecules[7] as well as surface-binding agents[8] in specific locations on the oligonucleotide[9]. During probe-target binding, the conformation change of the aptamer may be exploited to generate an analytical signal[10]. A number of aptamer-based biosensors have been successfully used to measure cell secretion of proteins[11, 12]; however, several opportunities for improvement remain before commercialization is feasible[13]. These include enhancing the sensitivity and improving the manufacturability and repeatability of aptamer-based sensors. Both of these challenges are not easily overcome without a better understanding of the molecular level interactions of the aptamer-biosensor surface (for improving manufacturability, reproducibility, and sensitivity) as well as of the aptamer-protein complex (for improving specificity and sensitivity). Despite the growing widespread use in biotechnology and the substantial number of biomedical applications of DNA aptamers, including its clinical use as therapeutic agents for a number of human diseases[14-16] and its increasing use in drug delivery[17], the considerable array of computational tools available for single stranded RNA structure prediction (e.g. see refs 18–21 for recent reviews)[18-21] are lacking for its DNA counterpart. Until now, the 3D computational tools available for DNA have been restricted to model mainly double-stranded DNA structures[22], lacking the ability to analyze single-stranded DNA hairpins and other more complex structures. The ability to faithfully predict the 3D structure of single stranded DNA and RNA from sequence has the potential to revolutionize the way aptamers are selected and allow for crucial applications not only into aptamer design but also for biosensor set-up design and the molecular level understanding of structure and dynamics of single stranded oligonucleotide systems[23]. Here, we present the first approach to predict the three-dimensional structures of single stranded DNA required for aptamer applications that extends current sequence-based computational efforts for RNA[18-21]. Our approach faithfully predicts the representative resolved structures available in the Nucleic Acid Database (NDB) and Protein Data Bank (PDB) databases from a pipeline that integrates 2D and 3D structural tools, including Mfold, Assemble 2, Chimera, VMD, and Molecular Dynamics (MD) simulations. Explicitly, we build ssDNA secondary structure from sequence, construct equivalent 3D ssRNA models, transform the 3D ssRNA models into ssDNA 3D structures, and finally refine the resulting ssDNA 3D structures through energy minimization. To thoroughly evaluate our approach, we considered all hairpin-like ssDNA molecules with experimentally solved 3D structures selected through an exhaustive search for ssDNA molecules and aptamers in the PDB database. Our results indicate that this approach works exceptionally well for the hairpin-like structural motif of ssDNA, the focus of the current work. To test the robustness of the results, we performed additional atomistic MD simulations for a sub-set of representative ssDNA molecules and aptamers. The atomistic details available in MD simulations with explicit solvent have been fundamental at uncovering the molecular level mechanisms of key experimental observations and deepen our understanding of the interactions and properties of biological complexes in their natural environment[24-28]. Partly because of the lack of solved 3D structures, very few MD simulation studies have focused on aptamers[29, 30]. Our results show that MD simulations can, indeed, be used to further improve the structural predictions and that the predictions are representative of those obtained from the dynamics of the systems under conditions that mimic their targeted environment.

Methods

Workflow for three-dimensional structure generation from sequence

The workflow to construct the 3D structures of the ssDNA molecules from the nucleotide sequence consists of four main steps (Fig. 1), comprising building the DNA secondary structure using Mfold in step 1, constructing refined equivalent 3D RNA models using Assemble2 and Chimera in step 2, translating the 3D RNA models into DNA models using VMD in step 3, and refining the final 3D DNA structure through minimization using VMD in step 4, as described below. Typically, this process takes about one hour to complete. Illustrative examples of the structures obtained at the end of each of the four steps of the workflow are shown in Supplementary Figure 1.
Figure 1

Workflow used to construct the ssDNA 3D structures from the sequence. The approach consists of four main steps, involving building the ssDNA secondary structure from the sequence using Mfold (step 1), constructing refined equivalent 3D ssRNA models using Assemble2/Chimera (step 2), translating the 3D ssRNA models into ssDNA models using VMD (step 3), and refining the 3D ssDNA structures using VMD (step 4).

Workflow used to construct the ssDNA 3D structures from the sequence. The approach consists of four main steps, involving building the ssDNA secondary structure from the sequence using Mfold (step 1), constructing refined equivalent 3D ssRNA models using Assemble2/Chimera (step 2), translating the 3D ssRNA models into ssDNA models using VMD (step 3), and refining the 3D ssDNA structures using VMD (step 4).

Step 1: Build ssDNA secondary structure from sequence

The first step of our approach is the prediction of secondary structures from DNA sequences. The input consists of the DNA sequences, which for this study were selected after searching the PDB database with a focus on the hairpin-like common fold of ssDNA, as detailed in the results section. Starting with the nucleotide sequence, the secondary structures of the ssDNA molecules were predicted using the mfold web server (http://mfold.rna.albany.edu/?=mfold)[31], based on free energy minimization techniques. In mfold, all possible secondary structures are approximated based on Watson-Crick base pairing and the most thermodynamically stable structures are selected[32, 33]. The initial sequences were selected as linear at a temperature of 37 °C and ionic concentration of 1 M of Na+, 0 M of Mg2+, computing only fold configurations within 5% from the minimum free energy, and considering a maximum number of 50 folds with no limit to the maximum distance between paired bases. In addition to the predicted secondary structure of the ssDNA this step provides the minimum free energy of the fold.

Step 2: Construct equivalent 3D ssRNA models and refine structures

The second step is to use the predicted secondary structures as a starting point to generate the 3D structures of the equivalent ssRNA models using Assemble2/Chimera. The 3D structures were modeled and visualized using Assemble2[34] and Chimera[35] following a manual process of individually selecting the 2D helical and non-helical elements of the ssDNA molecule and translating the residues into equivalent 3D RNA models. The 3D RNA models were then refined using 100 iterations to remove geometric deficiencies and optimize the structural parameters such as bond length and angles, planarity of certain groups, non-bonded contacts, and restricted torsion angles. The refinement was achieved by the geometrical least squares method using the Konnert–Hendrickson algorithm[36] as implemented in the Assemble2 program.

Step 3: Transform the 3D ssRNA models into ssDNA 3D structures

In the third step, the refined ssRNA 3D structures were imported into VMD[37], where hydrogen atoms were added using the AutoPSF VMD plugin (step 3a), and the ssRNAs modified into ssDNA 3D structures by: (i) identifying each uracil residue and replacing the H5 atom with a methyl group using the Molefacture VMD plugin (step 3b) and (ii) replacing the ribose sugar backbone with deoxyribose using the AutoPSF VMD plugin and manually renaming the modified uracil residues to thymine in the pdb file (step 3c). The output of this step consists of both pdb and psf files that contain the atomic coordinates (.pdb) and atom type, charge, mass and bonding information (.psf).

Step 4: Refine final ssDNA 3D structures

The fourth step of our approach consists of further refinement of the ssDNA 3D structures obtained in step 3. It uses the AutoIMD VMD plugin to refine the structures through energy minimization using 10,000 iterations. The output of this step consists of the final coordinate (.pdb) file in addition to the psf file obtained in the previous step. We will use these two files, in addition to files containing the topology of the molecules and the force-field parameters, for conducting further analysis through molecular dynamics simulations as detailed below.

Molecular Dynamics Simulations

Description of the systems and initial structure set-up

We carried out molecular dynamics simulation studies for 5 ssDNA sequences from the pool of 24 structures selected after an exhaustive search from the PDB database, as detailed in the results section, using as initial configurations our predicted 3D structure and the corresponding experimentally resolved structure from the PDB database. For the sake of simplicity, throughout this paper, the five original ssDNA resolved through conventional experimental methods will be referred to as “original” and those derived here using computational modeling methods will be referred to as “predicted”. They are labeled using their corresponding PDB IDs. The predicted and original structures for the single stranded DNA hairpins corresponding to the PDB ID entries 1BJH, 1LA8, 2M8Y, 2VAH, and 2L5K were each solvated in a water box to closer represent the biological as well as the biosensor environment. In the case of the original systems, first we used the AutoPSF VMD plugin to add the hydrogen atoms to the ssDNAs, missing from the NMR solution structures. Using the Solvate plugin in VMD[37], a layer of water of about 25 Å in each direction from the atom with the largest coordinate in that direction was created to fully immerse the ssDNA molecules. In addition, each system was neutralized by replacing a predetermined number of water molecules with sodium ions. Table 1 contains the resulting dimensions of the simulation cells as well as the type and number of ions used to neutralize each system.
Table 1

Dimensions of the simulation box, number of atoms of the ssDNA molecules, ion type and number of ions in the system, and total number of atoms for each of the 10 different molecular dynamics simulations.

Structure L X (Å) L Y (Å) L Z (Å)ssDNA atomsIon Type# of IonsTotal Atoms
1BJH original707471349Na+ 1032780
1BJH predicted757370349Na+ 1033947
1LA8 original727977411Na+ 1239720
1LA8 predicted727278411Na+ 1236867
2M8Y original747778474Na+ 1439899
2M8Y predicted747181474Na+ 1438411
2VAH original727482569Na+ 1738974
2VAH predicted747487569Na+ 1742901
2L5K original757594728Na+ 2247412
2L5K predicted757594728Na+ 2246119
Dimensions of the simulation box, number of atoms of the ssDNA molecules, ion type and number of ions in the system, and total number of atoms for each of the 10 different molecular dynamics simulations.

Molecular Dynamics Simulation Details

We carried out the ten different simulations using the atomistic molecular dynamics (MD) simulation code NAMD[38]. The details of each system are summarized in Table 1. All the MD simulations were carried out using the NAMD2.9 software package with the recent version of the all-atom CHARMM force field for the nucleic acids[39] (CHARMM27) and the rigid TIP3P model for water[40], which is consistent with the nucleic acids force field. Following standard procedures, the solvated ssDNA systems were first minimized for 100,000 steps using the conjugate gradient energy minimization method as implemented in NAMD. This was followed by a NVT equilibration run at a temperature of 300 K and a NPT production run of a total of 10 ns at a temperature of 300 K and a pressure of 1 atm. To maintain these conditions, we used the Langevin dynamics method with a friction constant of 1 and the Nose-Hoover Langevin piston method[41]. The simulations were carried out using a time step of 2 fs. In all the simulations, we used three-dimensional periodic boundary conditions and the minimum image convention[42] to calculate the short-range Lennard-Jones interactions using a spherical cutoff distance of 12 Å with a switch distance of 10 Å. The long-range electrostatic interactions were calculated by using the particle-mesh Ewald (PME) method[43].

Results and Discussion

Selection of ssDNA Candidates

We performed an exhaustive search for ssDNA molecules and aptamers with experimentally solved 3D structures in the Protein Data Bank database (http://www.pdb.org). The selection method consisted of two independent searches using the following keywords: “DNA hairpin” and “aptamer DNA”. This resulted in a total of 772 entries. The search results were then filtered by ‘Polymer Type’; only including entries that were classified under “DNA”. This resulted in reducing the pool of potential candidates to 97 entries as detailed in Table 2. Subsequently, 20 entries containing ligands were excluded from the list of candidates. The remaining 77 structures were then individually reviewed and 53 additional structures that did not correspond to the hairpin-like structural motif were eliminated. The exclusion criteria in this last step eliminated the following types of structures: duplex, triplex, three-way junction, dimer, dimer G-quadruplex, quadruplex duplex hybrid, complexed with non-protein ligand, Z-DNA, and non-nucleotide sequence modifications. This last step resulted in the exclusion from the list of all the DNA sequences with a G-score greater than zero as calculated using the QGRS Mapper software program[44]. QGRS Mapper uses a scoring system to predict the presence of quadruplex forming G-rich sequences in nucleotide sequences and non-zero scores indicate the possibility of G-quadruplex formation with higher scores representing better G-quadruplex forming candidates. The remaining pool of 24 candidates (Table 2), with 21 different sequences, was selected as our initial set of 21 DNA sequences for 3D structural prediction.
Table 2

Protein Data Bank database search results for the selection of ssDNA candidates.

PMIDPDB IDChain LengthSequenceQGRSExcludeExclusion Criteria
8069626134D31TCCTCCTTTTTTAGGAGGATTTTTTGGTGGT15YesTriplex
8069626135D31TCCTCCTTTTTTAGGAGGATTTTTTGGTGGT15YesTriplex
8069626136D31TCCTCCTTTTTTAGGAGGATTTTTTGGTGGT15YesTriplex
7613864184D7GCATGCT0YesDimer Quadruplex
97379261A8N12GGGCTTTTGGGC0YesDimer Quadruplex
97379271A8W12GGGCTTTTGGGC0YesDimer Quadruplex
90926591AC716ATCCTAGTTATAGGAT0NoN/A
93981691AO913GAGAGAXTCTCTC0YesComplexed with non-protein ligand
76641261AU68CATGCATG0YesComplexed with non-protein ligand
93845291AW427ACCTGGGGGAGTATTGCGGAGGAAGGT0YesComplexed with non-protein ligand
9000625 1BJH 11 GTACAAAGTAC 0 No N/A
93677761C1111TCCCGTTTCCA0YesDimer Quadruplex
123718561CS713GUTTTGXCAAAAC0YesComplexed with non-protein ligand
22996691D1616CGCGCGTTTTCGCGCG0YesZ-DNA
106536381DB622CGACCAACGTGTCGCCTGGTCG0YesComplexed with non-protein ligand
107561901DGO18AGGATCCTUTTGGATCCT0NoN/A
N/A1ECU19GCGCGAAACTGTTTCGCGC0NoN/A
109241011EN118GTCCCTGTTCGGGCGCCA0NoN/A
110902801EZN36CGTGCACCCGCTTGCGGCGACTTGTCGTTGTGCACG0YesThree way junction
117901441FV811TATCATCGATA0YesNon-nucleotide modified residues
113527241G5L6CCAAAG0YesComplexed with non-protein ligand
113527241GJ26CCAAAG0YesComplexed with non-protein ligand
119527901IDX18AGGATCCTTUTGGATCCT0NoN/A
119527901II118AGGATCCUTTTGGATCCT0NoN/A
118436261JU023CTTGCTGAAGCGCGCACGGCAAG0YesDimer
118436261JUA23CTTGCTGAAGCGCGCACGGCAAG0YesDimer
119913551JVE27CCTAATTATAACGAAGTTATAATTAGG0NoN/A
124494141KR87GCGAAGC0NoN/A
118954431L0R14ACGAAGTGCGAAGC0YesComplexed with non-protein ligand
11849039 1LA8 13 CGCGGTGTCCGCG 0 No N/A
118490391LAE13CGCGGTPTCCGCG0YesNon-nucleotide modified residues
118490381LAI13CGCGGTGTCCGCG0YesDuplex
118490381LAQ13CGCGGTPTCCGCG0YesDuplex
118490381LAS13CGCGGTPTCCGCG0YesDuplex
125604791MF57GCATGCT0YesDimer Quadruplex
125649211MP710GCCAGAGAGC0YesComplexed with non-protein ligand
127556091NGO27CTCTTTTTGTAAGAAATACAAGGAGAG0NoN/A
127556091NGU27CTCTCCTTGTATTTCTTACAAAAAGAG0NoN/A
127580811P0U13GCATCGACGATGC0NoN/A
85253811PNN24GAAGAAGAG0YesTriplex
124494141PQT7GCGAAGC0NoN/A
129524631PUY13GTTTTGXCAAAAC0YesComplexed with non-protein ligand
104810341QE722CTAGAGGATCCTTTUGGATCCT0NoN/A
N/A1QYK7GCATGCT0YesDimer Quadruplex
N/A1QYL7GCATGCT0YesDimer Quadruplex
151991711SNJ36CGTGCAGCGGCTTGCCGGCACTTGTGCTTCTGCACG0YesThree way junction
146848971UE29GCGAAAGCT0YesDuplex
146848971UE38GCGAAAGC0YesDuplex
89015501XUE17GTGGAATGCAATGGAAC0NoN/A
75836541ZHU10CAATGCAATG0NoN/A
8548453229D17CCAGACUGAAGAUCUGG0YesNon-nucleotide modified residues
98181482ARG30TGACCAGGGCAAACGGTAGGTGAGTGGTCA18YesComplexed with non-protein ligand
166201212AVH11GGGGTTTGGGG0YesG-Quadruplex
N/A2F1Q42GCACTGCATCCTTGGACGCTTGCGCCACTTGTGGTGCAGTGC0YesFour way junction
168665562GKU24TTGGGTTAGGGTTAGGGTTAGGGA42YesG-Quadruplex
N/A2K6717TTAATTTNNNAAATTAA0YesNon-nucleotide modified residues
N/A2K6817TTAATTTNNNAAATTAA0YesNon-nucleotide modified residues
N/A2K6917TTAATTTNNNAAATTAA0YesNon-nucleotide modified residues
193744202K718GCGAAAGC0NoN/A
193215012K8Z8TCGTTGCT0YesDimer
190706212KAZ13GGGACGTAGTGGG0YesDimer Quadruplex
214101962L1313TATTATXATAATA0YesNon-nucleotide modified residues
22129448 2L5K 23 CAGTTGATCCTTTGGATACCCTG 0 No N/A
225070542LO512GGCCGCAGTGCC0NoN/A
225070542LO810GCCGCAGTGC0NoN/A
225070542LOA10GCCGCAGTGC0YesComplexed with non-protein ligand
227984992LSC12CGCGAAUUCGCG0YesComplexed with non-protein ligand
23794476 2M8Y 15 CGCGAAGCATTCGCG 0 No N/A
237944762M8Z27GGTTGGCGCGAAGCATTCGCGGGTTGG9YesQuadruplex Duplex Hybrid
237944762M9032GCGCGAAGCATTCGCGGGGAGGTGGGGAAGGG21YesQuadruplex Duplex Hybrid
237944762M9130GGGAAGGGCGCGAAGCATTCGCGAGGTAGG7YesQuadruplex Duplex Hybrid
237944762M9234AGGGTGGGTGCTGGGGCGCGAAGCATTCGCGAGG17YesQuadruplex Duplex Hybrid
237944762M9332TTGGGTGGGCGCGAAGCATTCGCGGGGTGGGT29YesQuadruplex Duplex Hybrid
86581682NEO19CCCGATGCXGCAATTCGGG0YesComplexed with non-protein ligand
173620082O3M22AGGGAGGGCGCTGGGAGGAGGG39YesG-Quadruplex
173885702OEY25CCATCGTCTACCTTTGGTAGGATGG0YesComplexed with non-protein ligand
90209822PIK23CACTCCTGGTTTTTCCAGGAGTG0YesComplexed with non-protein ligand
18515837 2VAH 18 AGGATCCTUTTGGATCCT 0 No N/A
185158372VAI18AGGATCCTUTTGGATCCT0NoN/A
222876243QXR22AGGGAGGGCGCUGGGAGGAGGG39YesG-Quadruplex
N/A3T867GCATGCT0YesComplexed with non-protein ligand
224093134DKZ12CGCGAAXXCGCG0YesNon-nucleotide modified residues
8107090148D15GGTTGGTGTGGTTGG20YesG-Quadruplex
93845291AW427ACCTGGGGGAGTATTGCGGAGGAAGGT14YesComplexed with non-protein ligand
97997031BUB15GGTTGGTGTGGTTGG20YesG-Quadruplex
107561991C3215GGTTGGTGTGGTTGG20YesG-Quadruplex
107561991C3415GGTTGGTGTGGTTGG20YesG-Quadruplex
107561991C3515GGTTGGTGTGGTTGG20YesG-Quadruplex
107561991C3815GGTTGGTGTGGTTGG20YesG-Quadruplex
106536381DB622CGACCAACGTGTCGCCTGGTCG0YesComplexed with non-protein ligand
87578011QDF15GGTTGGTGTGGTTGG20YesG-Quadruplex
87578011QDH15GGTTGGTGTGGTTGG20YesG-Quadruplex
152148021RDE15GGTTGGTGTGGTTGG20YesG-Quadruplex
156371581Y8D16GGGGTGGGAGGAGGGT21YesDimer Quadruplex
98181482ARG30TGACCAGGGCAAACGGTAGGTGAGTGGTCA18YesComplexed with non-protein ligand
171457162IDN15GGTTGGTGTGGTTGG20YesG-Quadruplex
239350712M5325TGTGGGGGTGGACGGGCCGGGTAGA21YesG-Quadruplex

The details of the selection criteria are discussed in the text (see results section). For each experimentally resolved structure from the PDB database, denoted by its PDB ID, the table provides the corresponding publication denoted by its PubMed identifier (PMID), the sequence as provided in the PDB database, the chain length, the G-score denoted by QGRS obtained using the QGRS Mapper, and, if the DNA candidate was excluded from our list of potential candidates, the exclusion criteria.

Protein Data Bank database search results for the selection of ssDNA candidates. The details of the selection criteria are discussed in the text (see results section). For each experimentally resolved structure from the PDB database, denoted by its PDB ID, the table provides the corresponding publication denoted by its PubMed identifier (PMID), the sequence as provided in the PDB database, the chain length, the G-score denoted by QGRS obtained using the QGRS Mapper, and, if the DNA candidate was excluded from our list of potential candidates, the exclusion criteria. The 24 ssDNA molecules had 21 unique sequences ranging in length from 7 to 27 nucleotides, all with structures solved by NMR spectroscopy, and comprising a wide variety of systems of biological and biomedical interest, ranging from models of biologically relevant sequences, such as those to which HIV-1 nucleocapsid protein binds during reverse transcription[45] or those with uracil (a constituent of RNA) bases[46], to DNA aptamers, such as those that bind Mucin 1, a cell-surface glycoprotein overexpressed in a number of cancers[47]. In addition, five representative ssDNA hairpins (Table 2; entries in bold typesetting), with PDB IDs 1BJH[48], 1LA8[49], 2M8Y[50], 2VAH[46], and 2L5K[47], were selected from the 24 candidates for additional analysis through atomistic molecular dynamics simulations. The length of these molecules ranges from 11 (1BJH) to 24 (2L5K) nucleotides.

Three-dimensional Structure Prediction from Sequence

The approach followed, as indicated in the workflow (Fig. 1) and detailed in the methods section, successfully captures the 3D structures predicted from the 21 different sequences of the 24 ssDNA hairpins identified from the PDB database. In order to test the accuracy of the 3D prediction method, each predicted ssDNA structure was aligned with the corresponding ssDNA structure downloaded from the PDB database. To measure the degree of similarity, the corresponding Root Mean Square Deviation (RMSD) of the sugar-phosphate backbone between each pair of structures was calculated. Figure 2 shows an overlay of each of the 21 predicted 3D structures (ssDNA colored red) and the corresponding 24 NMR structures downloaded from the PDB database (ssDNA colored blue), together with calculated values of the RMSDs. Our results indicate that our approach is able to faithfully predict the 3D structure of a wide variety of ssDNA hairpins, with sequences ranging from 7 to 27 nucleotides. In all cases, the RMSD values obtained ranged from 1.9 Å (PDB ID: 1PQT) for the shortest sequences to 8.59 Å (1NGU) for the longest cases, and were typically around 3.5 Å, with an average value of 4 Å. Interestingly, our approach was also able to accurately predict the 3D structure for ssDNA sequences containing uracil bases, namely 1DGO, 1IDX, 1II1, 1QE7, 2VAH, and 2VAI. Three particular predicted structures, namely 2LO8, 1EN1, and 1NGU, differed more than those corresponding to typical values we obtained for other ssDNAs with similar sequence lengths. Figure 3 shows the RMSD values obtained for the 24 predicted structures with respect to the experimental ones as a function of chain length of the DNA. For instance, the RMSD obtained for 2LO8 with a chain length of 10 nucleotides is 5.53 Å, while the corresponding value for 1ZHU is 3.05 Å. Interestingly, the sequence 2LO8 is only 10nt, and is part of 2LO5. However, its RMSD is bigger than the latter because the two additional bases in 2LO5 make a CG base-pairing and increase the stability of the loop. In the case of 1EN1, the higher RMSD value (compared for instance with the value of 3.01 obtained for 1II1 with also 18 nucleotides) mostly originates from the unstructured tail at the 3’ end of the hairpin, which was predicted as more structured. In this case, additional molecular dynamics simulations, as the ones we present in the next section for five of the ssDNA sequences, are likely to improve the prediction of the model by relaxing the positions of the nucleotides at the tail.
Figure 2

Overlay of the 3D predicted structures (ssDNA colored red) and the corresponding experimental structures downloaded from the PDB database (ssDNA colored blue) for the 24 ssDNA hairpin structures. Each structure is labeled by its corresponding PDB ID and the calculated RMSD values (in Angstroms) are shown in parenthesis.

Figure 3

Values of the RMSD for the 24 ssDNA predicted structures with respect to the experimental ones as a function of the nucleotide chain length.

Overlay of the 3D predicted structures (ssDNA colored red) and the corresponding experimental structures downloaded from the PDB database (ssDNA colored blue) for the 24 ssDNA hairpin structures. Each structure is labeled by its corresponding PDB ID and the calculated RMSD values (in Angstroms) are shown in parenthesis. Values of the RMSD for the 24 ssDNA predicted structures with respect to the experimental ones as a function of the nucleotide chain length. The results obtained for the 3D predictions of the five ssDNA hairpins selected from the pool of 24 candidates for additional analysis through atomistic molecular dynamics simulations, namely 1BJH, 1LA8, 2M8Y, 2VAH, and 2L5K, were in good agreement with the experimental data. Specifically, we obtained RMSD values of ranging from ~2.4 Å for 1BJH with 11 nucleotides to ~4.1 Å for 2VAH and 2L5K with 18 and 23 nucleotides, respectively. In these five cases, the secondary structure prediction obtained in step 1 of our approach using Mfold as described in the methods section resulted in the following: 1BJH is characterized by a minimum free energy of 0.04 kcal/mol and consists of a 4-base pair stacked stem and a 3-nucleotide hairpin loop; 1LA8 is characterized by a minimum free energy of −4.9 kcal/mol and consists of a 5-base pair stacked stem and a 3-nucleotide hairpin loop; 2M8Y is characterized by a minimum free energy of −6.3 kcal/mol and consists of a 6-base pair helical stacked stem followed by a 3-nucleotide hairpin loop; 2VAH is characterized by a minimum free energy of −6.2 kcal/mol and consists of a 7-base pair helical stacked stem and a 4-nucleotide hairpin loop; and 2L5K is characterized by a minimum free energy of 0.05 kcal/mol and consists of a 3-base pair stacked stem attached to a 4-base pair stacked stem by two strings of 3-base single stranded nucleotides, followed by a 3-nucleotide hairpin loop. As in the other cases, the predicted secondary structures of these five ssDNA molecules are in agreement with the resolved structures available through the Protein Data Bank database.

Refinement of 3D Predictions and Dynamics through Molecular Dynamics Simulations

We have used fully atomistic molecular dynamics (MD) simulations to further improve our structural predictions for the 1BJH, 1LA8, 2M8Y, 2VAH, and 2L5K cases and study the dynamics of the systems. For these cases, the 3D structural predictions obtained from sequence (Fig. 2) were solvated in water and used as initial configurations of the molecular dynamics simulations, as detailed in the methods section. Figure 4 shows the initial configuration of the MD simulation corresponding to the 1LA8 ssDNA structure, solvated in water containing 12 sodium ions. In all five cases, the DNA structures evolve as a function of time during the 10-ns length of the simulations. In Fig. 5, we display three snapshots for each of the five different ssDNAs obtained from the simulations at different time intervals, which correspond to 1 ns, 5 ns, and 10 ns. Our results indicate that, in all cases, the hairpin structure of the ssDNAs remains stable during the 10 ns of the simulations. The overlays of Fig. 5 correspond to the ssDNA structures at the different time intervals (ssDNA colored red) with respect to the corresponding structure downloaded from the PDB database (ssDNA colored blue). The corresponding RMSD values calculated at different times along the 10-ns simulation, corresponding to 1 ns, 2 ns, 3 ns, 4 ns, 5 ns, 6 ns, 7 ns, 8 ns, and 9 ns, are summarized in Table 3. These values are typical and in the same range as values used to demonstrate similarity of proteins[51, 52]. The quantitative RMSD calculation and the qualitative visual comparison of the pairs of structures show that the approach successfully predicts the three-dimensional structure of ssDNA hairpins.
Figure 4

Initial configuration of the MD simulation of the predicted 3D ssDNA structure for the sequence CGCGGTGTCCGCG, corresponding to 1LA8. Only the molecules within the central simulation cell are shown. The ssDNA molecule was solvated in water and the system charge neutralized with 12 sodium ions. The atoms of the water molecules are represented as white (hydrogen) and red (oxygen) lines and the sodium ions are displayed as yellow spheres with radii corresponding to its atomic van der Waals radius. The ssDNA molecule is shown in blue with the backbone and bases represented as new ribbons and the additional components represented as lines. The unit cell dimensions are L x = 72 Å, L y = 72 Å, and L z = 78 Å.

Figure 5

Evolution of the ssDNA molecules as a function of time. The overlays of the predicted structures taken from the MD simulations (ssDNA colored red) and the corresponding experimental structures downloaded from the PDB database (ssDNA colored blue) correspond to (a) 1BJH, (b) 1LA8, (c) 2M8Y, (d) 2VAH, and (e) 2L5K. The snapshots for the predicted ssDNA molecules correspond to (from left to right) 1 ns, 5 ns, and 10 ns. The corresponding RMSD values are summarized in Table 3. Water molecules and ions within the simulation cell are not shown.

Table 3

RMSD values (in Angstroms) of the sugar-phosphate backbone for the predicted ssDNA structures at different time points along the MD simulations with respect to the corresponding experimental structures downloaded from the PDB database.

1BJH1LA82M8Y2VAH2L5K
3D-Prediction2.413.983.564.104.07
1 ns2.003.441.463.167.02
2 ns4.404.281.563.065.77
3 ns2.112.732.043.264.69
4 ns2.363.692.053.364.69
5 ns2.172.951.623.072.79
6 ns2.472.921.353.532.90
7 ns3.221.781.652.933.59
8 ns2.642.282.902.843.12
9 ns3.322.321.852.694.32

The corresponding values for the 3D predictions (Fig. 2) are included for completeness.

Initial configuration of the MD simulation of the predicted 3D ssDNA structure for the sequence CGCGGTGTCCGCG, corresponding to 1LA8. Only the molecules within the central simulation cell are shown. The ssDNA molecule was solvated in water and the system charge neutralized with 12 sodium ions. The atoms of the water molecules are represented as white (hydrogen) and red (oxygen) lines and the sodium ions are displayed as yellow spheres with radii corresponding to its atomic van der Waals radius. The ssDNA molecule is shown in blue with the backbone and bases represented as new ribbons and the additional components represented as lines. The unit cell dimensions are L x = 72 Å, L y = 72 Å, and L z = 78 Å. Evolution of the ssDNA molecules as a function of time. The overlays of the predicted structures taken from the MD simulations (ssDNA colored red) and the corresponding experimental structures downloaded from the PDB database (ssDNA colored blue) correspond to (a) 1BJH, (b) 1LA8, (c) 2M8Y, (d) 2VAH, and (e) 2L5K. The snapshots for the predicted ssDNA molecules correspond to (from left to right) 1 ns, 5 ns, and 10 ns. The corresponding RMSD values are summarized in Table 3. Water molecules and ions within the simulation cell are not shown. RMSD values (in Angstroms) of the sugar-phosphate backbone for the predicted ssDNA structures at different time points along the MD simulations with respect to the corresponding experimental structures downloaded from the PDB database. The corresponding values for the 3D predictions (Fig. 2) are included for completeness. Similar MD simulations were performed using the experimentally resolved structures as initial configurations, following the approach detailed in the methods section. To explore the structural deviation of the ssDNA molecules with respect to their corresponding starting structure, the RMSD of the sugar-phosphate backbone of each structure was calculated independently over the course of the 10-ns trajectory. The RMSDs provides valuable information on the conformational flexibility of the biomolecule and provides a means to evaluate the structural deviation of the biomolecule through time. The results of the RMSD studies (Fig. 6) show that the predicted structures have similar time-dependent behavior in comparison to the original structures.
Figure 6

Time evolution of the RMSDs for the sugar-phosphate backbone with respect to the structures at time 0 for the predicted (red curves) and original (blue curves) structures. The panels correspond to (a) 1BJH, (b) 1LA8, (c) 2M8Y, (d) 2VAH, and (e) 2L5K.

Time evolution of the RMSDs for the sugar-phosphate backbone with respect to the structures at time 0 for the predicted (red curves) and original (blue curves) structures. The panels correspond to (a) 1BJH, (b) 1LA8, (c) 2M8Y, (d) 2VAH, and (e) 2L5K.

Conclusions

DNA aptamers are more stable than their RNA counterparts, which is especially relevant for biomedical applications, but lack the wide range of computational tools for structural predictions currently available for single stranded RNA. Here, we have presented the first approach to predict the prototypical 3D structures of ssDNA required for aptamer applications. So far, there has not been a computational tool available for 3D structure prediction of DNA aptamers from their sequence and very few structures have been resolved experimentally. These limitations have constituted a major bottleneck for the emergent application of aptamers in biotechnology and for clinical use in biomedical applications. Our results strongly support that it is possible to accurately determine the structure of single-stranded DNA from sequence and provide an approach that works exceptionally well for the hairpin-like structural motif. We have shown that our approach faithfully predicts representative structures available in the Protein Data Bank and Nucleic Acids databases. Specifically, we have extensively validated the prediction capabilities of our approach against a pool of 24 ssDNA molecules with experimentally solved 3D structures selected through an exhaustive search for ssDNA molecules and aptamers in the PDB database (http://www.pdb.org), with sequences ranging from 7 to 27 nucleotides. The studied cases are representative of a wide variety of systems of biological and biomedical interest, including Mucin 1-binding aptamers and ssDNAs with uracil bases. The resulting structures can subsequently be used as inputs in computational docking methods to study the interactions with ligands[53]. We have shown that atomistic MD simulations can be used to further improve the structural predictions by studying the dynamics of the systems under conditions that mimic their targeted environment. Our results indicate that our approach works exceptionally well for the most common hairpin-like structural motif of ssDNA and it is expected to work for other systems with similar interactions between bases, like bulge loops and internal loops. Our results break the ground for future work to expand the applicability of our approach to other ssDNA folds, including G-quadruplex folds, typical of G-rich sequences such as those in the thrombin-binding aptamer, as well as the effect of covalent modifications to increase the stability of aptamers. For instance, it has been shown that phosphorothioate substitutions can substantially alter RNA conformation[54]. Understanding the specific interactions involved in stabilizing biomolecular complexes immobilized on solid substrates is essential for designing biosensors with improved sensitivity, specificity, and reliability. Several key studies have shown that the sensitivity of aptamer-based biosensors is related to the surface crowding of the immobilized aptamers on the biosensor surface[11, 55, 56] and have suggested that steric hindrance or electrostatic repulsion of the charged aptamer molecules at higher concentrations on the surface of the biosensor have an impact on biosensor performance. However, this area is still poorly understood because of the lack of understanding of the molecular level interactions occurring at the biosensor surface. Furthermore, understanding how small sequence changes can lead to a difference in affinity can be used to make marked improvements in biosensor performance by employing rational design techniques[57-60]. Our approach, therefore, provides a much-needed starting point to gain better insights in the performance of aptamer-based biosensors with the aim of improving biosensor performance by rational design techniques.
  52 in total

1.  RNA-Puzzles: a CASP-like evaluation of RNA three-dimensional structure prediction.

Authors:  José Almeida Cruz; Marc-Frédérick Blanchet; Michal Boniecki; Janusz M Bujnicki; Shi-Jie Chen; Song Cao; Rhiju Das; Feng Ding; Nikolay V Dokholyan; Samuel Coulbourn Flores; Lili Huang; Christopher A Lavender; Véronique Lisi; François Major; Katarzyna Mikolajczak; Dinshaw J Patel; Anna Philips; Tomasz Puton; John Santalucia; Fredrick Sijenyi; Thomas Hermann; Kristian Rother; Magdalena Rother; Alexander Serganov; Marcin Skorupski; Tomasz Soltysinski; Parin Sripakdeevong; Irina Tuszynska; Kevin M Weeks; Christina Waldsich; Michael Wildauer; Neocles B Leontis; Eric Westhof
Journal:  RNA       Date:  2012-02-23       Impact factor: 4.942

2.  Surface immobilization methods for aptamer diagnostic applications.

Authors:  Subramanian Balamurugan; Anne Obubuafo; Steven A Soper; David A Spivak
Journal:  Anal Bioanal Chem       Date:  2007-09-23       Impact factor: 4.142

3.  In silico study on multidrug resistance conferred by I223R/H275Y double mutant neuraminidase.

Authors:  Hua Tan; Kun Wei; Jiguang Bao; Xiaobo Zhou
Journal:  Mol Biosyst       Date:  2013-11

4.  Aptamer-based electrochemical biosensor for interferon gamma detection.

Authors:  Ying Liu; Nazgul Tuleouva; Erlan Ramanculov; Alexander Revzin
Journal:  Anal Chem       Date:  2010-10-01       Impact factor: 6.986

5.  Significance of root-mean-square deviation in comparing three-dimensional structures of globular proteins.

Authors:  V N Maiorov; G M Crippen
Journal:  J Mol Biol       Date:  1994-01-14       Impact factor: 5.469

6.  Solution structure of a truncated anti-MUC1 DNA aptamer determined by mesoscale modeling and NMR.

Authors:  Meriem Baouendi; Jean A H Cognet; Catia S M Ferreira; Sotiris Missailidis; Jérome Coutant; Martial Piotto; Edith Hantz; Catherine Hervé du Penhoat
Journal:  FEBS J       Date:  2012-01-03       Impact factor: 5.542

Review 7.  Aptamers as targeted therapeutics: current potential and challenges.

Authors:  Jiehua Zhou; John Rossi
Journal:  Nat Rev Drug Discov       Date:  2016-11-03       Impact factor: 84.694

8.  QGRS Mapper: a web-based server for predicting G-quadruplexes in nucleotide sequences.

Authors:  Oleg Kikin; Lawrence D'Antonio; Paramjeet S Bagga
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

9.  Ab initio thermodynamic modeling of distal multisite transcription regulation.

Authors:  Leonor Saiz; Jose M G Vilar
Journal:  Nucleic Acids Res       Date:  2007-12-01       Impact factor: 16.971

View more
  22 in total

Review 1.  Advances and Challenges in Small-Molecule DNA Aptamer Isolation, Characterization, and Sensor Development.

Authors:  Haixiang Yu; Obtin Alkhamis; Juan Canoura; Yingzhu Liu; Yi Xiao
Journal:  Angew Chem Int Ed Engl       Date:  2021-02-09       Impact factor: 15.336

2.  Gut bacterial isoamylamine promotes age-related cognitive dysfunction by promoting microglial cell death.

Authors:  Yun Teng; Jingyao Mu; Fangyi Xu; Xiangcheng Zhang; Mukesh K Sriwastva; Qiaohong M Liu; Xiaohong Li; Chao Lei; Kumaran Sundaram; Xin Hu; Lifeng Zhang; Juw Won Park; Jae Yeon Hwang; Eric C Rouchka; Xiang Zhang; Jun Yan; Michael L Merchant; Huang-Ge Zhang
Journal:  Cell Host Microbe       Date:  2022-06-01       Impact factor: 31.316

3.  In-Silico Selection of Aptamer Targeting SARS-CoV-2 Spike Protein.

Authors:  Yu-Chao Lin; Wen-Yih Chen; En-Te Hwu; Wen-Pin Hu
Journal:  Int J Mol Sci       Date:  2022-05-22       Impact factor: 6.208

Review 4.  Thermotropic Liquid Crystal-Assisted Chemical and Biological Sensors.

Authors:  Nicolai Popov; Lawrence W Honaker; Maia Popova; Nadezhda Usol'tseva; Elizabeth K Mann; Antal Jákli; Piotr Popov
Journal:  Materials (Basel)       Date:  2017-12-23       Impact factor: 3.623

Review 5.  Multifunctional Nanotechnology-Enabled Sensors for Rapid Capture and Detection of Pathogens.

Authors:  Fatima Mustafa; Rabeay Y A Hassan; Silvana Andreescu
Journal:  Sensors (Basel)       Date:  2017-09-15       Impact factor: 3.576

6.  Nucleobase-Guanidiniocarbonyl-Pyrrole Conjugates as Novel Fluorimetric Sensors for Single Stranded RNA.

Authors:  Željka Ban; Biserka Žinić; Robert Vianello; Carsten Schmuck; Ivo Piantanida
Journal:  Molecules       Date:  2017-12-13       Impact factor: 4.411

Review 7.  Molecular Modeling Applied to Nucleic Acid-Based Molecule Development.

Authors:  Arne Krüger; Flávia M Zimbres; Thales Kronenberger; Carsten Wrenger
Journal:  Biomolecules       Date:  2018-08-27

8.  DNA aptamers for the recognition of HMGB1 from Plasmodium falciparum.

Authors:  Diego F Joseph; Jose A Nakamoto; Oscar Andree Garcia Ruiz; Katherin Peñaranda; Ana Elena Sanchez-Castro; Pablo Soriano Castillo; Pohl Milón
Journal:  PLoS One       Date:  2019-04-09       Impact factor: 3.240

9.  Aptamers targeting protein-specific glycosylation in tumor biomarkers: general selection, characterization and structural modeling.

Authors:  Ana Díaz-Fernández; Rebeca Miranda-Castro; Natalia Díaz; Dimas Suárez; Noemí de-Los-Santos-Álvarez; M Jesús Lobo-Castañón
Journal:  Chem Sci       Date:  2020-07-21       Impact factor: 9.825

10.  Atomic Scale Interactions between RNA and DNA Aptamers with the TNF-α Protein.

Authors:  Homayoun Asadzadeh; Ali Moosavi; Georgios Alexandrakis; Mohammad R K Mofrad
Journal:  Biomed Res Int       Date:  2021-07-16       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.