Literature DB >> 34178311

DNA barcode by flossing through a cylindrical nanopore.

Abstract

We report an accurate method to determine DNA barcodes from the dwell time measurement of protein tags (barcodes) along the DNA backbone using Brownian dynamics simulation of a model DNA and use a recursive theoretical scheme which improves the measurements to almost 100% accuracy. The heavier protein tags along the DNA backbone introduce a large speed variation in the chain that can be understood using the idea of non-equilibrium tension propagation theory. However, from an initial rough characterization of velocities into "fast" (nucleotides) and "slow" (protein tags) domains, we introduce a physically motivated interpolation scheme that enables us to determine the barcode velocities rather accurately. Our theoretical analysis of the motion of the DNA through a cylindrical nanopore opens up the possibility of its experimental realization and carries over to multi-nanopore devices used for barcoding. This journal is © The Royal Society of Chemistry.

Entities: Chemical

Year: 2021 PMID： 34178311 PMCID： PMC8190898 DOI： 10.1039/d1ra00349f

Source DB: PubMed Journal: RSC Adv ISSN： 2046-2069 Impact factor: 4.036

Introduction

A DNA barcode consists of a short strand of DNA sequence taken from a targeted gene like COI or cox I (Cytochrome C Oxidase 1)[1] present in the mitochondrial gene in animals. The unique combination of nucleotide bases in barcodes allows us to distinguish one species from another. Unlike relying on the traditional taxonomical identification methods, DNA barcoding provides an alternative and reliable framework to categorize a wide variety of specimens obtained from the natural environment. Though researchers relied on DNA sequencing techniques for the identification of unknown species for a long time, in 2003, Hebert et al.[2] proposed the mitochondrial gene (COI) region barcoding to classify cryptic species[3] from the entire animal population. Since then, several studies have shown the potential applications of barcoding in conserving biodiversity,[4] estimating phyletic diversity, identifying disease vectors,[5] authenticating herbal products,[6] unambiguously labeling the food products,[7,8] and protecting endangered species.[4] Traditional sequencing methods based on chemical analysis are still been widely used in the biological community to determine the barcodes. Since its discovery,[9-12] last three decades have witnessed rapid progress in single nanopore based sequencing methods.[13-16] Consequently, a variety of experimental protocols are being explored for a cost effective, high throughput, without the use of chemical reagents, and real time sequencing,[17] sequence mapping,[18] detection of duplex-DNA for genomic profiling,[19] sorting of ultralong human genomic DNA,[20] topological variations of DNA at the single molecule level,[21] and barcode copying.[22] Recently, the possibility of determining DNA barcodes has been demonstrated in a dual nanopore device, by scanning a captured dsDNA multiple times by applying a net periodic bias across the two pores.[23-26] Theoretical and simulation studies have also been reported in the context of a double nanopore system.[27-29] In this article, we investigate a similar strategy in silico in a cylindrical nanopore and demonstrate that a cylindrical nanopore can have a competitive advantage over a dual nanopore system. By studying a model dsDNA with barcodes using Brownian dynamics we establish an important result that it is due to the disparate dwell time and speed of the barcodes (“tags”) compared to the nucleotide segments (“monomers”) the current blockade time information only is not enough and will lead to an inevitable underestimation of the distance between the barcodes. Furthermore, using the ideas of the tension propagation theory,[30,31] we demonstrate that information about the fast-moving nucleotides in between the barcodes, – not easily accessible experimentally, is a key element to resolve the underestimation. We suggest how to obtain this information experimentally and provide a physically motivated “two-step” interpolation scheme for an accurate determination of barcodes, even when the separation of the (unknown) tags has a broad distribution.

Methods

The Model System: Our in silico coarse-grained (CG) model of a dsDNA consist of 1024 monomers interspersed with 8 barcodes at different locations (shown in Fig. 1 and Table 1) is motivated by an experimental study by Zhang et al. on a 48 500 bp long dsDNA with 75 bp long protein tags at random locations along the chain[24-26] using a dual nanopore device. Here we explore if a cylindrical nanopore with an applied bias force can resolve the barcodes with similar accuracy or better. We purposely choose positions of the 8 barcodes (Table 1) to study how the effect of disparate distances among the barcodes affects their measurements.

Fig. 1

(a) Schematics of a model dsDNA captured in cylindrical nanopore of diameter d = 2σ and thickness tpore, where σ is the diameter of each monomer (purple beads). Protein tags (barcodes) of the same diameter and of different colors (only three are shown in here) interspersed along the dsDNA backbone. Opposite but unequal forces and are applied in the nanopore to straighten the dsDNA as it translocates in the direction of the bias net force . (b) Positions of the protein tags along the contour length of the model dsDNA of length L = 1024σ which represents an actual dsDNA of 48 500 base pairs. The locations of the tags are listed in Table 1.

Tag positions along the dsDNA

Tag #	T₁	T₂	T₃	T₄	T₅	T₆	T₇	T₈
Position	154	369	379	399	614	625	696	901
Separation	154	215	10	20	215	11	71	205

The tags T2, T3, T4 are closely spaced and form a group. Likewise, another group consisting of T5 and T6 are put in a closer proximity to T7. The tags T1 and T8 are further apart from the rest of the tags. The general scheme of the BD simulation strategy for a translocating homo-polymer under alternate force bias has been discussed in our recent publication[27,28] and in the accompanying ESI.† In this article, tags are introduced by choosing the mass and friction coefficient to be different than the rest of the monomers present along the chain. This requires modification of the BD algorithm as discussed in the ESI-I.† The protein tags used in the experiments[24-26] translate to about three monomers in the simulation. The heavier and extended tags introduce a larger viscous drag. Instead of explicitly putting side-chains at the tag locations, we made the mass and the friction coefficient of the tags 3 times larger. This we find enough to resolve the distance between the tags. Two forces and at each end of the cylinder in opposite directions keep the DNA straight inside the channel and allows translocation in the direction of the net bias (please see Fig. 1 and 2).

Fig. 2

Demonstration of the epoch when the bias voltage is flipped. (a) Showing the last barcode is yet to translocate in the downward direction when the net bias . (b) Shows the situation at a later time when all the barcodes crossed the cylindrical pore during downward translocation with a portion of the end segment still remaining inside the pore. At this point the bias is flipped with an upward bias , translocation now occurs in the upward direction. In this way, the DNA remains captured all the time during repeated scans.

Results and discussion

Barcodes from repeated scanning: As potentially could be done in a nanopore experiments, we switch the differential bias once the first tag or the last tag (T1, T8) translocates through the nanopore during up (U)/down (D) → D/U translocation yet having the end segments inside the pore (please see Fig. 2) so that the DNA remains captured in the cylindrical pore and the barcodes are scanned multiple times. The question we ask: can we recover the actual barcode locations from these scanning measurements, so that the method can be applied to determine unknown barcodes ? We monitor two important quantities, – (i) the dwell time of each monomer and tag (ii) the time delay of arrival of any two tags at the pore as demonstrated in Fig. 3 and explained below. For each up/down-ward scan we measure the dwell time of a bead (either a monomer or a tag) with index m as follows:Here, and are the arrival and exit time of the mth bead as further demonstrated in Fig. 3(a). The corresponding dwell velocities and for the mth bead along the nanopore channel axis (please see Fig. 3(a)) can be obtained as follows,

Fig. 3

(a) Demonstration of calculation of wait time for T7 which is of index m = 696. The dwell velocity is then calculated using eqn (2). (b) Demonstration of calculation of tag time delay for tags T7 and T8 while they are moving downward. Please note that similar quantity for upward translocation as there is no symmetry of the tag positions along the chain.

In an actual experiment one measures the dwell velocities of the tags only which are equivalent to the current blockade durations. Non uniformity of the dwell velocity: The presence of tags with heavier mass (mtag = 3mbulk) and larger solvent friction (γtag = 3γbulk) introduces a large variation in the dwell time and hence a large variation in the dwell velocities of the DNA beads and tags (see Fig. 4). In general, there is no up-down symmetry for the dwell time/velocity as tags are not located symmetrically along the chain backbone. Thus the physical quantities are averaged over U → D and D → U translocation data. The average dwell velocity clearly shows two different velocity envelopes – the tags residing at the lower envelope. Fig. 4(b) shows that the dwell velocities of the tags (green circle ) are significantly lower than the velocity of the nucleotides in between the tags, which will underestimate the barcode distances as explained later. We further notice that increasing the pore width resolves the barcodes better.

Fig. 4

(a) Dwell velocity of the monomers in a cylindrical nanopore system. and represent downward and upward translocation directions respectively. (b) Dwell velocity averages of the both directions are represented in green circles (). Filled symbols correspond to the dwell velocities of the tags. The average velocity of both directions of a homo-polymer (tags are absent) is shown in the blue curve. The green and blue solid lines are the averages of the corresponding colored curves. The magenta solid line represents the average velocity of the entire chain obtained from eqn (6).

Barcode estimation using a cylindrical nanopore setup

If the dsDNA with barcodes were a rigid rod, then one could obtain the barcode distances and between tags T and T from the following equations (shown for downward translocation only):Here is the time delay of arrivals of T and T for downward translocation (please see Fig. 3(b) which explains the special case when m = 7 and n = 8). Similar equations can be obtained by flipping D and m with U and n respectively. In other words, eqn (3) gives the shortest distance and not necessarily the contour length (the actual distance) between the tags. However, this is the only data accessible through experiments and likely to provide an underestimation of the barcodes. Fig. 5(a) shows the data for 300 scans. The averages with error bars are shown in the 3rd column of Table 2. Excepting for T6 these measurements grossly underestimate the actual positions with large error bars.

Fig. 5

(a) Barcodes are generated using different methods. In each graph, the colored symbols/lines refer to, from left to right the barcodes T1, T2, T3, T4, T6, T7, and T8 respectively. For better visibility every sixth data points are shown. The open and filled symbols represent barcodes for U → D and D → U translocation using (a) eqn (3); (c) using method 1, and (e) using method 2. In (b), (d) and (e) the solid lines refer to the actual location of the barcodes and the dashed lines correspond to the averages from (a), (c) and (e) respectively. The improved accuracy for the latter two methods are readily visible in (d) and (f) where the simulation and the actual data are almost indistinguishable.

Barcode from various methods

Tag label	Relative distance w.r.t T₅	Barcode
		(Eqn (3))	(Method-I)	(Method-II)
		×	✓	✓
T₁	460	373 ± 122	459 ± 59	460 ± 43
T₂	245	197 ± 67	250 ± 39	250 ± 32
T₃	235	183 ± 63	237 ± 38	237 ± 32
T₄	215	167 ± 54	211 ± 35	211 ± 30
T₅	0	0	0	0
T₆	11	11 ± 3	14 ± 4	11 ± 3
T₇	82	68 ± 23	86 ± 23	86 ± 21
T₈	287	230 ± 73	287 ± 65	287 ± 73

Tension propagation (TP) theory explains the source of discrepancy and provides solution

Unlike a rigid rod, tension propagation governs the semi-flexible chain's motion in the presence of an external bias. In TP theory and its implementation in Brownian dynamics, the motion of the subchain in the cis side decouples into two domains.[30,31] In the vicinity of the pore, the tension front affects the motion directly while the second domain remains unperturbed, beyond the reach of the TP front. In our case, after the tag T translocates through the pore, following monomers are dragged into the pore quickly by the tension front, analogous to the uncoiling effect of a rope pulled from one end (please refer to the movie in the ESI†). The onset of this sudden faster motion continues to grow and reaches its maximum until the tension front hits the subsequent tag T, with larger inertia and viscous drag. At this time (called the tension propagation time[32]) the faster motion of the monomers (Fig. S2† in ESI) begins to taper down to the velocity of the tag T. This process continues from one segment to the other. These contour lengths of faster moving segments in between two barcodes are not accounted for in eqn (3). The experimental protocols are limited in extracting barcode information through eqn (3) (measuring current blockade time) and therefore, likely to underestimate the barcodes, unless the data is corrected to account for the faster moving monomers in between two tags. How to determine the barcodes correctly? Fig. 1(b) and the 3rd column of Table 2 when looked closely provide clues to the solution of the underestimated tag distances. We note that locations of the isolated tags (such as, T1 and T8) far from T5 have larger error margins while T6 which is adjacent to T5 has the correct distance from eqn (3). It is simply because in the later case the contour length between T5 and T6 is almost equal to the shortest distance. Evidently, the error margins increase with increased separation. To compare the barcodes obtained from eqn (3) with the actual contour length (see 2nd column of Table 2) between tag pairs, we invoke the Flory theory[33] and describe the conformations of the translocating segments in terms of a dynamical effective exponent which reveals the behavior of small and large segments between adjacent barcodes. The heatmap in Fig. 6 confirms that when the separation between the tag pairs is less compared to the DNA length, the connecting segment behaves like a rigid rod ( > 0.6). While for the isolated tags, < 0.6 suggests that barcodes are shorter than their respective contour lengths. This clarifies the reason behind the barcode underestimation for the tags which are spaced apart while yielding accurate barcodes for tags located in groups.

Fig. 6

The dynamical effective exponent () for the segment connecting a tag pair represented as a two dimensional heatmap array on the color scale ranging from blue to white.

Within the experimental set up we suggest the following two methods which will account for the larger velocities of the monomers. Method 1 – Barcode from known end-to-end Tag distance: In order to measure the barcode distances accurately one thus needs the velocity of the entire chain. If the distance between (T1 and T8) d18 ≃ L, then the velocity of the segment d18 will approximately account for the average velocity of the entire chain vchain and correct the problem as demonstrated next. First we estimate the velocity of the chainassuming we know d18 and is the time delay of arrival at the pore between T1 and T8 for U → D translocation. We then estimate the barcode distance between tags T and T as In the similar fashion one can calculate using and information respectively. How do we know d18? One can use d18 ≈ Lscan and vchain ≈ v̄scan, from eqn (6) where v̄scan is the average velocity of the scanned length Lscan from repeated scanning as discussed in the next paragraph. This method is effective for estimating the long-spaced barcodes but it overestimates the barcode distance if multiple barcodes are close by as evident in Fig. 5(d) and the 4th column of Table 2. Thus, we know how to obtain barcode distances accurately when they are close by (from eqn (3)) and for large separation (eqn (5)). We now apply the physics behind these two schemes to derive an interpolation scheme that will work for all separations among the barcodes. Method 2 – Barcode using two-step method: Average scan time scan for the entire chain (which can be measured experimentally) is a better way to estimate the average velocity of the chain. Lscan is the maximal length up to which the dsDNA segment remains captured inside the nanopore while getting scanned and denotes the theoretical maximum beyond which the dsDNA will escape from the nanopore, thus, L ≈ Lscan. For example, in our simulation, scanning length Lscan = 0.804L. We denote the average scan velocity aswhere τscan(i) is the scan time for the ith event, and Nscan = 300. To proceed further, we use our established results that the monomers of the dsDNA segments in between the tags move with velocity v̄scan, while tags move with their respective dwell velocities and (eqn (2)). We then calculate the segment velocity between two tags by taking the weighted average of the velocities of tags and DNA segment in between as follows. First, we estimate the approximate number of monomers (〈bl〉 is the bond-length) by considering the tag velocities only using eqn (3). We then calculate the segment velocity accurately by incorporating weighted velocity contributions from both the tags and the monomers connecting the tags.Here, nnext are the number of neighboring monomers adjacent to the tags those share the same tag velocity. We checked that nnext ≈ 1–3 does not make a noticeable difference in the final result. The barcodes are finally estimated by multiplying the calculated 2-step velocity in eqn (7) above by the tag time delay asfor U → D translocation and repeating the procedure for D → U translocation. This 2-step method accurately captures the distance between the barcodes when the two tags are in proximity or spaced apart from each other. Table 2 and Fig. 5 summarize our main results and claims.

Summary & future work

Motivated by the recent experiments we have designed barcode determination experiment in silico in a cylindrical nanopore using the Brownian dynamics scheme on a model dsDNA with known locations of the barcodes. We have carefully chosen the locations of the barcodes so that the separations among the barcodes span a broad distribution. We discover that if we only use the dwell time data for the tags from multiple scans of the dsDNA to calculate the velocity of the segments joining two tags then this method underscores the barcode distances for tags which are further apart. Our simulation guides us to conclude that the source of this underestimation lies in neglecting the information contained in the faster moving DNA segments in between any two tags. We use non-equilibrium tension propagation theory to explain the non-monotonic velocity of the chain segments where the barcodes lie at the lower bound of the velocity envelope as shown in Fig. 4(b). The emerging picture readily shows the way how to rectify this error by introducing an interpolation scheme that works well to determine barcodes spaced apart for all distances which we validate using simulation data. We suggest how to implement the scheme in an experimental setup. It is important to note that the interpolation scheme-based concept of the TP theory is quite general and we have ample evidence that this will work in a double nanopore system as well. It is worth noting that our computational work has close similarity to a recent experimental study of sequence mapping using a plasmid DNA labeled with oligodeoxynucleotides at several locations.[18] The velocity renormalization scheme is pretty much the same as ours – indicating the role of the velocity of the entire chain for an accurate determination of genomic distances. The origin of velocity fluctuation has also been discussed by Bo et al.[34] for short chains. However, these papers studied and collected data from individual translocation events and did not scan the chain repeatedly. We would like to conclude with some remarks about the limitations of the model and some refinements which we plan to accomplish in future. We carried out Brownian dynamics – thus we do not have hydrodynamics – any discussion of Zimm time as mentioned in some experiments is not relevant here. Also, typically, in such simulation protocols, the velocities are several orders of magnitude faster (this is a general rule of thumb in most of the CG simulations) and cannot be compared with experimental velocities directly. We also have the net force bias as a variable which would also make the velocity faster or slower. But there are other ways to compare with experiments by matching the simulation data with those obtained experimentally, such as distribution of wait time. This will require an adjustment of the external bias. In a cylindrical nanopore the current blockade due to the tags is sequential once the DNA is captured in the pore,[18] while in a double nanopore set up – once it is captured it is not known what fragment of the DNA is residing between the pores which may lead to ambiguity about identifying the tags. Thus it is worthwhile to explore the concept of flossing in a cylindrical nanopore set up. Finally, in this study we consider the electric field to be local, – strictly inside the pore, while the effect of the electric field is nonzero in the vicinity of the pore which will affect both the translocation speed and the dwell time distribution. Typically in experimental salt concentrations a dsDNA has a screened partial charge of 0.2–0.3 times the charge of an electron.[34] With the electric field present beyond the pore, it will be worthwhile to study how the addition of a screened coulomb charge on the DNA monomer[35,36] in our model will affect the flossing. We believe our results will promote new experimental and theoretical studies on nanopore translocation.

Conflicts of interest

The authors declare no competing financial interest.

32 in total

DNA barcode by flossing through a cylindrical nanopore.

Introduction

Methods

Results and discussion

Barcode estimation using a cylindrical nanopore setup

Tension propagation (TP) theory explains the source of discrepancy and provides solution

Summary & future work

Conflicts of interest

1. Voltage-driven DNA translocations through a nanopore.

2. Biological identifications through DNA barcodes.

3. Rapid nanopore discrimination between single polynucleotide molecules.

4. Recognizing a single base in an individual DNA strand: a step toward DNA sequencing in nanopores.

5. Origins and consequences of velocity fluctuations during DNA passage through a nanopore.

6. Characterization of individual polynucleotide molecules using a membrane channel.

7. Single Molecule DNA Resensing Using a Two-Pore Device.

8. Electrical DNA Sequence Mapping Using Oligodeoxynucleotide Labels and Nanopores.

9. Detecting topological variations of DNA at single-molecule level.

10. On-Chip Stretching, Sorting, and Electro-Optical Nanopore Sensing of Ultralong Human Genomic DNA.

1. Electronic Mapping of a Bacterial Genome with Dual Solid-State Nanopores and Active Single-Molecule Control.

2. Discriminating protein tags on a dsDNA construct using a Dual Nanopore Device.