Xinjia Zhao1,2, Yuru Liu1, Xiaoyu Chen1,2, Zhuang Mi1,2, Wei Li1, Pengye Wang1,2,3, Xinyan Shan1, Xinghua Lu1,2,4,3. 1. Beijing National Laboratory for Condensed-Matter Physics and Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China. 2. School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100190, China. 3. Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China. 4. Center for Excellence in Topological Quantum Computation, Chinese Academy of Sciences, Beijing 100190, China.
Abstract
Detection and characterization of an individual cisplatin adduct on a single DNA molecule is a demanding task. We explore the characteristic features of cisplatin adducts in the nanopore sequencing signal in aspects of dwell time, genome anchored current trace, and basecalling accuracy. The offset between the motor protein and the nanopore constriction region is revealed by dwell time analysis to be about 14 bases in the nanopore device as we examined. Characteristic distortions due to cisplatin adducts are illustrated in genome anchored current trace analysis, constituting the fingerprint for identification of cisplatin adduct. The sharp increase in odds ratio at the location of adducting sites provides additional feature in the detection of the adduct. By these combined methods, single cisplatin adducts can be detected with high fidelity on a single read of the DNA sequence. The study demonstrates an effective method in the detection and characterization of single cisplatin adducts on DNA at the single-molecule level and with single nucleotide spatial resolution.
Detection and characterization of an individual cisplatin adduct on a single DNA molecule is a demanding task. We explore the characteristic features of cisplatin adducts in the nanopore sequencing signal in aspects of dwell time, genome anchored current trace, and basecalling accuracy. The offset between the motor protein and the nanopore constriction region is revealed by dwell time analysis to be about 14 bases in the nanopore device as we examined. Characteristic distortions due to cisplatin adducts are illustrated in genome anchored current trace analysis, constituting the fingerprint for identification of cisplatin adduct. The sharp increase in odds ratio at the location of adducting sites provides additional feature in the detection of the adduct. By these combined methods, single cisplatin adducts can be detected with high fidelity on a single read of the DNA sequence. The study demonstrates an effective method in the detection and characterization of single cisplatin adducts on DNA at the single-molecule level and with single nucleotide spatial resolution.
Cisplatin is an effective,
widely used drug for many cancers, which
shows a high level and broad spectrum of antitumor activities.[1−6] It has been known that the binding between this drug molecule and
the DNA molecule disables the physiologic activities of DNA and eventually
induces cell apoptosis. Great efforts have been made to probe and
to understand the interaction between the cisplatin molecules and
DNA and how the cisplatin adduct affects the DNA duplication in the
cell.[3,7] Many experimental methods such as nuclear
magnetic resonance, optical/magnetic tweezers,[8] and atomic force microscopy[9,10] have been employed
to probe the cisplatin–DNA interactions at the single-molecule
level. Besides these methods, nanopore is an emerging new experimental
tool that is capable of detecting single DNA molecule as well as its
modifications.[11−15] For cisplatin–DNA interaction, Zhou et al. employed a solid-state
nanopore for the real-time monitoring of DNA configuration and revealed
three distinct stages of the interaction process.[7] Further understanding on the dynamic behavior of cisplatin–DNA
interaction requires accurate information on the location and quantity
of cisplatin molecules adduct bonded to a single DNA molecule. Spatial
resolved detection and characterization of individual cisplatin adducts
along a single DNA molecule, however, has not been reported in previous
reports.Recent development in sequence-based nanopore techniques
has made
it possible to detect the subtle modification along the DNA strand
with single-base resolution. For example, sequencers from Oxford Nanopore
Technologies (ONT) have been employed in not only DNA[16,17] and RNA sequencing[18,19] but also DNA methylation detection,[20−22] discrimination in adducts size, regiochemistry, and functional groups.[23] This unique experimental tool opens an additional
door toward the characterization of cisplatin–DNA interaction
at the sequence level. Here, we report the study on the detection
and characterization of single cisplatin adducts on DNA molecules
by using ONT’s nanopore sequencer MinION. The presence of the
drug molecule shows a significant influence not only on the effective
cross section of the nucleotide but also on the DNA unwinding process.
As a result, the dwell time is slowed down by the presence of cisplatin
adducts, and deeper current blockade occurs when cisplatin-bond nucleotide
enters the nanopore constriction region. In addition, the perturbated
current level reduces the accuracy in basecalling. The comprehensive
analysis of these effects provides a fingerprint of cisplatin adduct
on DNA with base-resolved spatial resolution.
Results and Discussion
Experimental
Setup and Sample Information
The schematic
diagram for nanopore[24] sequencing is shown
in Figure a. Under
an external drive voltage, a double-stranded DNA is unwound by a motor
protein, and then, one of the two strands transit through the nanopore.
As the single-stranded DNA (ssDNA) passes through the constriction
region of the nanopore, variation in base size is imprinted in the
current signal. The raw current signal is then basecalled to reveal
the DNA sequence and other information for further process, as shown
in Figure b.
Figure 1
Experimental
setup and analysis methods in this study. (a) Simplified
diagram for nanopore DNA sequencing. (b) Analysis pipelines as performed.
The DNA sequence, as well as information on model state and move,
is basecalled from raw signal. Then, dwell time analysis is performed
with a homebuilt algorithm. Genome anchored current trace analysis
is acquired by program Tombo. ESB and odds ratio are obtained by program
Eligos2. The three analysis methods can be performed in parallel.
Experimental
setup and analysis methods in this study. (a) Simplified
diagram for nanopore DNA sequencing. (b) Analysis pipelines as performed.
The DNA sequence, as well as information on model state and move,
is basecalled from raw signal. Then, dwell time analysis is performed
with a homebuilt algorithm. Genome anchored current trace analysis
is acquired by program Tombo. ESB and odds ratio are obtained by program
Eligos2. The three analysis methods can be performed in parallel.Four DNA samples have been used in our study. The
first DNA sample
is 2449 bp long, with cisplatin bonded with two adjacent guanines
at the site of 2386–2387 bp. The second sample is the same
DNA just without cisplatin. The third DNA sample is 2454 bp long,
with a similar sequence as the first sample and with cisplatin bonded
at the site of 2414–2415 bp. The fourth sample is the same
DNA as the third sample, without cisplatin adduct. In the first sample,
the cisplatin molecules adduct to the strand of the DNA that transits
through the nanopore. As a comparison, the cisplatin-adducted ssDNA
of the third sample does not transit through the nanopore, its complementary
strand is sequenced instead. The second and fourth samples are used
as control groups. For detailed sequence of the DNA samples and sample
preparation method, please see Methods section.Data analysis plays more and more important roles in single-molecule
analysis based on nanopores.[25−27] In this study, three analysis
methods are employed to characterize the effect of cisplatin adducts
on the DNA sequencing results, with details explained as following.
Dwell Time
First, the dwell time for each base during
sequencing is computed by summarizing the move delay in the basecalled
results. Since the DNA translocation is controlled by the motor protein
attached to the nanopore entrance, the dwell time for each base is
associated with the DNA unwinding time. A homebuilt LabVIEW program
is used to load the basecalling results and compute the dwell time
for each base in the standard sequence.[28]Figure shows
the dwell time as a function of the sequence for each sample. Each
curve is averaged over 4000 reads of the basecalled current signals,
and the standard error is below 0.1 ms. Figure a presents the result for the first and second
samples. The two curves are almost the same for most of the sequences.
Evident difference, however, emerges in the last hundred bases, as
shown in the zoom-in plot of Figure b. The dwell time for the cisplatin-adducted ssDNA
(first sample, red line) is apparently higher than that for the control
group (second sample, black line). Especially at the site of 2372
bp, the dwell time increases sharply by fourfolds, as marked by the
arrow in Figure b.
The site of 2372 bp is about 14 bp before the binding position of
cisplatin molecule, which is at the range of 2386–2387 bp.
This difference reveals the offset distance between nanopore current
sensing region and the motor protein, which is consistent with the
estimation from physical model structures. The influence of cisplatin
on the DNA unwinding process is clearly illustrated here. The overall
increase in dwell time around the binding site is due to the increased
viscous force resulting from the presence of cisplatin adduct. As
a contrast experiment, Figure c,d shows the dwell time for the third and fourth samples.
There are still some slight variations between the two curves around
the cisplatin bonding site at 2414 bp, but the difference is much
smaller as compared with that in Figure a,b. This is reasonable as the cisplatin-adducted
strand does not go inside the motor protein, generating much less
obstruction to the unwinding process. We note that the dwell time
starts to drop in the last 20 bases. This is because that the drag
force becomes less as it approaches the end of translocation, resulting
in an increase in speed.[29] The length of
accelerated region is consistent with the Kuhn length.[29]
Figure 2
Dwell time during DNA sequencing. (a) Dwell time for each
base
along the cisplatin-adducted ssDNA (test group, red line) and the
nonadduct ssDNA (control group, black line). The DNA sample is 2449
bp long, and the cisplatin is attached at the site of 2386–2387
bp. (b) Zoom-in of figure (a) in the range of 2200–2449 bp.
There is a sharp peak at the 2372 bp site, as marked by red arrow.
(c) Dwell time for each base along the ssDNA with cisplatin attached
to the other chain (test group, red line) and the nonadduct ssDNA
(control group, black line). The DNA sample is 2454 bp long, and the
cisplatin is attached at the site of 2414–2415 bp. (d) Zoom-in
of figure (c) in the range of 2200–2454 bp. The contrast is
much smaller than that in (b).
Dwell time during DNA sequencing. (a) Dwell time for each
base
along the cisplatin-adducted ssDNA (test group, red line) and the
nonadduct ssDNA (control group, black line). The DNA sample is 2449
bp long, and the cisplatin is attached at the site of 2386–2387
bp. (b) Zoom-in of figure (a) in the range of 2200–2449 bp.
There is a sharp peak at the 2372 bp site, as marked by red arrow.
(c) Dwell time for each base along the ssDNA with cisplatin attached
to the other chain (test group, red line) and the nonadduct ssDNA
(control group, black line). The DNA sample is 2454 bp long, and the
cisplatin is attached at the site of 2414–2415 bp. (d) Zoom-in
of figure (c) in the range of 2200–2454 bp. The contrast is
much smaller than that in (b).
Genome Anchored Current Trace Analysis
Second, a genome
anchored plotting is generated to investigate the signal deviation.
The calculation is performed by software Tombo, a suite of useful
tools for the analysis and visualization of raw nanopore signal. It
has been employed to detect base modifications in DNA sequences like
DNA methylation.[20−22] The raw current signal can be normalized and assigned
to each base of the sequence by a resquiggle algorithm in Tombo. Figure a shows the resquiggled
signals of individual reads derived from the first and second samples,
overlaid for comparison. The horizontal axis is the sequence of the
ssDNA that transits through the nanopore. The signal for each base
corresponds to the third base in a 5-mer model state (see k-mer signal
level plot in the Supporting Information). For example, the resquiggled signals at 2380 for base “C”
are associated with the 5-mer model state “CTCCT.” The
test group (red line) and control group (black line) exhibit clear
difference around cisplatin binding sites between base positions of
2384 and 2390. The cisplatin-adducted bases “GG” are
marked by red stars. Quantitative analysis shows that the offset and
standard deviation in the signal are most significant on the cisplatin-adducted
sites. Figure b shows
the statistics in the resquiggled current traces for the bases from
2384 to 2391 bp, for both the test group and the control group. Adduct
of cisplatin results in deeper current blockade and higher standard
deviation on sites of 2386–2387 bp in the test group, as compared
to that in the control group. There are also some influences on the
sites nearby the adducted bases. The spreading of the influence is
partly due to the reason that the current blockade is indeed determined
by several DNA bases within the nanopore constriction region. In addition,
the cisplatin molecules adduct to two adjacent G bases, resulting
in significant conformation changes in the adducted G bases, as discussed
later.
Figure 3
Genome anchored current traces. (a) Genome anchored current traces
for the 2449 bp DNA samples (the first and second samples) obtained
by the resquiggle process. Red lines represent current signal of cisplatin-adducted
samples (test group). Black lines represent samples without cisplatin
(control group). Red stars indicate the cisplatin adduct sites. (b)
Statistics in resquiggled current traces for specific bases of interest
in (a). (c) Genome anchored current traces for the 2454 bp DNA samples
(the third and fourth samples). (d) Statistics in resquiggled current
traces for specific bases of interest in (c).
Genome anchored current traces. (a) Genome anchored current traces
for the 2449 bp DNA samples (the first and second samples) obtained
by the resquiggle process. Red lines represent current signal of cisplatin-adducted
samples (test group). Black lines represent samples without cisplatin
(control group). Red stars indicate the cisplatin adduct sites. (b)
Statistics in resquiggled current traces for specific bases of interest
in (a). (c) Genome anchored current traces for the 2454 bp DNA samples
(the third and fourth samples). (d) Statistics in resquiggled current
traces for specific bases of interest in (c).As a comparison, the resquiggled signal and analysis for the third
and fourth samples show no sign of cisplatin adduct, as shown in Figure c,d. This is reasonable
since the cisplatin does not pass through the nanopore with the DNA
strand in the third sample.
Error at Specific Base and Odds Ratio
Third, the influence
of cisplatin on basecalling accuracy is characterized by two terms,
i.e., error at specific base (ESB) and odds ratio. The distorted ionic
current signals give raise to a sequencing error following the basecalling
algorithm. ESB is defined as the frequency of the sum of substitutions,
insertions, and deletions of individual positions over the total mapped
reads obtained from read alignment results based on the reference
sequence. Odds ratios for all nucleotide positions were computed with
Fisher’s exact test, comparing the error of reads derived from
cisplatin-adducted DNA with that derived from the corresponding control
sample. Computer program Eligos2 is used to compute the sequencing
errors in individual bases and to compare the differences in error
fractions, producing odds ratios for individual nucleotide positions.[23,30]Figure a shows
the ESB of the first and second samples in the range of 2384–2394
bp. The ESB in the test sample is evidently higher than that in the
control sample, especially for the sites of 2388–2390 bp. The
odds ratio, as shown in Figure b, presents a clear peak on the site of 2389 bp, with a peak
value of 110 over a background fluctuation below 5. It shows an error
in basecalling that spans three nucleotides. The center position of
the peak is 2–3 bp behind the known adduct sites (2386 and
2387 bp). This lagging effect means that the presence of adduct affects
the basecalling result by a few bases later, which is likely due to
the basecalling algorithm. As a comparison, there is no distinct difference
in ESB between the third and fourth samples, as shown in Figure c. This is consistent
with the result shown in Figure c,d. As a result, there is no clear peak in the plot
of odds ratio, as shown in Figure d. As described in ref (23), current signal distortion induced by DNA adducts
are visualized as the π values and called as differential ionic
signal (DIS) plots. Figure e,f shows DIS plots for the 2449 and 2454 bp samples, respectively.
A clear peak emerges on the site of 2382 bp in Figure e, indicating that cisplatin influences mostly
current signal on the site. On the contrary, there is no difference
in Figure f.
Figure 4
Odds ratio,
ESB, and DIS plots. (a,b) Odds ratio for the 2449 and
2454 bp samples, respectively. Insets show the zoom-in plot near the
adduct sites. A clear peak emerges on the site of 2389 bp in (a).
(c,d) Radar plots displaying ESB of the four samples. The test samples
are plotted in red, and the control samples are plotted in black.
The positions of cisplatin adduct are marked with red stars. (e,f)
DIS plots for the 2449 and 2454 bp samples, respectively. A clear
peak emerges on the site of 2382 bp in (e).
Odds ratio,
ESB, and DIS plots. (a,b) Odds ratio for the 2449 and
2454 bp samples, respectively. Insets show the zoom-in plot near the
adduct sites. A clear peak emerges on the site of 2389 bp in (a).
(c,d) Radar plots displaying ESB of the four samples. The test samples
are plotted in red, and the control samples are plotted in black.
The positions of cisplatin adduct are marked with red stars. (e,f)
DIS plots for the 2449 and 2454 bp samples, respectively. A clear
peak emerges on the site of 2382 bp in (e).
Discussion
In our experiment, the cisplatin adducts
to the two adjacent G–C base pairs forming a 1,2-d(GG) intrastrand
cisplatin cross-link, with Pt atom bonds to the N7 atom of purine
bases in the adjacent guanines, as shown in Figure . The energy of activation for guanosine
substitution with cisplatin is about 18 kcal/mol, and for GG adduct
closure, the energy is about 21 kcal/mol.[31,32] Such strong Pt–N bonding ensures that cisplatin transits
with the ssDNA through the nanopore in the first sample. There are
several distorted configurations as proposed by theoretical computations,
showing that the cross-link causes distortion of the adjacent G–C
base pairs.[33−38]Table shows the
published bond length for all six hydrogen bonds in the normal and
the distorted configurations. The hydrogen bond length in a normal
DNA molecule without cisplatin adduct is about 1.8 ± 0.1 Å.
When a cisplatin adduct occurs, on average, the hydrogen bonds are
weakened with some of the bonds elongated to beyond 2.0 Å. Figure c shows one of the
3D configurations of cisplatin-adducted G–C base pairs. The
cisplatin adduct affects the DNA unwinding process if the distorted
G–C base pairs enter the motor protein. The increased dwell
time around the sites of cisplatin adduct may be due to two possible
reasons: distorted hydrogen bonds and size effect. But considering
the fact that the hydrogen bonds in the cisplatin-adducted G–C
base pairs are weaker on average than that in the normal configuration,
the distorted hydrogen bonds are not likely the reason to account
for the increased dwell time on the corresponding sites. It is thus
plausible that the increased size as well as the increased rigidity
in cisplatin-adducted guanines produces strong perturbation to the
unwinding process.
Figure 5
Model structure of cisplatin adduct to DNA. (a) Plan view
of a
cisplatin adduct to the two adjacent G–C pairs forming a 1,2-d(GG)
intrastrand cisplatin cross-link. Cisplatin is colored in green. There
are six hydrogen bonds between G and C pairs, as numbered from one
to six. (b) 3D conformation of adjacent G–C pairs without cisplatin
adduct. (c) 3D conformation of 1,2-d(GG) intrastrand cisplatin cross-link.
Cisplatin is drawn in green. This model is the solution structure
from ref (37).
Table 1
Published Hydrogen Bond Length in
Unit of Åa
name
HB-1
HB-2
HB-3
HB-4
HB-5
HB-6
normal
1.798
1.871
1.802
1.802
1.870
1.798
2npw
2.096
1.821
1.451
1.580
1.722
1.863
2nq0
2.022
1.826
1.510
1.614
1.743
1.879
1a84
1.905
1.945
1.846
2.341
2.146
1.906
1ksb
1.916
1.851
1.911
1.943
1.702
1.793
1au5
1.718
2.038
1.873
1.969
1.702
1.793
3lpv
3.075
2.849
2.640
2.949
2.790
2.932
1aio
2.738
2.804
2.680
2.770
2.735
2.690
The names in the first column are
Protein Data Bank access numbers.
Model structure of cisplatin adduct to DNA. (a) Plan view
of a
cisplatin adduct to the two adjacent G–C pairs forming a 1,2-d(GG)
intrastrand cisplatin cross-link. Cisplatin is colored in green. There
are six hydrogen bonds between G and C pairs, as numbered from one
to six. (b) 3D conformation of adjacent G–C pairs without cisplatin
adduct. (c) 3D conformation of 1,2-d(GG) intrastrand cisplatin cross-link.
Cisplatin is drawn in green. This model is the solution structure
from ref (37).The names in the first column are
Protein Data Bank access numbers.Based on the results of this study, a primary scheme
can be developed
for the detection of single cisplatin adduct in a single DNA sequence
read, especially for the 1,2-d(GG) intrastrand cisplatin cross-link.
First, the location of cisplatin adduct can be statistically determined
by the profile of dwell time and/or odds ratio. The actual cisplatin
binding site is 14 bp after the location of the peak center in the
profile of dwell time and is 2–3 bp before the peak center
in the odds ratio profile. As shown in Figures and 4, the peaks
in both profiles are very sharp, with peak width of a few nucleotides.
The spatial resolution in determining the position of cisplatin adduct
is about one nucleotide for both methods. Second, disturbances in
ionic current signal can serve as fingerprint to identify the characteristics
of the cisplatin adducts. Figure a shows the receiver operating characteristic (ROC)
curve displaying the ability to discriminate between the adduct and
control sequence based on current levels at individual sites. Figure b shows the p values derived from Eligos2 for different mixtures of
various percentages of reads from adduct-containing samples in the
presence of 10 000 reads from the control sample. We estimated
the detection level of an individual cisplatin adduct by calculating p value using Fisher’s exact test by mixing reads
for cisplatin-adducted DNA with those of the control at different
percentages of cisplatin adducts. It turns out that it is possible
to detect a cisplatin adduct as low as 2.5% at the p value cutoff of 0.05. Compared to previous single-molecule experimental
methods, the nanopore-based method as demonstrated in this study has
advantages of high detection sensitivity, base-resolved spatial resolution,
and high device portability. While the accuracy in nanopore sequencing
is still under improvement, the parallel acquisition with multiple
channels generates massive data traces that can be used to squeeze
the noise down to a very low level.
Figure 6
Precision for the detection of cisplatin
adducts. (a) ROC curve
displaying the ability to discriminate between the cisplatin adducts
and control sequence based on different bases as indicated in the
inset legend. (b) p value derived from Eligos2 for
different in silico mixtures of various percentages of reads from
adduct-containing samples in the presence of 10 000 reads from
the control sample.
Precision for the detection of cisplatin
adducts. (a) ROC curve
displaying the ability to discriminate between the cisplatin adducts
and control sequence based on different bases as indicated in the
inset legend. (b) p value derived from Eligos2 for
different in silico mixtures of various percentages of reads from
adduct-containing samples in the presence of 10 000 reads from
the control sample.
Conclusions
In
summary, the cisplatin adduct on DNA has been detected and characterized
by nanopore sequencing. Dwell time analysis reveals an offset of 14
bases between the motor protein and the nanopore constriction region.
Genome anchored plots illustrate current signal perturbation by the
presence of cisplatin adducts. Characteristic distortions, visualized
in ESB profiles and DIS plots, constitute a fingerprint for the identification
of cisplatin adduct. The location of adducting sites can also be measured
by the sharp increase in odds ratio. Combining these paralleled analysis
methods, it is possible to detect cisplatin adduct on a single read
of DNA sequence with high fidelity. The results, as well as the strategy
and analysis methods employed in this study, are useful for similar
research tasks.
Methods
Sample Preparation
Oligonucleotides containing a unique
1,2-intrastrand cross-links were constructed as previously described
with minor modifications.[39,40] Briefly, the G-residues
within the 5′-TTTCTTCTCTTTGGTTCTTCCTC-3′
oligonucleotide (shown in bold for clarity) were cross-linked by incubating
the oligonucleotide with activated cisplatin for 20 min at 37 °C
in the dark. After purification, the site-specifically modified oligonucleotides
were allowed to anneal to their complementary strands. The DNA duplexes
contained 1,2-intrastrand cross-links were allowed to ligation with
a 30 bp DNA duplex and 2.5 kb DNA fragment with adhesive overhangs.
1D Ligation Sequencing
Nanopore experiments were carried
out with ONT’s sequencer MinION, flow cell version FLO-MIN106,
and ligation kit SQK-LSK108. Single-end sequencing was implemented
by designing single dA cohesive end of the sample. 1D ligation sequencing
was conducted according to ONT’s protocol. Raw data were recorded
by ONT’s MinKNOW software and basecalled locally by Albacore
version 2.2.6. Basecalled data were resquiggled with Tombo version
1.5 as provided by ONT.
Sequence Alignment
In dwell time
analysis, we utilized
a normal algorithm, written in LabVIEW subVI. Detailed implementation
of this alignment algorithm will be discussed elsewhere. Minimap2
is used for sequence alignment in calculating ESB and odds ratio in
Eligos2.
ESB and Odds Ratio
ESB and odds ratio were acquired
by Eligos2 open-source software, version 2.0.0.[23,30] Detailed process procedure can be found on the website.
Structure Visualization
by VMD
CsgG nanopore in Figure a is obtained from
protein data bank.[24] Cisplatin binding
structures are downloaded from protein data bank.[37] 3D visualizations are generated by VMD software.[41]
Authors: Yibing Wu; Debadeep Bhattacharyya; Candice L King; Irene Baskerville-Abraham; Sung-Ho Huh; Gunnar Boysen; James A Swenberg; Brenda Temple; Sharon L Campbell; Stephen G Chaney Journal: Biochemistry Date: 2007-05-12 Impact factor: 3.162