Nicole Kretschy1, Matej Sack1, Mark M Somoza1. 1. Institute of Inorganic Chemistry, Faculty of Chemistry, University of Vienna , Althanstraße 14 (UZA II), A-1090 Vienna, Austria.
Abstract
The fluorescent intensity of Cy3 and Cy5 dyes is strongly dependent on the nucleobase sequence of the labeled oligonucleotides. Sequence-dependent fluorescence may significantly influence the data obtained from many common experimental methods based on fluorescence detection of nucleic acids, such as sequencing, PCR, FRET, and FISH. To quantify sequence dependent fluorescence, we have measured the fluorescence intensity of Cy3 and Cy5 bound to the 5' end of all 1024 possible double-stranded DNA 5mers. The fluorescence intensity was also determined for these dyes bound to the 5' end of fixed-sequence double-stranded DNA with a variable sequence 3' overhang adjacent to the dye. The labeled DNA oligonucleotides were made using light-directed, in situ microarray synthesis. The results indicate that the fluorescence intensity of both dyes is sensitive to all five bases or base pairs, that the sequence dependence is stronger for double- (vs single-) stranded DNA, and that the dyes are sensitive to both the adjacent dsDNA sequence and the 3'-ssDNA overhang. Purine-rich sequences result in higher fluorescence. The results can be used to estimate measurement error in experiments with fluorescent-labeled DNA, as well as to optimize the fluorescent signal by considering the nucleobase environment of the labeling cyanine dye.
The fluorescent intensity of Cy3 and Cy5 dyes is strongly dependent on the nucleobase sequence of the labeled oligonucleotides. Sequence-dependent fluorescence may significantly influence the data obtained from many common experimental methods based on fluorescence detection of nucleic acids, such as sequencing, PCR, FRET, and FISH. To quantify sequence dependent fluorescence, we have measured the fluorescence intensity of Cy3 and Cy5 bound to the 5' end of all 1024 possible double-stranded DNA 5mers. The fluorescence intensity was also determined for these dyes bound to the 5' end of fixed-sequence double-stranded DNA with a variable sequence 3' overhang adjacent to the dye. The labeled DNA oligonucleotides were made using light-directed, in situ microarray synthesis. The results indicate that the fluorescence intensity of both dyes is sensitive to all five bases or base pairs, that the sequence dependence is stronger for double- (vs single-) stranded DNA, and that the dyes are sensitive to both the adjacent dsDNA sequence and the 3'-ssDNA overhang. Purine-rich sequences result in higher fluorescence. The results can be used to estimate measurement error in experiments with fluorescent-labeled DNA, as well as to optimize the fluorescent signal by considering the nucleobase environment of the labeling cyanine dye.
The fluorescence of
molecules is always sensitive to environmental
conditions, although the magnitude of changes in the fluorescence
intensity of any particular fluorophore depends on its specific modes
of interaction with its environment.[1] Fluorescent
molecules can be used as molecular environmental probes by selecting
dyes with strong responses to, for example, pH,[2] viscosity,[3] polarizability,[4] elasticity,[5] and polarity;[6] however, in applications where the fluorescent
intensity is to serve as a proxy for the abundance of the labeled
molecule, environmental sensitivity is a liability that can result
in reduced measurement accuracy.[7] The cyanine
dyes Cy3 and Cy5 are among the most widely used and versatile[8] oligonucleotide labels in, e.g., microarray experiments,
fluorescent in situ hybridization (FISH), real-time PCR (RT-PCR),
and FRET studies[9,10] and are considered to be relatively
environmentally insensitive.[11] However,
Cy3 and Cy5 consist of two indole rings connected by three or five
carbon polymethine bridges which can undergo cis–trans isomerization from the first excited singlet state
which competes with fluorescence.[12−15] In viscous or restrictive environments,
or with conformationally locked dye variants, the rate of isomerization
is reduced or eliminated and the dyes are more fluorescent.[16] When Cy3 and Cy5 are tethered to the end of
double-stranded DNA they assume a planar capping configuration similar
to that of an additional base pair,[17,18] which inhibits
isomerization and increases their fluorescence quantum yield and lifetime.[15] At least in the case of Cy3, the range of motion
available is not fully restricted when attached to either single-
or double-stranded DNA, with time-resolved fluorescence anisotropy
measurements indicating decay components corresponding to rotation
with DNA as well as relative to DNA.Recent experiments have
shown that both Cy3 and Cy5 are also quite
sensitive to the particular nucleobase sequence of the ssDNA oligonucleotide
to which they are attached,[19,20] with the fluorescence
intensity varying by a factor of about 2 between the brightest and
the darkest labeled oligonucleotide in the case of Cy3, and a factor
of about 3 in the case of Cy5. The variation in fluorescence intensity
for ssDNA is strongly correlated with purine content, with purine-rich
sequences associated with high intensity, and high pyrimidine content,
particularly cytosine, with low intensity.[19]The magnitude of the sequence-dependent fluorescence is large
enough
to affect the accuracy of experimental data derived from Cy3- and
Cy5-labeled single-stranded DNA, but there is currently no data available
on sequence-dependent effects in double-stranded DNA. In experimental
methods based on labeled oligonucleotides, fluorescence is recorded
either from the double-stranded hybrid (e.g., Sanger and next-generation
sequencing, and molecular beacons[21]) or
from the unhybridized strand alone (e.g., hydrolyzed labeled TaqMan
probe fragments[22]). High-throughput DNA
sequencing-by-synthesis is likely to be particularly vulnerable to
sequence-dependent fluorescence because all short nucleobase sequences
will be repeatedly encountered, and detection failures (deletion errors)
from sequences highly unfavorable to fluorescence would be systematic
and therefore not easily detectable with resequencing. Furthermore,
the optical systems of sequencers need to balance dynamic range of
detection with throughput, making their throughput sensitive to dyes
with significant variations in fluorescence.[23] Even though our fluorescence data are obtained on microarrays, most
genomics microarray data is fairly insensitive to sequence-dependent
fluorescence because the labeling is typically based on reverse transcription
using labeled random primers or other quasi-random methods.[24] Nevertheless, gene-specific fluorescence intensity
effects, due to differences in the relative abundance of nucleobases
in particular genes, have been detected.[25]Since both Cy3- and Cy5-labeled single- and double-stranded
oligonucleotides
are commonly used, we present here comprehensive results for double-stranded
DNA to complement and strengthen previous results for Cy3 and Cy5
5′-labeled single-stranded DNA.[19] Two types of sequence-dependent dye–dsDNA interactions, as
illustrated in Figure , have been measured: relative intensity of the dyes at the 5′
end of each of the 1024 possible double-stranded DNA 5mers (Figure B), and relative
intensity of the dyes bound to the 5′ end of a fixed-sequence
double helix, but with a variable 5mer sequence 3′ overhang
adjacent to the dyes (Figure C). The sequence-dependent contribution of the overhang is
relevant since in many experimental contexts, such as PCR and FISH,
a short 5′-labeled oligonucleotide is used to quantify the
presence of much longer DNA or RNA molecules. Detailed data on the
sequence-dependent fluorescence of cyanine dyes on single-stranded
DNA (Figure A) has
been previously reported for Cy3, Cy5, Dy547, and Dy647;[19,26] this ssDNA data showed that over the range of all possible 5mers,
the intensity of Cy3 varied by about a factor of 2, and in the case
of Cy5, by a factor of about 3. There was also a clear pattern to
the data: the fluorescence follows, to a good approximation, the cumulative
distribution function of a normal distribution, with purine-rich sequences
resulting in high intensities and pyrimidine-rich sequences resulting
in low intensities. In addition, 5′ guanines promote higher
fluorescence much more so than 5′ adenosines, and 5′
cytosines result in much lower fluorescence in comparison with 5′
thymidines. Here we will show that broadly similar trends also hold
true for double-stranded DNA.
Figure 1
Interaction modes of dyes (red) on DNA. (A) 5′ dye with adjacent
nucleobases (blue) in
ssDNA. (B) 5′ dye with base-paired nucleobases (orange) in
dsDNA. (C) 5′ dye with nucleobases of ssDNA (green) adjacent
to a terminal dye on dsDNA.
Interaction modes of dyes (red) on DNA. (A) 5′ dye with adjacent
nucleobases (blue) in
ssDNA. (B) 5′ dye with base-paired nucleobases (orange) in
dsDNA. (C) 5′ dye with nucleobases of ssDNA (green) adjacent
to a terminal dye on dsDNA.
Results and Discussion
The results for the sequence-dependent
fluorescence of cyanine
dyes have been highly consistent, with the adjacent purine bases promoting
fluorescence relative to pyrimidine bases in single-stranded DNA,[19,26] and with the results presented here in double-stranded DNA. In addition,
for both ssDNA and dsDNA, a guanine immediately adjacent to the dye
consistently results in the highest fluorescence, but in the more
distal positions, adenine, rather than guanine, typically results
in higher fluorescence. Of the pyrimidines, cytosine, rather than
thymine, is most strongly associated with low fluorescence.
Cy3 and Cy5
dsDNA Interactions
Figure summarizes the results for both the 5′
Cy3 and Cy5 terminal labeling experiments on dsDNA. These data correspond
to the case where the random linker is used and the permuted nucleobases
form a double strand (Scheme A). Here, the dye interactions with the single-stranded segment
are present, but the data will reflect the average over all possible
sequences. As was the case with the data from Cy3 and Cy5 labeled
ssDNA,[19] the overall range of florescence
intensity is about a factor of 2 for Cy3 and a factor of 3 for Cy5
(Figure A). In order
to be able to compare the fluorescence intensity data for dsDNA with
ssDNA, the array design included reference ssDNA sequences. These
sequences have a very similar design, but with bases rearranged to
prevent hybridization. Figure A shows that both Cy3 and Cy5 on dsDNA have a somewhat extended
range of fluorescence intensity in comparison to Cy3 and Cy5 on ssDNA
(horizontal lines). Most of the additional range of intensity is on
the lower edge of intensity, i.e., the sequences resulting in the
highest fluorescence result in similar intensity for both ssDNA and
dsDNA.
Figure 2
Double-stranded DNA labeling with Cy3 and Cy5 (Figure B). (A) Relative fluorescence
intensity of Cy3 and Cy5 end-labeled 5mers, ranked from most to least
intense. The intensity falls by 55% for Cy3 and almost 70% for Cy5.
The horizontal lines show the fluorescence intensity of single-stranded
reference sequences on the same arrays. Fluorescence intensity consensus
sequences of all 1024 dsDNA 5mers 5′-end-labeled using (B)
Cy3 and (C) Cy5. The fluorescent range was equally divided into eight
bins of equal intensity ranges, and the consensus sequence for all
the 5mers is plotted for each such octile.
Scheme 1
Sequence Design for the 5′-Dye Self-Hybridizing DNA
Strands
Sequence (A) is used to measure
the interaction of the dyes with dsDNA and sequence (B) is used to
measure the interactions of the dyes with the ssDNA overhang of dsDNA.
Double-stranded DNA labeling with Cy3 and Cy5 (Figure B). (A) Relative fluorescence
intensity of Cy3 and Cy5 end-labeled 5mers, ranked from most to least
intense. The intensity falls by 55% for Cy3 and almost 70% for Cy5.
The horizontal lines show the fluorescence intensity of single-stranded
reference sequences on the same arrays. Fluorescence intensity consensus
sequences of all 1024 dsDNA 5mers 5′-end-labeled using (B)
Cy3 and (C) Cy5. The fluorescent range was equally divided into eight
bins of equal intensity ranges, and the consensus sequence for all
the 5mers is plotted for each such octile.
Sequence Design for the 5′-Dye Self-Hybridizing DNA
Strands
Sequence (A) is used to measure
the interaction of the dyes with dsDNA and sequence (B) is used to
measure the interactions of the dyes with the ssDNA overhang of dsDNA.The fluorescence intensity of intensity of most,
or perhaps all,
dyes is dependent on the nucleobase environment. In many cases the
mechanism is a photoinduced charge transfer between the bases and
the dye (fluorescein,[27] coumarin,[28] rhodamine,[29] and
pyrene[30]), in which case the quenching
efficiency is determined by proximity and base redox potential, dG
< dA < dC < dT, when the bases are reduced, or the reverse
order when oxidized.[28] Ethydium bromide,
another well-known dsDNA fluorescence label, undergoes quenching via
proton transfer to the solvent; intercalation enhances fluorescence
by reducing solvent exposure.[31] In the
case of the cyanine dyes, however, charge transfer is not thermodynamically
favored.[32,33] Instead, the intensity of cyanine dyes conjugated
with DNA is attributed to the modulation of the rotational isomerization
barrier in the excited state.[12−14]NMR data indicate that
Cy3 and Cy5, 5′-linked to dsDNA,
are positioned at the end of the double helix similarly in a capping
configuration, in a manner similar to that of a base pair.[17,18] This arrangement should restrict the rate of cis–trans isomerization of the dyes, increasing fluorescence relative to the
free dye. However, relative to the same dyes bound to the end of ssDNA,
differences in the rate of isomerization are less clear since the
dyes stack with the terminal base in both cases. Simulations and experiments
indicate that the quantum yield of Cy3 is higher on ssDNA vs dsDNA,
and that on dsDNA the strength of the stacking interaction depends
on the identity of the terminal base pair.[15,34,35] Our experiments indicate that the fluorescence
of Cy3 and Cy5 is somewhat greater on dsDNA; however, the differences
between our results and previously published results,[15] which show a 2-fold greater fluorescence of Cy3 on ssDNA,
may be due to the particular choice of cyanine dye. In particular,
we conjugate with DNA using the Cy3 and Cy5 phosphoramidites, rather
than the sulfonated versions of these dyes, used by Sandborn et al.,[15] and which are more commonly used for protein
labeling. The sulfonates increase the hydrophilicity of the dyes,
which could affect the strength of the stacking interactions with
the nucleobases. We have previously measured the intensity of sulfonated
Cy3 and Cy5 on DNA, and found a very strong pattern of sequence-specific
fluorescence distinct from that of the unsulfonated dyes.[19]In order to visualize the relationship
between the nucleobase sequence
and the fluorescence intensity, the consensus sequences for each octant
of intensity are plotted in Figure B and C for Cy3 and Cy5, respectively. These data are
quite similar to those obtained with the same dyes on ssDNA.[19] The most apparent differences in the dsDNA data
are that cytosine is less prominent in the weakly fluorescent sequences,
and that cytosine is more prominent in the distal positions of the
strongly fluorescent sequences, particularly for Cy5. If, as previous
studies have indicated, the fluorescence intensity of cyanine dyes
is greater on ssDNA, there might be bias in the consensus toward adenine-
and thymine-rich sequences, which will tend to destabilize the double
helix near the dyes, resulting in a higher locally single-stranded
(“frayed” ends) population of DNA. In relationship to
our previous data of Cy3 and Cy5 on ssDNA, this trend is not apparent.
In the dsDNA data (Figure ), the melting temperature of the consensus sequences for
the most fluorescent intensity octants are higher than those in the
equivalent octants in the ssDNA data for both Cy3 and Cy5 due to the
increased population of cytosines.
Cy3 and Cy5 Overhang Interactions
In the results described
above, the dyes must also be interacting with the immediately adjacent
ssDNA overhang segment as illustrated in Figure C and Scheme B. In order to estimate how this ssDNA modulates the
fluorescence, the random nucleobase linker was replaced with segments
representing all possible 5mers. To avoid having too many overall
permutations, only two dsDNA sequences were used, one associated with
strong fluorescence (GAAAA) and one with weak fluorescence (CGTGG).
About 10 replicates of each of the 2048 resulting sequences fit on
a single microarray, allowing accurate relative intensity comparisons
between sequences. In the dsDNA data shown in Figure , the sequence GAAAA resulted in the 33rd
and 100th brightest fluorescence for Cy3 and Cy5, respectively. The
sequence CGTGG resulted in the 1008th and 898th brightest fluorescence
for Cy3 and Cy5, respectively. The results from the overhang experiment,
using Cy3 as the dye, are shown in Figure . In Figure A, the intensity of each sequence has been normalized
to that of the most intense sequence, which, as expected, belongs
to the Cy3-dsGAAAA set. Most of the sequences with Cy3-dsCGTGG are
darker than any of those with GAAAA. Figure A clearly shows that the intensity of the
dye is similarly determined by both the dsDNA segment and the adjacent
ssDNA segment since the intensity difference between the two curves
is similar to the range in intensities within each curve.
Figure 3
5′-Cy3-dsDNA
with a permuted 3′ overhang (Figure C). The dsDNA strand
to which the Cy3 is attached has one of two sequences: GAAAA (bright)
or CGTGG (dark). (A) Relative fluorescence of Cy3-GAAAA and Cy3-CGTGG
ranked from most to least intense over the range of all ssDNA 3′
overhang 5mers. The intensity falls by ∼35% for both Cy3-GAAAA
and Cy3-CGTGG. Fluorescence intensity consensus sequences of all 1024
5mers on the 3′-overhang of (B) Cy3-dsGAAAA and (C) Cy3-dsCGTGG.
The fluorescent was equally divided into eight bins of equal intensity
ranges. The consensus sequence is plotted for each bin.
5′-Cy3-dsDNA
with a permuted 3′ overhang (Figure C). The dsDNA strand
to which the Cy3 is attached has one of two sequences: GAAAA (bright)
or CGTGG (dark). (A) Relative fluorescence of Cy3-GAAAA and Cy3-CGTGG
ranked from most to least intense over the range of all ssDNA 3′
overhang 5mers. The intensity falls by ∼35% for both Cy3-GAAAA
and Cy3-CGTGG. Fluorescence intensity consensus sequences of all 1024
5mers on the 3′-overhang of (B) Cy3-dsGAAAA and (C) Cy3-dsCGTGG.
The fluorescent was equally divided into eight bins of equal intensity
ranges. The consensus sequence is plotted for each bin.The relationship between the nucleobase sequence
of the permuted
overhang and the fluorescence intensity is shown using consensus logos
in Figure B and C,
for Cy3-dsGAAAA and Cy3-dsCGTGG, respectively. The consensus sequences
show a similar pattern to those of the ssDNA data and the dsDNA data
with the random overhang; the most fluorescent signal results from
sequences with high purine content and the least florescence signal
results from sequences with high pyrimidine content, particularly
cytosine. Two additional trends are clearly visible in the consensus
sequence data. First, the information content (bits) for each position
is typically lower than that for the data with the random overhang.
This is because in the present case, there is no single dominant base
at any position, e.g., both purines are approximately equally probable
in the most florescent sequences. This trend can also be anticipated
by the shape of the intensity curves in Figure A, which, spanning a lower range of intensity
in comparison to that in Figure for the same number of permuted sequences, indicate
a reduced sequence dependence of fluorescence. Second, the more distal
bases are more prominent in the consensus sequences, which suggests
that the dye is interacting more strongly with these more distal bases.
One possibility is that the presence of the dye on the terminus of
the double-stranded segment may tend to displace the more proximal
overhang bases to conformations where they cannot affect the cis–trans isomerization rate. This
is consistent with NMR data indicating that Cy3 occupies much of the
available stacking space at the end of dsDNA.[18]Data for Cy5 on double-stranded DNA with a permuted overhang
is
shown in Figure .
These data were collected using the same methods and the same microarray
design, only using Cy5 instead of Cy3. As with Cy3, the intensity
difference between the two curves in Figure A is similar to the range in intensities
within each curve, clearly showing that the intensity of Cy5 is similarly
determined by both the dsDNA segment and the adjacent ssDNA overhang
segment. Unlike in the case of Cy3, all of the Cy5-dsCGTGG sequences
are darker that the darkest of the Cy5-dsGAAAA sequences. The specific
sequence Cy5-dsGAAAA in the random linker data set resulted in an
intensity of 0.8 relative to that of Cy5-dsGAACC, the most intense
suggesting that the gap between the curves in Figure A could be significantly increased by using
GAACC as the fixed double-stranded sequence. Although the two curves
in Figure A appear
to have different shapes, this is due only to the large fluorescence
intensity difference between them. Independently normalizing the Cy5-dsCGTGG
data would cause it to overlap very closely with the Cy5-dsGAAAA data,
indicating that both double-stranded sequences modulate the interaction
of the dye with the overhang bases to a similar extent.
Figure 4
5′-Cy5-dsDNA
with a permuted 3′ overhang (Figure C). The dsDNA strand
to which the Cy5 is attached has one of two sequences: GAAAA (bright)
or CGTGG (dark). (A) Relative fluorescence of Cy5-GAAAA and Cy5-CGTGG,
ranked from most to least intense over the range of all ssDNA 3′
overhang 5mers. The intensity falls by ∼40% for both Cy5-GAAAA
and Cy5-CGTGG. Fluorescence intensity consensus sequences of all 1024
5mers on the 3′-overhang of (B) Cy5-dsGAAAA and (C) Cy5-dsCGTGG.
The fluorescent was equally divided into eight bins of equal intensity
ranges. The consensus sequence is plotted for each bin.
5′-Cy5-dsDNA
with a permuted 3′ overhang (Figure C). The dsDNA strand
to which the Cy5 is attached has one of two sequences: GAAAA (bright)
or CGTGG (dark). (A) Relative fluorescence of Cy5-GAAAA and Cy5-CGTGG,
ranked from most to least intense over the range of all ssDNA 3′
overhang 5mers. The intensity falls by ∼40% for both Cy5-GAAAA
and Cy5-CGTGG. Fluorescence intensity consensus sequences of all 1024
5mers on the 3′-overhang of (B) Cy5-dsGAAAA and (C) Cy5-dsCGTGG.
The fluorescent was equally divided into eight bins of equal intensity
ranges. The consensus sequence is plotted for each bin.The relationship between the nucleobase sequence
of the permuted
overhang and the fluorescence intensity is shown using consensus logos
in Figure B for Cy5-dsGAAAA
and in Figure C for
Cy5-dsCGTGG. Like in the case of Cy3, the highest fluorescence is
strongly associated with purines while the lowest fluorescence is
strongly associated with pyrimidines. Between the purines, guanine
is clearly more relevant than adenine in promoting fluorescence. Cytosine
is also much more common than thymine in the sequences associated
with low fluorescence. As a result of the dominance of these two bases,
the information content of the consensus sequences is higher in the
case of Cy5. The trend observed for Cy3, that the dye interacts more
strongly with more distal bases, is also the case with Cy5.For both Cy3 and Cy5, sequences resulting in the lowest intensity
among the dye-dsCGTGG subset have intensities similar to the darkest
from the data sets with the random overhang in Figure . Since the use of a random nucleobase linker
should be equivalent to averaging over all linker base permutations,
the expectation was that the minimum fluorescence measured in the
permuted overhang experiments would be significantly lower than those
measured using random overhang. One possibility is that the range
over which the fluorescence intensity of Cy-dyes can be modulated
via interactions with DNA is restricted. This seems reasonable since
the total range over which the fluorescence quantum yield of Cy3 can
be lowered by restricting the rate of cis–trans isomerization is about a factor of 8 at room temperature,
and Cy3 on DNA appears to be limited to the lower half of this range.[15] Nevertheless, some additional range of fluorescence
intensity could likely be measured in permuted sequences longer than
5mers. In most of the consensus sequences in Figures , 3, and 4, there is information content in the fifth base,
the most distal; indicating that this base also participates in modulating
the intensity, so a sixth or seventh base is also likely to contribute
to the modulation of fluorescence. Another perspective in this regard
is that the shapes of the curves in Figures A, 3A, and 4A can be interpreted as cumulative distribution
functions where the variable is the normalized relative fluorescence.
To a good approximation, the fluorescence intensities of Cy3 and Cy5
on random DNA sequences have probability mass functions approximating
those of binomial distributions, where the two results are purine
or pyrimidine.[19] Most random 5mer sequences
will contain a mix of purines and pyrimidines, which will result in
intermediate fluorescence in the central region of the distribution.
A few sequences will contain mostly or exclusively purines or pyrimidines,
resulting in, respectively, fluorescence at the high and low tails
of the intensity distribution. Increasing the permuted sequence length
(Bernoulli trials) should result in a few sequences in the tails of
the distribution that extend the range of fluorescence.These
results are consistent with previous experiments on the fluorescence
of Cy3 and Cy5, which have also shown similar patterns of nucleobase
dependency. Studies on the interactions of Cy3 with nucleoside monophosphate
solutions have found a pattern of nucleobase-specific enhancement
of fluorescence, dG > dA > dT > dC > no DNA.[36] Experiments on an intercalating cyanine dye
derived from thiazole
orange demonstrated a strong association of fluorescence with purine
DNA homopolymers but not with pyrimidine homopolymers; the resulting
fluorescence relative intensities followed the pattern dG > dA
≫
dC > dT > no DNA (100, 39, 2.3, 1.8, and 0.5, respectively).[37] Computer simulations in this study also indicated
that the dyes associate poorly with poly(dC) and poly(dT), while binding
strongly to poly(dG) and poly(dA). All these results fit well with
the model that π–π interactions between cyanine
dyes and nucleobases decrease the cis–trans isomerization rate. Purines, with a more extensive π system,
are more effective than pyrimidines. The extent of the π system
follows the order dG(14) > dA(12) > dT(10) = dC(10) in terms
of number
of π electrons, and the order dG(153 Å2) >
dA(142
Å2) = dT(142 Å2) > dC(127 Å2) in terms of surface area.[38] These
results apply directly to the 5′ nucleobase in our terminal
labeling experiments since this is the base that is directly adjacent
to the dye. We consistently observe, for both single- and double-stranded
data, that cyanine dye fluorescence follows the same trend, dG >
dA
> dT > dC, indicating that the terminal base directly affects
rotational
isomerization. The data also consistently shows that adjacent nonterminal
bases modulate dye fluorescence, with a distance-dependent influence,
indicating that sequence-dependent rigidity of the single- or double-stranded
DNA also contributes to the observed fluorescence of Cy3 and Cy5.
We hypothesize that the ability of the terminal base to hinder the
rotational isomerization of the dye increases when it is part of a
more rigid sequence of bases. The flexibility of DNA, particularly
dsDNA, is of ongoing interest due to its role in packing and in the
formation of protein–DNA complexes.[39] Many available degrees of freedom of the bases contribute to DNA
rigidity or flexibility, not all of which may be relevant to restricting
the isomerization of the terminal dye; nevertheless, multiple experimental
approaches indicate that purine stacks are more rigid than pyrimidine
stacks in ssDNA.[40,41] A similar pattern is observed
in dsDNA, also related to differences in base stacking area, dG (139
Å2) > dA (128 Å2) > dC (102
Å2) > dT (95 Å2), and stacking
free energy,
dA ≫ dG > dT ≈ dC (2.0, 1.3, 1.1, and 1.0 kcal·mol–1), for B-form geometry, based on melting temperature
changes.[38] Other experiments based on 5′
dangling DNA hairpins and 3′ RNA unpaired nucleotides give
similar stability results: A ≈ G > T/U > C.[42,43] Sequence specificity of the flexibility of di- and tetramers,[44,45] obtained from crystal structures and molecular dynamics simulations,
appear to be less relevant in this case because they treat paired
bases symmetrically and as a single rigid unit, such that, e.g., the
deformability of AA(TT) = TT(AA). While this treatment is relevant
to the ability of dsDNA to bend, the hydrogen bonding between Watson–Crick
pairs does not contribute to duplex stabilization; instead, duplex
stability is mainly determined by base-stacking interactions.[46] This suggests that, at short length scales,
the relevant modes of DNA dynamics are largely decupled from the complementary
strand and interact with the cyanine dyes by restricting the available
torsional volume and by changing high-frequency coordinates of the
potential energy surface of the excited state.[47] Our experiments are based the two cyanine dyes commonly
used for DNA labeling, but sulfonated variants of Cy3 and Cy5 appear
to interact differently with nucleobases.[15,19] The sulfonates increase water solubility, but may modify the stacking
interaction with DNA bases; stacking stability is dominated by hydrophobic
effects with contributions from dispersion and electrostatic forces,[38] all of which are likely to be affected by the
charges on the sulfonates.
Conclusion
With
the data presented here, we have sought to clarify and quantify
the impact of sequence-dependent fluorescence of Cy3 and Cy5 tethered
to double-stranded DNA. The results are consistent with previous results
of Cy3 and Cy5 and similar cyanide dyes tethered to single-stranded
DNA.[19,26] The results are also consistent with measurements
of the fluorescence yield of Cy3 in solution with each of the DNA
nucleoside monophosphates, which also follows the pattern G > A
>
T > C.[36] The preponderance of evidence
supports the hypothesis that stronger cyanine dye–nucleobase
stacking interactions of the purines relative to the pyrimidines restrict
the cis–trans isomerization
rate of these dyes, enhancing fluorescence. The results can be used
in the planning and analysis of experiments based on the labeling
of DNA (and probably RNA) with cyanine dyes. For example, TaqMan or
molecular beacon PCR probes and FISH probes using cyanine dye reporters
can be designed with one or more guanines or adenines immediately
adjacent to the dye for increased signal. The sequence for the latter
two of these probes can also be adjusted so that the reporter dye
is adjacent to a purine-rich segment of the target upon hybridization.
In the case of next-generation sequencing-by-synthesis, where high
throughput relies on maintaining the low end of the dynamic range
near the noise threshold,[48,49] the data analysis pipeline
can take into account the effect on measured fluorescence of adjacent
nucleobases when determining the probability of a correct nucleobase
assignment.
Experimental Procedures
Microarray Synthesis
Glass slides
(Schott Nexterion
D, cleanroom-cleaned) were functionalized with N-(3-triethoxysilylpropyl)-4-hydroxybutyramide
(Gelest SIT8189.5). The slides were loaded in a stainless steel rack,
placed in a plastic container, and covered with 0.5 L of a solution
consisting of 10 g of the silane in a 95:5 (v/v) ethanol:water plus
1 mL acetic acid. The slides were gently agitated for 4 h at room
temperature and then washed twice for 20 min each with the above solution
without the silane. The slides were drained, blown dry with argon,
and cured in a preheated vacuum oven (120 °C) overnight and stored
in a desiccator cabinet.For the synthesis of terminally labeled
oligonucleotides on microarrays we used the technique of maskless
array synthesis (MAS).[50,51] MAS was developed for in situ
synthesis of high-density DNA microarrays and consists of an optical
system and a chemical delivery system. The optical system consists
of a digital micromirror device (DMD), an array of individually tiltable
mirrors, which direct ultraviolet light from a mercury lamp to the
corresponding feature on the microarray via 1:1 imaging optics. Microarray
layout and oligonucleotide sequences are determined by selective removal
of the photocleavable protecting groups on the phosphoramidites at
the 5′ termini of the oligonucleotides.A computer synchronizes
the light exposures pattern with solvent
reagent delivery to the synthesis surface. The chemical system consists
of a slightly modified Perspective Biosystems Expedite 8909 synthesizer.
Oligonucleotide synthesis chemistry is similar to that used in conventional
solid-phase synthesis. The standard acid-labile 5′-OH protecting
group of the phosphoramidites is replaced with the photocleavable
nitrophenylpropyloxycarbonyl (NPPOC) group.[52] Upon absorption of light near 365 nm, the NPPOC group comes off,
leaving a free hydroxyl group that is able to react with an activated
phosphoramidite in the next coupling cycle. An exposure solvent consisting
of 1% (m/v) imidazole in DMSO is needed during ultraviolet exposure
to promote the cleavage of the NPPOC group.[51] The coupling reactions were performed with 30 mM NPPOC phosphoramidite
monomers and 0.25 M dicyanoimidazole (both from SAFC) for 60 s. In
the case of the Cy3 and Cy5 phosphoramidites (GE Healthcare 28–9172–98
and Glen Research 10–5915–95), Figure , the coupling reaction time was extended
to 10 min at a monomer concentration of 15 mM. Acetylation with a
1:1 mix of tert-butylphenoxyacetyl acetic anhydride
in tetrahydrofuran (Cap A) and 10% N-methylimidazole
in tetrahydrofuran/pyridine (8:1) (Cap B) after each coupling reaction
was used to ensure that only correctly synthesized sequences receive
the fluorescent label.
Figure 5
Molecular structures of the Cy3 and Cy5 cyanine
dye phosphoramidites
used in this study. After the end of the synthesis and the chemical
deprotection step, the dyes are linked to the 5′ DNA nucleoside
via a phosphodiester bond.
After microarray synthesis the substrate
was vigorously washed
for 2 h with acetonitrile in a 50 mL Falcon tube to remove uncoupled
Cy3 or Cy5 phosphoramidites, which tend to adhere nonspecifically
to the glass surface. The base and phosphate protecting groups were
removed by immersing the glass slide into 1:1 (v/v) ethylenediamine
in ethanol for 2 h at room temperature. Following deprotection, the
microarrays were washed twice with distilled water and dried with
argon.
Microarray Design
In principle, the resolution of the
digital micromirror device, 768 × 1024, allows for simultaneous
measurement of all possible n-mers up to n = 9 (262 144), but in these experiments, only permutations
of 5mers were included in order to include multiple replicates and
to dedicate more microarray surface area to each sequence and therefore
to achieve a good signal-to-noise ratio. The 1024 sequences were laid
out in a 25 in 36 pattern, that is, each “feature” (contiguous
area were a single sequence is synthesized) on the microarray corresponded
to a 5 by 5 block of mirrors surrounded by a one-mirror-sized margin
where no DNA was synthesized. Each of the 1024 single-sequence features
was replicated 20 times on each microarray in the case of the double-stranded
experiments (Figure B), and 10 times in the case of the double-stranded DNA with single-stranded
overhang experiments (Figure C).
Double-Stranded DNA Annealing
To
promote hairpin-loop
formation and self-hybridization, after deprotection the array was
incubated in 40 mL PBS buffer (0.65 M Na+, pH 7.4) starting
at 50 °C and cooled to room temperature over 30 min. Then it
was washed with final wash buffer for a few seconds and dried with
a microarray centrifuge. Successful hairpin loop formation was then
verified by hybridization of a Cy3-labeled oligonucleotide (5′-Cy3-GGC
GGC GGG TTC A-3′) to two unlabeled complementary sequences
on the array: (1) a sequence (TGA ACC CGC CGC CGT CCA TCCT TGG ACG
GCG GCG GGT TCA) that self-hybridized via hairpin-loop formation in
the previous step and is therefore blocked from hybridization with
the added oligonucleotide, and (2) a sequence (TGA ACC CGC CGC C)
that cannot self-hybridize but is fully complementarity with the added
labeled sequence.
Sequence Design
Three principle
considerations were
applied to the sequence design: (1) The double-stranded sequences
should all have equal melting temperatures since they must all form
duplexes equally under the single hybridization condition of the microarray,
(2) the melting temperature should be relatively high in order to
ensure stable duplex formation, and (3) the surface density of labeled
oligonucleotides should be constant for all experimental oligonucleotides
on the microarray so that fluorescence intensity differences between
them can be attributed to sequence-dependent effects. To meet these
design principles the double-stranded oligonucleotides have the design
illustrated in Scheme .The sequences contain self-complementary segments to allow
for duplex formation. The central TCCT sequence is known to bend easily
to promote hairpin loop formation.[53] The N represents the 5mer experimental
nucleobases that base pair with the complementary N. On the 3′ side of the N is the fixed sequence CCGCCGCC
which hybridizes with the GGCGGCGG sequence on the opposite side of
the hairpin. This GC-rich stretch is used to increase the melting
temperature. The P1P2P3P4P5 sequence is derived from the experimental
5mer sequence N1N2N3N4N5 using nonidentity, noncomplementarity
logic: for all i, if N = dA then P = dC; or if N = dC then P = dT;
or if N = dG then P = dA; or if N = dT then P = dG. These strands hybridize with their complementary
sequences P5cP4cP3cP2cP1c. The P and P sequences
have a double function: (1) they equilibrate the base composition
in order to ensure equal number density of all experimental sequences
on the array, and (2) they increase and homogenize the melting temperatures
(to Tm = 63 °C, salt adjusted, 50
mM Na+) by giving all the complementary DNA sequences on
the array exactly five of each nucleobases (plus the fixed GC sequences)
while retaining self-complementarity. The sequences are separated
from the glass substrate with a random linker 10mer sequence synthesized
from an equimolar mix of the four DNA phosphoramidites. The random
linker replaces the traditional poly(dT), and linker to avoid the
potential bias of any particular interaction of the dye and a dT homopolymer.
An alternative perspective is that the dye will interact with both
the double-stranded and single-stranded segments, but the interaction
with the single-stranded segment will be the average of all possible
sequences. In the second set of experiments, the single-stranded sequence
is permuted. The results of both data sets can be used to estimate
the relative contributions, to dye intensity variation, of the single-
vs double-stranded segments.Molecular structures of the Cy3 and Cy5 cyanine
dye phosphoramidites
used in this study. After the end of the synthesis and the chemical
deprotection step, the dyes are linked to the 5′ DNA nucleoside
via a phosphodiester bond.With these rules,
all of the sequences (excluding the linker) have
exactly 5 adenosines, 15 cytidines, 13 guanosines, and 7 thymidines.
Since the coupling efficiency of each of the four DNA phosphoramidites
can be different and can vary with time and by batch, equal numbers
of each base in each of the sequences assures equal representation
of the experimental oligonucleotides. This sequence design, in conjunction
with acetic anhydride capping after the coupling reactions, ensures
equal density and melting temperature and that only accurately synthesized
sequences receive the final coupling with the Cy3 or Cy5 phosphoramidite.
An alternative approach, to use simpler sequences and then adjust
the data for the measured coupling efficiencies, is less reliable
since the coupling efficiencies of the phosphoramidites used in maskless
array synthesis are measured with fluorescent dye terminal labeling
experiments,[54−57] which limits their accuracy due to the sequence-dependent fluorescence
intensity of single-stranded DNA.[19]The second set of experiments, with the dyes attached to fixed-sequence
double-stranded DNA and a variable single-stranded overhang, has a
similar design (Scheme B). Here, the permuted overhang sequence N1N2N3N4N5 is added at
the 3′ end to put it adjacent to the 5′ fluorescent
label. The F and F are complementary but are
no longer permuted; N1N2N3N4N5 is either GAAAA or CGTGG. GAAAA
and CGTGG were chosen from the initial double-stranded experiments
as sequences resulting in high and low fluorescence intensity, respectively,
for both Cy3 and Cy5.In order to allow direct comparisons between
the relative fluorescence
intensities of the dyes on single- vs double-stranded DNA, each dsDNA
microarray design included sequences that cannot self-hybridize to
form dsDNA, but have a very similar overall sequence design and base
composition. Since most of the microarray features were needed for
the dsDNA permutations, only a sampling of 57 labeled ssDNA permutations
was included. These sequences were chosen to be representative of
the range of expected fluorescence intensities for ssDNA found in
previous experiments.[19] To prevent the
self-hybridization of these sequences, the N5cN4cN3cN2cN1c segment
was inverted to N1cN2cN3cN4cN5c, the P5cP4cP3cP2cP1c segment was
inverted to P1cP2cP3cP4cP5c, palindromic N 5mers were avoided, and the segment GGCGGCGG
was reordered to GCGGCGGG.
Data Extraction and Analysis
Fluorescent
images of
the microarrays were obtained using a GenePix 4100A scanner with resolution
of 5 μM and with PMT voltages set to give similar intensity
ranges for both Cy3 and Cy5, and no saturated pixels, 350 and 450
V, respectively. Dye fluorescence was excited using 532 and 635 nm
solid-state lasers for Cy3 and Cy5, respectively. Fluorescence was
collected through 550–600 nm and 655–695 nm bandpass
filters for Cy3 and Cy5, respectively. Fluorescence was collected
using a 0.68 NA objective lens with a focal length of 3.1 mm. Microarray
scanners are designed to provide intensity values that are highly
consistent across the scanned surface. This allows highly reliable
relative fluorescence comparisons between microarray features. The
presence of the microarray surface, a lossless glass–air dielectric
interface, close to the fluorophores does not influence the relative
emission intensity or wavelength.[58] In
addition to the high throughput available with microarray experiments,
a significant advantage is that the density of fluorescence groups
can be closely controlled to avoid the aggregation-induced quenching
artifacts that can occur in solution experiments with hydrophobic
dyes such as Cy3 and Cy5.The fluorescence intensity data was
extracted from the scan image with NimbleScan v 2.1 software from
NimbleGen and further processed in Excel. For each microarray, fluorescence
intensity values were calculated as the average of the replicates
of each sequence, which were randomly located on each microarray.
For the double-stranded experiment, there were 20 sequence replicates
per array. For the overhang experiment there were 10 replicates per
array because of the inclusion of 2 experimental sets, one with double-stranded
sequence which strongly promotes fluorescence (dye-GAAAA) and one
with a double-stranded sequence resulting in weak fluorescence (dye-CGTGG).
Error was calculated as the standard error of the mean. The consensus
sequence figures were generated by ranking the 1024 sequences by fluorescence
intensity and then dividing the sequences into 8 bins spanning equal
ranges of intensity. Consensus logos for the sequences in each of
these octiles of fluorescence intensity were generated using Weblogo
(http://weblogo.berkeley.edu/).[59] Each of the 8 consensus sequence logos per fluorescent label represents
1/8 of the intensity range and are arranged together left to right
in order of decreasing intensity to compactly depict the relationship
between sequence and fluorescence for the entire data set. The relative
fluorescence intensity data for all the experimental sequences are
available as Supporting Information.
Authors: Christopher L Warren; Natasha C S Kratochvil; Karl E Hauschild; Shane Foister; Mary L Brezinski; Peter B Dervan; George N Phillips; Aseem Z Ansari Journal: Proc Natl Acad Sci U S A Date: 2006-01-17 Impact factor: 11.205
Authors: Emile F Nuwaysir; Wei Huang; Thomas J Albert; Jaz Singh; Kate Nuwaysir; Alan Pitas; Todd Richmond; Tom Gorski; James P Berg; Jeff Ballin; Mark McCormick; Jason Norton; Tim Pollock; Terry Sumwalt; Lawrence Butcher; DeAnn Porter; Michael Molla; Christine Hall; Fred Blattner; Michael R Sussman; Rodney L Wallace; Franco Cerrina; Roland D Green Journal: Genome Res Date: 2002-11 Impact factor: 9.043
Authors: Erik D Holmstrom; Andrea Holla; Wenwei Zheng; Daniel Nettels; Robert B Best; Benjamin Schuler Journal: Methods Enzymol Date: 2018-11-16 Impact factor: 1.600
Authors: Francesca Nicoli; Anders Barth; Wooli Bae; Fabian Neukirchinger; Alvaro H Crevenna; Don C Lamb; Tim Liedl Journal: ACS Nano Date: 2017-11-01 Impact factor: 15.881