Enrique Vázquez1, Francisco Antequera1. 1. Instituto de Biología, Funcional y Genómica (IBFG), Consejo Superior de Investigaciones Científicas (CSIC), Universidad de Salamanca, Campus Miguel de Unamuno, Salamanca, Spain.
Abstract
The dynamics of eukaryotic DNA polymerases has been difficult to establish because of the difficulty of tracking them along the chromosomes during DNA replication. Recent work has addressed this problem in the yeasts Schizosaccharomyces pombe and Saccharomyces cerevisiae through the engineering of replicative polymerases to render them prone to incorporating ribonucleotides at high rates. Their use as tracers of the passage of each polymerase has provided a picture of unprecedented resolution of the organization of replicons and replication origins in the two yeasts and has uncovered important differences between them. Additional studies have found an overlapping distribution of DNA polymorphisms and the junctions of Okazaki fragments along mononucleosomal DNA. This sequence instability is caused by the premature release of polymerase δ and the retention of non proof-read DNA tracts replicated by polymerase α. The possible implementation of these new experimental approaches in multicellular organisms opens the door to the analysis of replication dynamics under a broad range of genetic backgrounds and physiological or pathological conditions.
The dynamics of eukaryotic DNA polymerases has been difficult to establish because of the difficulty of tracking them along the chromosomes during DNA replication. Recent work has addressed this problem in the yeastsSchizosaccharomyces pombe and Saccharomyces cerevisiae through the engineering of replicative polymerases to render them prone to incorporating ribonucleotides at high rates. Their use as tracers of the passage of each polymerase has provided a picture of unprecedented resolution of the organization of replicons and replication origins in the two yeasts and has uncovered important differences between them. Additional studies have found an overlapping distribution of DNA polymorphisms and the junctions of Okazaki fragments along mononucleosomal DNA. This sequence instability is caused by the premature release of polymerase δ and the retention of non proof-read DNA tracts replicated by polymerase α. The possible implementation of these new experimental approaches in multicellular organisms opens the door to the analysis of replication dynamics under a broad range of genetic backgrounds and physiological or pathological conditions.
deoxyribonucleotidesorigin recognition complexorigin of replicationribonucleotidessingle nucleotide polymorphism
Introduction
In eukaryotes, the initiation of DNA replication depends on the progressive assembly of specialized protein complexes on multiple replication origins (ORI) along chromosomes. The activation of these complexes triggers the S phase of the cell cycle by initiating the divergent and bidirectional synthesis of new DNA daughter strands by DNA polymerases, which proceed until replicons moving in opposite directions converge and fuse to generate a fully replicated genome.Many of the essential molecules involved in this process have been conserved along evolution and the basic biochemical steps of DNA replication are well established 1, 2. The recent development of in vitro systems capable of sustaining plasmid replication 3, 4 and the reconstitution of replication forks using purified proteins 5, 6 will facilitate further dissection of the molecular mechanisms driving replication initiation.Contrary to the conservation of replication proteins, ORI sequences have diversified considerably along evolution. Even in individual genomes, ORIs show a very high level of sequence degeneracy, making it impossible to identify them on the basis of the sequence alone. The absence of conserved sequence elements has led to the exploitation of different biochemical features of replication with a view to map the distribution of ORIs in the genome. As in many other aspects of biology, local approaches to identify ORIs have been replaced by genome‐wide methods, for which genome replication is particularly well suited. These strategies have included the localization of binding sites of initiator proteins, the identification of short DNA nascent strands and the incorporation of bromodeoxyuridine to label newly synthesized DNA. The advantages and limitations of such strategies have been discussed in detail 7, 8, 9.Defining the dynamics of eukaryotic DNA polymerases has been hampered by the lack of an efficient system to track their passage along the chromosomes during DNA replication. In this review, we will focus on very recent analyses of replication dynamics in the yeastsSchizosaccharomyces pombe and Saccharomyces cerevisiae, based on the incorporation of ribonucleotides (rNTPs) in DNA followed by high‐throughput sequencing of the trail of rNMPs left behind the passage of the DNA polymerases 10, 11, 12, 13. The combination of these analyses with the interference of nucleosomes in the maturation of Okazaki fragments 14, 15 has uncovered a functional link between DNA replication, the organization of chromatin, and the presence of regions of instability across the genome.
Engineering the incorporation of ribonucleotides in DNA as tracers of the passage of replicative polymerases
The incorporation of nucleotides in the 5′–3′ direction by DNA polymerases implies that the two antiparallel DNA strands must be replicated through different mechanisms. DNA synthesis is initiated by DNA polymerase α (Pol α) 16 whose primase activity synthesizes de novo a short RNA primer that is approximately 8–10 ribonucleotides long and is elongated through the addition of 10–30 deoxyribonucleotides. At this stage, Pol α is replaced by DNA polymerase ϵ (Pol ϵ) 17, which catalyzes the synthesis of the continuous daughter strand. On the complementary strand, Pol α is replaced by DNA polymerase δ (Pol δ) 18 to add another 180–200 deoxyribonucleotides. These Okazaki fragments are synthesized opposite to the movement of the replication fork, and they are subsequently processed and ligated to generate the lagging DNA daughter strand 5, 6, 19.Not surprisingly, DNA polymerases have been selected to favor the incorporation of deoxyribonucleotides (dNTPs) relative to ribonucleotides (rNTPs) against the 36‐ to 190‐fold molar excess of rNTPs relative to dNTPs in the nucleus 20. Despite this preference, misincorporation of rNTPs is the most common error that occurs during DNA replication and amounts to an estimated 10,000 rNMPs per genome during each round of in vivo replication in S. cerevisiae
20 and to over a million in mouse cells 21. Ribonucleotides are removed from DNA by several mechanisms, of which the ribonucleotide excision repair (RER) pathway is the most efficient 22. The RNase H2 ribonuclease generates a nick 5′ to the rNMP, which is followed by extension from the free 3′‐OH terminus by Pol δ or Pol ϵ. The displaced strand harboring the rNMP at its 5′ end is removed by the FEN1 or EXO1 nucleases and the remaining nick is sealed by DNA ligase 1 23. The decreased removal of rNMPs due to the inactivation of the RNase H2 causes replication stress and genome instability in a wide variety of organisms due to the higher reactivity of the oxygen atom of the ribose 20, 21, 22, 24.Four recent studies have used S. cerevisiae and S. pombe mutants whose replicative polymerases were engineered by introducing point mutations in their catalytic domains to enhance the misincorporation of rNTPs 10, 11, 12, 13. Additional deletion of the rnh201 gene encoding the RNase H2 ensured that the rNTPs incorporated were not removed from the DNA, such that the passage of each polymerase along the DNA could be followed by the trail of rNTPs left behind. To identify the sites of rNMP incorporation, purified genomic DNA from exponential cultures was treated either with RNAse H2 and denatured or with alkali to hydrolyze the DNA molecule in the 5′ or 3′ position of the rNMPs, respectively. The resulting single‐stranded DNA fragments were size‐selected and tagged with short bar‐coded adaptors at their rNMP end prior to cloning and high‐throughput sequencing. Alignment of the sequence reads to the reference genome identified the position of the rNMP along the two DNA strands in the individual Pol α, δ, and ϵ mutants and defined which genomic regions were preferentially replicated by each polymerase.
Replicons in S. pombe and S. cerevisiae have a different structure
The antiparallel organization of the DNA strands predicts that approximately half of the genome should be replicated by Pol δ and half by Pol ϵ and this is clearly confirmed by the division of labor between the two polymerases shown in Fig. 1. An immediately observable difference between S. pombe and S. cerevisiae replicons, however, is that the average length of the regions replicated by Pol δ and by Pol ϵ is smaller in S. pombe, implying that the ORI density along the chromosomes is higher than in S. cerevisiae.
Figure 1
Division of labor between polymerases and replicon sizes in S. cerevisiae and S. pombe. Heatmaps show the ratio between Pol δ‐ and Pol ϵ‐synthesized DNA using the bottom (Crick) strand as a template of the indicated genomic regions of chromosomes XV and chromosome I of S. cerevisiae and S. pombe, respectively. The distribution of Pol α and Pol δ relative to Pol ϵ coincides along the lagging strand in S. cerevisiae (Pol α was not mapped in S. pombe). Red arrowheads point to sites of sharp switching between polymerases that coincide with strong replication origins in the two yeasts. Weak ORIs in S. pombe are indicated by green arrowheads. Intermediate blue/yellow color (brackets) corresponds to termination zones. Heatmaps were generated from the original sequencing data of Clausen et al. 10 (S. cerevisiae) and Daigaku et al. 13 (S. pombe).
Division of labor between polymerases and replicon sizes in S. cerevisiae and S. pombe. Heatmaps show the ratio between Pol δ‐ and Pol ϵ‐synthesized DNA using the bottom (Crick) strand as a template of the indicated genomic regions of chromosomes XV and chromosome I of S. cerevisiae and S. pombe, respectively. The distribution of Pol α and Pol δ relative to Pol ϵ coincides along the lagging strand in S. cerevisiae (Pol α was not mapped in S. pombe). Red arrowheads point to sites of sharp switching between polymerases that coincide with strong replication origins in the two yeasts. Weak ORIs in S. pombe are indicated by green arrowheads. Intermediate blue/yellow color (brackets) corresponds to termination zones. Heatmaps were generated from the original sequencing data of Clausen et al. 10 (S. cerevisiae) and Daigaku et al. 13 (S. pombe).Because not all replication origins are active in each S phase, their probability of firing is referred to as their “efficiency.” This means that the genome of individual cells in the population will be replicated from a different subset of ORIs in any given S phase, and will follow a different replication pattern, as has been shown in DNA‐combing analyses 25, 26. As a consequence, many genomic regions in the population will not appear to be exclusively replicated by Pol ϵ and Pol δ, but the ratio of use between them will range between 0 and 1. The relative use of each polymerase switches sharply at ORIs and gradually decays from them to converge at a point where the ratio of use of Pol δ and Pol ϵ approaches 0.5 in regions where the average number of forks moving in opposite directions is the same. This implies that although ORIs map to discrete genomic regions, there are no preferred sites of termination, and that the fusion of converging replicons in individual cells takes place at different inter‐origin positions in different cells (Fig. 1). Inflections in the replicon profile (Fig. 2) reflect small‐scale switches between Pol δ and Pol ϵ associated with low‐efficiency ORIs firing only in a small fraction of the population. These ORIs are passively replicated in the remaining cells from replicons originated by more efficient ORIs. This phenomenon is also present in S. cerevisiae but at a much lower scale (Fig. 2). As discussed in the next section, this is probably due to the different strategy of ORI specification in the two yeasts.
Figure 2
Different replicon structure in S. cerevisiae and S. pombe. Relative use of Pol δ (red) and Pol ϵ (blue) along the indicated genomic regions of the bottom (Crick) strand in chromosomes V and III of S. cerevisiae and S. pombe, respectively. As shown in Fig. 1, termination zones are narrower in S. cerevisiae than in S. pombe. For example, the 14 and 22 kb of the bottom DNA strand flanking ARS508 in S. cerevisiae are replicated almost exclusively by Pol δ and Pol ϵ, respectively, in most cells in the population. Previously identified early origins ARS508, ARS510, and ARS511 in S. cerevisiae and strong origins AT3013, AT3014, and AT3015 in S. pombe are indicated by red arrowheads. Inflections in the shape of replicons (white arrowheads) occur more often in S. pombe and are probably associated with very weak ORIs, many of which escaped previous detection by other methods.
Different replicon structure in S. cerevisiae and S. pombe. Relative use of Pol δ (red) and Pol ϵ (blue) along the indicated genomic regions of the bottom (Crick) strand in chromosomes V and III of S. cerevisiae and S. pombe, respectively. As shown in Fig. 1, termination zones are narrower in S. cerevisiae than in S. pombe. For example, the 14 and 22 kb of the bottom DNA strand flanking ARS508 in S. cerevisiae are replicated almost exclusively by Pol δ and Pol ϵ, respectively, in most cells in the population. Previously identified early origins ARS508, ARS510, and ARS511 in S. cerevisiae and strong origins AT3013, AT3014, and AT3015 in S. pombe are indicated by red arrowheads. Inflections in the shape of replicons (white arrowheads) occur more often in S. pombe and are probably associated with very weak ORIs, many of which escaped previous detection by other methods.
Origin specification in S. pombe and S. cerevisiae determines a different replication dynamics
An important feature of the rNTP mapping methodology is that, in addition to localizing ORIs very precisely, it gives a measure of their efficiency by the amplitude of the switch between polymerases (Fig. 3). ORIs in S. cerevisiae have a modular structure made up of an 11 bp long ARS consensus element (ACS) that is essential for their activity, and several short and variable auxiliary elements. They have been mapped genome wide by several methods and the efficiency and timing of firing during the S phase has been estimated for many of them 27, 28, 29, 30, 31. Ninety five percent of those ORIs colocalize with sites of polymerase switching and their efficiency is consistent with that previously estimated for some of them by two‐dimensional gel electrophoresis 29, 32, 33, 34, 35 (Figs. 2 and 3). In addition, the new study by Clausen et al. 10 has identified 72 sites of Pol δ/Pol ϵ switching, corresponding to previously undetected ORIs in S. cerevisiae.
Figure 3
Timing, efficiency, and base composition of replication origins. The replication profiles of genomic regions 40 kb long spanning 100 early and 136 late firing ORIs in S. cerevisiae as classified by Soriano et al. 49 were centered on the position of polymerase switching. The amplitude of the switch and the length of the replicons from early ORIs are greater than in late ORIs. In S. pombe, ORIs identified by Daigaku et al. 13 were classified according to the amplitude of the switch in two groups of 566 and 579 ORIs of different efficiency. The A + T content (measured in windows of 300 bp and step of 1 bp) is higher in the more efficient ORIs in a region of approximately 2 kb centered on the position of polymerase switching.
Timing, efficiency, and base composition of replication origins. The replication profiles of genomic regions 40 kb long spanning 100 early and 136 late firing ORIs in S. cerevisiae as classified by Soriano et al. 49 were centered on the position of polymerase switching. The amplitude of the switch and the length of the replicons from early ORIs are greater than in late ORIs. In S. pombe, ORIs identified by Daigaku et al. 13 were classified according to the amplitude of the switch in two groups of 566 and 579 ORIs of different efficiency. The A + T content (measured in windows of 300 bp and step of 1 bp) is higher in the more efficient ORIs in a region of approximately 2 kb centered on the position of polymerase switching.Replication origins in S. pombe have a different organization because they lack consensus sequence elements. Their only shared feature is a high A + T content, regardless of the primary DNA sequence 35, 36. This seems to be their only requirement, as suggested by the finding that artificial A + T‐rich DNA fragments of arbitrary sequence can initiate replication in the chromosome as efficiently as endogenous ORIs 34. As in the case of S. cerevisiae, ORIs have been mapped genome wide in S. pombe by several methods 26, 35, 37, 38, 39 and the large majority of them colocalize with sites of Pol δ/Pol ϵ switching identified by Daigaku et al. 13. The efficiency of ORIs as measured by the amplitude of the switch between Pol δ and Pol ϵ correlates with their A + T content (Fig. 3), as previously found for individual ORIs 34. This correlation reflects the fact that the origin recognition complex (ORC) is targeted to ORIs in S. pombe through the AT‐hooks of the Orc4 subunit of the ORC complex 40. A higher density of ORIs in S. pombe is likely due to the presence of A + T‐rich sequence tracts at many intergenic regions susceptible to being targeted by the Orc4 protein. The presence of many low‐efficiency ORIs in S. pombe is probably the reason why the profile of the replicons is more irregular than in S. cerevisiae (Fig. 2).
Nucleosome repositioning on the DNA lagging strand compromises genome stability
Another study, not directly related to the tracking of DNA polymerases during replication, found that junctions between Okazaki fragments map preferentially to the central region of the 147 bp of mononucleosomal DNA (nucleosomal dyad) in S. cerevisiae. Smith and Whitehouse 14 proposed that the rapid binding of nucleosomes to nascent Okazaki fragments could cause the collision and premature dissociation of the Pol δ engaged in the extension of the next fragment synthesized. The distribution of the nicks at the sites of collision increases towards the dyad region, where the interaction between DNA and the histone core is stronger 41.Several studies have reported that the frequency of single nucleotide polymorphisms (SNPs) in the nucleosomal dyad region is higher than the genomic average. This feature is present in yeasts and other organisms, including humans, and suggests that different DNA regions around the histone core have a different rate of mutation 42, 43, 44, 45. Following up these observations, Reijns et al. 12 found that the junctions between Okazaki fragments and the previously identified spectrum of SNPs in S. cerevisiae largely overlapped along mononucleosomal DNA (Fig. 4). They also used S. cerevisiae mutants harboring Pol α, δ, and ϵ polymerases engineered to incorporate rNTPs in order to test the possibility that the premature release of Pol δ might result in the retention of DNA Pol α‐synthesized DNA at the 5′ end of mature Okazaki fragments. Measurement of the incorporation rate of rNTPs in vitro and in vivo by wild‐type and mutant DNA polymerases, showed that up to 1.5% of Pol α‐synthesized DNA was retained in the fully replicated genome whose sequence stability could be compromised by the lack of 3′ to 5′ proofreading exonuclease activity of Pol α. Since this phenomenon applies to the entire genome, it could account for the increased sequence variability at dyad regions in S. cerevisiae
42 and for the higher A + T or G + C content of nucleosomal dyads relative to the genomic averages in S. pombe and S. cerevisiae, respectively 46, 47, 48. Strikingly, in the case of mononucleosomal DNA from coding regions, these differences in base composition determine a different distribution of amino acids in the two species depending on the position of their corresponding codons relative to the dyad position 48. In S. pombe, higher instability in the dyad region is consistent with the fact that its A + T composition is higher than that of the linker DNA 46, 48. This is probably a consequence of the general bias of the S. pombe genome toward increasing its A + T content 36.
Figure 4
Phased profiles of nucleosomal occupancy, single nucleotide polymorphisms and junctions of Okazaki fragments. The profile of nucleosome occupancy of 3,742 regions 500 bp long spanning three adjacent nucleosomes in the S. cerevisiae genome (taken from Quintales et al. 48) were aligned to the dyad position of the central nucleosome to generate the aggregated profile shown in black. The normalized distribution of polymorphisms 12 (orange) and of the 5′ end of Okazaki fragments 14 (green) on the two DNA strands are in phase with the nucleosomal profile.
Phased profiles of nucleosomal occupancy, single nucleotide polymorphisms and junctions of Okazaki fragments. The profile of nucleosome occupancy of 3,742 regions 500 bp long spanning three adjacent nucleosomes in the S. cerevisiae genome (taken from Quintales et al. 48) were aligned to the dyad position of the central nucleosome to generate the aggregated profile shown in black. The normalized distribution of polymorphisms 12 (orange) and of the 5′ end of Okazaki fragments 14 (green) on the two DNA strands are in phase with the nucleosomal profile.The over‐representation of Okazaki fragment termini was not limited to nucleosomal dyads but was also detectable at other sites of DNA‐protein interaction such as the sites of binding for transcription factors Abf1, Reb1, and Rap1. As in the case of nucleosome assembly on inmature Okazaki fragments, the binding of these proteins to their cognate sites after the passage of the replication fork could affect negatively the processivity of DNA Pol δ 14. Consistent with this model, the rate of nucleotide substitution is higher in regions immediately adjacent to the binding sites of some regulatory proteins and at the edge of human DNase I footprint regions 12, 14.
Conclusions and outlook
The development of methods to monitor ORI activity and the dynamics of replicons at genome‐wide scale in a single experiment represents a turning point in the field of DNA replication. The reliability of these new approaches is supported by the consistency between the replication maps of S. cerevisiae generated in two independent laboratories 10, 12 and the coincidence in the localization and efficiency of ORIs with previous analyses in S. cerevisiae and S. pombe.At present, precise quantitative analyses of replication are difficult to implement at genome‐wide scale and in many cases they can only be applied to individual ORIs. The new qualitative and quantitative analyses of DNA replication will provide a picture of the replication landscape with a resolution comparable to that afforded by RNA‐Seq analyses for transcription. For example, this technology will make it possible to monitor the dynamics of replication in different genetic backgrounds and physiological conditions and to assess the impact of stress and physical or chemical challenges on the replication program in vivo. It also opens the possibility of modifying the many non‐replicative polymerases present in eukaryotes to track their activity and specificity in maintaining genome integrity. Finally, although the required genetic manipulations will not be as straightforward as in yeasts, the implementation of these methods in multicellular organisms would contribute to defining the replication program in different cell types and to detecting abnormal or unscheduled replication patterns under pathological conditions.The authors declare no conflict of interests.
Authors: M K Raghuraman; E A Winzeler; D Collingwood; S Hunt; L Wodicka; A Conway; D J Lockhart; R W Davis; B J Brewer; W L Fangman Journal: Science Date: 2001-10-05 Impact factor: 47.728
Authors: Justin L Sparks; Hyongi Chon; Susana M Cerritelli; Thomas A Kunkel; Erik Johansson; Robert J Crouch; Peter M Burgers Journal: Mol Cell Date: 2012-08-02 Impact factor: 17.970
Authors: Georgette Moyle-Heyrman; Tetiana Zaichuk; Liqun Xi; Quanwei Zhang; Olke C Uhlenbeck; Robert Holmgren; Jonathan Widom; Ji-Ping Wang Journal: Proc Natl Acad Sci U S A Date: 2013-11-25 Impact factor: 11.205
Authors: Emily Rayner; Inge C van Gool; Claire Palles; Stephen E Kearsey; Tjalling Bosse; Ian Tomlinson; David N Church Journal: Nat Rev Cancer Date: 2016-02 Impact factor: 60.716