Olga Yurieva1, Mike O'Donnell1. 1. a Howard Hughes Medical Institute and DNA Replication Laboratory, The Rockefeller University , New York , NY , USA.
Abstract
Eukaryotes require 3 DNA polymerases for normal replisome operations, DNA polymerases (Pol) α, delta and epsilon. Recent biochemical and structural studies support the asymmetric use of these polymerases on the leading and lagging strands. Pol epsilon interacts with the 11-subunit CMG helicase, forming a 15-protein leading strand complex that acts processively in leading strand synthesis in vitro, but Pol epsilon is inactive on the lagging strand. The opposite results are observed for Pol delta with CMG. Pol delta is highly active on the lagging strand in vitro, but has only feeble activity with CMG on the leading strand. Pol α also functions with CMG to prime both strands, and is even capable of extending both strands with CMG present. However, extensive DNA synthesis by Pol α is sharply curtailed by the presence of either Pol epsilon or Pol delta, which limits the role of the low fidelity Pol α to the initial priming of synthesis.
Eukaryotes require 3 DNA polymerases for normal replisome operations, DNA polymerases (Pol) α, delta and epsilon. Recent biochemical and structural studies support the asymmetric use of these polymerases on the leading and lagging strands. Pol epsilon interacts with the 11-subunit CMGhelicase, forming a 15-protein leading strand complex that acts processively in leading strand synthesis in vitro, but Pol epsilon is inactive on the lagging strand. The opposite results are observed for Pol delta with CMG. Pol delta is highly active on the lagging strand in vitro, but has only feeble activity with CMG on the leading strand. Pol α also functions with CMG to prime both strands, and is even capable of extending both strands with CMG present. However, extensive DNA synthesis by Pol α is sharply curtailed by the presence of either Pol epsilon or Pol delta, which limits the role of the low fidelity Pol α to the initial priming of synthesis.
Entities:
Keywords:
CMG; DNA helicase; DNA polymerase; primase; replisome
E. coli and its phages, T4 and T7, utilize multiple copies of an identical DNA polymerase for leading and lagging strand synthesis. The use of identical copies of a DNA polymerase generalizes to replication fork operations in archaeal cells. However, in eukaryotes the picture is more complicated; they contain 3 essential DNA polymerases and their funtions are still being sorted out. The first eukaryotic DNA polymerase to be identified was Pol α. Pol α consists of 4 subunits (Fig. 1); the largest subunit harbors the DNA polymerase activity and the 2 smallest subunits function together to make small RNA primers. The presence of DNA polymerase and primase activities in one protein complex initially suggested Pol α might replicate the genome without needing other DNA polymerases, although the lack of a proofreading 3′–5′ exonuclease was somewhat confounding. Indeed initial biochemical studies in the simian virus 40 (SV40) replication system found that Pol α can function with SV40 T-antigen helicase to prime and extend both the leading and lagging strands. Another DNA polymerase was discovered, called Pol delta, that had a proofreading 3′–5′ exonuclease, and in vitro studies of SV40 replication demonstrated that Pol delta takes over primed sites made by Pol α and replicates both strands of the SV40 genome. Pol delta consists of 4 subunits in human and 3 subunits in yeast (Fig. 1). Pol delta, like the bacterial Pol III replicase, requires 2 accessory factors. Studies in E. coli identified these factors as a circular clamp (PCNA in eukaryotes), and an ATP driven clamp loader complex (RFC in eukaryotes). These accessory factors are now known to be essential to replication in all cells from bacteria to archaea and eukaryotes. Genetic studies in budding yeast demonstrated that a third DNA polymerase, Pol epsilon, is essential for cellular replication; it consists of 4 subunits (Fig. 1). Pol epsilon, like Pol delta, utilizes the RFC clamp loader and PCNA clamp.
Figure 1.
Replication fork proteins of Saccharomyces cerevisiae. The Table to the left lists the names of subunits, complexes, their molecular weight and biochemical function. SDS polyacrylamide gels of recombinant pure proteins are shown to the right of the Table.
Replication fork proteins of Saccharomyces cerevisiae. The Table to the left lists the names of subunits, complexes, their molecular weight and biochemical function. SDSpolyacrylamide gels of recombinant pure proteins are shown to the right of the Table.The 3 DNA polymerases, Pol α, Pol delta and Pol epsilon are members of the B-family of DNA polymerases. The largest subunit is the DNA polymerase, and in both Pol epsilon and Pol delta the large subunit also contains a 3′–5′ proofreading exonuclease, while Pol α lacks a proofreading exonuclease. The 2nd largest subunit of these polymerases, referred to as the B subunit, is essential for cell viability but the role of the B subunits are not understood for any of the 3 DNA polymerases. The smallest subunits of Pol α, Pri1-Pri2, contain the primase activity. The small Dpb3-Dpb4 subunits of Pol epsilon are non-essential and form a histone like heterodimer fold. The function of the third subunit (Pol32) of Pol delta is unknown, but it contains a PCNA binding motif and both Pol31 and Pol32 are also subunits of Pol zeta, a translesion repair polymerase. This review provides a brief overview of the current understanding of how these eukaryotic DNA polymerases function together at the replication fork.
Genetic studies indicate pol epsilon and pol delta act on different strands
Considering that DNA consists of 2 strands, one may expect that only 2 polymerase molecules would be needed for DNA replication. The fact that there are 3 different essential DNA polymerases suggested that Pol α, with its intrinsic priming activity, probably functions as the primase while Pol epsilon and Pol delta perform bulk leading and lagging strand synthesis. Supporting this view is the fact that Pol α has no proofreading exonuclease and thus has lower fidelity compared to Pols epsilon and delta, which each contain a 3′–5′ proofreading exonuclease. Recent studies have identified mutations in the exonuclease active sites of Pol epsilon and Pol delta that are associated with colorectal cancer, consistent with a central role of these polymerases in cellular replication. It has long been thought that Pol epsilon and Pol delta function on different strands (i.e. leading vs lagging), and this presumption was supported by early studies on exonuclease mutants of Pols epsilon and delta. However, these early studies could not assign the DNA polymerases to one strand or the other.The first strong evidence that identified the strand upon which a DNA polymerase functions came from genetic studies using a Pol epsilon active site mutant that leaves a mutation signature on the strand that it acts upon. These studies provided convincing evidence for Pol epsilon on the leading strand in the budding yeast (Saccharomyces cerevisiae). Similar studies using active sites mutants of Pol delta indicated that Pol delta performs lagging strand synthesis. Further genetic studies employed mutations in the active sites of Pols epsilon and delta that frequently misincorporate rNTPs, with the same conclusions. Experiments in fission yeast (Schizosaccharomyces pombe) also supported the assignments of Pol epsilon and Pol delta on the leading and lagging strands, respectively. Additional support was provided by DNA-protein cross-linking studies demonstrating that Pol epsilon cross-links to the leading strand and Pol delta cross-links to the lagging strand. Despite these observations, there remain caveats to the experiments, and a recent report concluded that Pol delta acts as the major replicase on both strands, similar to replication of the SV40 genome. In addition, the catalytic domain of Pol2, the polymerase subunit of Pol epsilon, can be deleted and cells survive, although the cells show severe defects in S phase progression. It was presumed that Pol delta takes over in these mutant cells, possibly similar to studies showing that Pol I takes over chromosome replication in E. coli when the replicative Pol III is deleted. In contrast to the deletion of the catalytic domain of Pol2, active site point mutants of Pol2 were lethal, indicating that Pol epsilon is normally used for replication. Considering that the issue of polymerase assignment to particular strands is not fully resolved, additional information from a different line of investigation may help clarify the issue. Recent experiments have reconstituted the eukaryotic replsome from pure proteins in vitro, and the biochemical studies have shed new light onto this question.
The eukaryotic cellular helicase
Biochemical reconstitution studies of eukaryotic cellular replication have lagged far behind the bacterial in vitro replication field. This has in part been due to the difficulty in identification of the eukaryotic cellular helicase. Bacterial, archaeal and many phage and viral systems utilize a homohexameric helicase to unwind DNA at the replication fork. The eukaryotic replicative helicase was believed for many years to be the heterohexameric Mcm2-7 complex. Indeed, it has long been known that a double hexamer of the Mcm2-7 complex is loaded onto an origin in a G1 phase specific licensing step. The SV40 large T-antigen helicase is also assembled as a double hexamer at the viral origin, and thus Mcm2-7 was expected to perform as a helicase in similar fashion to SV40 large T-antigen. In addition, the archaeal Mcm homohexamer is an active helicase, supporting the hypothesis that Mcm2-7 may be the eukaryotic cellular helicae. While the isolated Mcm2-7 complex is nearly devoid of helicase activity, a Mcm4,6,7 subassembly is a robust 3′–5′ helicase. But more recent studies discovered that the true cellular helicase requires 5 additional proteins.Identification and purification of the active eukarytotic helicase was accomplished in the Drosophila system. The active helicase is a complex of 11 different subunits, Mcm2-7, Cdc45 protein, and the heterotetramer GINS (go-ichi-ni-san). This helicase complex, referred to as CMG (Cdc45, Mcm2-7, GINS), displays robust helicase activity. The Cdc45 and GINS accessory subunits do not interact with ATP, and thus are proposed to provide helicase activity by holding the Mcm2-7 complex in an active conformation for helicase activity. Numerous cell biology and biochemical studies have shown that the Cdc45 and GINS are chaperoned to the Mcm2-7 complex in a multistep reaction at an origin that involves several initiation factors and 2 cell cycle kinases (reviewed in ). Hence, activation of origins of replication appear to be intricately intertwined with activation of the CMGhelicase, and possibly even defined by this process.Identification of the cellular helicase provided the missing link necessary to reconstitute the eukaryotic replication fork in vitro. However, CMG is present at very low concentrations in the cell and amounts needed for intensive biochemical studies required expression by recombinant means. Recombinant DrosophilaCMG has been expressed in the baculovirus system, and CMG of budding yeast and human have also been expressed by recombinant means. In all cases, the isolated recombinant CMG displays ATP dependent 3′–5′ DNA helicase activity. The 3′–5′ polarity of unwinding places CMG on the leading strand of a replication fork. Studies of bacterial and viral homohexameric helicases suggest that they encircle single-strand DNA and use ATP to track along it while excluding the other strand of DNA from inside the ring. In this “steric exclusion” model of unwinding, the helicase unwinds dsDNA as it tracks along the strand that it encircles, acting as a wedge to separate the strand that it sterically excludes (i.e., the strand that it does not encircle).
Reconstitution of the leading strand replisome
Pull-outs of CMG subunits from extracts of budding yeast, followed by mass spectrometry analysis has identified several additional factors that travel with CMG, referred to as the replisome progression complex (RPC). In addition to CMG, the RPC contains Pol α, Ctf4, Mcm10, The Mrc1-Tof1-Csm3 heterotrimer involved in checkpoint control, the FACT complex for nucleosome handling and Topoisomerase I. The RPC is presumed to function at a moving replication fork, along with Pol epsilon, Pol delta, RFC, PCNA and the RPA single-strand binding protein.The first biochemical reconstitution of a functional eukaryotic leading strand replication fork entailed use of 27 different recombinant polypeptides (Fig. 1 and ), and reconstitution of a coupled leading-lagging strand replisome in vitro utilized 31 different proteins (Fig. 2). Initiation at a yeast origin has also been accomplished by recombinant proteins in the absence of Pol delta, RFC and PCNA. Studies of the 3 DNA polymerases in the reconstituted replication fork system, and studies of individual DNA polymerases, has provided insight into the mechanisms by which DNA polymerases are assigned to their respective DNA strands at a replication fork.
Figure 2.
Example of leading-lagging strand replication in vitro. (A) The substrate used in the in vitro assays is a 3 kb forked DNA that has no dG residues on the leading strand and no dC residues on the lagging strand. Therefore 32P-dCTP specifically labels the leading strand product and 32P-dGTP labels lagging strand products. (B) Leading strand reaction time course using 32P-dCTP and either Pols α + epsilon, or Pols α + epsilon + delta. (C) While a low level of lagging strand products are observed with only Pols α + epsilon, significantly more lagging strand fragments are formed by addition of Pol delta. (D) Quantitation of DNA synthesis shows similar extents of leading and lagging strand replication. The panels of the figure are adapted with permission from. Specifically, panels a, b are from Supplemental Figure 6a,b, panel c is from Figure 5b, and panel d is from Figure 7b of .
Example of leading-lagging strand replication in vitro. (A) The substrate used in the in vitro assays is a 3 kb forked DNA that has no dG residues on the leading strand and no dC residues on the lagging strand. Therefore 32P-dCTP specifically labels the leading strand product and 32P-dGTP labels lagging strand products. (B) Leading strand reaction time course using 32P-dCTP and either Pols α + epsilon, or Pols α + epsilon + delta. (C) While a low level of lagging strand products are observed with only Pols α + epsilon, significantly more lagging strand fragments are formed by addition of Pol delta. (D) Quantitation of DNA synthesis shows similar extents of leading and lagging strand replication. The panels of the figure are adapted with permission from. Specifically, panels a, b are from Supplemental Figure 6a,b, panel c is from Figure 5b, and panel d is from Figure 7b of .Biochemical study of pure CMGhelicase, DNA polymerases, PCNA clamp, RFC clamp loader and RPA single-strand binding protein used a pre-formed DNA fork substrate. The forked DNA lacks dC residues on one strand and dG residues on the other (Fig. 2a). This enables specific labeling of leading or lagging strand synthesis depending on whether 32P-dCTP or 32P-dGTP is used (Fig. 2b,c). Processive leading strand synthesis was observed using CMG and Pol epsilon, along with RFC, PCNA and RPA. Substituting Pol delta for Pol epsilon resulted in comparatively little synthesis, indicating that CMG stabilizes Pol epsilon but not Pol delta. Hence, asymmetric polymerase use on the leading strand is recapitulated in vitro. Pol epsilon function on the leading strand is consistent with the bulk of genetic studies. However, it is important to note that future studies that include other proteins that travel with forks could alter these conclusions.Protein interaction studies reveal that Pol epsilon binds directly to CMG, forming a 15-protein “CMGE” complex. Recent single-particle EM 3D reconstruction studies of CMGE reveal that Pol epsilon sits on the GINS-Cdc45 accessory factors, and on the C-terminal side of the Mcm2-7 motor subunits. Keeping in mind that CMG encircles the leading strand, the interaction of Pol epsilon with CMG may confine Pol epsilon function to the leading strand. In this view, the asymmetric use of Pol epsilon over Pol delta in leading strand synthesis is facilitated by a direct connection between CMG and Pol epsilon.
Lagging strand synthesis
Pol delta has been studied for many years, and characterization of its biochemical properties indicated that it is well suited to the actions required on the lagging strand. For instance, Pol delta is highly efficient in switching with Pol α on a primed site. The Pol delta-PCNA complex is uniquely capable of functioning with Fen1 nuclease in the removal of RNA primers to produce a ligatable nick for sealing Okazaki fragments, while Pol epsilon-PCNA does not fulfill this function. Pol delta-PCNA also contains a limited strand displacement activity that provides an efficient substrate for Fen1 nuclease. In addition, the strand displacement activity of Pol delta-PCNA has recently been shown to be processive during the 1–2 seconds required for action with Fen1 during Okazaki fragment repair. On a primed RPA coated single-strand DNA, Pol delta-PCNA is processive but quickly self-ejects from the PCNA clamp shortly after completing replication (i.e. it only stays with completed DNA the few seconds needed to function with Fen1). This self-ejection feature of Pol delta-PCNA may underlie the observation that Pol delta-PCNA has only feeble activity on the leading strand with CMG. Specifically, Pol delta-PCNA extends DNA faster than CMG unwinds DNA, and thus Pol delta-PCNA will “bump” into CMG which may trigger the self-ejection of Pol delta from PCNA resulting in the observed distributive synthesis with CMG. Biochemical studies also show that Pol delta rapidly switches with Pol epsilon on PCNA-primed ssDNA, a lagging strand mimic. Extrapolation of these actions to a moving replication fork suggests that even if Pol epsilon were to assemble with PCNA at a lagging strand primed site, Pol delta would soon switch with it and take over.Reconstitution of an efficient leading-lagging strand replication system required the addition of all 3 DNA polymerases, along with CMG, RFC, PCNA and RPA. In the presence of all 3 DNA polymerases, the leading and lagging strands were synthesized in equal amounts, and the Okazaki fragments were the length expected from in vivo studies (Fig. 2c,d). Experiments using individual polymerases, or combinations of 2 DNA polymerases with CMG, provided insight into the diferent functions of Pols α, epsilon and delta on the leading and lagging strands (summarized in Fig. 3). One unexpected finding was that Pol epsilon could not extend lagging strand primers into Okazaki fragments, even in the complete absence of Pol delta. Hence, CMG-Pol epsilon is not capable of acting on the lagging strand. Furthermore, in the absence of both Pol epsilon and Pol delta, Pol α is capable of priming and extending both the leading and lagging strands with CMG (explained further in the next section). However, the presence of Pol epsilon results in a Pol epsilon switch with Pol α, which stimulates the leading strand but inhibits the lagging strand (Fig. 3a,b). The mechanism that prevents Pol epsilon function on the lagging strand is not yet understood. One possibility is that Pol epsilon is inefficient in extending short primed sites made by Pol α, as suggested from studies of isolated Pol epsilon. In summary, Pol epsilon and Pol delta are only efficient in synthesis when they are on the “right” strand. Specifically, Pol epsilon is most active on the leading strand and Pol delta is most active on the lagging strand (Fig. 3). The in vitro results indicate that asymmetric activity of Pol epsilon and Pol delta on the leading and lagging strands is an inherent property of the core helicase/polymerase components of the replisome.
Figure 3.
Asymmetric use of DNA polymerases at a replication fork. The diagrams refer to conclusions from in vitro replisome reconstitution reactions demonstrating that Pol epsilon is the dominant enzyme on the leading strand and Pol delta is the dominant enzyme on the lagging strand . (A) Pol α functions with CMG on the leading strand, (B) Pol delta switches with Pol α on the leading strand but lowers synthesis, (C) Pol epsilon takes over from both Pols α and delta to provide the most synthesis on the leading strand. (D) Pol α can function on the lagging strand in the absence of other polyemrases, (E) Pol epsilon takes over from Pol α on both strands, but is only active on the leading strand, (F) Pol delta extends Okazaki fragments in the presence of Pol epsilon and Pol α.
Asymmetric use of DNA polymerases at a replication fork. The diagrams refer to conclusions from in vitro replisome reconstitution reactions demonstrating that Pol epsilon is the dominant enzyme on the leading strand and Pol delta is the dominant enzyme on the lagging strand . (A) Pol α functions with CMG on the leading strand, (B) Pol delta switches with Pol α on the leading strand but lowers synthesis, (C) Pol epsilon takes over from both Pols α and delta to provide the most synthesis on the leading strand. (D) Pol α can function on the lagging strand in the absence of other polyemrases, (E) Pol epsilon takes over from Pol α on both strands, but is only active on the leading strand, (F) Pol delta extends Okazaki fragments in the presence of Pol epsilon and Pol α.
Priming of DNA synthesis
One may question why Pol α contains a DNA polymerase activity, considering that the cell has 2 other DNA polymerases for the leading and lagging strands. Conceptually, only the RNA priming activity of Pol α should be required for the replication fork. The relevance of DNA synthesis by Pol α has not garnered much attention. Archaea do not have a 4-subunit Pol α and instead contain only the heterodimer primase (i.e., Pri1-Pri2). Archaea differ from eukaryotes in having circular chromosomes and perhaps the DNA polymerase of Pol α is important to convert the product of telomerase to double-strand DNA (i.e. telomere second strand synthesis). Regardless of the polymerase function, all eukaryotes studied thus far have a 4-subunit Pol α. In vitro studies show that Pol α requires CMG for priming activity in the presence of the RPA single-strand binding protein. The dependence of priming activity on the interaction of primase with its cognate helicase is a common theme in bacterial and phage systems. Studies in the SV40 system also demonstrate that Pol α requires interaction with the SV40 T-antigen helicase for priming activity on plasmid DNA containing the SV40 origin.As described above, Pol α can prime and extend both the leading and lagging strands in combination with CMGhelicase, but the presence of Pol epsilon or Pol delta stops Pol α activity. This remains the case even when the Pol epsilon or Pol delta is functioning on the “wrong” strand. For example, Pol α lagging strand products are inhibited by Pol epsilon, and conversely, leading strand extension by Pol α is inhibited by Pol delta. These observations suggest that in the cell, where all 3 DNA polymerases are present simultaneously, that Pol α DNA polymerase activity will be sharply curtailed by DNA polymerases epsilon and delta. However, the priming function of Pol α is not curtailed.
Conclusion
Recent studies demonstrate that a functional eukaryotic replisome can be reconstituted from pure proteins, thus laying a foundation for future mechanistic studies. The initial studies reveal asymmetric functional properties of Pol epsilon and Pol delta on the leading and lagging strands, respectively, yet much remains to be understood. The current stage of development has only used naked DNA of limited sequence context. In the cell, the replisome will encounter a vast combinatorial landscape of nucleotide sequence, various types of DNA lesions, transcribing RNA polymerase and many other protein blocks. Additionally, eukaryotic DNA is packaged into nucleosomes, some of which condense DNA into highly packed heterochromatic regions. There also exist cohesion rings that hold the 2 daughter chromosomes together, possibly a consequence of a replisome that replicates through the giant cohesion rings. The replisome copies every nucleostide of the entire genome, and thus must have mechanisms to cope with and traverse all impediments to replication. How the replisome deals with the many biological impediments it encounters is completely unknown. Furthermore, the replisome is regulated by a variety of posttranslational modifications. For example, induction of the DNA damage checkpoint response results in phosphorylation of Mcm subunits, and an initial study indicates that CMGhelicase activity is altered in response to DNA damage induced phosphorylation. DNA damage also leads to ubiquitinylation of PCNA, and this is associated with translesion bypass performed by a variety of specialized translesion DNA polymerases that function with ubiquitinated PCNA. The Mcm10 subunit of the RPC is an essential protein, yet little is known about its function, and the Ctf4 homotrimer is known to cross-link Pol α to CMG, but the functional consequence of this interaction is still unknown. Interestingly, the Mcm2 subunit binds a H3-H4 heterotetramer. Whether the Mcm2 subunit, in the context of CMG, can bind histones is not yet known, but the results with pure Mcm2 indicate that the interaction faciliates nucleosome assembly onto DNA. The FACT nucleosome handling factor is also a member of the RPC. Thus it seems likely that the repllsome machinery is directly involved in nucleosome handling, and thus might play a direct role in the transfer of epigenetic information to the next generation.
Authors: Agnieszka Gambus; Richard C Jones; Alberto Sanchez-Diaz; Masato Kanemaki; Frederick van Deursen; Ricky D Edmondson; Karim Labib Journal: Nat Cell Biol Date: 2006-03-12 Impact factor: 28.824
Authors: Stephanie A Nick McElhinny; Dmitry A Gordenin; Carrie M Stith; Peter M J Burgers; Thomas A Kunkel Journal: Mol Cell Date: 2008-04-25 Impact factor: 17.970
Authors: Jingchuan Sun; Yi Shi; Roxana E Georgescu; Zuanning Yuan; Brian T Chait; Huilin Li; Michael E O'Donnell Journal: Nat Struct Mol Biol Date: 2015-11-02 Impact factor: 15.369
Authors: Roxana E Georgescu; Grant D Schauer; Nina Y Yao; Lance D Langston; Olga Yurieva; Dan Zhang; Jeff Finkelstein; Mike E O'Donnell Journal: Elife Date: 2015-04-14 Impact factor: 8.140
Authors: Xuanxuan Xing; Daniel P Kane; Chelsea R Bulock; Elizabeth A Moore; Sushma Sharma; Andrei Chabes; Polina V Shcherbakova Journal: Nat Commun Date: 2019-01-22 Impact factor: 14.919
Authors: Daniel Hürtgen; Judita Mascarenhas; Michael Heymann; Seán M Murray; Petra Schwille; Victor Sourjik Journal: Chembiochem Date: 2019-08-28 Impact factor: 3.164