Literature DB >> 29284393

The Interplay between G-quadruplex and Transcription.

Nayun Kim1.   

Abstract

G4 DNA is a non-canonical DNA structure consisting of a stacked array of Gquartets held together by base pairing between guanine bases. The formation of G4 DNA requires a cluster of guanine-runs within a strand of DNA. Even though the chemistry of this remarkable DNA structure has been under investigation for decades, evidence supporting the biological relevance of G4 DNA has only begun to emerge and point to very important and conserved biological functions. This review will specifically focus on the interplay between transcription and G4 DNA and discuss two alternative but interconnected perspectives. The first part of the review will describe the evidence substantiating the intriguing idea that a shift in DNA structural conformation could be another layer of non-genetic or epigenetic regulator of gene expression and thereby an important determinant of cell fate. The second part will describe the recent genetic studies showing that those genomic loci containing G4 DNA-forming guanine-rich sequences are potential hotspots of genome instability and that the level and orientation of transcription is critical in the materialization of genome instability associated with these sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.net.

Entities:  

Keywords:  G4 DNA; R-loops; Top1; genome stability; supercoiling; transcription.

Mesh:

Substances:

Year:  2019        PMID: 29284393      PMCID: PMC6026074          DOI: 10.2174/0929867325666171229132619

Source DB:  PubMed          Journal:  Curr Med Chem        ISSN: 0929-8673            Impact factor:   4.530


INTRODUCTION

Free guanine bases in solution readily interact with each other via Hoogsten bonds to form a four-membered ring-like structure referred to as a G-quartet [1]. G-quaduplex or G4 DNA, comprising multiple G-quartets stabilized by stacking also readily form from single-stranded oligonucleotides in solution. The presence of various cations, such as K+, Na+, Ca2+, and Sr2+, facilitates the G4 DNA formation with K+ having the most stabilizing effect. Nucleosides between guanine-runs are incorporated into the structure as loops between G-quartets, and the size of the loops can determine the relative stability of various G4 DNA configurations. Sequences with potential to form G4 DNA or “G4 motifs” were first noted at the telomeres, ribosomal DNA arrays, Immunoglobulin (Ig) heavy chain loci, Chromosomal Fragile Sites (CFSs) and G/C-rich micro- or mini-satellites. More recently, searching for sequences with at least 4 G-runs with the loop length of less or equal to 7 nt, computational analyses identified ~1,400 and ~370,000 putative G4 motifs in the Saccharomyces cerevisiae nuclear genome and the human genome, respectively [2, 3]. The possible biological function of G4 DNA was first inferred when the guanine-rich telomere sequences were shown to assume the secondary structure through guanine-guanine base-pairing and when many of the proteins known to be telomere-localized were shown to bind G4 DNA with high affinity [4, 5]. Whether the G4 motifs besides the telomeric repeats would form G4 DNA in vivo stable enough to carry out a particular cellular function has been debated for long. Here, I discuss how transcription critically impacts the conformational change of G4 motifs in the genome into the G4 DNA structure and, reciprocally, how the G4 DNA both positively and negatively regulates the level of transcription.

TRANSCRIPTION REGULATION BY G4 DNA

In silico Evidence of G4 as a Regulator of 
Transcription

Analysis of the list of the putative G4 motifs in the human genome identified in silico uncovered that G4 motifs are highly enriched at the promoter regions of ~20,000 human genes compared to the whole genome-wide distribution [6]. In fact, they found that 42.7% of the genes surveyed contained at least one G4 motif within 1 Kb upstream of the transcription start site (TSS). Additionally, enrichment of G4 motifs correlated with previously characterized gene regulatory elements including enhancers, conserved transcription factor binding sites, and nuclease hypersensitive sites (NHS) [6, 7]. NHS are indicative of more accessible regions of the genome and broadly mark regulatory sequences. G4 motifs are also enriched within 500 nt downstream of TSS of human genes [8]. In yeast, where the extensive annotation of ORFs and regulatory regions allow more comprehensive characterization of the distribution pattern of G4 motifs, the correlation between gene regulatory regions and the G4 motifs was identified as the most notable aspect [9]. Although the yeast genome is relatively low in GC content (38 to 39%) compared to the mammalian genomes (46% in human and 51% in mouse), G4 motifs were enriched by 6-fold at promoter regions defined as -850 to -50 relative to TSS compared to genome-wide distribution. Another smaller peak of enrichment was present in the regions within 400 nt downstream of TSS. When yeast cells were treated with the G4 ligand N-methyl mesoporphyrin IX (NMM), there was a significant up-regulation of expression for many of the genes with G4 motifs in the promoter regions. A putative role in the important cellular function such as transcriptional regulation means that the evolutionary conservation would be expected of G4 motifs. Capra et al. compared the genomes of S. cerevisiae and six other yeast species in order to determine the level of conservation of the G4 motifs in these closely related species [10]. Allowing for up to 50 nt loop size, G4 motifs identified in these genomes showed significantly higher conservation than would be expected with 34 of 552 motifs being conserved in all 7 yeast species surveyed. In addition, within a G4 motif, nucleotides at the positions where mutation would lead to the disruption of G4 DNA structure were more highly conserved than those at non-disruptive positions, indicating that these sequences were evolved to retain their capability to form the secondary structure. G4 motifs identified in this study were strongly associated with gene promoters. Overall, this suggests that the regulation of transcription could be a function of G4 DNA conserved from yeast to human.

Confirmation of In Vivo G4 DNA Formation at Promoters

For many of the G4 motifs identified proximal to the promoter regions, the potential to fold into G4 DNA structure has been verified by in vitro approaches, including Dimethyl Sulfate (DMS) footprinting, Circular Dichroism (CD) and Electrophoretic Mobility Shift Assay (EMSA). However, the critical proof of in vivo G4 DNA formation at these same sequences has been absent until the recent identification and characterization of antibodies specific to G4 DNA [11-14]. Among these, two recombinant antibodies, HF2 and BG4, were isolated by screening the Tomlinson J library of phage-displayed single-chain variable fragment (scFv) antibodies. Both of these scFv antibodies bind specifically to G4 DNA in vitro and have been successfully used in the detection of cellular G4 DNA in immunohistochemistry (IHC) and/or chromatin immunoprecipitation (ChIP) analyses [13-15]. ChIP with the HF2 antibody followed by deep sequencing identified 175 HF2-bound peaks in human breast cancer cells that contain the consensus G4 motifs with the maximum loop size of 7nt [14]. In order to determine the biological consequence of the G4 DNA, the authors determined the effect of the G4-stabilizing ligand pyridostatin (PDS) on the gene expression of a set of 8 genes identified to have in vivo G4 DNA formation at the promoter regions by HF2-ChIP-seq experiments. Significant changes were observed for 6 of these genes, confirming that the presence of G4 DNA does regulate transcription from these promoters. Interestingly, both significant down-regulation and up-regulation of the gene expression levels by PDS-treatment were observed suggesting that the G4 DNA-mediated regulation involves both activation and repression of transcription. BG4 antibody binds to DNA G-quadruplex with a very high affinity (Kd = 1 to 2 nM) with no detectable binding to single strand RNA or DNA [13]. Human cells incubated with BG4 exhibit discreet foci that disappear upon treatment with DNase or competition with G4 DNA-forming oligos. For the mitotic spread of chromosomes, BG4 bound at telomeres are visible at each ends of the chromosome as expected with the human telomeres consisting of G-rich TTAGGG repeats. However, the majority of BG4 foci were present outside of the telomeric regions. The identity of these non-telomeric BG4 foci were recently clarified by the ChIP-seq experiment where the fragments of protein/ nucleosome-crosslinked human chromosomes are immunoprecipitated with bead-immobilized BG4 and followed up by deep sequencing [15]. Of the ~10,000 BG4-bound peaks identified in the immortalized human keratinocytes, 87% conformed to a G4 forming sequence. However, the number of the BG4 peaks were significantly lower than would be predicted by in silico analysis indicating that the presence of G-run containing sequence does not constitute the sufficient condition for G4 DNA folding in vivo and that not all G4 motifs assume the G4 DNA conformation. This also implies that the number and distribution of G4 DNA could vary by the epigenetic state determined by the cell type and status. Surprisingly, only 21% of the BG4 peaks fell within the strictly canonical G4 consensus sequence of G3+N1-7G3+N1-7G3+N1-7G3+ indicating that a broader definition of G4 motifs needs to be considered. Similar results were obtained when a different scFv-type antibody named D1, specific for parallel G4 DNAs, was expressed from a plasmid construct in cultured human cervical carcinoma SiHa cells [16]. The stringent G4 consensus of G3+N1-7G3+N1-7G3+N1-7G3+ comprised minority of the D1 binding sites identified by the ChIP-seq approach in this study. More of the D1 binding sites fell into a broader consensus allowing for longer loop lengths (G N G N G N G). One interpretation of the overrepresentation of non-canonical G4 forming sequences is that binding of BG4 or D1 antibody potentially stabilizes very transient G4 DNA in vivo. This forced stabilization of G4 structures by the antibody binding could potentially lead to a high rate of false positives in future studies using this or other similar approaches. Therefore, biochemical probing to clarify the relative thermal stability of G4 DNA/ antibody complex compared to the unbound G4 DNA is a critical future direction. An alternative explanation for the non-canonical G4-forming sequences identified as BG4- or D1-binding sites is that, depending on the context, the loops larger than 7-nt can be stably accommodated in G4 DNA configurations in vivo and are therefore biologically relevant. In contrast to the report indicating that shorter loop lengths of <4 nt result in the greater thermal stability of the G4 DNA in vitro and the higher risk of genome instability in vivo [17], others have reported of stable G4 DNA with significantly larger loops not conforming to the definition of G4 sequence widely used in computational analysis (G3N1-7G3N1-7G3N1-7G3) [18]. Additionally, one of the two GC-rich sequence motifs identified at the promoter of the human BCL2 gene formed a stable parallel G4 DNA with 13-nt loop [19].

How does the Presence of G4 DNA Regulate Transcription?

G4 DNA as the Physical Obstacle to the RNA Polymerase Complex

Compared to the large set of indirect, mostly computational, evidence implicating G4 DNA in the gene expression regulation, studies into the mechanical basis of G4 DNA-dependent activation or repression of transcription has been quite sparse. G4 DNA as a DNA replication block has been well documented through multiple in vitro primer extension experiments or polymerase stop assays involving the minimal, required components of DNA polymerase activity [20-22]. Accordingly, as will be discussed below, G4 DNA-forming sequences appear as genome instability hotspots in vivo. It is therefore postulated that G4 DNA can also block transcription by becoming a physical obstacle in the way of the RNA polymerase. According to in vitro transcription experiments with only the minimal, required proteins, a G4 motif located in the transcribed region can indeed impede the movement of RNA polymerase complex [23-26]. When the G4 motifs are present on either strand within the transcribed region, the intramolecular G4 DNA on the template strand as well as the intermolecular G4 DNA composed of the non-template DNA strand and the nascent RNA would co-transcriptionally form [27] and physically interfere with subsequent rounds of the RNA polymerase movement (Fig. and ). Alternatively, the intramolecular G4 DNA on the non-template strand could interfere with transcription by first impeding the reannealing of two DNA strands behind the RNA polymerase complex and thereby affording a favorable condition for the formation and stabilization of RNA:DNA hybrids involving the template DNA strand (Fig. ) [26]. In support of this mode of transcription block, removing RNA:DNA hybrids by the addition of recombinant RNase H to the transcription reaction facilitated efficient RNA polymerase elongation past the G4 sequence in an in vitro transcription assay. Another piece of evidence supporting the biological function for the G4 DNAs near TSS comes from the analysis of transcription pause sites in human T cells. RNA Pol II-mediated transcription occurs in phases of initiation, elongation, and termination [28]. Between initiation and elongation phases, Pol II transcription undergo pauses, which might serve as a check point. Such “promoter proximal pausing” can occur within 50 nt of TSS in mammalian cells and the release from this paused state into the fully elongating RNA polymerase complex is considered to be the rate-limiting step in transcription. Corroborating the results of in vitro transcription experiments where G-runs blocked transcription by T7 RNA polymerase [24, 26], Eddy et al. described a correlation between the G4 motifs near TSS and the promoter-proximal transcriptional pausing [29]. Another report by Du et al., however, posited a somewhat contradictory view of the role of G4 DNA located within the transcribed areas [8]. By considering ~8,000 human genes identified to contain G4 motifs proximal to TSS, Du et al. determined that > 5,000 genes contained at least one G4 motif within 500 nt downstream of the TSS. More of these downstream G4 motifs were present on the coding (non-template) strand and correlated with significantly higher expression levels, leading the authors to suggest a transcription-activating role for these G4 motifs. One possible mechanistic model to explain this correlation is that the G4 folding of non-template strand keeps the template strand single-stranded, thereby, facilitating the subsequent rounds of RNA polymerization [8]. However, the argument for the transcription-stimulatory role of G4 in this study was partly based on the increased RNA polymerase occupancy near the TSS that correlate G4 motifs, which can be alternatively interpreted as the evidence of the RNA polymerase complex pausing.

G4 DNA as the Docking Site for Transcription Factors

As described above, transcription block by assembling a structural barrier can sufficiently account for the transcriptional repression by the G4 DNA located in the first intron, downstream of TSS. However, this model of transcriptional regulation by G4 DNA does not provide an explanation for how G4 DNA located 5’ to TSS in the promoter regions block or enhance transcription. Entirely different perspective is required to understand the G4 DNA function as a transcription enhancer. One possible model is that G4 DNA can be a high-affinity binding site of certain transcription factors (Fig. ). A well-established example is the human SP1 protein, which is a zinc-finger transcription factor that is ubiquitously expressed and controls the expression of many house-keeping genes. It had been initially characterized as a typical sequence specific double strand DNA binding protein with the minimal consensus of 5’-GGGCGG-3’. However, an empirically determined SP1 binding site at the promoter of the oncogene c-KIT was shown to not contain the consensus binding sequence but instead to form the G4 structure [30]. In vitro binding assays were carried out to demonstrate that SP1 binds with higher affinity to the G4 DNA formed by the single strand oligonucleotide representing the SP1 binding site at the c-KIT promoter. An analysis of the actual genome-wide SP1-binding sites determined by a ChIP-seq experiment showed that ~36% of the SP1-occupied sites did not contain the 5’-GGGCGG-3’ consensus binding sequence while a majority of such sites contained one or more G4 motifs. Then, SP1-dependent transactivation is not simply genetically determined (i.e. DNA sequence-dependent) but can significant dependent on the conformation of the DNA and thus dependent on the factors affecting the DNA conformation. Another example of G4-binding transcription factors is the Myc-associated zinc-finger protein (MAZ), which binds to the promoter of KRAS gene [31]. The promoter of KRAS gene contains a G4 forming sequence that coincides with a nuclease hypersensitivity site and overlaps the binding site for a known transcription factor MAZ [32]. Stabilization of the G4 DNA structure favored MAZ-binding at the KRAS promoter leading to activation of transcription whereas point mutations disrupting the G4 DNA conformation led to down-regulated expression of KRAS [31]. Recently, an essential general transcription factor PC4 (hSub1), which was previously characterized as a single strand DNA binding protein [33] was also identified as a high-affinity G4 DNA binding protein. Discovery of other structure-specific transcription trans-activators could be on the way, which would shed more light on how binding of these proteins could be the mediating step in the G4 DNA-dependent activation of transcription. Nucleolin (NCL), a highly abundant and conserved protein largely localized in the nucleolus [34], is another highly specific G4 DNA-binding protein that can serve as a transcription activator. NCL is a multi-functional protein with a major role in the ribosomal RNA maturation. In hematopoietic cells, its function is implicated in wide-range of cellular activities including B lymphocyte maintenance and regulation of apoptosis and inflammation [35]. Overexpression and mislocalization of NCL is a common biomarker of variety of cancers [36, 37]. NCL contains tandem RNA Recognition Motifs (RRMs) as well as multiple RGG (arginine/glycine/glycine) boxes at the C-terminal domain, both of which contribute to its high-affinity interaction with G4 DNA (Kd ≅ 1 nM) [38]. Together with hnRNPD, NCL forms a lymphocyte-specific complex LR1 (lipopolysaccharide responsive factor 1), which binds at the G4 DNA-forming Immunoglobulin heavy chain (IgH) switch regions [39]. When bound at the G4 DNA present in the promoter of the human VEGF gene, NCL functions as transcription activator [40] in a manner similar to the above-described SP1 and MAZ. However, the significance of NCL-G4 DNA interaction in the transcription process is complicated with apparently incongruous reports. NCL binds to the G4 motif that was recently identified at the Long Terminal Repeat (LTR) promoter of Human Immunodeficiency Virus-1 (HIV-1) and functions like a molecular chaperon to facilitate G4 DNA folding [41, 42]. Rather than leading to transcriptional activation like in the case of its binding to the VEGF promoter, the consequence of NCL interaction with the HIV-1 LTR promoter is to repress transcription. Overexpression of NCL in human breast epithelial cells transformed with the LTR promoter reporter construct led to the transcriptional silencing of the LTR promoter [42]. Conversely, siRNA-mediated depletion or aptamer-mediated disruption of NCL led to activation of transcription from the LTR promoter. NCL similarly functions as transcriptional repressor at the c-MYC promoter [43]. According to in vitro assays carried out with HeLa nuclear extract, binding of NCL to the G4 motif present proximal to the P1 promoter of c-MYC inhibits transcription and represses expression of a reporter gene under P1 promoter control in vivo [44, 45]. Although the mechanism of how NCL-G4 DNA complex inhibits transcription from P1 promoter both in vivo and in vitro is not entirely clear, the very close proximity of the G4 motif to the TSS (-142 to -115) leads to the speculation that this alteration in the DNA topology might interfere with the binding of other necessary transcriptional activators or impede the assembly of the transcription initiation complex.

G4 DNA and Nucleosome-free Region

A large majority of actively transcribed genes in yeast and human share a distinct nucleosome depleted region just upstream of TSS (reviewed in [46]). In addition, other regulatory regions (i.e. transcription factor binding sites) of active genes are relatively nucleosome-free. Comparison of the nucleosome occupied sequences in the genomes of nematode C. elegans and human with the computationally determined G4 motifs showed that potential G4 sequences were frequently found outside of nucleosome-bound regions [47]. For both C. elegans and human, this correlation between G4 motifs and the nucleosome-depleted regions were stronger when considering the subset of G4 motifs with higher expected stability (i.e. those with smaller loops between G-runs). Similar results were obtained when the nucleosome occupancy was compared to the G4 motifs identified in the yeast genome [48]. Nucleosome-free regions can be separated from the protein/nucleosome-dense regions of the genome by centrifugation and mapped using a technique called “formaldehyde-assisted isolation of regulatory elements with sequencing (FAIRE-seq)” [49]. In a normal human keratinocyte cell line, 98% of the G4 DNA peaks as determined by ChIP with the G4-specific BG4 antibody overlapped with the nucleosome-free regions identified by FAIRE-seq approach [15]. These results indicate that the stable formation of G4 DNA and possibly other non-canonical DNA secondary structures preclude the sharp bend in DNA required for nucleosome assembly and thus serve to locally exclude nucleosome to maintain open conformation (Fig. ). Nucleosome-exclusion by G4 DNA could be a mechanism of enhancing the transcription initiation and overall transcription rate. Alternatively, the correlation of G4 DNA and nucleosome depleted regions could be viewed as an indication that the unraveling of double strand DNA and folding into the G4 conformation is hindered when the DNA is compactly wrapped around a nucleosome.

The Functional Relevance of the G4 Motifs to the Transcriptional Regulation

G4 DNA Resolution and Transcription

The ChIP-seq experiments with the G4-structure specific antibodies HF2 and BG4 indicate that only a subset of the G4 motifs identified in silico assumes the G4 DNA conformation in vivo. Unlike the in vitro experiments carried out with naked single-stranded oligonucleotides in buffers, factors other than the sequence must contribute to the G4 DNA folding at a given G4 motif. If the structural transition from the double-stranded B DNA to the tetrahelical G4 DNA with G-G base pairings is required for the transcriptional regulation, any proteins that impinge on the equilibrium of this transition are expected to affect the expression of genes with G4 motifs at promoters. Several DNA helicases have been characterized to unwind G4 DNA with higher efficiency than the canonical B DNA (reviewed in [5]). In human cells, a defect in the BLM helicase, a RecQ family 3’ to 5’ DNA helicase that recognizes and unwinds G4 structures [50], led to significant changes in the expression level of genes with G4 motifs near TSS [51]. This result not only reveals that the G4 DNA located at promoter regions could have a regulatory role but also suggests that the G4 DNA-dependent transcriptional dysregulation could be a pathogenic feature. Patients with Bloom syndrome, who are characterized genetically by the recessive autosomal mutations in the BLM gene, present with symptoms such as the gross developmental defect and the predisposition to leukemia and lymphoma. Similarly, in yeast cells with a deletion of the gene encoding the homolog of BLM helicase, Sgs1 (sgs1∆), many genes with the promoter-proximal G4 motifs were selectively down-regulated [9]. WRN is another RecQ family 3’ to 5’ DNA helicase with the biochemically demonstrated capability to unwind G4 DNA structures. Loss of function mutations in WRN is responsible for the premature aging disease Werner Syndrome. Comparing the gene expression patterns in fibroblasts from Werner syndrome patients to those from heathy donors, G4 motifs were found to be highly enriched at those genes down-regulated in patient cells [52]. G4 motifs were found enriched on both the template and non-template strands, and upstream of TSS, and in the first intron. On the other hand, those genes up-regulated in Werner Syndrome patient fibroblasts were generally G4-depleted upstream of TSS. These results were corroborated by another more recent report [53]. In an independent experiment, the highly significant correlation between G4 motifs in the promoter regions and the differential gene expression levels was also made in fibroblasts from patients with Bloom or Werner syndromes but not from patients with Rothmund-Thompson syndrome [54]. Rothmund-Thompson syndrome results from a defect in RECQ4, which is a BLM/ WRN-related RecQ family DNA helicase but, unlike BLM or WRN, lacks the RQC domain that directs its helicase activity to G4 DNA.

G4 DNA in Vertebrate Embryonic Development

Despite the relatively low GC content of the zebrafish genome (38.6%) compared to mammalian species such as human (46.1%) or mouse (51.2%) [55], the computational search showed a significant enrichment of G4 forming sequences within 1000 bp upstream of TSS of zebrafish genes [56]. According to this study, those genes containing G4 motifs at the promoter proximal regions identified in zebrafish were 73% and 60% shared with human and mouse, respectively, indicating that G4 motifs in these genes were maintained and positively selected for their functional roles. Of the 926 genes identified with the promoter-proximal G4 motifs in the zebrafish genome, 120 genes were previously characterized to be involved in various pathways of development. When the G4 motifs were cloned into the reporter plasmid, G4 sequence led to stimulation of the expression of the reporter gene, which is different from the G4-dependent transcription-repression proposed by earlier work. When zebra fish embryo was injected with G4 binding ligand TMPyP4, it led to lower expression of the select G4-containing genes tested. The interaction with TMPyP4, which can stabilize or destabilize G4 structures depending on the target [57], resulted in less stable G4 structures for these particular G4 motifs according to the in vitro circular dichroism spectroscopic analysis [56]. This is in contrast to many other studies described in this review where TMPyP4 functions to regulate transcription of the promoter G4-containing genes by stabilizing G4 DNA conformation. Similar transcriptional down-regulation was observed when the cells were injected with the anti-sense oligos specific to the G4 motifs at the col2a1, fzd5 and nog3 genes with promoter proximal G4 motifs. These Anti-Sense Oligos (ASOs) were designed to base-pair with the G-run containing strand and thereby inhibit G4 DNA folding. Overall these results point to transcription-activation mediated by G4 DNA in zebrafish. Underscoring the significance of the G4 DNA-mediated gene regulation in the embryonic development of the fish, each ASO targeting the G4 motif present at col2a1, fzd5 or nog3 gene led to a defect in body morphology, eye development, or head morphology, respectively.

G4 DNA in Stress Response in Plants

In plants, G4 DNAs are thought to be involved in the transcriptional response to various environmental stress conditions (reviewed in [58]). In Arabidopsis thaliana, a large number of genes are differentially expressed in drought conditions, and it was noted that 45% of these drought-responsive genes contain at least one G4 motif [59]. When considering the G4 motifs identified in the genome of Zea mays (maize) using the stringent definition of G4 motifs (G3+N1-7G3+N1-7G3+N1-7G3+) with the maximum loop size of 7 nt, ~24% of all expressed genes in this plant species contained at least one G4 motif [60]. The distribution pattern of the G4 motifs within genes were similar to that described for yeast and the metazoan species; the regions proximal to TSS were significantly enriched for G4 motifs with the highest density within the 1 – 100 nt downstream of the TSS. Those G4 motifs present downstream of the TSS were highly biased to be on the template/anti-sense strand, which makes it likely that they could regulate transcription by blocking the RNA polymerase movement (Fig. ). In addition, the presence of multiple G4 motifs proximal to TSS correlated with those maize genes that are highly conserved in four other related pan-grass species, indicating that these G4 motifs were evolutionarily conserved due to their functional role. Many of the G4 motif-containing genes fell into the categories of energy signaling, sugar metabolism, hypoxia, and response to DNA damage, among others. More comprehensive survey of the genomes of 15 different plant species found that the consensus G4-forming sequences with runs of 3 or more consecutive guanines are very frequently found in the genomes of monocot species including Setaria italica (millet) and Oryza sativa (rice), two of the most important food crops [61]. The pattern of distribution was consistent with that found in A. thaliana [59]; G4 motifs were enriched at the promoter regions and relatively depleted in the coding region [61]. Moreover, the gene ontology analysis showed that many of the orthologous genes containing conserved G4 motifs were involved in several distinctive biological pathways, namely, reproductive development, ion transmembrane transport, and regulation of gene expression. Even though the empirical evidence is yet to come, according to the computational analyses described above, G4-mediated transcriptional regulation is expected to be conserved in these plant species.

Significance of G4-mediated Gene Regulation and Proto-oncogenes

G4 motifs have also been identified at the promoter regions of many important proto-oncogenes (Reviewed in [62]). One example is the important proto-oncogene c-MYC, which, as described above, contains a G4 motif found proximal to the P1 promoter where the majority of transcription is initiated [43, 63]. This 27-nt sequence conforms to the strict definition of G4 forming sequence and supports the formation of very stable G4 structures with three stacked G-tetrads. Concurrent with the identification of the G4 motif at the P1 promoter, Siddiqui-Jain et al. demonstrated the G4-dependent regulation of P1 promoter using two Burkitt’s lymphoma cell lines with different c-MYC translocations [43]. Stabilization of G4 DNA by TMPyP4 treatment resulted in a significant down-regulation of the c-MYC transcription only in the cell line with intact P1 promoter but not in the cell line where P1 promoter is lost due to the translocation. Conversely, a single G to A mutation of this P1-associated G4 motif disrupting the G4 conformation led to elevation in c-MYC transcription. Together, these data support the model that the presence of stable G4 DNA silences the P1 promoter of c-MYC, possibly by excluding certain transcription factors from binding to this region. Multiple putative G4 motifs are identified at the promoter of KRAS, of which expression level can be repressed by treating with G4-stabilizing ligand TMPyP4 [64]. Another proto-oncogene BCL2 contains multiple G4 motifs at the promoter region [65, 66] as well as regions corresponding to the 3’ UTR [67], which coincides with the major break region (MBR) where a large number of leukemia-associated interchromosomal translocations are found. Putative G4 DNA-forming sequences are also identified at the promoters of VEGF, HIF1a, Ret and c-KIT [62]. When the density of G4 motifs at 55 tumor suppressor genes and 95 proto-oncogenes were compared, a distinct pattern was uncovered. The G4-forming potential is significantly low at the tumor suppressor genes and high at the proto-oncogenes, which suggests possible evolutionary selection of G4 motifs based on the function of the genes [68].

The Clinical Application of G4 DNA

Overall, the expression of a large number of oncogenes appears to be regulated by G4 DNA. Moreover, the transcription at these oncogenes are primarily repressed by the presence of a stable G4 DNA either at the promoter or in the 5’ end of the transcribed area. This finding advanced the idea that G4-stabilizing small chemicals or ligands would be a promising tool to modulate oncogene expression to control cancer growth. TMPyP4 is a cationic porphyrin with a macrocyclic ring structure that binds to and stabilizes G4 DNA [57, 69, 70]. As already described, this compound has been extensively used to show that genes with G4 motifs in the promoter can be modulated by the addition of G4 ligand that stabilizes the otherwise transient G4 structure. Although its photosensitivity and acute cytotoxicity render TMPyP4 unsuitable for clinical application, the derivatives of this macrocyclic porphyrin are being developed as the drug targeting those cancers with overexpression of c-MYC [71]. Another G4 ligand telomestatin is a natural compound with selective binding to G4 DNA whose anti-proliferative activity has proven effective in several different studies (reviewed in [57]). Possible clinical application of G4 binding chemicals is illustrated by the recent finding that the expression of the important DNA repair gene Brca1 in post-mitotic rat neurons is down-regulated by the G4 ligand Pyridostatin (PDS) [72]. The PDS treatment also resulted in a sharp increase in DNA breaks and, compounded by the diminished repair capacity due to Brca1 down-regulation, induced severe neurotoxicity. When the precise mechanism and targets of the G4-mediated gene regulation emerge through future research, the genotoxic effect and the transcriptional repression effect of various G4 ligands could be exploited to achieve synergy in eliminating cancer cells. Another way G4 DNA is being targeted in cancer treatment is through G4 DNA binding proteins. Nucleolin (NCL), whose interaction with G4 DNA at the c-MYC promoter is described above, is often overexpressed and mislocalized in a variety of cancers [36]. In addition to c-MYC, a high affinity binding of NCL to the G4 motifs present at other oncogenes including VEGF and RET has been verified in vitro [73]. The prominence of NCL in cancer biology has led to multiple approaches to target this protein in anti-cancer therapeutics, some of which are currently in clinical trials [74, 75]. A 26 nt long oligo AS1411 (5’-GGTGGTGGTGGTTGTGGTGGTGGTGG-3’) is an aptamer that folds into a very stable G4 DNA structure in vitro and interacts with the RNA binding domain and the RGG repeats of NCL with high affinity [74, 76]. AS1411 was shown to limit growth of cancer cells in cell culture experiments, animal models, and clinical trials. The mechanism by which the interaction between AS1411 and NCL leads to cytotoxic effect in cancer cells is not yet clear. Reports indicate that AS1411-treatment inhibits the proper formation of protein complexes involving NCL and the protein arginine methyltransferase 5 [77]. AS1411- and siRNA-targeted interference with NCL resulted in reduction in the BCL2 protein level in breast or glial cancer cell lines overexpressing NCL [78, 79]. However, the effect on BCL2 level in these cases appears to be independent of the transcriptional regulation by the G4 motifs in the BCL2 promoter but rather related to the normal binding of NCL to the BCL2 mRNA that increases the mRNA half-life. Another factor that further confounds the complicated story of AS1411 is that NCL can function both as the enhancer or repressor of transcription. When AS1411 was added to the human breast cancer cells transformed with a construct designed to measure transcription from the HIV-1 LTR promoter, transcription from LTR promoter was sharply increased, consistent with the report that LTR promoter is repressed by the binding of NCL to G4 DNA [42]. Interfering with the interaction between NCL and G4 DNA present in the promoters of several oncogenes would lead to derepression of the transcription of these genes, which seems counter to the goal of limiting proliferation of cancer cells. In order to uncover the precise mechanism of how AS1411-mediated dysfunction of NCL leads to cancer cell death will require further work that includes comprehensive examination of its effect on the oncogenes with promoter-proximal G4 motifs.

TRANSCRIPTION-ASSOCIATED GENOME INSTABILITY AND G4 DNA

In the human genome, the number of in vivo G4 DNA-forming loci, as determined by the chromatin IP experiment with the G4-specific antibody BG4, was far below the number predicted by the computational sequence analysis [15]. And the number of BG4-binding genomic sites varied depending on the cell type and conditions with significantly more BG4 foci present in the immortalized HaCaT human keratinocytes compared to the normal non-immortalized keratinocytes. The number of BG4 binding sites increased when the chromatin packaging was relaxed by treating the cells with an HDAC inhibitor. The elevation in genome-wide transcription due either to transformation or HDAC-inhibitor treatment apparently shifted the equilibrium toward G4 DNA. Taken together, these data indicate that the level of transcription is an important factor regulating the transition from the canonical double stranded DNA to the G-G base-paired G4 DNA. The G-rich strand of DNA, when folded into the G4 DNA configuration, becomes a replication block in vitro and endangers the genome stability. G4 DNA-associated genome instability, then, will be closely tied not only to the presence of a G4 motif but to the level and orientation of transcription of such sequence. DNA secondary structures in general and G4 DNA in particular as a cause of genome instability has been extensively reported [80]. Research into the mechanism underlying the genome instability induced by triplex or H-DNA forming GAA trinucleotide repeats, hairpin forming CAG repeats, and G4 DNA-forming G-rich sequences has accomplished many recent advances through the genetic experiments carried out in the model eukaryote, S. cerevisiae [81-85]. The cytotoxic effect resulting from DNA strand breaks and genome rearrangement induced by the G4 stabilizing ligands such as TMPyP4, pyridostatin or PhenDC3 has also been extensively reported on [86-88]. In this section of the review, I will specifically focus on the role transcription plays in converting the relatively innocuous G-run containing DNA sequences into the highly unstable genomic loci.

Transcription-associated Genome Instability non-B DNA Secondary Structures

Highly transcribed regions form obstacles to DNA replication and, thus, hotspots of genome instability [89, 90]. In yeast, the natural pause sites of DNA polymerase complex coincide with actively transcribed loci [91], and the elevation of transcription level leads to a corresponding elevation in mutation and recombination rates [92, 93]. The formation of non-B DNA secondary structure is a prominent element of the multi-pronged mechanism underlying this phenomenon of transcription-associated genome instability. Melting of the DNA duplex and the ensuing negative supercoiling, both of which are the obligatory elements of transcription, create conditions conducive for the folding of DNA strands into secondary structures. For G4 DNA-forming sequences, in vitro experiments using a G4 sequence incorporated into a circular plasmid demonstrated that the transcription-associated negative superhelical tension can facilitate the transition of guanine-rich DNA strand into the G4 conformation [94]. Definitive proof of co-transcriptional formation of G4 DNA came with the study of the G-rich immunoglobulin (Ig) switch region sequences cloned into a plasmid; a distinct bubble-like structure comprising G4 DNA was visible under the electron microscope only when the plasmid was transcribed in vitro by T7 RNA polymerase [95]. These bubbles consisted of RNA:DNA hybrid on one side and G4 DNA on the other side. The presence of the RNA:DNA hybrids and G4 DNA were each verified by the sensitivity to RNAse H-treatment and the binding of recombinant, truncated version of high-affinity G4 binding protein NCL, respectively.

G4 DNA as the Genome Instability Hotspot

Even with the plethora of in vitro studies of G4 DNA folding and its stability, the question of whether there is indeed the presence of G4 DNA in vivo stable enough to disrupt DNA replication has long been debated; but supporting evidence is fast emerging. In 2002, it was reported that a mutation in C. elegans Dog-1 gene, which encodes a DEAH DNA helicase related to the mammalian FANCJ, resulted in highly elevated deletions throughout the genome [96]. The loci affected by the Dog-1 mutation uniformly contained guanine runs capable of folding into G4 DNA, suggesting the model where the failure to resolve G4 DNA structures in vivo leads to a significant consequence for the genome maintenance. In the human genome, besides the telomeres, ribosomal DNA arrays, chromosomal fragile sites, and G/C-rich micro- or mini-satellites, G4 DNA are enriched at the chromosomal translocation break points associated with various types of cancers. A recent survey of 19,947 translocations and 46,365 deletions in cancer genomes showed a highly significant enrichment of G4 sequences within 500 bases of the breakpoints [97], confirming the previous conclusion that G4 DNA is a major contributing factor to oncogenic transformation [98]. For blood cancers, nearly 70% of the genes involved in recurrent chromosomal rearrangements contain potential G4 DNA sequences. The BCL2 gene is involved in several different type of chromosomal translocations, most prominent of which is t(14;18) frequently associated with follicular lymphoma. A large majority of the breaks in the BCL2 gene clusters within a 150-nt region corresponding to the 3’UTR; this region is referred to as the major break point region or MBR. The G-run containing sequence identified within MBR (5’-GAAGGAG GGCAGGAGGGCTCTGGGTGGGTCTGT-3’) assumes the G4 DNA conformation in vitro and, when cloned into a plasmid, stalls Taq DNA polymerase [67]. For HOX11 gene, which is involved in t(10,14) translocations found in T cell leukemia, two G4 DNA-forming sequences (5’-GCGCGAGGGAGGGGAGGGGAGGG GGAGAGG-3’ and 5’-AGAAGGGGGAGGGGAGGG AGAGAGG GGGCGCCG-3’) capable of stalling replication and transcription in vitro were confirmed at the breakpoint cluster regions [99]. Within the TCF3 (E2A) gene, involved in the common t(1;19) translocations associated with acute lymphoblastic leukemia, the sites of translocation break points coincide with a high density of G4 motifs [21]. Multiple G4 motifs identified at this break point region vary in the putative loop size with varying potential to form stable G4 DNA. Several oligonucleotides representing the TCF3-specific G4 motifs form G4 DNA were shown to block primer extension by a purified recombinant DNA polymerase in vitro. When a 470 bp fragment consisting of the TCF3 translocation breakpoint region encompassing multiple G4 motifs was integrated into the yeast genome, it induced chromosomal rearrangements in the transcription-dependent manner.

Instability at a G-rich Sequence is Elevated in a Transcription-dependent Manner

Unlike the single-stranded oligo DNA in solution used to demonstrate G4 DNA formation, DNA in the genome is in a stable double-stranded configuration maintained by the hydrogen bonds between two complementary strands. The most significant obstacle to the G-G base pairings that hold the G quartets together in a G4 DNA is the G-C base pairing in the double strand DNA. The predominant model has been that G4 DNA folding is favored during DNA replication when the necessary single strand DNA becomes transiently present through the strand separation that precedes synthesis (Fig. ) [13]. Others, including my own work, have been focused on showing whether transcription can provide the opportunity for the G4 DNA formation and thereby exacerbates genome instability (Fig. ). The guanine-rich non-transcribed strand (NTS) of the repetitive, G/C-rich immunoglobulin heavy-chain (IgH) switch region in mammalian genomes assembles into G4 DNAs when transcribed [95]. A 770-bp fragment of mouse Immunoglobulin (Ig) switch Mu region sequence (Sμ) that contains ~20 repeats of (GAGCT)n GGGGT motif was integrated into the yeast genome and the rate of gene conversion initiating at this construct was measured when it is transcribed or silenced [100]. This fragment was inserted either in the physiological (-GTOP) or in the inverted orientation (-GBTM), placing the guanine runs in the non-transcribed strand (NTS) or in the transcribed strand (TS), respectively. Activated transcription stimulated gene conversion events initiating at the Sμ fragment up to 7-fold with a clear strand bias. A significantly greater increase in recombination rate was observed when the guanine-rich DNA strand was on the NTS as in the GTOP orientation and therefore transiently single-stranded during transcription. When located on the TS, the guanine-runs would be engaged in base-pairing with the nascent RNA strand rather than being available for the G-G interaction necessary for G4 DNA folding. Similar genome instability dependent on the level and orientation of the transcription was observed when a 470 bp fragment consisting of the TCF3 translocation breakpoint region with multiple G4 motifs were integrated into the yeast genome [21]. This strand bias and dependence on active transcription strongly indicates that transcription stimulates genomic instability at G4 motifs by generating single strand DNA predisposed to fold into the DNA secondary structure.

Top1 and G4 DNA-associated Genome Instability

The impact of transcription on the dynamics of the folding of G4 DNA is illustrated by the importance of Top1 in the locus-specific elevation of genome instability at G4 motifs. Top1 is a type IB topoisomerase with a highly conserved role in removing topological stress produced during transcription [101, 102]. Transcription necessitates the strand separation that results in both the positive and negative supercoils ahead and behind of the RNA polymerase complex, respectively [103]. Eukaryotic Top1 is largely responsible for preventing such transcription-associated torsional stress. Upon accumulation of torsional stress associated with transcription, the equilibrium between the DNA conformations can shift toward alternative structures like G4 DNA as well as other non-B DNA such as hairpins, cruciform DNA, and triplex DNA (H-DNA) (reviewed in [104]). Although several reports noted that genome instability in general is not severely affected in yeast cells lacking Top1 [105-107], for the genomic loci prone to forming DNA secondary structures, the stability is acutely hampered by the loss of Top1 activity. In addition to the DNA secondary structures, the accumulation of negative torsional stress behind the RNA polymerase complex can lead to the prolonged and stable association of the nascent RNA with the template DNA strand leading to RNA:DNA hybrids or R-loop formation [108, 109]. While the extensive and stable R-loop formation affords an opportunity for the non-transcribed strand to assume non-B secondary structures, the sequestration of the non-transcribed strand in a stable, non-B DNA conformation makes it inaccessible for pairing with the complementary, transcribed strand and thus supports R-loop stabilization. R-loops and non-B DNA are each favored by the negative helical tension in DNA produced by transcription and, in turn, mutually promote stabilization of each other [89]. Accordingly, Top1-regulated maintenance of the topological homeostasis in transcribed regions is most consequential when there is the potential to form non-B structures such as G4 DNA. When a sequence with multiple runs of guanines is transcribed, the failure to remove the transcription-associated helical stress shifts the equilibrium toward G4 DNA formation, and the subsequent elevation in genome instability is anticipated. In budding yeast, the recombination initiating at a guanine-run containing sequence was elevated when transcribed [100] and significantly exacerbated in the absence of Top1, supporting the model of topology-driven transcription-associated instability [21, 110, 111]. The activation of transcription and disruption of Top1 stimulated recombination only when the guanine runs were on the non-transcribed (non-template) DNA strand; the effect of Top1-loss was minimal when the direction of transcription was reversed so that the guanine runs were on the transcribed (template) strand, likely in stable base-pairing with the nascent transcript [21, 100]. At the actively transcribed G4 sequence, the overexpression of E. coli topA, but not of the gyrase, reduced recombination [111]. The bacterial topA and gyrase each removes only negative and positive helical tensions, respectively. So the above results indicate that it is the Top1-catalyzed removal of negative supercoils that prevents the formation of G4 DNA. Further supporting the model where Top1 reduces the adverse effect of G4 DNA on genome maintenance, the loss of Top1 significantly increased yeast cell sensitivity to the G4-stabilizing ligand TMPyP4 [112]. In addition to indirectly suppressing the G4 DNA formation by relieving transcription-associated negative supercoils, Top1 can bind to G4 DNA with high affinity [113-115]. This property of Top1 explains why a mutation at the catalytic tyrosine (Y727F of yeast Top1) elevates G4-associated genome instability more than does complete deletion of the TOP1 gene [111]. Because the Top1-Y727F protein retains its DNA binding function, its unexpected deleterious effect with regard to G4 stability may be due to the high-affinity interaction between Top1 and G4 DNA. Top1 is an historically important target of anti-cancer therapy, and thus defining how the G4 binding properties of Top1 affects the genome instability is an important issue to be addressed. One of the interesting directions of further research suggested by the specific interaction of Top1 with G4 DNA is the feasibility of developing G4-forming oligonucleotides as Top1 inhibitors that might be effective in a combinatorial therapy with G4 ligands. Co-transcriptionally formed G4 DNA is subject to both positive and negative regulation through interaction with additional factors. First, those factors involved in resolving the G4 DNA structure to restore the double-stranded B DNA configuration will effectively suppress genome instability induced by the co-transcriptionally formed G4 DNA. DNA helicases that can efficiently unwind G4 DNA are canonical examples of the negative regulators of G4 DNA [5]. Sub1, a general transcription factor and single strand DNA binding protein, is a newly identified G4 DNA binding factor in yeast that specifically interacts with co-transcriptionally formed G4 DNA and helps to suppress genome instability [33, 112]. The human homolog of Sub1, formerly referred to as PC4 and now as hSub1, functionally complements the loss of yeast Sub1; it suppresses the elevated recombination at the highly transcribed G4 sequence [112]. Physical interaction with G4 DNA as well as interaction with the G4 DNA-unwinding DNA helicase Pif1 is required for Sub1 to suppress the genome instability associated with the actively transcribed G4 sequence, leading us to hypothesize that the main role of Sub1 is to stimulate the recruitment of Pif1 helicase to the co-transcriptionally formed G4 DNA. For the aforementioned Top1-Y727F mutant protein, the interaction with co-transcriptionally formed G4 DNA produces the reverse effect compared to Sub1 and aggravates genome instability, possibly by stabilizing the G4 structure and creating a strong protein-DNA complex as a road block to DNA polym- erases [111]. Similarly, for the ubiquitous G4 DNA binding protein nucleolin, the aggravated genome instability at the actively transcribed G4 motif results upon interaction with G4 DNA (S. Singh and N. Kim, unpublished results).

G4 DNA and Immnuglobulin Class Switch Recombination

The transformation of G-rich sequence into G4 DNA plays an important role in the critical molecular changes required for B lymphocyte development. Immunoglobulin (Ig) genes encoding antigen receptor proteins or antibodies undergo several molecular rearrangements during the course of B lymphocyte development: V(D)J Recombination, Somatic Hypermutation (SHM) and Class Switch Recombination (CSR) [116]. CSR is activated at the cellular level by the antibody encounter with a cognate antigen and is initiated at the molecular level by AID (Activation-Induced cytosine Deaminase)-catalyzed conversion of cytosine to uracil in switch-region sequences (e.g., Sµ, Sγ, Sα) [117]. The overall outcome of CSR is the recombination between two switch sequences along with deletion of the sequence in between. Functionally, CSR changes the constant region domain of antibody molecules that interacts with downstream effectors, while the antigen-recognizing variable region remains unaltered. A defect in CSR (e.g. due to mutated AID) manifests as a hyper-IgM syndrome characterized by acute susceptibility to opportunistic infections [118, 119]. In addition to being required for CSR, Ig switch region sequences are involved in recurrent chromosomal translocations observed in a large fraction of human multiple myelomas and Burkitt’s lymphomas [120, 121]. There is a germline promoter upstream of each switch-region sequence [122], and its activation produces sterile (non-protein coding) transcripts required for CSR. Transcription of the switch regions but not the specific germline promoters are important for CSR since transcription from heterologous promoters is sufficient to support CSR [123]. Elevation in the topological strain due to the activated transcription through the switch regions serves two putative functions: first, to more efficiently target AID, a single strand DNA-specific deaminase, to cytosines on both strands of the switch region and second, to increase R-loop formation and stabilize the single-stranded character of the non-transcribed strand. Extensive R-loop formation at activated switch regions in B lymphocytes as well as during in vitro transcription has been reported [124-126]. Switch-region sequences, although unique for each re- gions, are invariably guanine-rich and have the potential to form G4 DNA, with the G-rich sequences located asymmetrically on the non-transcribed strand [95, 127]. When these sequences are transcribed either in vitro or in E. coli cells, complex structures consisting of non-transcribed strand G4 DNA and a hybrid between the transcribed strand and the nascent RNA are visible by electron microscopy [95]. Transcription of the switch regions in antigen-activated B cells will also likely promote both R-loop and G4-DNA formation. Recent reports of the involvement of Top1 in regulating CSR effectively corroborate the model where the transcription-driven topological stress is necessary for CSR. These studies showed that Top1 expression is down-regulated in B lymphocytes undergoing CSR [128], and further depletion of Top1 using siRNA results in significantly elevated CSR [129]. As described earlier, the disruption in Top1 activity, combined with active transcription from the switch region germline promoters, would lead to even greater accumulation of negative torsional stress that stimulates the formation of R-loop and G4 DNA. However, whether the co-transcriptionally formed G4 DNA at the switch region simply functions to stabilize the R-loop or serves a more significant role in CSR is still under dispute. One proposed function for G4 DNA in CSR process is to serve as a platform for proteins necessary for the synapsis of two switch regions [130]. MutSα complex, which normally functions in the recognition of mismatched DNA base pairs, associates with the switch region in activated B cells through G4 DNA-binding and bridges the interaction of two G4 DNA-containing DNA fragments proximal to each other.

G4 DNA and the Antigenic Variation in Microbial Pathogens

Co-transcriptional formation of G4 DNA is thought to play an important role in the host-pathogen interaction. Several microorganisms have been shown to take an advantage of the programmed recombination to promote continual changes in the amino acid sequence of certain surface exposed immunogenic proteins. This process, termed antigenic variation (Av), in a way invoking the recombination/mutagenesis at Ig loci resulting in the antibody diversification, results in the evasion of recognition by the host adaptive immune system. In the human pathogen Neisseria gonorrhoeae, the single expressed cassette encoding the pilin protein (pilE) is modified through the RecA-dependent gene-conversion recombination using multiple silent pseudo pilin cassettes located upstream [131, 132]. The genetic tractability of this organism allowed the identification of a 16-nucleotide long G4 motif (5’ GGGTGG GTTGGGTGGG 3’) located just upstream of the transcription start site for pilE that is required for the Av-recombination to proceed. Mutations that would disrupt the putative G4 DNA structure severely diminished the Av efficiency indicating that its capability to fold into a G4 DNA is required for the recombination to occur. Further work showed that the transcription of the region containing this G4 motif, specifically in the orientation that would place the G-runs on the non-transcribed strand, is also necessary for the efficient Av [132]. The mechanism proposed is very similar to the Ig CSR; transcription in that particular orientation would generate the RNA:DNA hybrid on the C-rich transcribed strand, which in turn facilitates the folding of the G-rich non-transcribed strand into the G4 DNA structure. Subsequently, recombination is thought to initiate by the yet uncharacterized step that could involve a G4 DNA-specific endonuclease. In Borrelia burgdorferi, the pathogenic bacteria causing Lyme disease, the sequence of a surface-exposed protein of unknown function VlsE is modified while the bacteria are maintained in the blood stream of the mammalian host [133]. This modification and resulting immune evasion appears to be very similar in molecular mechanism to what has been uncovered in N. gonorrhoeae; a single expressed cassette is modified through one or more rounds of gene conversion recombination using one of the many silent cassettes as the template [134, 135]. Furthermore, putative G4 motifs were identified within the transcribed VlsE gene in the regions flanking the recombination-prone variable region [136]. The G4 motifs are also present in the VlsE gene of two other related Borrelia species with distinct strand bias for the G-runs on the non-transcribed strand. Although genetic or molecular evidence demonstrating its requirement in the recombination process is yet to come forward, the conservation and the strand bias of the G4 motifs in VlsE gene strongly suggest the role of co-transcriptionally formed G4 DNA in mediating Av-driving recombination process in the Borrelia species.

CONCLUSION AND PERSEPECTIVES

Among the various non-B DNA structures that are recently being investigated as the significant instigators of genome instability, G4 DNA is uniquely implicated as a key element of the deliberate, regulated recombination programs such as the vertebrate Ig CSR and bacterial Av. As reviewed here, the findings from the yeast genetic assays utilizing the G4 forming sequences embedded in the recombination reporter constructs, together with the molecular details uncovered regarding the mechanisms CSR and Av recombination processes, revealed the critical role of transcription in transforming G4 motifs into genome instability hotspot. G-run containing sequences manifest as genome instability hotspots when the local topological stress, generated by transcription and exacerbated upon the disruption of Top1 activity, stimulates the folding of G4 DNA structure as well as the accumulation of R-loops. These steps starting from transcription activation to the changes in the genome, deleterious or advantageous to cells, are likely to have significant clinical implications. For genetic changes underlying the cellular transformation into tumors, the role of transcription in initiating the chromosomal translocations at the recurrent breakpoints containing G4 motifs, with the exception of Ig switch regions, has not yet been clarified and should be further investigated. The link between transcription and genome instability is also relevant for those neurological disorders linked to the unstable repetitive DNA sequences. For Huntington’s disease, the role of transcription in elevating the pathogenic expansion of the hairpin forming (CAG)n trinucleotide repeats is strongly substantiated by experiments carried out in cultured cells [137, 138]. Similar mechanism is expected for the expansion of (CGG)n trinucleotide repeats associated with Fragile X syndrome [139] or (GGGGCC)n hexanucleotide repeats associated with Amyotrophic Lateral Sclerosis (ALS) [140]. Both (CGG)n and (GGGGCC)n repeats presumably form G4 DNA in vivo. Independent of the key role played by transcription in G4 DNA-mediated genome instability, the role of G4 DNA as a significant regulator of transcription has been under intense scrutiny. Until recently, this very interesting concept of gene expression differentially regulated when the double-strand guanine-run containing DNA transitions to the G-G base-paired G4 DNA was chiefly based on the computational analyses of genome sequences. In silico analyses consistently demonstrated that the G4-forming sequences are notably enriched proximal to gene promoters and transcription start sites. Additional support for the concept came from studying the effect of G4 ligands or G4-resolving helicases in modulating transcription. Many unresolved questions regarding the mechanistic details of the non-genetic change in the DNA conformation leading to the changes in gene expression still remain and are complicated by some inconsistent results. Important remaining questions with clinical implications include whether G4 DNA regulates transcription directly (e.g. via nucleosome exclusion) or indirectly (e.g. via interaction with transcription repressor/activator). In this review, two distinct approaches to exploiting the role of G4 in transcriptional regulation for clinical application were described. First, the plethora of small-molecule G4-binding ligands variably affect the genomic stability and transcription of the G4 motif-containing loci. Better understanding of the exact targets of these chemicals will potentially lead to the improved anti-cancer therapy synergistically combining the down-regulation of DNA repair and the elevation of DNA breaks. Second, targeting of proteins such as nucleolin and topoisomerase I using G-rich aptamers exploits the high affinity binding of these factors to G4 DNA and will benefit from further study into the binding specificities and the down-stream functions of G4-bound nucleolin or topoisomerase I. Latest advances in the field, such as the new tools to investigate G4 DNA in vivo (i.e. G4-specific antibodies) and the identification of novel G4 DNA-interacting proteins, are expected to soon lead to more answers and more advances in clinical application of G4 DNA.
  140 in total

Review 1.  Mechanisms of chromosomal translocations in B cell lymphomas.

Authors:  R Küppers; R Dalla-Favera
Journal:  Oncogene       Date:  2001-09-10       Impact factor: 9.867

2.  Direct evidence for a G-quadruplex in a promoter region and its targeting with a small molecule to repress c-MYC transcription.

Authors:  Adam Siddiqui-Jain; Cory L Grand; David J Bearss; Laurence H Hurley
Journal:  Proc Natl Acad Sci U S A       Date:  2002-08-23       Impact factor: 11.205

3.  Genetic requirements for spontaneous and transcription-stimulated mitotic recombination in Saccharomyces cerevisiae.

Authors:  Jennifer A Freedman; Sue Jinks-Robertson
Journal:  Genetics       Date:  2002-09       Impact factor: 4.562

4.  Interaction of human nuclear topoisomerase I with guanosine quartet-forming and guanosine-rich single-stranded DNA and RNA oligonucleotides.

Authors:  Christophe Marchand; Philippe Pourquier; Gary S Laco; Naijie Jing; Yves Pommier
Journal:  J Biol Chem       Date:  2001-12-26       Impact factor: 5.157

5.  High affinity interactions of nucleolin with G-G-paired rDNA.

Authors:  L A Hanakahi; H Sun; N Maizels
Journal:  J Biol Chem       Date:  1999-05-28       Impact factor: 5.157

6.  Class switch recombination and hypermutation require activation-induced cytidine deaminase (AID), a potential RNA editing enzyme.

Authors:  M Muramatsu; K Kinoshita; S Fagarasan; S Yamada; Y Shinkai; T Honjo
Journal:  Cell       Date:  2000-09-01       Impact factor: 41.582

7.  Interaction of human DNA topoisomerase I with G-quartet structures.

Authors:  P B Arimondo; J F Riou; J L Mergny; J Tazi; J S Sun; T Garestier; C Hélène
Journal:  Nucleic Acids Res       Date:  2000-12-15       Impact factor: 16.971

8.  Disruption of dog-1 in Caenorhabditis elegans triggers deletions upstream of guanine-rich DNA.

Authors:  Iris Cheung; Michael Schertzer; Ann Rose; Peter M Lansdorp
Journal:  Nat Genet       Date:  2002-07-08       Impact factor: 38.330

9.  Activation-induced cytidine deaminase (AID) deficiency causes the autosomal recessive form of the Hyper-IgM syndrome (HIGM2).

Authors:  P Revy; T Muto; Y Levy; F Geissmann; A Plebani; O Sanal; N Catalan; M Forveille; R Dufourcq-Labelouse; A Gennery; I Tezcan; F Ersoy; H Kayserili; A G Ugazio; N Brousse; M Muramatsu; L D Notarangelo; K Kinoshita; T Honjo; A Fischer; A Durandy
Journal:  Cell       Date:  2000-09-01       Impact factor: 41.582

10.  Role for nucleolin/Nsr1 in the cellular localization of topoisomerase I.

Authors:  T K Edwards; A Saleem; J A Shaman; T Dennis; C Gerigk; E Oliveros; M R Gartenberg; E H Rubin
Journal:  J Biol Chem       Date:  2000-11-17       Impact factor: 5.157

View more
  31 in total

Review 1.  Non-canonical DNA/RNA structures during Transcription-Coupled Double-Strand Break Repair: Roadblocks or Bona fide repair intermediates?

Authors:  Nadine Puget; Kyle M Miller; Gaëlle Legube
Journal:  DNA Repair (Amst)       Date:  2019-07-08

Review 2.  Molecular Probes, Chemosensors, and Nanosensors for Optical Detection of Biorelevant Molecules and Ions in Aqueous Media and Biofluids.

Authors:  Joana Krämer; Rui Kang; Laura M Grimm; Luisa De Cola; Pierre Picchetti; Frank Biedermann
Journal:  Chem Rev       Date:  2022-01-07       Impact factor: 60.622

3.  Epigenomic features of DNA G-quadruplexes and their roles in regulating rice gene transcription.

Authors:  Yilong Feng; Shentong Tao; Pengyue Zhang; Francesco Rota Sperti; Guanqing Liu; Xuejiao Cheng; Tao Zhang; Hengxiu Yu; Xiu-E Wang; Caiyan Chen; David Monchaud; Wenli Zhang
Journal:  Plant Physiol       Date:  2022-03-04       Impact factor: 8.340

4.  Assessing the Potential for DNA Quadruplex Formation in the Predatory Bacterium Bdellovibrio bacteriovorus.

Authors:  Lucille H Tsao; Sally Shepardson-Fungairiño; Hikari Murayama; Amelia Cecere; Elizabeth Wren; Megan Núñez
Journal:  Biochemistry       Date:  2022-09-15       Impact factor: 3.321

5.  High-throughput characterization of the role of non-B DNA motifs on promoter function.

Authors:  Ilias Georgakopoulos-Soares; Jesus Victorino; Guillermo E Parada; Vikram Agarwal; Jingjing Zhao; Hei Yuen Wong; Mubarak Ishaq Umar; Orry Elor; Allan Muhwezi; Joon-Yong An; Stephan J Sanders; Chun Kit Kwok; Fumitaka Inoue; Martin Hemberg; Nadav Ahituv
Journal:  Cell Genom       Date:  2022-03-15

Review 6.  Action and function of helicases on RNA G-quadruplexes.

Authors:  Marco Caterino; Katrin Paeschke
Journal:  Methods       Date:  2021-09-10       Impact factor: 4.647

7.  G-quadruplex structural variations in human genome associated with single-nucleotide variations and their impact on gene activity.

Authors:  Jia-Yuan Gong; Cui-Jiao Wen; Ming-Liang Tang; Rui-Fang Duan; Juan-Nan Chen; Jia-Yu Zhang; Ke-Wei Zheng; Yi-de He; Yu-Hua Hao; Qun Yu; Su-Ping Ren; Zheng Tan
Journal:  Proc Natl Acad Sci U S A       Date:  2021-05-25       Impact factor: 11.205

8.  Double-stranded flanking ends affect the folding kinetics and conformational equilibrium of G-quadruplexes forming sequences within the promoter of KIT oncogene.

Authors:  Guglielmo Vesco; Marco Lamperti; Domenico Salerno; Claudia Adriana Marrano; Valeria Cassina; Riccardo Rigo; Enrico Buglione; Maria Bondani; Giulia Nicoletto; Francesco Mantegazza; Claudia Sissi; Luca Nardo
Journal:  Nucleic Acids Res       Date:  2021-09-27       Impact factor: 16.971

9.  Differential responses of neurons, astrocytes, and microglia to G-quadruplex stabilization.

Authors:  Natalie Tabor; Conelius Ngwa; Jeremie Mitteaux; Matthew D Meyer; Jose F Moruno-Manchon; Liang Zhu; Fudong Liu; David Monchaud; Louise D McCullough; Andrey S Tsvetkov
Journal:  Aging (Albany NY)       Date:  2021-06-19       Impact factor: 5.682

10.  Constrained G4 structures unveil topology specificity of known and new G4 binding proteins.

Authors:  A Pipier; A Devaux; T Lavergne; A Adrait; Y Couté; S Britton; P Calsou; J F Riou; E Defrancq; D Gomez
Journal:  Sci Rep       Date:  2021-06-29       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.