Literature DB >> 22016335

CpG deamination creates transcription factor-binding sites with high efficiency.

Tomasz Zemojtel1, Szymon M Kielbasa, Peter F Arndt, Sarah Behrens, Guillaume Bourque, Martin Vingron.   

Abstract

The formation of new transcription factor-binding sites (TFBSs) has a major impact on the evolution of gene regulatory networks. Clearly, single nucleotide mutations arising within genomic DNA can lead to the creation of TFBSs. Are molecular processes inducing single nucleotide mutations contributing equally to the creation of TFBSs? In the human genome, a spontaneous deamination of methylated cytosine in the context of CpG dinucleotides results in the creation of thymine (C → T), and this mutation has the highest rate among all base substitutions. CpG deamination has been ascribed a role in silencing of transposons and induction of variation in regional methylation. We have previously shown that CpG deamination created thousands of p53-binding sites within genomic sequences of Alu transposons. Interestingly, we have defined a ∼30 bp region in Alu sequence, which, depending on a pattern of CpG deamination, can be converted to functional p53-, PAX-6-, and Myc-binding sites. Here, we have studied single nucleotide mutational events leading to creation of TFBSs in promoters of human genes and in genomic regions bound by such key transcription factors as Oct4, NANOG, and c-Myc. We document that CpG deamination events can create TFBSs with much higher efficiency than other types of mutational events. Our findings add a new role to CpG methylation: We propose that deamination of methylated CpGs constitutes one of the evolutionary forces acting on mutational trajectories of TFBSs formation contributing to variability in gene regulation.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 22016335      PMCID: PMC3228489          DOI: 10.1093/gbe/evr107

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

CpG Deamination and Evolution of Transcription Factor–Binding Sites

The creation of transcription factor–binding sites (TFBSs) is a fundamental evolutionary process shaping gene regulatory networks. Two mechanisms have been recognized to facilitate generation of new TFBSs: insertional activity of transposable elements resulting in spreading of TFBSs (Bourque 2009) and mutational events. Some insight into the robustness of the latter mechanism has been gained in the context of evolutionary studies reporting widespread turnover of TFBSs in mammalian genomes (Dermitzakis and Clark 2002; Horvath et al. 2007; Odom et al. 2007; Kasowski et al. 2010). However, up to this point, the unequal contribution of molecular processes inducing single nucleotide mutational events in DNA has not been taken into account when deciphering evolutionary trajectories leading to creation of TFBSs. In vertebrates, the mutation of cytosine to thymine (C → T) in the context of CpG dinucleotides has the highest rate among all base substitutions and is reflected in the ongoing genomic depletion of CpG dinucleotides. On the molecular level, this is triggered by methylation of cytosine followed by a spontaneous deamination event creating TpG or CpA dinucleotides (Razin and Riggs 1980). We have previously shown that methylation and deamination of CpGs embedded within Alu transposons in the human genome resulted in generation of thousands of p53-binding sites with the preferred core motif composed of CpA and TpG dinucleotides (Zemojtel et al. 2009). We reasoned that this phenomenon should not be limited to Alu sequences and thus were expecting to gather evidence that CpG deamination has created multiple binding sites in multiple regions of the human genome. Second, as CpG deamination–driven mutagenesis is characterized by the highest rate among all substitutions, we wanted to learn if it is more efficient in creating binding sites than the nondeamination-driven mutagenesis. How can we define a proper creation efficiency measure in this context? Here, we address these issues and provide evidence that CpG deamination is indeed the important evolutionary process impacting on TFBS formation.

Materials and Methods

Ancestral Sequence Reconstruction

We reconstructed human and chimp ancestral sequences extending a model developed by Arndt and Hwa (2005). Initially, we downloaded multiple alignments of vertebrate genomes to hg18 human genome from the University of California–Santa Cruz (UCSC) Genome Bioinformatics Site. In the analysis, we considered only alignments which contained aligned sequences of all three species: human, chimp, and rhesus. We extracted alignment fragments overlapping the putative promoter regions and the in -vivo bound regions. We inferred the nucleotide substitution frequencies along the two terminal branches from the humanchimp ancestor to human and chimp as well as the trinucleotide distribution for the humanchimp ancestor using the maximum likelihood-based method developed by Duret and Arndt (2008). The nucleotide substitution frequencies were then used to compute the transition probabilities of any trinucleotide at the humanchimp ancestor into a trinucleotide in human and likewise in chimp. Given the states of 5′ and 3′ adjacent nucleotides in human and their orthologous partners in chimp and rhesus, we could then infer the ancestral nucleotide distribution of the nucleotide in the middle position. We reconstructed ancestral nucleotides according to this distribution.

Prediction of TFBS

Position frequency matrices (PFMs) available in the Transfac 9.4 database as well as de novo profiles constructed from ChIP-Seq data (Celniker et al. 2009; Kunarso et al. 2010) were used to model binding of transcription factors.

Predictions with PFMs

We predicted binding sites using the method described by Rahmann et al. (2003). First, based on a positional frequency matrix, we calculated a corresponding score matrix and a cutoff threshold. Next, we scanned both strands of a DNA sequence and checked whether it contained a motif of a score at least equal to the threshold. We normalized the thresholds for each matrix expecting the same number of false positives within studied sequence regions.

Predictions with IUPAC Consensus Sequence

In order to identify binding sites of c-Myc transcription factor in genome-wide ChIP-Seq regions (Zeller et al. 2006), we used a consensus-match strategy. For a site prediction, we demanded a perfect match to one of the two c-Myc consensuses: CACGTG (canonical motif) and CACATG or CATGTG (noncanonical).

Identification of Potential Creation Events

We scanned the ancestral sequences in order to identify “potential creation events,” that is, all sequence locations where a single nucleotide mutation could lead to creation of a new TFBS. Specifically, the latter is achieved by enumerating all possible single nucleotide mutations in the ancestral sequence and then checking whether a new binding site is predicted for the mutated sequence.

Estimation of Number of TF Recognizing Products of Deamination

Based on PFMs, we estimated the fraction of TFs recognizing products of deamination. For each of 217 Transfac PFMs, we calculated how many CA or TG dinucleotides will be on average present in the predicted motifs, using the formula: ∑i {f[ i,C] * f[ i + 1, A] + f[ i,T] * f[i + 1, G]}, where f(i, N) denotes the frequency of observation of nucleotide N at position i within the PFM. We observed that on average at least 85% of PFMs recognize a motif with CA or TG dinucleotides.

Classification of Putative Promoter Regions

We defined putative promoter regions based on the gene coordinates provided by EnsEMBL database (version 47). After Weber et al. (2007), we extracted regions from 700 bp upstream to 200 bp downstream around transcription start sites of the genes. We classified the promoters into two groups: high CpG set (HCP) and low CpG set (LCP) depending on their sequence composition. We adopted a classification introduced by Weber et al. (2007). That is, a promoter is added to the HCP if it contains a fragment of 500 bp with GC content larger than 0.55 and ratio of observed to expected CpG dinucleotides higher than 0.75. Moreover, if the ratio is never larger than 0.48 for any 500 bp fragment of a promoter, it is classified in the LCP.

Annotation of Known TF-Binding Motifs in Putative Promoter Regions

Due to redundancies observed in the Transfac database, we used a subset of available vertebrate PFMs. For each transcription factor which has more than a single PFM assigned, we chose only the PFM with the highest information content. This selection process resulted in a set of 217 PFMs used in our study. We annotated this set of TF-binding motifs in the putative promoter regions.

Predicted Creation Events in the LCP and HCP Promoters

We provide two text files listing predicted TFBS creation events in the LCP and HCP promoters (supplementary text file S1 and S2, Supplementary Material online). The data columns contain the following information: name of a transcription factor, genomic location of the new site in the human genome (hg18), the reconstructed ancestral sequence at the location, the corresponding sequence in human hg18 genome, a list of differences/mutations between the ancestral sequence and the corresponding human sequence, and the list of the differences that are interpreted as deamination events. Finally, the last column describes potential single mutation events, which when occurred in the ancestral sequence could result in a binding site. Events marked with a star are interpreted as deamination events.

Analyses of In Vivo Bound Sequences of Oct-4, Nanog, Ctcf, and c-Myc

Genome-wide regions bound in human embryonic stem cells by Oct-4, Nanog, and Ctcf TFs have been identified in the ChIP-Seq experiments described in Kunarso et al. (2010). In total, respectively, 2.1 Mbase, 7.6 Mbase, and 7.5 Mbase of sequence alignable to chimp and rhesus genomes have been studied (supplementary table 2, Supplementary Material online). Genome-wide regions bound by the c-Myc TF in a model human B cell line, P493 (Zeller et al. 2006) have been downloaded from the UCSC database (table wgEncodeGisChipPetMycP493 for the human genome version hg18). We studied only regions identified by at least three independent ChIP-PET tags. It has been demonstrated in 29 ChIP-qPCR assays that clusters associated with at least three ChIP-PET tags (PET-3+) always corresponded to Myc binding (Zeller et al. 2006). In total, we analyzed 1168 regions, which represented 1.26 Mbp of sequence alignable to chimp and rhesus genomes.

Analyses of In Vivo Bound Sequences of TFs from ENCODE Consortium Data Set

We downloaded from the UCSC genome browser database the genome wide–binding locations of TFs identified in conducted by the ENCODE consortium ChIP-Seq experiments (in human cell lines) (Celniker et al. 2009). We used our pipeline to calculate binding site creation efficiencies for these TFs. In order to reduce estimation error of creation efficiencies, we selected only those experiments for which the number of identified deamination-driven creation events or nondeamination-driven creation events was at least 5. This resulted in evolutionary analysis of nine TF-binding data sets (wgEncodeHudsonalphaChipSeqPeaks tracks: Rep2Gm12878Irf4, Rep1Gm12878Nrsf, Rep2Gm12878Pax5n19, Rep2Gm12878Pbx3, Rep2Hepg 2RxraPcr1x, Rep1K562Usf1, and wgEncodeYaleChIPseqPeaks tracks: K562Nfya, K562Nfyb, and K562bYy1). The characteristics of the ENCODE ChIP-Seq regions are reported in (supplementary table 2, Supplementary Material online). We used MEME algorithm to identify motifs recognized by the TFs. The identified motifs were matched by these previously published in the Transfac database (Transfac ID numbers: Irf4: M00972, Nfy: M00209, NRSF: M01028, Pax5: M00143, Pbx3: M00998, Rxra: M00518, Usf1: M00121, and YY1: M00069).

Results/Discussion

Identification of TFBS Creation Events

In order to reveal the role of CpG deamination in TFBS formation, we were interested in tracing the mutational events that led to the creation of TFBSs in the human promoters after the split of human and chimp. Thus, we reconstructed the sequences of the humanchimp ancestor promoters employing multiple sequence alignments of human, chimp, and rhesus (see Material and Methods). In this approach, rhesus constitutes an outgroup to infer the directionality of mutations. Using 217 TF matrices obtained from the TRANSFAC database (constituting a diverse repertoire of motifs recognized by transcription factors, see Materials and Methods), we annotated binding sites in the human promoters and in the reconstructed humanchimp ancestor promoters (see Material and Methods). Of all annotated TFBSs in the human promoters, ∼96.5% also get annotated at the orthologous positions in the ancestral sequence. The remaining ∼3.5% (48861) of sites comprises a group of candidates for TFBSs created in the human promoters since the split of human and chimp lineages. Out of those putative creation events, the vast majority (∼96%) was due to a single nucleotide mutation event and thus we focused on such creation events in our further analysis. We further subdivided the single nucleotide creation events into two groups. These are, likely “CpG deamination–driven creation events” encompassing CpG → CpA or TpG mutational events, accounting for ∼22% of all single nucleotide creation events and “nondeamination-driven creation events” harboring all other mutational events.

High Efficiency of CpG Deamination–Driven Creation Events in Human Promoters

On the first look, the CpG deamination–driven creation events might seem to constitute a rather modest fraction of all creation events specific to the human lineage. However, in order to interpret this result, the following has to be taken into account. The number of TFBSs created by CpG deamination events after the split of human and chimp lineages depends primarily on the following two factors: The mutational rate of CpGs. The number and the composition of the CpG-containing motifs residing in the humanchimp ancestral sequence. In connection with point 2, it has to be emphasized that CpG dinucleotides are underrepresented in the mammalian genomes, including promoter regions (Saxonov et al. 2006). We have thus set out to derive a new measure of a propensity of CpG deamination events to create binding sites that account for above factors. First, building on our observation that the vast majority (96%) of creation events in the human lineage were single nucleotide mutation events and point 2 above, we annotated all single nucleotide mutation events that could potentially lead to creation of a TFBS within the reconstructed humanchimp ancestral sequence. Any such mutation event, we call a “potential TFBS creation event.” Depending on their origin, we distinguish two classes of potential TFBS creation events: potential CpG deamination–driven creation events (defined as mutations of Cs within CpGs) and potential nonCpG deamination–driven creation events (encompassing all the remaining mutations). Clearly, only a fraction of all annotated potential creation events in the ancestral sequence occurred after the split of human and chimp lineages. We compute this fraction by introducing the measure of creation efficiency (CE). We define CEdeam for CpG deamination–driven creation events as the ratio of the number of CpG deamination–driven creation events that occurred after the split of human and chimp to the number of all potential CpG deamination–driven creation events in the ancestral sequence. Analogously, we define CEnondeam for nonCpG deamination–driven creation events. How efficient are CpG deamination–driven creation events in comparison with nondeamination–driven creation events? In order to answer this question, we compute for each of 217 TFs (represented by a set of 217 TF matrices) two types of CE: CEdeam and CEnondeam. In addition, we determine average CEdeam and CEnondeam values over all 217 TFs. We performed the calculation separately on two distinct classes of human promoters (Saxonov et al. 2006) characterized by low (LCPs) and high contents of CpG dinucleotide (HCPs). Importantly, LCPs were shown to be mostly methylated, whereas the vast majority of HCPs are hypomethylated in somatic cells (Weber et al. 2007). As depicted in figure 1, the average CEdeam value in LCPs was ∼0.04 meaning that on average 4 of 100 potential CpG deamination–driven creation events took place in LCPs. This was ∼9-fold higher than in HCPs where the average CEdeam value was ∼0.005. In contrast, the average CEnondeam value was approximately equal in HCPs and LCPs (fig. 1). In LCPs, CpG deamination–driven creation events occurred ∼28-fold more frequently than nondeamination-driven creation events, and in HCPs, we obtained a value of ∼3.4-fold enrichment. These data are compatible with a ∼20.4-fold higher rate of CpG mutation (into products of deamination CpA or TpG) in LCPs when compared with HCPs (not shown). In line with these observations, it has been established that in germ line cells (the cells contributing genetic material to the next generation), CpGs are constitutively methylated in LCPs and, in contrast, CpGs are largely unmethylated in HCPs (Weber et al. 2007).
F

CpG deamination creates TFBSs with high efficiency in human promoters. Each point represents a CE value calculated for each of 217 TFs. The horizontal axis represents CE values recorded in HCPs, and the vertical axis represents CE values obtained in LCPs. Average CE values for CpG deamination–driven creation events (CEdeam) and nondeamination–driven creation events (CEnondeam) are highlighted in bold red and blue digits, respectively.

CpG deamination creates TFBSs with high efficiency in human promoters. Each point represents a CE value calculated for each of 217 TFs. The horizontal axis represents CE values recorded in HCPs, and the vertical axis represents CE values obtained in LCPs. Average CE values for CpG deamination–driven creation events (CEdeam) and nondeamination–driven creation events (CEnondeam) are highlighted in bold red and blue digits, respectively. These results strongly indicate that CpG deamination–driven depletion of CpG dinucleotide, which has been recognized to contribute to variation in regional methylation (Feinberg and Irizarry; Kerkel et al. 2008), serves as an efficient mechanism for generation of new TFBSs.

CpG Deamination Creates In Vivo Binding Sites

Since the evolutionary analysis of binding sites annotated in the human promoter regions revealed a remarkable efficiency of CpG deamination–driven creation events, we were interested if the same could be observed for TFBSs detected in “genome-wide” experiments. Interestingly, we found that >85% of 217 TF-binding matrices recognize on average at least one product of deamination, CpA or TpG (Materials and Methods) and reasoned that a large number of TFs is capable of interacting with these dineuclotides in vivo. We identified 13 TF-binding data sets generated in genome-wide ChIP-Seq experiments in human cells (in large part produced by the ENCODE consortium; Celniker et al. 2009, table 1) for which we could reliably estimate creation efficiencies (see Materials and Methods). The 13 TFs belong to the following major classes: helix-loop-helix/leucine zipper, helix-turn-helix/homeodomain/Paired box/Tryptophan clusters, Cys2His2 zinc finger domain, Cys4 zinc finger of nuclear type, and histone fold class. The values of CEdeam for the 13 TFs were ∼4-fold to ∼25-fold higher (and on average ∼11-fold higher) than the corresponding values of CEnondeam (table 1). The latter values fall in the range observed for the annotated binding sites in the promoter regions (fig. 1). On average, per TF, CpG deamination–driven creation events comprised ∼25% of all binding site creation events (table 1). This all indicated that CpG deamination events strongly contributed to creation of in vivo TFBSs.
Table 1

Numbers of Created In Vivo Binding Sites and Corresponding Creation Efficiencies

Factord+d−Nond+Nond−CEdeamCEnondeamCEdeam/CEnondeamP value% of d+
c-Myc_canonical51871340500.0260420.0032008.09.4E−427.8
c-Myc_noncanonical2697621117060.0259480.00179114.53.4E−1755.3
Ctcf9033568321995480.0261170.0041526.35.2E−409.8
Irf4247693271361330.0302650.00239612.61.9E−186.8
Nanog70972246917940.0671780.00267225.12.2E−11122.2
Nfya21120750279060.0171010.0017889.67.1E−1329.6
Nfyb22141770342350.0152880.0020417.59.4E−1223.9
Nrsf51951042070.0250000.00237110.53.5E−433.3
OCT43338299241230.0795180.00403919.72.4E−2925.0
Pax58393212441466970.0088260.0016615.31.4E−2925.4
Pbx31060935152170.0161550.0022947.06.1E−622.2
Rxra26129983552980.0196220.00149813.14.0E−1923.8
Usf152506272177010.0931900.0151346.22.1E−2316.0
YY18135427192030.0058740.0014044.21.3E−322.9

NOTE.—The following values are listed: the numbers of observed new site creations due to CpG deamination events (d+) and nondeamination events (nond+); the numbers of potential deamination and nondeamination-driven creation events which could lead to site creation but have not been observed (d− and nond−, respectively); creation efficiencies for deamination-driven creation events (CEdeam), nondeamination-driven creation events (CEnondeam), and percentages of deamination-driven creation events (% of d+). The P value was calculated with Fisher’s exact test.

Numbers of Created In Vivo Binding Sites and Corresponding Creation Efficiencies NOTE.—The following values are listed: the numbers of observed new site creations due to CpG deamination events (d+) and nondeamination events (nond+); the numbers of potential deamination and nondeamination-driven creation events which could lead to site creation but have not been observed (d− and nond−, respectively); creation efficiencies for deamination-driven creation events (CEdeam), nondeamination-driven creation events (CEnondeam), and percentages of deamination-driven creation events (% of d+). The P value was calculated with Fisher’s exact test. In the following, we illustrate the significance of CpG deamination–driven creation events with an evolutionary analysis of in vivo TFBSs of such key transcription factors as c-Myc, Nanog, Oct4, and Ctcf. It is important to note that these factors contain CpA and TpG dinucleotides as part of their binding motifs (fig. 2; supplementary figs. 1 and 2, Supplementary Material online). c-Myc is known to bind to two different E-box motifs composed of CpA, TpG, and CpG dinucleotides: the so called canonical Myc E-box 5′-CACGTG-3′ and noncanonical Myc E-box 5′-CACATG-3′/5′-CATGTG-3′ (Zeller et al. 2006). Our analysis of in vivo c-Myc sites detected in a chromatin immunoprecipitation with the paired-end ditag experiment (ChIP-PET) (Zeller et al. 2006) revealed that CpG deamination–driven creation events make up as much as 56% of noncanonical and 28% of canonical c-Myc site creation events, and as expected, they are localized at positions 2, 3, 4, and 5 within c-Myc motifs (fig. 2 and table 1). The values of CEdeam for canonical and noncanonical c-Myc sites were 8- and 14-fold higher, respectively, than the values of CEnondeam. Likewise, the strong contribution of CpG deamination events was also observed for c-Myc sites annotated in the two classes of human promoters (supplementary table 1, Supplementary Material online). In LCPs, CpG deamination–driven creation events constituted as much as 78% of all noncanonical c-Myc site creation events, and the ratio of the CEdeam and CEnondeam values was ∼51.
F

CpG deamination drives creation of in vivo TFBSs. (A) Upper panel: Histogram of c-Myc- and Oct4-binding sites created via single nucleotide mutation events. The presence of the CpA and TpG dinucleotides within individual binding sites is highlighted by a violet background. Lower panel: Histogram of single nucleotide mutation events leading to creation of TFBSs. CpG deamination–driven creation events and non-CpG deamination–driven creation events are depicted in red and blue, respectively. (B) PAX-6-, c-Myc-, and p53-binding sites are created via deamination of methylated CpGs in Alu transposons. R-AluS/J_cons: consensus sequence of the CpG-containing region in the right arm of AluS and AluJ subfamilies.

CpG deamination drives creation of in vivo TFBSs. (A) Upper panel: Histogram of c-Myc- and Oct4-binding sites created via single nucleotide mutation events. The presence of the CpA and TpG dinucleotides within individual binding sites is highlighted by a violet background. Lower panel: Histogram of single nucleotide mutation events leading to creation of TFBSs. CpG deamination–driven creation events and non-CpG deamination–driven creation events are depicted in red and blue, respectively. (B) PAX-6-, c-Myc-, and p53-binding sites are created via deamination of methylated CpGs in Alu transposons. R-AluS/J_cons: consensus sequence of the CpG-containing region in the right arm of AluS and AluJ subfamilies. Moreover, we found further evidence suggesting that CpG deamination created Myc E-box sites. Interestingly, a recent study reported the identification of a canonical Myc E-box site in AluS sequences (Wang et al. 2009). Likewise, we identified noncanonical Myc E-box sites to reside in sequences of AluS transposons (fig. 2). Our literature searches identified experimental studies in which two canonical Myc E-box sites within AluJ and AluS elements inserted in the second intron of the CDC25A gene (Galaktionov et al. 1996) and in the distal promoter region of the KIR gene (Cichocki et al. 2009), respectively, were shown to function as Myc-responsive elements. Provocatively, we found that all these canonical and noncanonical c-Myc sites are located in one particular location in AluS/J, which overlaps with the p53-binding site previously described by us (Zemojtel et al. 2009) (fig. 2). The reconstructed consensus sequences of AluS and AluJ subfamilies contain the CGCGCG sequence at the location corresponding to the Myc-binding site. Thus, we conclude that like p53 sites, they were also created via CpG deamination after the insertion of Alu transposons into the genome. Specifically, two and three CpG deaminations are required to create from a CGCGCG template the canonical and the noncanonical c-Myc sites, respectively. Interestingly, it has been reported that methylation of the CpG dinucleotide present in the canonical E-box inhibits Myc binding both in vitro (Prendergast et al. 1991) and in vivo (Perini et al. 2005). In light of this, spontaneous deamination of the methylated CpG in a canonical Myc-binding site would result in the creation of a noncanonical-binding site, which is desensitized to methylation. Likewise, evolutionary analysis of TFBSs of Oct4, Nanog, and Ctcf detected in human ES cells (Kunarso et al. 2010) also pointed to a strong contribution of CpG deamination–driven creation events. For Oct4, Nanog, and Ctcf, we obtained, respectively, a 24-, 22-, and 6-fold enrichment of CpG deamination–driven creation events when compared with nondeamination–driven creation events (table 1). CpG deamination events constituted 25%, 22%, and 10% of all creation events for Oct4, Nanog, and Ctcf, respectively (table 1 and fig. 2; supplementary fig. 3, Supplementary Material online). For example, it can be seen that CpG deamination–driven creation events were abundant at positions corresponding to nucleotides 4, 8, 9, and 12, where CA and TG dinucleotides are present within the Oct4-binding motif (fig. 2, supplementary fig. 1, Supplementary Material online). We provide here experimental evidence supporting the notion that single nucleotide mutations from CpG to TpG (resulting from CpG deamination) such as seen at position 9 in figure 2 (bottom panel) can create functional Oct4-binding motifs. Recently, an SNP at position 9 in a putative Oct4 motif (as referenced in fig. 2) was discovered in a patient with Beckwith–Wiedemann syndrome (Demars et al. 2010). The mutation from T to C occurred in the context of a TpG dinucleotide and resulted in the creation of a CpG dinucleotide in the patient sequence; WT: GTTTGAGATGCTAAT → P: GTTTGAGACGCTAAT. The study used in vitro GEMSA (Gel Electrophoretic Mobility Shift Assay) with nuclear extract from Oct4-overexpressing cells to provide evidence that Oct4 did not bind to the sequence variant containing CpG (P) but only to the one having a TpG instead of CpG. Together, these results highlight the efficiency of CpG deamination events in the creation of TFBS. Originally, it has been postulated that a key role of CpG methylation and deamination is in the inactivation of transposons and thus in protecting mammalian genomes from their insertional activity (Yoder et al. 1997; Zemach and Zilberman 2010). One-third of all CpGs in the human genome are located within Alu transposons. Over time, CpG deamination permanently inactivates initially CpG-rich Alus. As a side effect of this process, decaying Alu sequences give birth to new regulatory elements. In particular, the CpG-rich ∼20 nt long template sequence residing in Alu elements (fig. 2), can be converted via means of deamination into p53 (Zemojtel et al. 2009), PAX-6 (Zhou et al. 2002), and c-Myc TFBSs as documented here (fig. 2). This phenomenon is not limited to these three TFs and Alu transposons. By analyzing mutational events leading to creation of TFBSs in human promoters (217 TFBSs) and in genome-wide regions bound in vivo (14 TFBSs), we document that CpG deamination events create TFBSs with much higher efficiency than other single nucleotide mutational events. In a recent study reporting genome-wide locations of Ctcf sites in human genome, it has been noticed that orthologous Ctcf-binding sequences in vertebrate genomes accumulated at a very high rate C → T mutations at the position where the CpG dineuclotide is located in the Ctcf-binding motif (position 15 as referenced in the supplementary fig. 2, Supplementary Material online) (Kim et al. 2007). This observation is compatible with CpG deamination as a driving process. In light of this, it is tempting to speculate that CpG deamination might in fact constitute a double-edged sword involved not only in creating but also in inactivating binding sites. In this context, we propose that CpG deamination, which is known to induce variable regional methylation in human populations, constitutes an evolutionary benefit facilitating innovation in gene regulation.

Supplementary Material

Supplementary figures 1–3, Supplementary tables 1 and 2, and Supplementary files 1 and 2 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  27 in total

1.  Methylation and deamination of CpGs generate p53-binding sites on a genomic scale.

Authors:  Tomasz Zemojtel; Szymon M Kielbasa; Peter F Arndt; Ho-Ryun Chung; Martin Vingron
Journal:  Trends Genet       Date:  2008-12-26       Impact factor: 11.639

2.  Genomic surveys by methylation-sensitive SNP analysis identify sequence-dependent allele-specific DNA methylation.

Authors:  Kristi Kerkel; Alexandra Spadola; Eric Yuan; Jolanta Kosek; Le Jiang; Eldad Hod; Kerry Li; Vundavalli V Murty; Nicole Schupf; Eric Vilain; Mitzi Morris; Fatemeh Haghighi; Benjamin Tycko
Journal:  Nat Genet       Date:  2008-06-22       Impact factor: 38.330

3.  Novel PAX6 binding sites in the human genome and the role of repetitive elements in the evolution of gene regulation.

Authors:  Yi-Hong Zhou; Jessica B Zheng; Xun Gu; Grady F Saunders; W-K Alfred Yung
Journal:  Genome Res       Date:  2002-11       Impact factor: 9.043

Review 4.  Cytosine methylation and the ecology of intragenomic parasites.

Authors:  J A Yoder; C P Walsh; T H Bestor
Journal:  Trends Genet       Date:  1997-08       Impact factor: 11.639

5.  Analysis of the IGF2/H19 imprinting control region uncovers new genetic defects, including mutations of OCT-binding sequences, in patients with 11p15 fetal growth disorders.

Authors:  Julie Demars; Mansur Ennuri Shmela; Sylvie Rossignol; Jun Okabe; Irène Netchine; Salah Azzi; Sylvie Cabrol; Cédric Le Caignec; Albert David; Yves Le Bouc; Assam El-Osta; Christine Gicquel
Journal:  Hum Mol Genet       Date:  2009-12-09       Impact factor: 6.150

6.  Association of Myn, the murine homolog of max, with c-Myc stimulates methylation-sensitive DNA binding and ras cotransformation.

Authors:  G C Prendergast; D Lawe; E B Ziff
Journal:  Cell       Date:  1991-05-03       Impact factor: 41.582

7.  A c-Myc regulatory subnetwork from human transposable element sequences.

Authors:  Jianrong Wang; Nathan J Bowen; Leonardo Mariño-Ramírez; I King Jordan
Journal:  Mol Biosyst       Date:  2009-07-21

8.  Evolution of transcription factor binding sites in Mammalian gene regulatory regions: conservation and turnover.

Authors:  Emmanouil T Dermitzakis; Andrew G Clark
Journal:  Mol Biol Evol       Date:  2002-07       Impact factor: 16.240

9.  Divergent evolution of human p53 binding sites: cell cycle versus apoptosis.

Authors:  Monica M Horvath; Xuting Wang; Michael A Resnick; Douglas A Bell
Journal:  PLoS Genet       Date:  2007-06-15       Impact factor: 5.917

10.  The impact of recombination on nucleotide substitutions in the human genome.

Authors:  Laurent Duret; Peter F Arndt
Journal:  PLoS Genet       Date:  2008-05-09       Impact factor: 5.917

View more
  14 in total

1.  Cytosine Methylation Affects the Mutability of Neighboring Nucleotides in Germline and Soma.

Authors:  Vassili Kusmartsev; Magdalena Drożdż; Benjamin Schuster-Böckler; Tobias Warnecke
Journal:  Genetics       Date:  2020-02-20       Impact factor: 4.562

Review 2.  Nuclear function of Alus.

Authors:  Chen Wang; Sui Huang
Journal:  Nucleus       Date:  2014-02-04       Impact factor: 4.197

Review 3.  Early developmental conditioning of later health and disease: physiology or pathophysiology?

Authors:  M A Hanson; P D Gluckman
Journal:  Physiol Rev       Date:  2014-10       Impact factor: 37.312

Review 4.  The theory of RNA-mediated gene evolution.

Authors:  Kevin V Morris
Journal:  Epigenetics       Date:  2015-01-27       Impact factor: 4.528

Review 5.  Roles of transposable elements in the regulation of mammalian transcription.

Authors:  Raquel Fueyo; Julius Judd; Cedric Feschotte; Joanna Wysocka
Journal:  Nat Rev Mol Cell Biol       Date:  2022-02-28       Impact factor: 113.915

6.  Interactions of chromatin context, binding site sequence content, and sequence evolution in stress-induced p53 occupancy and transactivation.

Authors:  Dan Su; Xuting Wang; Michelle R Campbell; Lingyun Song; Alexias Safi; Gregory E Crawford; Douglas A Bell
Journal:  PLoS Genet       Date:  2015-01-08       Impact factor: 5.917

7.  L1Base 2: more retrotransposition-active LINE-1s, more mammalian genomes.

Authors:  Tobias Penzkofer; Marten Jäger; Marek Figlerowicz; Richard Badge; Stefan Mundlos; Peter N Robinson; Tomasz Zemojtel
Journal:  Nucleic Acids Res       Date:  2016-10-18       Impact factor: 16.971

8.  Transposable Elements and DNA Methylation Create in Embryonic Stem Cells Human-Specific Regulatory Sequences Associated with Distal Enhancers and Noncoding RNAs.

Authors:  Gennadi V Glinsky
Journal:  Genome Biol Evol       Date:  2015-05-07       Impact factor: 3.416

9.  The destiny of the resistance/susceptibility against GCRV is controlled by epigenetic mechanisms in CIK cells.

Authors:  Xueying Shang; Chunrong Yang; Quanyuan Wan; Youliang Rao; Jianguo Su
Journal:  Sci Rep       Date:  2017-07-03       Impact factor: 4.379

10.  Functional characterization of enhancer evolution in the primate lineage.

Authors:  Jason C Klein; Aidan Keith; Vikram Agarwal; Timothy Durham; Jay Shendure
Journal:  Genome Biol       Date:  2018-07-25       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.