Shi-Qing Mao1, Sergio Martínez Cuesta1,2, David Tannahill1, Shankar Balasubramanian1,2,3. 1. Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Cambridge CB2 0RE, U.K. 2. Department of Chemistry, University of Cambridge, Cambridge CB2 1EW, U.K. 3. School of Clinical Medicine, University of Cambridge, Cambridge CB2 0SP, U.K.
Abstract
Cytosine methylation is an important epigenetic mark, but how the distinctive patterns of DNA methylation arise remains elusive. For the first time, we systematically investigated how these patterns can be imparted by the inherent enzymatic preferences of mammalian de novo DNA methyltransferases in vitro and the extent to which this applies in cells. In a biochemical experiment, we subjected a wide variety of DNA sequences to methylation by DNMT3A or DNMT3B and then applied deep bisulfite sequencing to quantitatively determine the sequence preferences for methylation. The data show that DNMT3A prefers CpG and non-CpG sites followed by a 3'-pyrimidine, whereas DNMT3B favors a 3'-purine. Overall, we show that DNMT3A has a sequence preference for a TNC[G/A]CC context, while DNMT3B prefers TAC[G/A]GC. We extended our finding using publicly available data from mouse Dnmt1/3a/3b triple-knockout cells in which reintroduction of either DNMT3A or DNMT3B expression results in the acquisition of the same enzyme specific signature sequences observed in vitro. Furthermore, loss of DNMT3A or DNMT3B in human embryonic stem cells leads to a loss of methylation at the corresponding enzyme specific signatures. Therefore, the global DNA methylation landscape of the mammalian genome can be fundamentally determined by the inherent sequence preference of de novo methyltransferases.
Cytosine methylation is an important epigenetic mark, but how the distinctive patterns of DNA methylation arise remains elusive. For the first time, we systematically investigated how these patterns can be imparted by the inherent enzymatic preferences of mammalian de novo DNA methyltransferases in vitro and the extent to which this applies in cells. In a biochemical experiment, we subjected a wide variety of DNA sequences to methylation by DNMT3A or DNMT3B and then applied deep bisulfite sequencing to quantitatively determine the sequence preferences for methylation. The data show that DNMT3A prefers CpG and non-CpG sites followed by a 3'-pyrimidine, whereas DNMT3B favors a 3'-purine. Overall, we show that DNMT3A has a sequence preference for a TNC[G/A]CC context, while DNMT3B prefers TAC[G/A]GC. We extended our finding using publicly available data from mouseDnmt1/3a/3b triple-knockout cells in which reintroduction of either DNMT3A or DNMT3B expression results in the acquisition of the same enzyme specific signature sequences observed in vitro. Furthermore, loss of DNMT3A or DNMT3B in human embryonic stem cells leads to a loss of methylation at the corresponding enzyme specific signatures. Therefore, the global DNA methylation landscape of the mammalian genome can be fundamentally determined by the inherent sequence preference of de novo methyltransferases.
In mammals,
most DNA methylation
occurs at C-5 of cytosine bases. Cytosine methylation is a well-established
epigenetic mark and is involved in the regulation of key biological
processes, including tissue specific gene expression patterns, X-chromosome
inactivation, transposon silencing, and genomic imprinting.[1−3] There are three main DNA methyltransferase enzymes, DNMT3A, DNMT3B,
and DNMT1. DNMT3A and DNMT3B are de novo methylases
that operate on both unmethylated and hemimethylated DNA.[4,5] In contrast, DNMT1 is a maintenance methylase that preserves methylation
patterns during replication due to an inherent requirement for hemimethylated
DNA.[6−8]Cytosine methylation is often described in terms of CpG and
non-CpG
contexts (i.e., CH, where H = A, T, or C), with the latter extended
to include CHG and CHH categories based on symmetry.[9] Overall, this leads to palindromic (CpG), partially palindromic
(CHG), and nonpalindromic (CHH) methylation sites. Most CpG sites
gain methylation on both DNA strands in early mammalian embryonic
development[1,10] and then remain highly methylated
throughout development. Cytosines in CpG islands (genomic regions
with a high frequency of CpG sites) associated with promoters are
dynamically regulated and closely linked with gene expression.[11−14] Different tissues have distinct profiles of non-CpG methylation,
and the highest levels are found in pluripotent stem cells and in
the central nervous system.[11,15−18] In human embryonic stem cells (hESCs), ∼25% of total cytosine
methylation occurs in a non-CpG context with 71% at CHG and 29% at
CHH sites, while in human neurons, 53% of total methylated cytosines
are at non-CpGs, of which >80% are at CHH sites.[18] As the maintenance methylation enzyme DNMT1 has no reported
activity at non-CpG sites,[15] this raises
basic questions about how distinctive DNA methylation landscapes are
established and maintained.Several factors shape the DNA methylome,
including nucleosome positioning[19,20] and histone
modifications.[21,22] The methylation-deficient
DNMT family member DNMT3L has been reported to stimulate DNMT3A/B
activity by enhancing the stability of enzyme complex recruitment
to DNA or by an increased level of cofactor S-adenosyl-l-methionine binding.[23−25] Genome engineering experiments
of inserted artificial sequences in mouse stem cells have begun to
uncover the contribution to methylation of the underlying genomic
sequence, namely, CpG density and GC content.[26,27] Furthermore, transcription factor (TF) binding[27−29] and G-quadruplex
DNA secondary structures[30] are implicated
in protecting TF-bound regions and certain CpG islands from methylation,
respectively. While such factors are critical for regulating DNA methyltransferases
and influencing the distribution of DNA methylation, a key unanswered
question that remains is how differential methylation patterns are
imparted to different genomic sequences in the first place.The preferential methylation of unmethylated CpG by DNMT3A/B and
of hemimethylated CpG sites by DNMT1 has been studied biochemically
and structurally.[7,31] Early work attempted to determine
the flanking sequence preferences of DNMT3A/B at CpGs using four synthetic
oligos with no assessment of non-CpG methylation.[32] Notably, no studies have considered a large and unbiased
pool of competing substrates as a fair test of methylation preferences.
It has also not been established whether CpG methylation is installed
in a sequence specific manner by DNMT3A/B in the mammalian genome
under physiological conditions. Recent DNA methylation maps show non-CpG
methylation in nearly all human tissues,[16,33] but the question of whether DNMT3A or DNMT3B establishes these non-CpG
methylation signatures is still elusive.Herein, we describe
a novel assay and systematic analyses that
quantitatively interrogate DNMT3A and -3B enzyme specificity on a
large and diverse set of cytosine contexts using unmethylated Escherichia coli genomic DNA as the substrate coupled with
high-depth bisulfite sequencing analysis. We find that each enzyme
shows distinct target sequence signatures that are unchanged upon
boosted methylation activity by the inactive cofactor DNMT3L. We find
that these signatures are naturally observed within the mouse and
human DNA methylomes, demonstrating that the intrinsic substrate preferences
of DNMT3A/B are critical for determining the distribution of DNA methylation
in mammalian genomes.
Materials and Methods
In Vitro Methylation Assay
Full-length
recombinant humanDNMT3A (Abcam, ab170408), DNMT3B (Abcam, ab170410),
and DNMT3L (active motif, catalog no. 31414) were purchased from commercial
providers; 100 ng of unmethylated E. coli genomic
DNA (D5016, Zymo Research) was incubated at 37 °C with 500 ng
of DNMT3A, DNMT3B, or DNMT3L and 160 μM S-adenosylmethionine
(SAM, catalog no. B9003S, NEB) in reaction buffer (50 mM Tris-HCI,
1 mM EDTA, 1 mM dithiothreitol, 5% glycerol, and 100 μg/mL bovine
serum albumin) for 30, 120, and 240 min. For DNMT3L stimulation experiments,
200 ng of DNMT3A or DNMT3B and 200 ng of DNMT3L were incubated with
100 ng of E. coli DNA for 120 min. For comparison,
1 unit of bacterial CpG methyltransferase M.SssI
(New England Biolabs), which has high methylation activity in vitro, was also incubated with DNA for 10, 30, and 240
min. After incubation, the reaction was terminated by the mixture
being heated at 65 °C for 20 min. DNA was then purified using
a DNA Clean & Concentrator Kit (D4030, Zymo Research) and processed
for high-throughput bisulfite sequencing.
Bisulfite Sequencing
Bisulfite libraries were prepared
using a Pico Methyl-Seq Library Prep Kit (D5456, Zymo Research) by
following the manufacturer’s protocol. Briefly, DNA was treated
with bisulfite conversion reagent at 98 °C for 8 min and then
at 54 °C for 60 min. Converted DNA was purified and amplified
using random priming. Amplified DNA was purified, adapted, and indexed.
Libraries were pooled and sequenced on an Illumina NextSeq-500 platform
using High-Output Kit ver. 2.5 (75 cycles) in single-end mode. The
nonconversion rate was estimated to be 0.5% using E. coli DNA incubated with the inactive DNMT3L.
The observed numbers of unique sequence
contexts (flanking cytosine,
CG, or CA dinucleotides) present in the forward and reverse strands
of the λ, E. coli, and human reference genomes
were obtained using bedtools ver. 2.27.0[34] and custom Python scripts. The observed number of occurrences for
a given n k-mer was compared to the total number
(t) of all possible sequence contexts, e.g., NCGN
(n = 2; t = 16), NNCGNN (n = 4; t = 256), and NNNCGNNN (n = 6; t = 4096), and is represented in Figure S1.
Processing and Analysis
of E. coli Bisulfite
Sequencing Data
The quality of raw sequencing reads was evaluated
using FastQC ver. 0.11.3 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
Low-quality base calls were filtered, and Illumina TruSeq adapters
were trimmed from the read’s 3′ end using cutadapt ver.
1.123.[35] No reads smaller than 10 bp were
kept (after adapter and base quality trimming). Following the read
quality assessment, the first six bases of every read were also trimmed.Bisulfite-converted reads were aligned to the E. coli K-12 MG1655 ASM584v2 reference genome (Ensembl Genomes release 41)
using bismark ver. 0.19.0[36] with options non_directional–unmapped, and duplicated alignments were removed using deduplicate_bismark.
Methylation calls were obtained using bismark_methylation_extractor
with the option –CX_context. The sequence
context for each cytosine in the E. coli genome was
obtained using bedtools slop and bedtools getfasta. Only cytosines
with at least 10 aligned sequencing reads were considered for further
analysis.To visualize methylation levels, boxplots and sequence
logos were
generated in different sequence contexts using the libraries data.table
v1.10.4, ggplot2 v2.2.1[37] and ggseqlogo
v0.1[38] in the R programming language (https://www.r-project.org/).
Processing and Analysis of Mouse and Human Bisulfite Sequencing
Data Sets
Public whole genome bisulfite sequencing (WGBS)
and reduced representation bisulfite sequencing (RRBS) data sets used
in this study are listed in Table S1. Raw
WGBS data sets from GEO were processed like the E. coli libraries, whereas RRBS data sets were quality trimmed and further
clipped by three bases from the 5′ end using Trim Galore ver.
0.6.4_dev (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/). The GENCODE reference genomes[39] used
were human release 28 (GRCh38.p12) and mouse release M18 (GRCm38.p6).
For WGBS data sets, cleaned-up fastq files from human and mouse were
aligned against GRCh38.p12 and GRCm38.p6, respectively, and subsequent
reads were processed on a chromosome-by-chromosome basis. Unless otherwise
stated, methylation on chromosome 1 of both mouse and human data sets
was reported. For RRBS data sets, non-deduplicated aligned reads were
processed for all chromosomes simultaneously, and methylation counting
and visualization were performed as in the E. coli libraries.[40]For details about the bioinformatics data analysis, see https://github.com/sblab-bioinformatics/dnmt3a-dnmt3b.
Results
DNMT3A and -3B Enzyme Sequence Preferences
Revealed by a High-Throughput
Biochemical Methylation Assay
For a comprehensive study of
methyltransferase enzyme sequence preferences, we aimed to biochemically
capture a wide range of substrates that display sufficient sequence
diversity and coverage to provide a fair and systematic collection
of possible sequence targets. The 4.6 million bp E. coli genome is 51% G/C rich and contains 346670 CpG sites, which represents
96.6% of all possible NNNNCGNNNN (N = A, T, C, or G) sequences
(63295 of 48 = 65536 total combinations), 96.6% of all
NNNNCANNNN, and 99% of all NNNNCNNNN (Figure S1). Thus, unlike previous studies using a limited
range of CpG substrates (275 CpG sites altogether),[32] the E. coli genome has sufficient sequence
context diversity to serve as an essentially unbiased substrate to
investigate the sequence preferences of different methylases.We then developed a biochemical assay to evaluate the methylation
activity of recombinant full-length humanDNMT3A or DNMT3B using unmethylated E. coli genomic DNA as the substrate, followed by methylation
assessment through whole genome bisulfite sequencing and subsequent
computational analysis. We sought assay conditions (10–60%
total methylation) that avoided saturated methylation that would mask
any differential activity. Either DNMT3A or DNMT3B was incubated with E. coli DNA for different time ranges (30, 120, and 240
min) to provide a range of methylation levels for subsequent analysis.
After bisulfite sequencing, average methylation at CpG and non-CpG
contexts was calculated. The level of methylation at CpG sites increased
with incubation time and ranged from 11% to 20% at 30 min and from
40% to 46% at 240 min for DNMT3A and DNMT3B, respectively (Figure A). After 240 min,
the level of non-CpG methylation was 2.7% at CHGs and 2.8% at CHHs
for DNMT3A and 10.7% at CHGs and 5.3% at CHHs for DNMT3B. These results
show that while DNMT3A and DNMT3B show a broadly similar level (41%
and 47%, respectively) of CpG methylation after incubation for 240
min, DNMT3B has a relatively greater methylation activity for non-CpG
sites (∼1.9–4-fold) compared to DNMT3A. Excluding biases
introduced by bisulfite conversion, we also showed that the nonconversion
level was 0.5% for both CpG and non-CpG contexts after incubating
inactive DNMT3L with E. coli genomic DNA for 240
min (Figure A). Furthermore,
our results also show that DNMT3B but not DNMT3A has a >2-fold
methylation
activity for CHG over CHH sequences (Figure A), which implies an inherent sequence-dependent
preference of DNMT3B.
Figure 1
In vitro methylation assay using recombinant
full-length
human DNMT3A and -3B. (A) Average level of methylation introduced
by DNMT3A, DNMT3B, or DNMT3L in CpG and non-CpG sites. Box plots of
methylation levels at NCN sequences after incubation with (B) DNMT3A
or (C) DNMT3B for 240 min, ranked by median methylation level. Sequences
are written in 5′ to 3′ order.
In vitro methylation assay using recombinant
full-length
humanDNMT3A and -3B. (A) Average level of methylation introduced
by DNMT3A, DNMT3B, or DNMT3L in CpG and non-CpG sites. Box plots of
methylation levels at NCN sequences after incubation with (B) DNMT3A
or (C) DNMT3B for 240 min, ranked by median methylation level. Sequences
are written in 5′ to 3′ order.To investigate the influence of sequence context on cytosine methylation
by DNMT3A and DNMT3B, we ranked the median cytosine methylation levels
for trinucleotide sequences with cytosine as the middle base [i.e.,
NCN (Figure B,C)].
Both DNMT3A and DNMT3B showed a strong preference for CpG dinucleotides,
resulting in more than 30% and 37% methylation, respectively. The
next most methylated sequence context was for CpA dinucleotides, which
was less than 4% and 15% methylation on all non-CpG sites after incubation
for 240 min as in DNMT3A and DNMT3B, respectively (Figure B,C). DNMT3B also showed more
variable methylation than DNMT3B on CpG or CpA. The preference for
CpG sites is independent of incubation time (Figure S2A,B). An important control using M.SssI
showed high methylation activity, and all trinucleotide sequence contexts
are equally available for methylation without any preference (Figure S2C), which also rules out any biases
caused by sample processing and data analysis. Altogether, the results
reveal the importance of bases flanking the substrate cytosine and
the already known preference for CpG over non-CpG sites.
Distinct DNMT3A
and DNMT3B Sequence Preferences Are Directed
by Flanking Sequences for both CpG and Non-CpG Contexts
To
further investigate the sequence preferences of DNMT3A and -3B, we
then explored the influence of both the 5′ and 3′ flanking
bases for CpG sites by ranking the median methylation level at all
known NCGN sequences. Notably, DNMT3A generally favors a pyrimidine
(C or T) as the 3′ adjacent base with NCGC and NCGT sequences
gaining the most (44%) and second most (38%) methylations, whereas
conversely, DNMT3B prefers a 3′ purine base (G or A), with
NCGG and NCGA sequences being most methylated (62% and 58%, respectively).
We also observed that DNMT3B showed a preference for sequences with
a T or A at the 5′ position in NCGG or NCGA contexts, respectively,
whereas DNMT3A favors sites with C/A at the 5′ position (Figure A,B). DNMT3B also
showed a greater spread in median methylation levels, from 17% to
67%, across different sequence contexts, while DNMT3A was more restricted
ranging from 25% to 40% (Figure A,B). When longer flanking sequences (NNCGNN) were
considered, a clear pattern of sequence preferences and differences
between the two enzymes emerged (Figure S4). For example, DNMT3A prefers TACGCC sequences (N = 3206; median level of methylation of 66.7%) and disfavors AGCGGG
sequences (N = 2585; 12.9%), whereas DNMT3B prefers
GTCGGC sequences (N = 2641; 73.9%) and disfavors
GCCGTG sequences (N = 2570; 8.3%) (Figure S4A,B). The differences in methylation range and sequence
preference were independent of incubation time. There was no observed
flanking sequence preference for the M.SssI control
methylase (Figure S3 and Figure C).
Figure 2
DNMT3A and -3B have different
preferences for flanking sequences
for CpG and non-CpG sites. Box plot of methylation levels in NCGN
context after incubation with (A) DNMT3A or (B) DNMT3B for 240 min,
ranked by median methylation level. Sequence logo of the 1000 most
methylated 10-mer CG sequences after incubation with (C) DNMT3A or
(D) DNMT3B for 30 min. Sequence logo of the 1000 most methylated 10-mer
non-CpG sequences after incubation with (E) DNMT3A or (F) DNMT3B for
120 min.
DNMT3A and -3B have different
preferences for flanking sequences
for CpG and non-CpG sites. Box plot of methylation levels in NCGN
context after incubation with (A) DNMT3A or (B) DNMT3B for 240 min,
ranked by median methylation level. Sequence logo of the 1000 most
methylated 10-mer CG sequences after incubation with (C) DNMT3A or
(D) DNMT3B for 30 min. Sequence logo of the 1000 most methylated 10-mer
non-CpG sequences after incubation with (E) DNMT3A or (F) DNMT3B for
120 min.To determine whether additional
flanking bases have an influence
on preference, we extended our analyses to include four bases 5′
and 3′ of the CpG (i.e., NNNNCGNNNN) by calculating the
consensus sequence logo of the top 1000 most methylated sequences
after incubation for 30 min (Figure C,D). Following from the strong enzymatic preference
at the adjacent 3′ position for CpG substrates as highlighted
before, DNMT3A also showed a strong preference for a T at the −2
position 5′ with NNTNCGNNNN representing
75% in all methylated sequences (Figure C). In contrast, DNMT3B had a preference
for T or A in both the −1 and −2 positions 5′
with NN[T/A]NCGNNNN or NNN[T/A]CGNNNN
sequences representing >75% (Figure D). Both DNMT3A and DNMT3B showed similar preferences
for C at the +2 position 3′, and DNMT3A also showed a preference
for A at the +3 position 3′ (Figure C,D). Longer incubation times ultimately
led to full methylation at a wide range of sequence contexts (Figure S4), obscuring intrinsic sequence preferences
(Figure S5A,B).For non-CpG dinucleotides,
DNMT3A and DNMT3B showed higher activity
at CpA than at CpC or CpT sites, with NNNNCANNNN
representing 97% of all methylated sequences (Figure E,F). The sequence preference of DNMT3A/B
at non-CpG sites is similar to that at CpG sites, which was also independent
of the incubation time before reaching the saturation level (Figure S5C,D). Furthermore, DNMT3A and -3B each
showed a similar preference for flanking sequences at the less methylated
CT or CC dinucleotide sites compared to that of CA or CG sequences
(Figure S6). Overall, these in
vitro methylation analyses unveil distinctive methylation
signatures for human de novo methyltransferases in
both CpG and non-CpG contexts, which reveals intrinsic enzymatic substrate
specificities.To examine the possible asymmetry of sequence
preferences within
a duplex context, we identified the 10-mer CpG sites that were both
>60% methylated at the C (forward strand) and G (reverse strand)
position.
Then, 1748 heavily methylated duplex sites were found after incubation
with DNMT3A, and 18062 sites for DNMT3B. Sequence logo analysis reveals
a core [A/G]CG[T/C] signature for DNMT3A and a [C/T]CG[G/A] signature
for DNMT3B (Figure S7). We found no evidence
to support asymmetry in sequence preference. These signatures were
self-complementary, in concordance with the flanking sequence signature
of DNMT3A/B.
DNMT3L Stimulates DNMT3A/B Activity without
Altering Sequence
Preference
DNMT3L is highly related to DNA methyltransferases,
and though it does not have any methyltransferase activity per se,
it is a key factor that stimulates de novo methylation.[23−25] Early work on DNMT3L suggested that it can modulate DNMT3A/B activity
without changing the sequence preferences of DNMT3A/B.[32] However, this study focused on only a limited
number of CpG sites and used near-saturation levels of methylation;
thus, an unbiased and accurate assessment of the effects of DNMT3L
remains open.To further investigate how DNMT3L may affect DNMT3A/B
sequence preferences, we added full-length human recombinant DNMT3L
to the methylation reaction together with DNMT3A or DNMT3B. To avoid
methylation saturation due to increased overall methylation levels,
200 ng instead of 500 ng of DNMT3A or DNMT3B was used, which resulted
in 14% of CG methylation for DNMT3A and 3.2% for DNMT3B (Figure A). DNMT3L increased
DNMT3A methylation activity by 3-fold and DNMT3B methylation activity
by 11-fold in a CpG context (Figure A), which is consistent with previous reports.[23−25] Methylation at non-CpG sites was also enhanced (Figure B). Sequence logo analysis
shows an unaltered sequence preference for DNMT3A/3B after addition
of DNMT3L in both CpG and non-CpG contexts (Figure C–F; see also Figure C–F and Figures S5 and S8). This suggests that the stimulatory effect of DNMT3L
does not alter the flanking sequence preference for DNMT3A/B, which
is consistent with the absence of any direct interaction between DNMT3L
and DNA within a DNMT3A–DNMT3L tetramer complex.[41,42]
Figure 3
DNMT3L
stimulates DNMT3A and DNMT3B activity without altering their
sequence preference. Average methylation by DNMT3A and DNMT3B in (A)
CpG and (B) non-CpG sites. Sequence logo of the 1000 most methylated
10-mer CG sequences after incubation with (C) DNMT3A/3L or (D) DNMT3B/3L
for 120 min. Sequence logo of the 1000 most methylated 10-mer non-CpG
sequences after incubation with (E) DNMT3A/3L or (F) DNMT3B/3L for
120 min.
DNMT3L
stimulates DNMT3A and DNMT3B activity without altering their
sequence preference. Average methylation by DNMT3A and DNMT3B in (A)
CpG and (B) non-CpG sites. Sequence logo of the 1000 most methylated
10-mer CG sequences after incubation with (C) DNMT3A/3L or (D) DNMT3B/3L
for 120 min. Sequence logo of the 1000 most methylated 10-mer non-CpG
sequences after incubation with (E) DNMT3A/3L or (F) DNMT3B/3L for
120 min.
Methylation Signatures
of DNMT3A and DNMT3B in Mammalian Cells
To further expand
our in vitro findings that revealed
DNMT3A/B sequence preferences, we asked if the observed patterns hold
true in cellular and physiological conditions. Subsequently, we explored
the extent to which endogenous mammalian DNA methylomes are explained
by the distinct specificities of DNMT3A and DNMT3B.Mammalian
DNMT3 protein sequences are highly conserved with 96% of amino acids
(875 of 912) being identical between mouse and humanDNMT3A, including
100% identical C-terminal residues and catalytic domains (508–912).
Human and mouseDNMT3B protein sequences are 88% identical (717 of
817). Due to this level of conservation, we anticipate that human
and mouse DNMT3 enzymes will show equivalent sequence preferences,
and therefore, we used human or mouse methylation data sets interchangeably
in the following analyses.We examined the patterns of highly
methylated sequences in wild-type
(WT) J1 mouse embryonic stem cells (mESCs) compared to mESCs in which
Dnmt1, -3a, and -3b have been genetically deleted (Dnmt triple-knockout
or TKO cells)[43] with either Dnmt3a or Dnmt3b
subsequently reintroduced ectopically.[44] In WT stem cells, no obvious pattern was observed in the top methylated
10-mer sequences (i.e., CpG ± 4 bases), which most likely reflects
the close-to saturation levels of CG methylation (81.7%). In contrast,
in TKO cells with reintroduced Dnmt3a or Dnmt3b, there is an average
methylation level of 7.1% and 2.8%, respectively, in CpG contexts,
which are nonsaturating and therefore allow the further analysis of
sequence preferences (see Materials and Methods). The preferred sequence contexts were readily apparent at the methylated
sites in TKO cells expressing either Dnmt3a or Dnmt3b (Figure A,B). These signatures correspond
closely to the patterns identified in the in vitro assay (Figure C–F);
namely, CpG and non-CpG sites followed by a 3′ pyrimidine gained
more methylation when Dnmt3a was expressed and 3′ purine when
Dnmt3b was expressed (Figure A,B).
Figure 4
Methylation signatures in the mouse and human genome depend
on
the presence of DNMT3A or DNMT3B. Sequence logos of the most methylated
sequence contexts in chromosome 1 of mouse stem cells. (A) Top methylated
10-mer CG sequences: 100% methylation for WT (left; N = 325622), >60% methylation for TKO with Dnmt3a (middle; N = 628), and >40% for TKO with Dnmt3b (right; N = 399). (B) Top methylated 10-mer CH sequences: >40%
methylation
for WT (left; N = 16392), >30% for TKO with Dnmt3a
(middle; N = 533), and >20% for TKO with Dnmt3b
(right; N = 177). Sequence logos of the most methylated
(>20% methylation)
CH sequence contexts in chromosome 1 of hESCs. (C) WT hESCs (N = 163050). (D) DNMT3A-KO hESCs at early passage 7 (left; N = 28101) and late passage 22 (middle; N = 59323). (E) DNMT3B-KO hESCs at passage 6 (left; N = 24327) and passage 22 (middle; N = 20603). (F)
DNMT3A/B double-knockout hESCs at passage 7 (left; N = 12736) and passage 22 (right; N = 6859). (G)
Ratio between CAC and CAG methylation in mouse Dnmt-TKO cells with
Dnmt3a/Dnmt3b reintroduced and hESCs with either DNMT3A or DNMT3B
knocked out.
Methylation signatures in the mouse and human genome depend
on
the presence of DNMT3A or DNMT3B. Sequence logos of the most methylated
sequence contexts in chromosome 1 of mouse stem cells. (A) Top methylated
10-mer CG sequences: 100% methylation for WT (left; N = 325622), >60% methylation for TKO with Dnmt3a (middle; N = 628), and >40% for TKO with Dnmt3b (right; N = 399). (B) Top methylated 10-mer CH sequences: >40%
methylation
for WT (left; N = 16392), >30% for TKO with Dnmt3a
(middle; N = 533), and >20% for TKO with Dnmt3b
(right; N = 177). Sequence logos of the most methylated
(>20% methylation)
CH sequence contexts in chromosome 1 of hESCs. (C) WT hESCs (N = 163050). (D) DNMT3A-KO hESCs at early passage 7 (left; N = 28101) and late passage 22 (middle; N = 59323). (E) DNMT3B-KO hESCs at passage 6 (left; N = 24327) and passage 22 (middle; N = 20603). (F)
DNMT3A/B double-knockout hESCs at passage 7 (left; N = 12736) and passage 22 (right; N = 6859). (G)
Ratio between CAC and CAG methylation in mouseDnmt-TKO cells with
Dnmt3a/Dnmt3b reintroduced and hESCs with either DNMT3A or DNMT3B
knocked out.To explore if the same preferences
persist in human cells, we then
profiled methylation signatures in HUES64human ESCs (hESCs) with
either wild-type or DNMT knockout genotypes.[45] WT hESCs displayed a mixture of DNMT3A-type and DNMT3B-type methylation
signature (Figure C), which was not observed in mouse WT cells. We attributed this
to the higher level of expression of DNMT3A/B in humanHUES64 cells
compared to mouse cells (Figure S9A). Moreover,
we observed that the DNMT3B-type signature emerges when DNMT3A is
depleted, with later cell culture passages leading to more prominent
effect (Figure D).
Similarly, removal of DNMT3B leads to the loss of the DNMT3B signature
in early passages with the subsequent appearance of the DNMT3A signature,
which suggests the slow dilution of DNMT3B-type methylation and accumulation
of DNMT3A type over a period of 15 passages (Figure E). Finally, DNMT3A and DNMT3B double knockout
leads to a substantial loss of CA methylation (from 1.8% to 0.2%)
and loss of DNMT3 signatures (Figure F).The clear difference in sequence preferences
between DNMT3A and
DNMT3B is at the 3′ base directly adjacent to the substrate
dinucleotides. To further infer whether the methylation levels in
CAC and CAG contexts are a good representation of the DNMT3A and DNMT3B
methylation signatures, we calculated the average methylation at trinucleotides
CAN (N = A, T, C, or G) in mouse and human stem cells and found that
CAC gained more methylation compared to other trinucleotides when
Dnmt3a was introduced. On the contrary, more methylation at CAG was
observed when Dnmt3b was reintroduced, which is consistent with the
preferences discovered before (Figure S9B). In both human and mouse WT ESCs, the ratio between CAC and CAG
methylation is close to 1, suggesting a balancing act between DNMT3A
and -3B (Figure G).
Additionally, introduction of DNMT3A into mouse TKO cells (or removal
of DNMT3B in human WT cells) led to ∼2–3-fold more CAC
methylation; however, introduction of DNMT3B into mouse TKO cells
(or removal of DNMT3A in human WT cells) led to more CAG methylation.
In line with the inherent sequence preferences flanking CpG sites
revealed in vitro, we also noted that TKO cells gain
more CGC/CGT methylation after reintroduction of Dnmt3a and more CGG/CGA
methylation after reintroduction of Dnmt3b (Figure S9C,D). No significant change was observed in WT or DNMT3A-
and DNMT3B-depleted hESCs in sequence contexts adjacent to CpG dinucleotides,
which may be due to saturation levels (Figure S9C,D).
The DNMT3A N-Terminal Domain Imparts Sequence
Preferences
The distinct patterns of flanking sequence preferences
for DNMT3A
or DNMT3B at both CpG and non-CpG sites suggest that there are intrinsic
enzyme structural features determining their specificity. To determine
whether the N-terminal or the catalytic domain is a determinant for
the sequence preferences of DNMT3A, we analyzed publicly available
RRBS data sets generated in the Dnmt3a/b double knockout and Dnmt1
knocked down mESCs (DKO-zero) expressing either full-length (FL) or
the catalytic domain (CD) of Dnmt3a.[46] The
expression of either FL- or CD-Dnmt3a reinstated CpG methylation levels
similar to that of WT cells.[46] The most
methylated non-CpG sites in WT cells revealed a TNCA[C/G]C
methylation signature combining the DNMT3A and DNMT3B’s methylation
signatures observed in vitro (Figure A, and also Figures E and 4B). The knockout
of Dnmt3a/b and knockdown of Dnmt1 abrogated methylation in DKO-zero
cells, which resulted in no methylation signature (Figure B). The reintroduction of full-length
DNMT3A (but not the DNMT3A catalytic domain) restored the characteristic
DNMT3A methylation signature observed in WT cells (Figure C,D). Overall, this suggests
that the N-terminal domain is a determinant for the sequence preference
of DNMT3A.
Figure 5
N-Terminal domain accounting for the sequence preferences of DNMT3A.
Sequence logos of the most methylated 10-mer non-CpG sequences (N = 5000) in (A) WT cells (WT), (B) DKO-zero cells, (C)
DKO-zero cells expressing full-length DNMT3A (DNMT3A-FL), or (D) DKO-zero
cells expressing the DNMT3A catalytic domain (DNMT3A-CD).
N-Terminal domain accounting for the sequence preferences of DNMT3A.
Sequence logos of the most methylated 10-mer non-CpG sequences (N = 5000) in (A) WT cells (WT), (B) DKO-zero cells, (C)
DKO-zero cells expressing full-length DNMT3A (DNMT3A-FL), or (D) DKO-zero
cells expressing the DNMT3A catalytic domain (DNMT3A-CD).
Discussion
A key challenge is to build an understanding
of how the de novo methyltransferases DNMT3A and
DNMT3B cooperate to
establish the mammalian DNA methylome in early embryonic development.
Evidence suggests that the underlying primary genomic sequence could
be involved in the dynamic and recurring deposition of cytosine methylation
in regulatory regions by sequence specific recruitment of transcription
factors.[29,47] However, how sequence context affects the
activity of de novo DNA methyltransferases is elusive.
By quantitatively examining the in vitro methylation
activity of full-length human recombinant DNA methyltransferases on
a diverse set of sequence contexts present in a small bacterial genome,
we have uncovered the inherent enzymatic preferences for sequences
flanking the substrate dinucleotides. DNMT3A favors a TNC[G/A]CC signature,
while DNMT3B prefers TAC[G/A]GC. Our observations are corroborated
by our findings of similar Dnmt3a or Dnmt3b methylation signatures
in mouseDnmt-TKO cells that express either Dnmt3a or Dnmt3b ectopically.
Furthermore, depletion of DNMT3A in humanHUES64 cells enhances a
DNMT3B-type methylation pattern, especially in a CA context, while
removal of DNMT3B leads to the appearance of a DNMT3A-type signature.
Taken together, we propose that the intrinsic sequence preferences
of DMNT3A/B should be taken into consideration when studying the establishment
of tissue specific methylation patterns.From our analysis of
mouse TKO stem cells and human DNMT3 knockout
cells, it is evident that DNMT3A and DNMT3B impose methylation patterns
in cells that resemble those seen in vitro from the
corresponding purified recombinant enzymes in the absence of additional
factors. This suggests that while the interaction with DNMT3L,[48−50] histone modifications,[21,22,44] or transcription factors[29] could modulate
or guide the methylation capacity of DNMT3s at certain regions, the
inherent enzyme sequence preferences shape a substantial part of the
underlying methylation patterns globally.While humanDNMT3A
and DNMT3B share ∼45% conservation across
the whole protein, ∼80% of amino acids are conserved in the
catalytic domain. This points to regulatory features outside the catalytic
domain having evolved to provide each protein selectivity to methylate
distinct genomic loci in different tissues and developmental stages.
Epigenetic enzymes such as DNMTs and TETs are being deployed in a
range of epigenetic engineering and biotechnological setups with potential
clinical utility, and our examination of the intrinsic sequence preference
of these enzymes could help guide the selection of DNMT3s for optimal
activity.
Conclusions
In summary, we provide a comprehensive
and robust quantitative
analysis of the intrinsic sequence preferences for the enzymatic activities
of de novo DNA methyltransferases on CpG and non-CpG
target sites in vitro and in mammalian stem cells.
The accurate determination of sequence preferences of de novo methyltransferases provides a new understanding of the origin of
specific DNA methylation patterns in different cell lineages and regulatory
regions.
Authors: Zachary D Smith; Michelle M Chan; Tarjei S Mikkelsen; Hongcang Gu; Andreas Gnirke; Aviv Regev; Alexander Meissner Journal: Nature Date: 2012-03-28 Impact factor: 49.962
Authors: Michael J Ziller; Hongcang Gu; Fabian Müller; Julie Donaghey; Linus T-Y Tsai; Oliver Kohlbacher; Philip L De Jager; Evan D Rosen; David A Bennett; Bradley E Bernstein; Andreas Gnirke; Alexander Meissner Journal: Nature Date: 2013-08-07 Impact factor: 49.962
Authors: Elisabeth Wachter; Timo Quante; Cara Merusi; Aleksandra Arczewska; Francis Stewart; Shaun Webb; Adrian Bird Journal: Elife Date: 2014-09-26 Impact factor: 8.140
Authors: Ma'mon M Hatmal; Mohammad A I Al-Hatamleh; Amin N Olaimat; Walhan Alshaer; Hanan Hasan; Khaled A Albakri; Enas Alkhafaji; Nada N Issa; Murad A Al-Holy; Salim M Abderrahman; Atiyeh M Abdallah; Rohimah Mohamud Journal: Biomedicines Date: 2022-05-24