Literature DB >> 33462491

The emergence of the brain non-CpG methylation system in vertebrates.

Alex de Mendoza1,2,3, Daniel Poppe1,2, Sam Buckberry1,2, Jahnvi Pflueger1,2, Caroline B Albertin4,5, Tasman Daish6, Stephanie Bertrand7, Elisa de la Calle-Mustienes8, José Luis Gómez-Skarmeta8, Joseph R Nery9, Joseph R Ecker9,10, Boris Baer11, Clifton W Ragsdale5, Frank Grützner6, Hector Escriva7, Byrappa Venkatesh12, Ozren Bogdanovic1,2,13,14, Ryan Lister15,16.   

Abstract

Mammalian brains feature exceptionally high levels of non-CpG DNA methylation alongside the canonical form of CpG methylation. Non-CpG methylation plays a critical regulatory role in cognitive function, which is mediated by the binding of MeCP2, the transcriptional regulator that when mutated causes Rett syndrome. However, it is unclear whether the non-CpG neural methylation system is restricted to mammalian species with complex cognitive abilities or has deeper evolutionary origins. To test this, we investigated brain DNA methylation across 12 distantly related animal lineages, revealing that non-CpG methylation is restricted to vertebrates. We discovered that in vertebrates, non-CpG methylation is enriched within a highly conserved set of developmental genes transcriptionally repressed in adult brains, indicating that it demarcates a deeply conserved regulatory program. We also found that the writer of non-CpG methylation, DNMT3A, and the reader, MeCP2, originated at the onset of vertebrates as a result of the ancestral vertebrate whole-genome duplication. Together, we demonstrate how this novel layer of epigenetic information assembled at the root of vertebrates and gained new regulatory roles independent of the ancestral form of the canonical CpG methylation. This suggests that the emergence of non-CpG methylation may have fostered the evolution of sophisticated cognitive abilities found in the vertebrate lineage.

Entities:  

Mesh:

Substances:

Year:  2021        PMID: 33462491      PMCID: PMC7116863          DOI: 10.1038/s41559-020-01371-2

Source DB:  PubMed          Journal:  Nat Ecol Evol        ISSN: 2397-334X            Impact factor:   15.460


Introduction

Cytosine DNA methylation (mC) is the most abundant base modification in animal genomes[1,2]. In vertebrates, most of the CpG dinucleotides (> 80%) in the genome are methylated[3]. In contrast, most invertebrates show sparse methylation, where most CpG methylation accumulates on transcribed gene bodies[4,5]. However, cytosine methylation can also occur in the CpH (where H is C, A, or T) dinucleotide context. In mammals, CpH methylation is mostly restricted to a few tissues and cell types[6], such as embryonic stem cells, neurons, and muscle. Embryonic stem cells display CpH methylation enriched on transcribed gene bodies, while neural tissues accumulate high levels of CpH methylation on transcriptionally silent genes[7-12]. CpH methylation is deposited de novo by the DNMT3A or DNMT3B methyltransferases, and unlike CpG methylation, is not maintained after genome replication by the DNA methyltransferase DNMT1[11]. Thus, post-mitotic neurons can accumulate CpH methylation since they do not undergo genome replication. In contrast to CpG methylation, CpH methylation is accumulated in the brain after birth, coinciding with synaptogenesis and synaptic pruning[7,13]. Furthermore, CpH methylation shows cell-type specific patterns in distinct neurons and glia[7,8,14], and is the most abundant form of DNA methylation in neurons. Most importantly, CpH methylation is bound by MeCP2, a highly expressed transcriptional regulator that can cause Rett syndrome, a strong autistic phenotype, when mutated[15,16]. Similarly, mutations in DNMT3A and abnormal cytosine methylation are also linked to neurological diseases[17]. Therefore, the role of DNA methylation and CpH methylation in neural maturation and cognitive functions is well established in mammals. To date, CpH methylation has been observed in the brain of human, mouse, and a songbird[7,18,19], thus the roles of this unique epigenomic feature could potentially be linked to complex brain functions. However, neither the evolutionary origin of CpH methylation nor the molecular basis that allowed the emergence of this new methylation context to appear has so far been unraveled. The morphology of the vertebrate brain is highly conserved, with a tripartite organization that is found from lampreys to mammals[20]. However, the homology between the vertebrate brain and that of distantly related invertebrates remains uncertain[21,22]. Notwithstanding this, all animal brains are mainly composed of neurons and glia, ectodermal-derived neural cell types that have deep evolutionary roots[23]. Thus, to understand the evolution of neural CpG and CpH methylation and its relationship to cognitive complexity, here we study the evolution of neural methylation within and outside the vertebrate lineage.

Results

Brain CpG methylation recapitulates differences between vertebrates and invertebrates

To investigate the evolution of neural DNA methylation, we gathered forebrain samples from representative species of major vertebrate lineages. We generated whole genome bisulfite sequencing (WGBS) data from adult forebrain regions for six vertebrate species (Fig. 1a), including opossum, platypus, chicken, zebrafish, elephant shark and arctic lamprey, and we reanalysed previously published datasets from another four[7,18,24]. For invertebrates, we generated new data for two lineages with highly complex brains and behaviours. As representatives of insects, we generated WGBS data for honeybee whole brains from a queen. As a cephalopod representative, we obtained material from the California two-spot octopus, for which we sampled and performed WGBS for both the supraesophageal and the subesophageal brains. As an out-group to vertebrates, we generated new data for neural tube material from the European amphioxus. The anterior neural tube is homologous to and shares many epigenomic similarities with the vertebrate brain[25,26]. Therefore, this dataset comprises the broadest assessment of adult neural DNA methylation to date, encompassing major animal phyla with highly complex brains.
Fig. 1

Brain methylomes reflect the vertebrate-invertebrate CG methylation boundary.

a, Global brain CpG methylation, genome size, and CpG genome content across animal species. Schematic representation of established animal phylogeny on the left-hand side. Newly generated WGBS datasets marked with a blue circle, WGBS samples from non-neural tissue marked with a red circle. The Ciona intestinalis sample corresponds to muscle tissue[73], and sea anemone Nematostella vectensis sample corresponds to a gastrula sample[64]. Genome size represents the genome assembly size. b, Proportion of CpG sites classified according to methylation levels (mC/C). Only sites with coverage ≥ 10x were considered. Silhouettes of human, platypus, octopus and honeybee obtained from phylopic.org.

To understand major differences in methylation across species, we first analysed CpG methylation, since it is the preferred context for animal DNA methyltransferases[27]. As previously reported, vertebrates show higher CpG methylation levels than invertebrates (Fig. 1a)[1,4]. The high global levels of CpG methylation in vertebrate genomes have been proposed to correlate with the size of the genome or its high level of repetitive content[4,28]. However, the octopus genome is larger and has comparable repeat content to some vertebrate species[29]. Still, the octopus genome shows typical invertebrate global methylation levels (~10% mCpG/CpG) and most CpGs in the genome are unmethylated (Fig. 1a,b), thus contradicting previous hypotheses regarding the evolutionary origin of hypermethylation in vertebrates. Additionally, hypermethylation (global mCpG/CpG > 70%) is not found in all vertebrate samples. The arctic lamprey and both bird species show lower levels of global methylation than other vertebrate species (Fig. 1a). These vertebrate lower global methylation levels are explained by an overwhelming majority of intermediately methylated CpG positions (Fig. 1b). Intermediate methylation observed in the arctic lamprey brain coincides with previous observations from sperm, muscle and heart methylation levels in another species of lamprey[30]. Interestingly, intermediate methylation levels correspond to very heterogeneous methylation at the read level, suggesting noisy inheritance of methylation after cell division (Extended Data 1). Given the phylogenetic position of lampreys, the intermediate methylation levels in this lineage might represent a middle step in the transition between the mosaic methylomes of invertebrates to the fully methylated genomes of jawed vertebrates[30]. However, avian intermediate methylomes represent a secondary reduction since all earlier splitting lineages show hypermethylation. The evolutionary causes of such reduction in methylation are unclear, since genome size does not explain methylation levels, even within vertebrates, given that elephant shark has higher methylation and a smaller genome than birds (Fig. 1a). Surprisingly, lampreys and other cyclostomes have genomes enriched in CpG dinucleotides, unlike any other vertebrate (Fig. 1a, Extended Data 2). In sum, the CpG methylation landscape in the brain reflects known differences between vertebrate and invertebrate genomes, yet challenges prior assumptions about the evolution of hypermethylation in vertebrates.
Extended Data Fig. 1

Locally disordered methylation characterises the lamprey epigenome

Proportion of Discordant Reads (PDR) values for a subset of CpGs (100,000) of each species (See Methods). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

Extended Data Fig. 2

CpG hypermutability is widespread in vertebrates except the lamprey

Percentage of Single Nucleotide Variants identified from the WGBS libraries from the total number of dinucleotides in the reference genome. In pale blue are those proportions that are equal or lower than the expected (total number of SNVs / total number of dinucleotides), and in dark blue are those that are overrepresented. Note that the mouse has very few SNVs as it is a laboratory isogenic line, however it still shows a slightly higher enrichment for SNVs in CpG dinucleotides, whereas birds have very high SNV rates on CpG dinucleotides despite having intermediate levels of CpG methylation.

Lamprey genomes are not affected by methyl-CpG hypermutability

Methylated cytosines are known to be prone to deaminate into thymines[31]. This tendency towards deamination makes CpG sites hotspots of mutability and genetic variation[32]. Furthermore, methylated CpG mutability is believed to be responsible for the global depletion of CpG sites in vertebrate genomes[1,4]. To explore these observations, we first gathered global CpG dinucleotide content in the sampled species (Figure 1a). Whereas all jawed vertebrates show strong depletions of CpG dinucleotides, lampreys and other cyclostomes do not show such depletions. In fact, the ratio of CpG dinucleotides in lamprey genomes is similar to that of species that lack cytosine DNA methylation (Figure 1a). To further investigate this anomaly, we used WGBS data to identify Single Nucleotide Variants (SNV) in all sampled species (Extended Data 2). All jawed vertebrates showed a higher frequency of variants at CpG dinucleotides with respect to other dinucleotides. However, the arctic lamprey did not show such an enrichment. The intermediate methylation levels found in lamprey genomes could explain why CpG dinucleotides are not disproportionately affected by mutagenesis and depleted as seen in other vertebrate lineages. However, avian genomes also have intermediate methylation levels and still show archetypal global CpG depletion and disproportionate variants on CpG sites. Therefore, how lampreys avoid or compensate for methylation-derived mutagenesis remains unclear, yet could be linked to somatic DNA elimination in this lineage[33].

Brain CpH methylation is restricted to vertebrates

To avoid methylation mutability confounding our measurements of CpH methylation, we first discarded all CpH positions that showed evidence of being CpG dinucleotide variants in the sequenced WGBS reads. We then measured global genomic methylation levels at CpA, CpT and CpC dinucleotides for each species and compared these to the bisulfite non-conversion rates in the unmethylated lambda DNA spike in control for each WGBS experiment (Fig. 2a). All vertebrates showed CpA and CpT global methylation above non-conversion levels, whereas invertebrates did not. As previously reported in mammals, CpA is the preferred context for non-CpG methylation in all vertebrates, while CpC is rarely methylated[7,34]. We next interrogated whether there is a wider sequence context in which CpH methylation gets preferentially deposited, as it occurs in mammals[7,35]. We gathered the neighbouring positions from the 10,000 most highly methylated CpH sites in each species, finding that the trinucleotide CAC and additional bases conform to an overrepresented motif conserved across vertebrates (Fig. 2b). The flanking bases surrounding the CpH sites coincide with the flanking sequence preference reported for DNMT3A[36]. This CpH flanking motif was not detectable in non-neural samples for elephant shark, zebrafish or Xenopus, confirming that mCpH is not a bisulfite sequencing bias and mCpH neural specificity extends beyond mammals (Extended Data 3). Similarly, the CpH flanking motif was not detectable in invertebrates (Fig. 2b). Furthermore, methylation levels on the highest methylated CpH sites were lower in amphioxus, honeybee, and octopus (mC/C < 20%) compared to any vertebrate brain (Fig. 2c). Thus, invertebrate CpH methylation is likely to be a rare off-target consequence of DNMT activity. In contrast, the robust mammalian neural CpH methylation levels are conserved across the vertebrate lineage.
Fig. 2

Neural CpH methylation is restricted to vertebrate brains.

a, Global methylation levels in brain samples classified per dinucleotide context. Dark blue represents the global methylation level on the nuclear chromosomes (excluding mitochondrial genome) and pale blue represents the bisulfite reaction non-conversion rate for each library, calculated as the methylation levels on an unmethylated lambda phage DNA spike-in. b, Sequence motifs found surrounding the most highly methylated CpH positions in each brain sample. Only CpH positions with coverage ≥ 10x were considered. c, Methylation level (mC/C) for the top mCpH positions depicted in panel b. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR). Silhouettes of human, platypus, octopus and honeybee obtained from phylopic.org.

Extended Data Fig. 3

CpH methylation is specific to brain tissues across vertebrates.

Sequence motifs found surrounding the highest methylated CpH positions in each sample. CpH positions were required to have a coverage ≥ 10x. hpf = embryo hours post fertilization. Sox10+ cells correspond to developmental neural crest cells in zebrafish. (b) Gene Ontology enrichments for genes showing the highest and lowest gene body methylation levels in the CpA context, as defined by belonging to the top and bottom deciles in each species and tissue. (c) Gene Ontology enrichments for genes showing the highest and lowest methylated levels in the CpG context.

CpH methylation is functionally decoupled from CpG methylation across vertebrates

In mammalian brains, CpH methylation deposition does not fully recapitulate CpG methylation[6,7]. While CpG methylation is found on transcribed and silent gene bodies alike, CpH methylation is depleted on transcriptionally active gene bodies in neurons. To test whether brain CpH methylation anti-correlates with transcription, we classified genes in deciles of expression for each species and assessed the corresponding gene body CpG and CpA methylation levels (Extended Data 4). A clear anti-correlation pattern between transcription and CpA methylation was observed for mammals, birds and the frog (Spearman’s r). However, this anti-correlation was not evident in opossum, zebrafish, elephant shark and lamprey. This lack of anti-correlation in these species might respond to different cell-type compositions biasing the measurements. The proportion of neurons versus glia depends on the exact brain region and varies in a species-specific manner[37], and species with smaller brains might display higher cell-type heterogeneity in similarly sized samples, as for instance birds have higher neuron densities than mammals[38]. In fact, all four species not showing CpA methylation anti-correlation with transcription show lower levels of CpH methylation on the highest methylated CpH sites (Fig. 2C), which suggests a lower ratio of neurons to glia. Another possible explanation is that the anti-correlation with transcription evolved in tetrapods, and was secondarily lost in opossum. In contrast, CpG methylation also shows some degree of anti-correlation with expression levels in most vertebrate brain samples, whereas invertebrates show the typical positive correlation between CpG methylation and transcription.
Extended Data Fig. 4

Anticorrelation between CpG and CpA methylation and transcription is restricted to a subset of vertebrate samples

Distribution of gene body methylation levels on genes separated by expression level on brain tissue. “No expression” category includes all genes with TPM < 1, whereas the rest of genes were classified in 10 deciles of expression (lower expression left, higher expression right). Positive correlation between expression and CpG methylation is restricted to invertebrate brain samples. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

Despite the existence of differences in cell type composition across the brains of different species, we reasoned that common methylation patterns should be observable across species if similar pathways are regulated in a similar manner across most neural cells. In fact, distinct brain regions show similar CpH methylation patterns in mammals[39]. Consistently, transcriptional and enhancer landscapes at the organ and tissue level are conserved across vertebrates[40,41]. To test if methylation patterns are conserved, we classified all genes in each species into 10 deciles based on the weighted average of CpG and CpA methylation along the gene body (Extended Data 5). For each hypermethylated and hypomethylated gene subset (top and bottom decile), we obtained Gene Ontology (GO) enrichments (Fig. 3, Supplementary Table 1). Hypomethylated genes in the CpG context largely represent developmental genes, predominantly transcription factors. Such genes are found in methylation canyons or valleys, where lack of methylation in the gene body and surrounding regions is mediated by histone modifications such as H3K27me3 and H3K4me3[42,43]. These same GOs appear enriched in non-brain samples, suggesting that CpG methylation valleys are shared across tissues (Extended Data 3). In contrast, highly methylated genes in the CpG context did not show deeply conserved GO patterns, and the few GOs that appear in more than one species have housekeeping functions. On the contrary, hypermethylated genes in the CpA context belong to developmental functions across all vertebrates (Fig. 3), and many are related to signaling pathways, cell adhesion, or cell differentiation. On the other hand, genes with the lowest levels of CpA methylation have housekeeping functions. Unlike with CpG methylation, non-brain samples do not recapitulate any of these CpA enrichments (Extended Data 3). However, CpA and CpG methylation patterns are not completely unlinked, since there is a high degree of overlap between genes found in both the lowly methylated categories (Extended Data 5), which suggests that methylation protection on hypomethylated genes occurs through restricting access of DNA methyltransferases[44,45]. However, the developmental genes that are CpG hypomethylated and CpA hypermethylated show very little overlap, which is indicative of differential removal or deposition of methylated cytosines occurring in these regions. Invertebrates do not exhibit conservation of these patterns. Surprisingly, birds show higher conservation of GOs for genes methylated in the CpA context than for the CpG context (Fig. 3), suggesting that CpG methylation state is not maintained yet CpA methylation is deposited in a conserved set of genes.
Extended Data Fig. 5

Gene classification by CpA and CpG methylation levels

(a) Distribution of gene body methylation levels on genes classified in deciles from lower to higher methylation levels. Few genes are CpG methylated in the honeybee (only 3 top deciles). The dynamic range of CpG gene body methylation of lampreys and birds differs from the rest of vertebrates, in which a vast majority of genes are highly methylated (>50%). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR). (b) Overlap between top and bottom decile genes classified by CpA and CpG gene body methylation levels. All deciles have the same size, thus overlap % captures the relative differences between categories in a comparable manner. (c) Level of conservation of gene sets classified by CpA and CpG gene body methylation levels. If a given orthologue is present in one subset of genes in only one species it is classified as a Singleton (1), whereas if it is found in the nine vertebrate species analyzed it is classified as 9. Each orthologue is counted once per species (e.g. if lamprey has 2 species-specific paralogues of one gene, it is only counted as 1).

Fig. 3

Conserved non-overlapping programs are associated with CpH and CpG methylation.

a, Gene Ontology enrichments for genes showing the highest and lowest gene body methylation levels in the CpA context, as defined by belonging to the top and bottom deciles in each species. b, Gene Ontology enrichments for genes showing the highest and lowest methylated levels in the CpG context. Q-values were obtained using the g:SCS algorithm implemented in the gProfiler2 R package.

To corroborate the functional patterns gathered by GO analysis, we measured the CpG and CpA methylation levels of genes classified by gene family or function. Methylation levels on transcription factors, signaling molecules, synaptic genes and ribosomal proteins (Supplementary Fig. 1), showed overall consistent patterns with the GO analysis approach. Among the orthologues found in the highly methylated CpA category across species (≥7 species, Supplementary Table 2) there are signaling molecules (WNT16, BMP7) and transcription factors (FOXP2, EOMES/TBR2, GLI3, PROX1, SOX6, SALL1) that have been previously shown to be involved in neural progenitor cell maintenance and differentiation. Furthermore, these sets of conserved CpA methylated genes show declining gene expression in adult stages in the brains of mammals and birds compared to earlier developmental stages (Extended Data 6). Therefore, CpA methylation accumulates on a conserved subset of developmental genes across the vertebrate lineage, likely marking and contributing to silencing genes no longer required in the fully developed adult brain.
Extended Data Fig. 6

Expression level of highly conserved CpA methylated genes

Standardized expression level for genes conserved in at least 7 vertebrate species as belonging to the top decile of CpA methylated genes. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

DNMT3A is the ancestral writer of neural CpH methylation in vertebrates

Given that the establishment of CpH methylation coincided with the origin of vertebrates, a new “writer” able to deposit CpH methylation should have also evolved concomitantly. In mammals DNMT3A is responsible for neural CpH methylation[10,13], whereas CpH methylation in stem cells is mediated by DNMT3B[46]. To gain an evolutionary perspective on the distribution and origin of these genes, we performed a phylogenetic analysis of DNMT3 enzymes in animals (Fig. 4a, Extended Data 7). While invertebrate genomes typically contain a single DNMT3 gene, DNMT3A and DNMT3B evolved at the root of vertebrates. DNMT3A and DNMT3B are located in syntenic regions (Supplementary Fig. 2), confirming that they represent ohnologues: the paralogues product of the ancestral two rounds of whole genome duplication (WGD) in vertebrates, as previously reported[47,48]. More unexpectedly, we found that DNMT3L, a degenerate paralogue with non-catalytic methyltransferase domain[49], is present in two lamprey genomes and non-avian reptiles, suggesting it might be the third ohnologue derived from the WGD (Fig. 4a, Extended Data 7). However, not all DNMT3 ohnologues are widely retained across vertebrates; lampreys and amphibians do not encode a DNMT3B copy (Fig. 4c). Given that both species have neural CpH methylation, only DNMT3A orthologues can have a role as writers of CpH methylation in these species. This, in turn, would support an ancestral role of DNMT3A in neural CpH methylation. Consistently, zebrafish DNMT3A orthologues have been shown to be expressed in brain tissues[50], and we detect DNMT3A transcripts in all vertebrate brain samples (Extended Data 7). Furthermore, the differential deposition patterns of CpH methylation in neural and stem cells seems to have been mediated by changes in the PWWP domain in DNMT3A and DNMT3B ohnologues after gene duplication (Supplementary Fig. 3). In summary, phylogeny and distribution of DNMT3 paralogues suggests that DNMT3A was the ancestral “writer” of neural CpH methylation in vertebrates.
Fig. 4

Vertebrate origins of MECP2 and DNMT3A.

a, Maximum likelihood phylogenetic tree of DNMT3 genes in animals. Nodal supports represent 100 bootstrap nonparametric replications. Schematic protein domain configurations shown for each clade. PWWP, Pro-Trp-Trp-Pro motif domain (PF00855). AAD ATRX, DNMT3, DNMT3L domain. MT, cytosine Methyltransferase domain (PF00145). CH, Calponin Homology domain (PF00307). Asterisk highlights arctic lamprey sequences. Broken domains indicate that the domain has large deletions in the given clade. b, Maximum likelihood phylogenetic tree of the Methyl-CpG Binding Domain family in animals. Nodal supports represent 100 bootstrap nonparametric replications. On the right, protein domain structure of each clade, as defined by Pfam domains. MBD, Methyl Binding Domain (PF01429). HhH-GPD, Thymine glycosylase (PF00730). MBDa, p55-binding region of MBD2/3 (PF16564). MBD_C, MBD2/3 C-terminal domain (PF14048). zf-CXXC, zinc finger (PF02008). CTD, MECP2 C-Terminal Domain. TRD, MECP2 Transcriptional Repression Domain. Asterisks highlight vertebrate sequences, percentages are shown for amino acid MBD identity between lamprey and human orthologues. c, Distribution of MECP2/MBD4 and DNMT3 genes across animal lineages. Absence of a dot indicates gene absence. Numbers indicate those species/lineages that have multiple copies of a given gene. Dnmt3c in rodents and dnmt3ba/bb.1/bb.2 are lineage-specific duplications of DNMT3B that have diverged in their function or domain architecture. “x3” indicates lineage-specific duplications. On the right, the phylogenetic relationships among animal lineages. d, Stepwise evolution of the MeCP2 and MBD4 protein domains in vertebrates, amphioxus, and non-chordates. NID stands for the N-CoR/SMRT interacting amino acids. e, Genome browser snapshot of amphioxus MBD4 locus. The longer isoform with the capacity to repair DNA has higher expression in embryonic samples, see further detail in Extended Data 10.

Extended Data Fig. 7

Phylogeny and expression of DNMT3 enzymes

(a) Maximum likelihood phylogenetic tree of DNMT3 orthologues across animals, representing the full version of that presented in Figure 4a. Nodal supports represent 100 bootstrap nonparametric replications. Schematic protein domain configurations shown for each clade. PWWP, Pro-Trp-Trp-Pro motif domain (PF00855). AAD ATRX, DNMT3, DNMT3L domain. MT, cytosine Methyltransferase domain (PF00145). CH, Calponin Homology domain (PF00307). Asterisk highlights arctic lamprey sequences. Broken domains indicate that the domain has large deletions in the given clade. (b) Table with the steady-state transcriptional level of DNMT3A in vertebrate samples, and DNMT3 in invertebrate samples. Compared to previous analysis of the DNMT3 family, here we describe for the first time the presence of DNMT3L in non-mammalian genomes. These include non-avian reptiles (turtles, crocodiles and squamates) and two lamprey genomes. This indicates that DNMT3L was one of the ancestral onhologues product of the vertebrate ancestral WGD. Interestingly, both lampreys and tetrapod sequences show a truncated cytosine methyltransferase domain, which might indicate that the DNMT3L has been conserved despite its lack of catalytic activity.

MeCP2 evolved as CpH reader from an ancestral DNA repair protein

In mammals, the silencing capacity of CpH methylation has been attributed to the methylation “reader” MeCP2[34]. MeCP2 is a Methyl-CpG Binding Domain (MBD) containing protein, capable of binding both methylated CpG and CpA dinucleotides[34,51]. Furthermore, MeCP2 has been shown to bind methylated CAC in vitro and in vivo, the most common context of CA methylation in the brain[51,52]. To better understand if CpH methylation co-evolved with MeCP2, we performed a phylogenetic analysis of MBD proteins in animals (Fig. 4b, Extended Data 8). We found that MeCP2 is deeply conserved in all vertebrates, including lampreys and chondrichthyans. MeCP2 branches as a sister group to the MBD4 family, as reported previously[53]. MBD4 is conserved across vertebrates, however, it is associated with DNA repair and not gene regulation54 implying that MeCP2 evolved as a duplication of an ancestral MBD4-like gene.
Extended Data Fig. 8

Phylogeny and conservation of MBD4/MECP2

(a) Maximum likelihood phylogenetic tree of the Methyl-CpG Binding Domain family in animals, representing the full-version of Figure 4b. Nodal supports represent 100 bootstrap nonparametric replications. On the right, protein domain structure of each clade, as defined by Pfam domains. MBD, Methyl Binding Domain (PF01429). HhH-GPD, Thymine glycosylase (PF00730). MBDa, p55-binding region of MBD2/3 (PF16564). MBD_C, MBD2/3 C-terminal domain (PF14048). zf-CXXC, zinc finger (PF02008). CTD, MECP2 C-Terminal Domain. TRD, MECP2 Transcriptional Repression Domain. (b) Domain presence in MBD4/MECP2 orthologues in several invertebrate genomes. Lack of the Thymine glycosylase domain is likely due to incomplete gene annotation or genome assembly gaps.

Besides the conserved MBD domain, MeCP2 has vastly diverged from the ancestral invertebrate MBD4-like family. Whereas MBD4 contains a C-terminal glycosylase domain, involved in mismatch repair of CpG dinucleotides, MeCP2 harbors a transcriptional repression domain (TRD) and a C-terminal domain. The TRD domain is known to interact with multiple histone modifying complexes associated with transcriptional silencing, such as Sin3, CoREST and N-CoR[55-57]. Most surprisingly, we found that many parts of the TRD are conserved beyond vertebrates, being found in amphioxus MBD4/MECP2 orthologue (Fig. 4d, Extended Data 9), which represents an intermediate step between MBD4 and MeCP2. Moreover, we found that amphioxus transcribes a longer MBD4/MECP2 isoform that includes the glycosylase domain involved in DNA repair and a shorter isoform lacking this domain. When assessing the isoform usage across developmental stages and tissues in amphioxus, we found that the longer MBD4/MECP2 isoform is preferentially expressed in developmental samples, whereas the short version is predominant in adult tissues (Fig. 4e, Extended Data 10). This suggests that MBD4/MECP2 in amphioxus has DNA repair functions predominantly during development, and gene regulatory activities in adult tissues, and thus a dual function achieved using alternative isoforms. In vertebrates, MeCP2 could have evolved and specialised as a consequence of gene duplication linked to WGD, in which one of the MBD4-like duplicated loci lost the glycosylase domain and gained a new C-terminal domain restricting it to gene regulation, whereas the other copy lost the TRD domain and maintained the glycosylase domain, reverting to the pre-chordate MBD4 domain architecture specialised in DNA repair.
Extended Data Fig. 9

Conservation of the MeCP2 protein domains

(a) Amino acid multi-sequence alignment (MAFFT E-INS-i mode) of the Methyl-CpG Binding domain (MBD) from MeCP2, MBD4 and invertebrate MECP2/MBD4 sequences. The black square highlights the MBD domain as defined by Pfam. The red triangles indicate positions mutated in the human MECP2 gene that cause Rett Syndrome phenotypes[52]. (b) Amino acid multi-sequence alignment of the Transcriptional Repression Domain (TRD) from MeCP2, MBD4 and the homologous region (C-terminal of the MBD) of invertebrate MBD4/MECP2 proteins. NID stands for the N-CoR/SMRT interacting amino acids. Additional black squares highlight the AT-hook domains. Alignment visualised using Geneious software.

Extended Data Fig. 10

MBD4/MECP2 isoform expression in the european amphioxus

Diagram representing the sequences used to uniquely map RNA-seq reads to each isoform across different tissues and developmental stages. Quantification of each isoform in each sample, normalised by gene length (TPM as per Kallisto quantification).

These changes in protein structure and function must have imposed new functional constraints on MeCP2. Since MeCP2 protein is expressed at histone levels and proposed to partially substitute H1 in neurons[58], high levels of conservation in MeCP2 would be expected. Consistently, we found that the MBD had 70% identity between lamprey and human MeCP2 orthologues, but only ~40% identity between MBD4 orthologues (Fig. 4b). In contrast, the MBD domain in amphioxus MBD4/MECP2 is quite divergent from both MBD4 and MeCP2 (Extended Data 9), suggesting that it does not have the capacity to bind CpH methylation like MeCP2, which is consistent with the lack of CpH methylation in amphioxus neural tube (Fig. 2). Also influencing DNA binding specificity, MeCP2 harbours two AT-hook motifs[51,59], which are conserved across vertebrates and amphioxus MBD4/MECP2 (Extended Data 9). Thus, the binding specificities of MeCP2 evolved in a stepwise manner, first gaining the AT-Hooks in the MBD4-like chordate ancestor, and then acquiring the vertebrate MBD CpH methylation binding capacity that became fixed after the subfunctionalization of MeCP2.

Discussion

Here we show how a functionally conserved new layer of epigenomic regulation was assembled at the origin of the vertebrate lineage (Fig. 5). Neural CpH methylation evolved from gene machinery ancestrally involved in CpG methylation. Despite CpH methylation having non-overlapping distribution patterns with CpG methylation, CpH methylation is not fully independent of CpG methylation, as it is deposited by DNMT3 enzymes able to methylate both sequence contexts. Furthermore, CpH methylation is read by MeCP2, which also binds CpG methylation. This scenario contrasts with that of plants, in which the different contexts of cytosine methylation are fully uncoupled. Specialised DNMTs are responsible for CpG and CpH methylation deposition and maintenance, and CpH methylation is largely restricted to transposable elements[60]. Nevertheless, there is extensive cross-talk between CpH and CpG methylation in plants, since CpG gene body methylation is lost in species that have lost CMT3, a DNMT that methylates the CHG context[61]. Instead, such a dual readout of CpG and CpH methylation seems to be absent from invertebrate genomes, as CpH methylation is very scarce. Here we show how brain DNA methylation in amphioxus, honeybee, and octopus are depleted of CpH methylation, as the low levels of CpH methylation cannot be distinguished from non-conversion rates. Furthermore, invertebrates lack a functionally consistent pattern of deposition of CpH on gene bodies as observed in vertebrates. Therefore, it is likely that previous reports of CpH methylation in invertebrate genomes are due to off-target activity of DNMT3[62,63], suggestive of CpH methylation in invertebrates not being fully constituted into an autonomous epigenomic layer.
Fig. 5

The assembly of neural-CpH methylation.

Cladogram representing the evolutionary scenario of neural CpH methylation acquisition in vertebrates. Silhouettes of octopus and honeybee obtained from phylopic.org.

We hypothesize that the evolution of MeCP2 was instrumental in the fixation of CpH methylation as a regulatory mark in the brain. CpH methylation could have originally accumulated in neurons simply as a by-product of the lack of DNA replication. However, the capacity of MeCP2 to specifically read CpH methylation could have enabled and reinforced the silencing roles of CpH methylation as a hub for chromatin silencing in a pathway partially independent of CpG methylation. In fact, mice that preserve neural CpG methylation patterns but lack CpH methylation recapitulate the transcriptional deregulation caused by MeCP2 loss[39], suggesting that CpH methylation is what drives the specific roles of MeCP2 in the brain. Furthermore, mice encoding a modified MeCP2 version lacking the ability to bind to methylated CpA (while still preserving the capacity to bind to methylated CpGs) show Rett syndrome-like phenotypes[52]. Our finding that CpH methylation and MeCP2 evolved concomitantly argues in favour of a key role of this epigenomic layer in neural functions across the whole vertebrate lineage. Despite the fact that we do not know at which developmental time point CpH methylation is deposited in most vertebrate lineages, or the MeCP2 binding patterns in most species, we speculate that the CpH roles in neural maturation and memory formation described in mammals could extend to all vertebrates. Recent evidence suggests that the ancestral whole genome duplication may not have had an impact on the evolution of CpG hypermethylation in vertebrates[64], however, it allowed the emergence of neural CpH methylation. DNMT3 paralogues that are specialised in different functions emerged after duplication, as exemplified by DNMT3A methylating CAC trinucleotides in neural tissues whereas DNMT3B methylates CAG trinucleotides in stem cells[46]. In the case of MeCP2 and MBD4, the duplication allowed the specialisation of both copies to perform unique functions, which was only partially attained in amphioxus through differential usage of isoforms, as previously observed for a vertebrate neural-specific splicing factor[65]. Therefore, our work unveils the stepwise assembly of a critical regulatory novelty in vertebrate brains. This novelty likely had an impact on the complexity of behaviours and cognitive processes found across the vertebrate lineage.

Methods

Brain DNA collection

Arctic lamprey (Lethenteron camtschaticum) and elephant shark (Callorhinchus milii) forebrains were collected from frozen samples, belonging to adult animals collected in Hokkaido, Japan and Queenscliff, Victoria, Australia respectively. Chicken (Gallus gallus) and zebrafish (Danio rerio) forebrains were collected from adult individuals reared in the CABD, Spain, approved by the Ethical Committees from the University Pablo de Olavide, CSIC and the Andalucían government. The platypus (Ornithorhynchus anatinus) frontal lobe cortex and the gray short-tailed opossum (Monodelphis domestica) brain samples were obtained from adult male frozen samples according to the University of Adelaide biosafety and ethics committee regulations (Institutional Biosafety Committee, Dealing ID 12713, permits ID1111998.2, NPWS A193 and ID1814535.1). Mediterranean amphioxus (Branchiostoma lanceolatum) neural tubes were dissected from 6 adults collected in Argeles-sur-Mer, France with special permission provided by the Prefect of Region Provence Alpes Côte d’Azur. For the honeybee (Apis mellifera), a whole brain from an adult egg laying queen was collected at the University of Western Australia. California two-spot octopus (Octopus bimaculoides) samples were obtained from a single adult female octopus in compliance with the EU Directive 2010/63/EU guidelines on cephalopod use and the University of Chicago Animal Care and Use Committee. Both the supraesophageal and subesophageal brains from the octopus were dissected as previously described[29]. To purify genomic DNA, DNeasy Blood and tissue Kit (Qiagen) and phenol-chloroform DNA extraction methods were used.

Whole Genome Bisulfite Sequencing

We followed the MethylC-seq protocol for library preparation[66]. In brief, for each species, 500 ng to 1 μg of brain genomic DNA was mixed with 0.1% to 0.5% (w/w) of unmethylated lambda phage genomic DNA. The mixed DNA was sheared into 200 bp fragments using a Covaris Sonicator S220. Then methylated Illumina adaptors (Nextflex Bisulfite-seq adaptors, BIOO scientific) were ligated to sheared DNA, and bisulfite conversion was performed using EZ DNA Methylation-Gold kit (Zymo Research) following the manufacturer’s instructions. After bisulfite treatment, DNA was purified and amplified using universal Illumina primers and KAPA HiFi HotStart Uracil+ DNA polymerase (Kapa Biosystems). The honeybee library was obtained using the same protocol with minor modifications, MethylCode Bisulfite Conversion Kit (Thermo Fisher) was used for bisulfite conversion and the PfuTurbo Cx Hotstart DNA Polymerase (Agilent) was used for library amplification. All libraries but the honeybee and amphioxus samples were sequenced in a Illumina HiSeq 1500 instrument in single-end mode, with reads spanning 100 bp. The honeybee samples were sequenced with an Illumina Genome Analyzer IIx in single-end mode, with reads spanning 84 bp, and amphioxus were sequenced in a NovaSeq 6000 in a paired-end 28-87 bp format.

Methylation analysis

The newly generated WGBS datasets were complemented with available data from previous studies[7,18,24,67], corresponding to the NCBI Sequence Read Archive (SRA) accessions SRX314948 for 6 week old mouse frontal cortex, SRX306585 for 25 year old human frontal cortex, SRX1002603 for zebrafish (Danio rerio) adult brain, SRX1162705 for Xenopus tropicalis adult brain, SRX2645741 for elephant shark liver and SRX1064224 for great tit (Parus major) adult whole brain. All WGBS reads were trimmed using fastp[68] with default parameters and mapped to the reference genomes using BS-Seeker2[69] specifying Bowtie 2[70] as the aligner in end-to-end mode. Duplicated reads were discarded using Sambamba[71], unconverted reads were filtered out using the XS:i:1 sam flag from BS-Seeker2, and methylation calls were obtained using CGmapTools[72]. Previously processed WGBS datasets for Ciona intestinalis and the sea anemone Nematostella vectensis were obtained from Gene Expression Omnibus (GEO) GSE19824[73] and GSE124016[64]. Since methylated CpG sites are prone to deamination, after the deamination of a symmetric CpG site it becomes a non-symmetric CpA site. Therefore, some CpA positions in the reference genomes are likely to represent genetic variants in which individuals might have CpG dinucleotides. Distinguishing those sites is crucial to accurately measure CpH methylation, to avoid confounding variant hypermethylated CpG sites for CpA positions. Therefore, the ATCGmap file resulting from CGmapTools was parsed with AWK to identify CpH sites with ≥20% of reads supporting a guanine in the downstream position of a methylated cytosine. Those positions were discarded from the final CGmap file. Single Nucleotide Variants were obtained using CGmapTools ‘snv’ function (-m bayes --bayes-dynamicP parameters) from the WGBS ATCGmap file. For each SNV position, the upstream and downstream dinucleotides based on the reference genome were obtained using BEDTools[74]. To estimate methylation heterogeneity in each sample, we followed the Proportion of Discordant Reads (PDR) measure previously proposed for heterogeneous tumour samples[75]. We first selected CpG positions for which coverage was ≥10, and filtered for those that had at least 3 CpG ± 40 bp around them. Then we selected 100,000 of these CpGs randomly in every genome (sample function in R) and obtained the per read methylation levels on the reads that overlapped these positions. We only retained CpGs that had at least 5 reads covering ≥4 CpGs. Fully methylated and unmethylated reads were counted as concordant, whereas intermediate methylation was counted as discordant. CGmap files were imported into R using the bsseq package[76], and all methylation calculations were performed using in-built functions getCoverage and getMeth. CpH methylation was initially calculated for each dinucleotide context to obtain the global levels (mC/C), however, gene body level calculations were restricted to CpA dinucleotides since it is the predominant context. For each species, CpH positions were sorted by methylation level (mC/C), and the top 10,000 were selected to have a comparable number across species. The neighbouring regions were obtained using BEDTools in a strand-specific manner, and collapsed into sequence motifs with ggseqlogo in R[77]. Protein-coding genes were classified into 10 deciles according to CpA and CpG methylation levels along the gene body. Gene body methylation level measurements were obtained from the weighted average of all cytosine calls in a given region divided by the total amount of coverage in the C positions. Genes without enough covered CpG positions (≥30) and mean coverage (≥4x) were discarded.

Gene Ontology enrichments

Gene Ontology (GO) enrichments were obtained using g:Profiler[78] gProfileR R package, using ensembl gene ids. For the arctic lamprey and the elephant shark, which were not present on the g:Profiler database, OrthoFinder2[79] was used to obtain orthology relationships with human genes. Then, gene ids from both species and each decile were converted to human gene ids, which were used to obtain GO enrichments using g:Profiler with ‘hsapiens’, limiting the background to all the human genes detected in each orthology search. Significance was corrected with the g:Profiler inbuilt g:SCS algorithm. The final set of GOs shown in Fig. 3 represent GOs that are enriched in the maximum number of species and are not non-redundant. The full list of GOs and KEGG pathways for each species and comparison are found in Supplementary Table 1.

RNA-seq analysis

Brain RNA-seq reads from previous publications[7,18,25,29,40,65,80,81] were downloaded from SRA. SRX314972 was used for human adult frontal cortex, SRX314992 was used for mouse adult prefrontal cortex, SRX081894 for opossum brain, SRX081882 for the platypus brain, SRX081869 for the chicken brain, SRX904626 for the great tit brain, SRX191164 for Xenopus tropicalis brain, SRX4184230 for zebrafish adult forebrain, SRX154851 for elephant shark brain, SRX2267405 for the Arctic lamprey brain, SRX1045432 for the octopus supraesophageal brain, and PRJNA416866 for all amphioxus tissues. For the honeybee brain, we extracted matched DNA and RNA samples from workers and queens, using a Trizol extraction protocol and prepared Illumina stranded TruSeq RNA-seq libraries, which were sequenced on an Illumina Genome Analyzer IIx. Kallisto[82] was then used to quantify gene expression, based on the canonical isoform for each gene as per ENSEMBL annotations. For genomes without ENSEMBL annotation, we used the isoform that encoded the longest open reading frame. Developmental time-series from human, mouse, opossum and chicken were downloaded from https://apps.kaessmannlab.org/evodevoapp/ [83], gene expression was standardized for each gene dividing the RPKM value against the maximum level of expression of that given gene. To determine isoform usage in amphioxus MBD4 locus, we gathered the non-overlapping regions between the short and the long MBD4 isoforms, added 100 padding N bases (to allow paired-end sequencing mapping) and made a transcriptome index using Kallisto[82], which was also used to quantify isoform abundance without using reads from the common sequence between isoforms.

Gene search and phylogeny

MBD family genes were searched using HMMER3[84] with the PFAM PF01429 model against the proteomes of a representative subset of animal genomes (Supplementary Table 3). Hits were extracted and aligned with MAFFT[85] in LINS-I mode, and an initial pruning of the alignment was performed to avoid members of the SETDB1/2 and BAZ2A/B families, since the MBD domain in these family is derived and accumulates an excess of mutations. The resulting alignment was then trimmed manually, to maximize the number of positions on the MBD domain and avoiding spurious aligned regions. The resulting alignment was then used in IQ-TREE[86] to obtain maximum likelihood phylogenetic reconstruction, letting the software to choose the best fitting substitution model (-m TEST) and obtaining 100 non-parametric bootstrap replications to compute nodal supports. Protein domain architectures for each sequence were obtained using HMMER3 with the PFAM A database using the “hmmscan” program. MECP2 domains not defined in PFAM were obtained from previous publications describing the TRD and CTD domains[15,55]. TRD and CTD alignments spanning all vertebrate major lineages were used to generate HMM models with HMMER3 hmmbuild program, and were searched using hmmsearch against the selected animal proteomes. For obtaining DNMT3 sequences, we used BLASTP search using human DNMT3A as query against the proteomes of all species, selecting the best hits for each species. For species where we could not find a specific ohnologue, we searched in NCBI against the whole clade using BLASTP (e.g. DNMT3B in amphibians) to certify that absence is not due to genome assembly incompleteness. Similarly, DNMT3L was searched using BLASTP in NCBI against all lineages except mammals, to detect ohnologues in all reptilian lineages (turtles, crocodilians and squamates) except birds. The resulting sequences were aligned with MAFFT in EINS-I mode and trimmed using TrimAL (-automated1). The phylogenetic tree was computed as for MBDs. PWWP alignments were obtained from a subset of full length DNMT3 sequences, using one representative species for each lineage. The sequences were aligned using MAFFT LINS-I mode and the sequence logos were obtained using ggseqlogo in R. The alignments were visualised using Geneious software.

Locally disordered methylation characterises the lamprey epigenome

Proportion of Discordant Reads (PDR) values for a subset of CpGs (100,000) of each species (See Methods). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

CpG hypermutability is widespread in vertebrates except the lamprey

Percentage of Single Nucleotide Variants identified from the WGBS libraries from the total number of dinucleotides in the reference genome. In pale blue are those proportions that are equal or lower than the expected (total number of SNVs / total number of dinucleotides), and in dark blue are those that are overrepresented. Note that the mouse has very few SNVs as it is a laboratory isogenic line, however it still shows a slightly higher enrichment for SNVs in CpG dinucleotides, whereas birds have very high SNV rates on CpG dinucleotides despite having intermediate levels of CpG methylation.

CpH methylation is specific to brain tissues across vertebrates.

Sequence motifs found surrounding the highest methylated CpH positions in each sample. CpH positions were required to have a coverage ≥ 10x. hpf = embryo hours post fertilization. Sox10+ cells correspond to developmental neural crest cells in zebrafish. (b) Gene Ontology enrichments for genes showing the highest and lowest gene body methylation levels in the CpA context, as defined by belonging to the top and bottom deciles in each species and tissue. (c) Gene Ontology enrichments for genes showing the highest and lowest methylated levels in the CpG context.

Anticorrelation between CpG and CpA methylation and transcription is restricted to a subset of vertebrate samples

Distribution of gene body methylation levels on genes separated by expression level on brain tissue. “No expression” category includes all genes with TPM < 1, whereas the rest of genes were classified in 10 deciles of expression (lower expression left, higher expression right). Positive correlation between expression and CpG methylation is restricted to invertebrate brain samples. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

Gene classification by CpA and CpG methylation levels

(a) Distribution of gene body methylation levels on genes classified in deciles from lower to higher methylation levels. Few genes are CpG methylated in the honeybee (only 3 top deciles). The dynamic range of CpG gene body methylation of lampreys and birds differs from the rest of vertebrates, in which a vast majority of genes are highly methylated (>50%). Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR). (b) Overlap between top and bottom decile genes classified by CpA and CpG gene body methylation levels. All deciles have the same size, thus overlap % captures the relative differences between categories in a comparable manner. (c) Level of conservation of gene sets classified by CpA and CpG gene body methylation levels. If a given orthologue is present in one subset of genes in only one species it is classified as a Singleton (1), whereas if it is found in the nine vertebrate species analyzed it is classified as 9. Each orthologue is counted once per species (e.g. if lamprey has 2 species-specific paralogues of one gene, it is only counted as 1).

Expression level of highly conserved CpA methylated genes

Standardized expression level for genes conserved in at least 7 vertebrate species as belonging to the top decile of CpA methylated genes. Boxplot centre lines are medians, box limits are quartiles 1 (Q1) and 3 (Q3), whiskers are 1.5 × interquartile range (IQR).

Phylogeny and expression of DNMT3 enzymes

(a) Maximum likelihood phylogenetic tree of DNMT3 orthologues across animals, representing the full version of that presented in Figure 4a. Nodal supports represent 100 bootstrap nonparametric replications. Schematic protein domain configurations shown for each clade. PWWP, Pro-Trp-Trp-Pro motif domain (PF00855). AAD ATRX, DNMT3, DNMT3L domain. MT, cytosine Methyltransferase domain (PF00145). CH, Calponin Homology domain (PF00307). Asterisk highlights arctic lamprey sequences. Broken domains indicate that the domain has large deletions in the given clade. (b) Table with the steady-state transcriptional level of DNMT3A in vertebrate samples, and DNMT3 in invertebrate samples. Compared to previous analysis of the DNMT3 family, here we describe for the first time the presence of DNMT3L in non-mammalian genomes. These include non-avian reptiles (turtles, crocodiles and squamates) and two lamprey genomes. This indicates that DNMT3L was one of the ancestral onhologues product of the vertebrate ancestral WGD. Interestingly, both lampreys and tetrapod sequences show a truncated cytosine methyltransferase domain, which might indicate that the DNMT3L has been conserved despite its lack of catalytic activity.

Phylogeny and conservation of MBD4/MECP2

(a) Maximum likelihood phylogenetic tree of the Methyl-CpG Binding Domain family in animals, representing the full-version of Figure 4b. Nodal supports represent 100 bootstrap nonparametric replications. On the right, protein domain structure of each clade, as defined by Pfam domains. MBD, Methyl Binding Domain (PF01429). HhH-GPD, Thymine glycosylase (PF00730). MBDa, p55-binding region of MBD2/3 (PF16564). MBD_C, MBD2/3 C-terminal domain (PF14048). zf-CXXC, zinc finger (PF02008). CTD, MECP2 C-Terminal Domain. TRD, MECP2 Transcriptional Repression Domain. (b) Domain presence in MBD4/MECP2 orthologues in several invertebrate genomes. Lack of the Thymine glycosylase domain is likely due to incomplete gene annotation or genome assembly gaps.

Conservation of the MeCP2 protein domains

(a) Amino acid multi-sequence alignment (MAFFT E-INS-i mode) of the Methyl-CpG Binding domain (MBD) from MeCP2, MBD4 and invertebrate MECP2/MBD4 sequences. The black square highlights the MBD domain as defined by Pfam. The red triangles indicate positions mutated in the human MECP2 gene that cause Rett Syndrome phenotypes[52]. (b) Amino acid multi-sequence alignment of the Transcriptional Repression Domain (TRD) from MeCP2, MBD4 and the homologous region (C-terminal of the MBD) of invertebrate MBD4/MECP2 proteins. NID stands for the N-CoR/SMRT interacting amino acids. Additional black squares highlight the AT-hook domains. Alignment visualised using Geneious software.

MBD4/MECP2 isoform expression in the european amphioxus

Diagram representing the sequences used to uniquely map RNA-seq reads to each isoform across different tissues and developmental stages. Quantification of each isoform in each sample, normalised by gene length (TPM as per Kallisto quantification).
  84 in total

Review 1.  DNA methylation patterns and epigenetic memory.

Authors:  Adrian Bird
Journal:  Genes Dev       Date:  2002-01-01       Impact factor: 11.361

Review 2.  DNA methylation landscapes: provocative insights from epigenomics.

Authors:  Miho M Suzuki; Adrian Bird
Journal:  Nat Rev Genet       Date:  2008-06       Impact factor: 53.242

Review 3.  Function and information content of DNA methylation.

Authors:  Dirk Schübeler
Journal:  Nature       Date:  2015-01-15       Impact factor: 49.962

Review 4.  Non-CG Methylation in the Human Genome.

Authors:  Yupeng He; Joseph R Ecker
Journal:  Annu Rev Genomics Hum Genet       Date:  2015-06-04       Impact factor: 8.929

5.  Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells.

Authors:  Ryan Lister; Mattia Pelizzola; Yasuyuki S Kida; R David Hawkins; Joseph R Nery; Gary Hon; Jessica Antosiewicz-Bourget; Ronan O'Malley; Rosa Castanon; Sarit Klugman; Michael Downes; Ruth Yu; Ron Stewart; Bing Ren; James A Thomson; Ronald M Evans; Joseph R Ecker
Journal:  Nature       Date:  2011-02-02       Impact factor: 49.962

Review 6.  Dynamic DNA methylation: In the right place at the right time.

Authors:  Chongyuan Luo; Petra Hajkova; Joseph R Ecker
Journal:  Science       Date:  2018-09-28       Impact factor: 47.728

7.  Global epigenomic reconfiguration during mammalian brain development.

Authors:  Ryan Lister; Eran A Mukamel; Joseph R Nery; Mark Urich; Clare A Puddifoot; Nicholas D Johnson; Jacinta Lucero; Yun Huang; Andrew J Dwork; Matthew D Schultz; Miao Yu; Julian Tonti-Filippini; Holger Heyn; Shijun Hu; Joseph C Wu; Anjana Rao; Manel Esteller; Chuan He; Fatemeh G Haghighi; Terrence J Sejnowski; M Margarita Behrens; Joseph R Ecker
Journal:  Science       Date:  2013-07-04       Impact factor: 47.728

8.  Genomic distribution and inter-sample variation of non-CpG methylation across human cell types.

Authors:  Michael J Ziller; Fabian Müller; Jing Liao; Yingying Zhang; Hongcang Gu; Christoph Bock; Patrick Boyle; Charles B Epstein; Bradley E Bernstein; Thomas Lengauer; Andreas Gnirke; Alexander Meissner
Journal:  PLoS Genet       Date:  2011-12-08       Impact factor: 5.917

9.  Epigenomic Signatures of Neuronal Diversity in the Mammalian Brain.

Authors:  Alisa Mo; Eran A Mukamel; Fred P Davis; Chongyuan Luo; Gilbert L Henry; Serge Picard; Mark A Urich; Joseph R Nery; Terrence J Sejnowski; Ryan Lister; Sean R Eddy; Joseph R Ecker; Jeremy Nathans
Journal:  Neuron       Date:  2015-06-17       Impact factor: 17.173

10.  Distribution, recognition and regulation of non-CpG methylation in the adult mammalian brain.

Authors:  Junjie U Guo; Yijing Su; Joo Heon Shin; Jaehoon Shin; Hongda Li; Bin Xie; Chun Zhong; Shaohui Hu; Thuc Le; Guoping Fan; Heng Zhu; Qiang Chang; Yuan Gao; Guo-li Ming; Hongjun Song
Journal:  Nat Neurosci       Date:  2013-12-22       Impact factor: 24.884

View more
  10 in total

Review 1.  Enzyme-free targeted DNA demethylation using CRISPR-dCas9-based steric hindrance to identify DNA methylation marks causal to altered gene expression.

Authors:  Daniel M Sapozhnikov; Moshe Szyf
Journal:  Nat Protoc       Date:  2022-10-07       Impact factor: 17.021

2.  Human non-CpG methylation patterns display both tissue-specific and inter-individual differences suggestive of underlying function.

Authors:  Philip Titcombe; Robert Murray; Matthew Hewitt; Elie Antoun; Cyrus Cooper; Hazel M Inskip; Joanna D Holbrook; Keith M Godfrey; Karen Lillycrop; Mark Hanson; Sheila J Barton
Journal:  Epigenetics       Date:  2021-08-30       Impact factor: 4.861

3.  Encyclopaedia of eukaryotic DNA methylation: from patterns to mechanisms and functions.

Authors:  Peter Sarkies
Journal:  Biochem Soc Trans       Date:  2022-06-30       Impact factor: 4.919

4.  Developmental and Injury-induced Changes in DNA Methylation in Regenerative versus Non-regenerative Regions of the Vertebrate Central Nervous System.

Authors:  Sergei Reverdatto; Aparna Prasad; Jamie L Belrose; Xiang Zhang; Morgan A Sammons; Kurt M Gibbs; Ben G Szaro
Journal:  BMC Genomics       Date:  2022-01-04       Impact factor: 3.969

5.  Editorial: Chromatin Regulation in Cell Fate Decisions.

Authors:  Justin Brumbaugh; Bruno Di Stefano; José Luis Sardina
Journal:  Front Cell Dev Biol       Date:  2021-09-01

Review 6.  Early-Life Environment Influence on Late-Onset Alzheimer's Disease.

Authors:  Thibaut Gauvrit; Hamza Benderradji; Luc Buée; David Blum; Didier Vieau
Journal:  Front Cell Dev Biol       Date:  2022-02-17

7.  A comparative methylome analysis reveals conservation and divergence of DNA methylation patterns and functions in vertebrates.

Authors:  Hala Al Adhami; Anaïs Flore Bardet; Michael Dumas; Elouan Cleroux; Sylvain Guibert; Patricia Fauque; Hervé Acloque; Michael Weber
Journal:  BMC Biol       Date:  2022-03-23       Impact factor: 7.431

8.  A critique of the hypothesis that CA repeats are primary targets of neuronal MeCP2.

Authors:  Kashyap Chhatbar; John Connelly; Shaun Webb; Skirmantas Kriaucionis; Adrian Bird
Journal:  Life Sci Alliance       Date:  2022-09-19

9.  Epigenetic machinery is functionally conserved in cephalopods.

Authors:  Filippo Macchi; Eric Edsinger; Kirsten C Sadler
Journal:  BMC Biol       Date:  2022-09-14       Impact factor: 7.364

Review 10.  Epigenetic Regulation in Hydra: Conserved and Divergent Roles.

Authors:  Anirudh Pillai; Akhila Gungi; Puli Chandramouli Reddy; Sanjeev Galande
Journal:  Front Cell Dev Biol       Date:  2021-05-10
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.