Literature DB >> 23138310

Mapping recently identified nucleotide variants in the genome and transcriptome.

Abstract

Nucleotide variants, especially those related to epigenetic functions, provide critical regulatory information beyond simple genomic sequence, and they define cell status in higher organisms. 5-Methylcytosine, which is found in DNA, was until recently the only nucleotide variant studied in terms of epigenetics in eukaryotes. However, 5-methylcytosine has turned out to be just one component of a dynamic DNA epigenetic regulatory network that also includes 5-hydroxymethylcytosine, 5-formylcytosine and 5-carboxylcytosine. Recently, reversible methylation of N6-methyladenosine in RNA has also been demonstrated. The discovery of these new nucleotide variants triggered an explosion of new information in the epigenetics field. This rapid research progress has benefited significantly from timely developments of new technologies that specifically recognize, enrich and sequence nucleotide modifications, as evidenced by the wide application of the bisulfite sequencing of 5-methylcytosine and very recent modifications of bisulfite sequencing to resolve 5-hydroxymethylcytosine from 5-methylcytosine with base-resolution information.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Nucleotides

Year: 2012 PMID： 23138310 PMCID： PMC3537840 DOI： 10.1038/nbt.2398

Source DB: PubMed Journal: Nat Biotechnol ISSN： 1087-0156 Impact factor: 54.908

The mammalian genome possesses much more information than a sequence of nucleotides. Each adult human body contains over 200 distinct cell types; yet despite their marked differences in phenotype and function, these cell types share an almost identical genome sequence. Epigenetic modifications play a major role in this diversity. An important epigenetic modification in mammalian genomic DNA is the nucleotide variant 5-methylcytosine (5mC); 5mC regulates gene expression, determines cell development, and affects disease pathogenesis[1,2]. But 5mC is not the only nucleotide variant. During the past three years, three additional cytosine variants were identified in the mammalian genome. In 2009, 5-hydroxymethylcytosine (5hmC) was shown to exist in relatively high abundance in certain mammalian cells and tissues[3,4]. Following this discovery, 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) were revealed in mouse embryonic stem cells (ESCs) and mouse tissues[5-7]. These cytosine derivatives are produced from a stepwise oxidation of 5mC by the ten-eleven translocation (TET) family dioxygenases (Fig. 1, Table 1)[4,6-8]. These new DNA base modifications immediately drew broad attention from the research community and have been extensively reviewed[9-13].

Figure 1

New DNA nucleotide variants, including 5hmC, 5fC, and 5caC. The pattern of DNA methylation is established and maintained by DNA methyltransferases. Demethylation can be passive (e.g. during replication) or active. TET family proteins can oxidize 5mC to 5hmC, 5hmC to 5fC, and then 5fC to 5caC. The oxidation products 5fC and 5caC can be removed by TDG to generate an abasic site. This abasic site can be repaired to a cytosine by the base excision repair (BER) pathway. Alternatively, 5hmC may be deaminated by AID or APOBEC to 5hmU, which can subsequently be removed and repaired by TDG or SMUG1 and then BER, respectively. 5caC may also be removed in a decarboxylation pathway. Solid arrows indicate biochemically validated pathways whereas dotted arrows are pathways yet to be confirmed biochemically. 5hmU has not been detected in the mammalian genome so far.

Table 1

Proteins that deposit, bind to, modify or remove nucleotide variants, and the known genomic locations of some of these nucleotide variants.

Modification	Proteins that deposit the modification	Proteins that modify, remove or bind the modification	Genomic or transcriptomic location
5hmC	TET1-3[4,8]	TET1-3[6,7]	With affinity-based profiling, it is shown to be enriched at TSSs, promoters, exons, CTCF-binding sites and enhancers[18-20,22,61,62,65,66]. With single-base resolution sequencing, it shows highest enrichment at distal regulatory regions, near but not on transcription factor-binding sites[69].
5fC	TET1-3[6]	TET1-3[6], TDG[23,24]	Unknown
5caC	TET1-3[6,7]	TDG[7,23,24]	Unknown
m⁶A in mRNA	MT-A70 (A 70 kD subunit protein in a 200 kD protein complex)[120]	FTO[37], YTHDF2-3 and ELAVL1 (binding proteins)[101]	Enriched around stop codons, in 3′ UTRs and within long internal exons[101,102]
5mC in RNA	NSUN2[92]	Unknown	Enriched in untranslated regions (both 5′ and 3′ UTRs) and near Argonaute-binding regions within mRNA[92]

5mC is generally viewed as a ‘silencing’ epigenetic mark because of the hydrophobic recruitment of methylCpG-binding proteins[2]. 5hmC, although carrying a hydrophilic modification, is not simply an ‘activating’ epigenetic mark; it is regarded as an intermediate in an active demethylation pathway[4,6,7,14-16] (Fig. 1) and appears to play complex roles in gene regulation[17-22]. In certain cells or tissues in which 5hmC accumulates to relative high abundance, it may also have unique functions of its own that directly affect gene expression. Currently, 5fC and 5caC are thought to be strictly demethylation intermediates for two reasons. First, they exist in much lower abundance compared with 5hmC in mouse ESCs. Second, they are recognized and removed by DNA glycosylase TDG to yield abasic sites, which are subsequently converted to cytosine through base excision repair (BER) (Fig. 1)[7,23,24]. However, the generation and removal of these further oxidized cytosine derivatives could be regulated to affect gene expression. Box 1 summarizes our current knowledge of the biological functions of these new DNA epigenetic nucleotide variants. The availability of next-generation sequencing technologies that allow for high-throughput and affordable sequencing significantly accelerated research on nucleic acid modifications[25]. In less than three years tremendous progress has been made in understanding the biological function of 5hmC as a direct result of the rapid development of 5hmC-detection and sequencing methods[26]. In this review, we discuss methods for detecting nucleotide modifications of potential functional importance, such as 5hmC, 5fC and 5caC in DNA, and we briefly summarize other interesting modifications, such as m6A and 5mC in RNA. To fully describe new technologies and the biological insights revealed through their application to the study of a single nucleotide variant, we focus on 5hmC. This review will not cover 5mC in DNA, which is the subject of many other articles[27,28]. Although 5-hydroxymethyluracil (5hmU) is a suggested intermediate in active demethylation mediated through deamination and then BER (Fig. 1)[29,30], it is essentially undetectable in the mammalian genome[15], and a recent biochemistry and cell-based investigation raised questions about the feasibility of the deamination step[31]. Nevertheless, 5hmU behaves similarly to 5hmC in certain aspects. Therefore, many 5hmC detection/profiling methods that we discuss can potentially be applied to future 5hmU detection and profiling, if it does indeed play functional roles in certain biological pathways[32,33].

New nucleotide variants bring technological challenges

The nucleotide variants discussed here generally exist in very low abundance in the genome, ranging from several ppm (parts per million, equal to 0.0001%) to less than 1% compared to regular nucleotides (A, T, C, G) (Table 2). Therefore, highly selective and sensitive methods with low background noise are required to detect, profile, and sequence these variants. While antibody-based immunoprecipitation approaches typically show bias towards densely populated regions, an ideal method would recognize or enrich every single modification in the genome without bias, thus achieving high sensitivity for scarce modifications.

Table 2

Relative abundance and known tissue locations of new nucleotide modifications.

Modification	Tissues and cell lines	Relative abundance	Genome- or transcriptome-wide profiling methods applied
5hmC in DNA	Mouse ESC	0.1% of cytosine[6]	hMeDIP[18-20,61]
			GLIB[22]
			anti-CMS[22]
			TAB-Seq[69] (single-base resolution)

	Human ESC	Not available	hMeDIP[62]
			hMe-Seal [66]
			TAB-Seq[69] (single-base resolution)

	Mouse brain tissue	0.4∼0.7% of cytosine[6,15]	hMe-Seal[39,65]

	Human brain tissue	Not available	hMeDIP[35]
	Human brain tissue	Not available	hMe-Seal[39]

	Other mouse tissue	0.02∼0.3% of cytosine[6,15]	Not available

	Human cancer cells	0.03∼0.1% of guanine[116]	Not available

	mouse P19 and 3T3-L1 cells	Not available	hMeDIP[63]

5fC in DNA	Mouse ESC	20 ppm of cytosine[6]	Not available

	Mouse tissues	3-20 ppm of cytosine[6]	Not available

5caC in DNA	Mouse ESC	3 ppm of cytosine[6]	Not available

m⁶A in mRNA	Human HepG2 cells	Not available	anti-m⁶A antibody[101]

	Mouse liver	Not available	anti-m⁶A antibody[101]

	Mouse brain	Not available	anti-m⁶A antibody[102]

	Human HEK293T cells	Not available	anti-m⁶A antibody[102]

5mC in RNA	HeLa cell	Not available	Bisulfite sequencing[92]

Since discovering the potential epigenetic role of 5hmC[3,4], the field has progressed through three stages of technology development and implementation. When 5hmC was first discovered, methods to accurately detect and quantify this base variant in the genome were required. Next, genome-wide affinity-based profiling methods to enrich 5hmC-containing genomic regions, which could then be subjected to next-generation sequencing, were required to determine the genomic distribution of 5hmC. Although powerful enough to reveal initial biological insights into 5hmC, the sequencing data obtained from these methods does not provide single-base resolution maps with the relative abundance at each modification site; this quantitative information is crucial to the understanding of the biology associated with 5hmC. Just recently, we have seen the emergence of single-base resolution sequencing technologies that are capable of quantifying the relative abundance of 5hmC at each modification site. In addition, these methods can further refine base-resolution maps of 5mC both genome-wide and at specific loci in combination with the conventional bisulfite sequencing approach, which by itself could not differentiate 5hmC from 5mC and gives the sum of 5mC+5hmC. Therefore, these new methods are expected to have a transformative impact on DNA epigenetic research in general.

Detection and quantification

Thin layer chromatography (TLC) analysis of hydrolyzed nucleotides is perhaps the oldest method for studying DNA or RNA modifications[34]. Combined with radioactive labeling, TLC can be quite sensitive; in fact many new nucleotide variants, such as 5hmC[3,4] and 5caC[6,7] were initially discovered by TLC with genomic samples isolated from mammalian cells. TLC can be performed in either one or two dimensions, with the latter providing more resolution power[6]. A chemical reaction with the base modification can change the physical properties of the base and thus induce a shift of its migration on TLC for enhanced separation or validation of the chemical properties of the modification. For instance, 5fC and 5caC can react with O-ethylhydroxylamine hydrochloride and 1-ethyl-3-(3-dimethylaminopropyl) carbodiimide (EDC), respectively, thus resulting in dramatic shifts of their migrations on TLC for easy detection and validation[6]. The TLC method, however, can be tedious and its overall sensitivity is limited by radioactive labeling. Antibody-based detection has also been implemented[17,35-37]. However, as a result of its nonlinear and density-biased response, this approach tends to be limited to semiquantitative information[22,38]. Nevertheless, antibodies can directly stain nucleotide modifications inside cells for cell-based visualization[8,38-42]. The chemical properties of a modification can also be varied in order to achieve much enhanced antibody-based recognition. Rao, Agarwal and colleagues demonstrated this by treating 5hmC with sodium bisulfite to generate cytosine 5-methylenesulfonate (CMS), a chemically modified 5hmC derivative that is highly immunogenic (Fig. 2a)[38]. The resulting anti-CMS antibody is very specific with much less density bias compared with anti-5hmC antibodies[22,38].

Figure 2

Summary of genome-wide affinity-based 5hmC-profiling methods. (a) 5hmC in genomic DNA can be enriched by anti-5hmC antibodies or, after treatment with sodium bisulfite, by anti-CMS antibody. (b) 5hmC can be labeled with glucose by βGT. The resulting glucosylated 5hmC can be enriched with JBP-1. Alternatively, βGT-treated 5hmC can undergo glucosylation, periodate oxidation, biotinylation (GLIB); in this reaction sodium periodate cleaves the vicinal hydroxyl groups in the glucose to generate reactive aldehyde groups, which can be biotinylated using an aldehyde-reactive hydroxylamine-biotin probe. Alternatively, an azide-modified glucose can be introduced to 5hmC by βGT and subsequently biotinylated via click chemistry in selective chemical labeling (hMe-Seal). Biotinylated 5hmC residues can be enriched using streptavidin beads, and all the affinity-enriched 5hmC DNA can be subjected to high-throughput sequencing, or to SMRT sequencing in the case of hMe-Seal, to determine the genomic distribution of 5hmC.

Besides antibodies, enzymes that specifically recognize and react with the nucleotide of interest have proved to be extremely valuable. In this regard, the T4 bacteriophage enzyme β-glucosyltransferase (βGT) has become a critical tool for specifically modifying 5hmC for subsequent detection and sequencing (Fig. 2b)[43]. This bacteriophage enzyme has long been known to transfer a glucose moiety from UDP-glucose to 5hmC, which also exists in the T4 phage genome[44]. Since the glucosylation reaction is specific to 5hmC, it was used to transfer a radioactively labeled glucose to 5hmC for quantification, which is more sensitive and accurate than the antibody-based detection[32,45]. Restriction endonucleases sensitive to methylation have long been used to detect DNA methylation[46]. Recently, several restriction endonucleases including MspI[47,48], TaqαI[49], MspJI[50,51], PvuRts1I[52,53], and SauUSI[54] have been employed to detect 5hmC at specific loci and potentially for genome-wide analysis as well. These enzymes are either selective to 5hmC or blocked by glucose-modified 5hmC (from βGT-catalyzed glucosylation), thus providing a sequence-dependent interrogation of 5hmC. Methyltransferase[55] and exonuclease[56] have also been used for the detection of 5hmC. Equivalent methods have yet to be reported for 5fC and 5caC. For a more accurate method, researchers turn to the gold standard in quantifying low levels of modified nucleotides: liquid chromatography (LC)-mass spectroscopy (MS), which separates and identifies hydrolyzed nucleotides. Carell and colleagues coupled LC with high-resolution MS and used isotope-labeled internal standard to achieve accurate quantification of 5hmC[57]. Later, they applied the same approach to the detection of 5fC in genomic DNA isolated from mouse ESCs[5]. They reacted the formyl group of 5fC with biotin-hydroxylamine for enhanced signal and validation[5]. Recent advances in applying triple quadrupole mass spectrometer for LC–tandem mass spectrometry (MS/MS) detection of rare base modifications, such as 5hmC[58], 5fC[6], 5caC[6,7], m6A[37] and other nucleotide variants[59], further improve the detection limits and allow for the quantification. In fact, LC-MS/MS is the only reported method so far that can quantify 5caC, the scarcest cytosine derivative in ESC genomic DNA, at the level of ∼3 ppm of that of cytosine[6]. The relative abundance of these new nucleotide variants determined by LC-MS are summarized in Table 2.

Genome-wide profiling methods

Compared to simple detection and quantification, genome-wide profiling methods that combine affinity-based enrichment and high-throughput sequencing to yield a genome-wide distribution map of the modified base can provide much-needed biological insights. Figure 2 and Table 3 summarize and compare all reported profiling methods for 5hmC. Antibody is usually the first method came to mind (Fig. 2a). Traditional antibody-based captures, such as methylated DNA immunoprecipitation-sequencing (MeDIP-seq) and the related methyl-binding protein-sequencing (MBD-seq), have been used extensively to map methylomes[60]. Similarly, several groups have simultaneously developed hydroxymethyl-DNA immunoprecipitation-sequencing (hMeDIP)[18-20,35,61-63] using antibodies raised against 5hmC. However, careful analyses reveal the tendency of these anti-5hmC antibodies to recognize modification-dense regions[22] as well as CA repeats[64]. Such biases, together with the high background noise and inferior reproducibility when using antibodies from different lots can pose problems in data analysis[22,38,64]. Nevertheless, valuable information on the genome-wide distribution of 5hmC has been gained[11-13]. As mentioned above, the anti-CMS antibody showed substantially improved performance over anti-5hmC antibody in genome-wide pull-down and sequencing with less bias and lower background noise[22]. Based on this body of work, 5hmC has been shown to be enriched at transcription start sites (TSSs), promoters, gene bodies (exons), CCCTC-binding factor (CTCF)-binding sites and enhancers in ESCs, thus suggesting potential roles for 5hmC in DNA methylation fidelity, pluripotency and lineage commitment balance[11-13] (Table 1).

Table 3

Advantages and disadvantages of current 5hmC sequencing methods

Methods	Advantages	Disadvantages
Affinity-based methods
hMeDIP[18-20,35,61-63]	Antibody readily available; relatively straightforward procedure	Biased to heavily modified regions and CA-repeats; high background; antibody lot-to-lot inconsistency
Anti-CMS[22]	Less bias and lower background compared to anti-5hmC antibody	Prolonged procedure; PCR bias after bisulfite treatment
hMe-Seal[39,65-67]	Highly efficient, specific and unbiased labeling; built-in disulfide bond for easy pull-down	Requires synthesis of azide-modified glucose (now commercially available)
GLIB[22]	Highly specific biotin-based pull-down; readily available materials	Sodium periodate oxidation introduces high background; comparison to a non-trivial negative control is necessary
JBP-1[33,68]	Highly efficient one-step βGT labeling; readily available materials	Takes one week to prepare the JBP-1 beads; no published genome-wide profiling data for comparison
Single-base resolution methods
SMRT[67]	Single molecular sequencing, no PCR amplification required; strand-specific 5hmC sequencing	Loss of quantitative information due to prior enrichment; higher sequencing capacity needed
oxBS-Seq[78]	Low-cost and readily available materials; simple procedure.	Oxidation degradation of DNA; repeated bisulfite treatments to fully deaminate 5fC; potentially increased error owing to the comparitive nature of the method.
TAB-Seq[69]	Measure 5hmC directly; readily deamination of 5caC under traditional bisulfite treatment	Requires highly active TET enzymes for high conversion rate of 5mC to 5caC.

We took advantage of the βGT-catalyzed 5hmC glucosylation reaction and developed a selective chemical labeling-based method we named hMe-Seal (Fig. 2b)[65]. Like unmodified glucose, azide-modified glucose is well tolerated by βGT and efficiently transferred to 5hmC. A biotin can be subsequently installed onto the azido group. Relying on the extremely tight and specific binding between biotin and streptavidin, which has virtually no modification density bias[22], we can in principle label every 5hmC and perform selective pull-down for genome-wide profiling or loci-specific analysis of 5hmC distribution[39,65,66]. Thanks to use of a disulfide linker, the enriched product can be readily released from streptavidin via reduction with dithiothreitol (DTT)[67]. hMe-Seal is robust with extremely low background and no bias[64]. It should also be noted that the glucose modification on the enriched DNA fragments does not interfere with polymerases employed regularly for library preparation in Illumina sequencing. Only in rare cases do we observe pausing with Taq polymerase at modification sites[65]. Using hMe-Seal, we have performed whole-genome profiling of 5hmC in mouse and human brain tissues. We found distinct age-dependent distribution of 5hmC in brain tissues as compared with ESCs. Specifically, we saw enrichment within gene bodies of expressed genes and upstream of the TSS, but we observed depletion at the TSS, suggesting a unique function of 5hmC in neurodevelopment[39,65]. A related biotin-based 5hmC-profiling method is referred to as glucosylation, periodate oxidation, biotinylation (GLIB). It utilizes βGT to transfer an unmodified glucose to 5hmC, followed by cleavage of the vicinal hydroxyl groups in the glucose by sodium periodate to generate reactive aldehyde groups, which can then be biotinylated using an aldehyde-reactive hydroxylamine-biotin probe for further enrichment[22]. However, the sodium periodate oxidation may cause DNA damage and introduce high background. Nevertheless, with appropriate controls the GLIB method provides an alternative approach. Applying this method and the anti-CMS antibody-based enrichment, the Rao group revealed the distribution of 5hmC in ESCs as described above[22]. After treating 5hmC with βGT, Klungland and colleagues showed that the J-binding protein 1 (JBP-1), which is known to interact with glucosylated 5hmU in certain kinetoplastids, can also bind and therefore enrich glucosylated 5hmC for specific 5hmC profiling[33,68]. Thus, JBP-1 works as a naturally existing ‘antibody’ for glucosylated 5hmC.

Single-base resolution sequencing methods

Although valuable, affinity-based genome-wide profiling methods have several disadvantages. First, these methods generate distribution maps with poor resolution as a result of the size limitation of the nucleic acid fragmentation and capture technology. Second, enrichment renders it impossible to measure the absolute abundance of the nucleotide modification. Third, the affinity-based methods' propensity to amplify frequent but weak signals may impose biases[69]. In contrast, a single-base resolution mapping method, especially a whole-genome sequencing method without prior enrichment, could provide the most accurate and quantitative information regarding the modification. The simplest way to achieve single-base resolution sequencing of a nucleotide variant would be to recognize its physical size or properties directly during sequencing. Unfortunately, the current second-generation sequencing technologies involve sample pre-amplification, which leads to the loss of base modification information. Third-generation sequencing technologies that feature single-molecule sequencing and do not require sample pre-amplification may pose a solution[70]. The single-molecule, real-time (SMRT) sequencing developed by Pacific Biosciences records the incorporation of phospholinked nucleotides by individual DNA polymerase in real time[71]. By further monitoring the polymerase kinetics during replication, SMRT can directly detect DNA base modifications including 5mC and 5hmC, albeit with low confidence[72]. Through collaboration with Pacific Biosciences we have successfully integrated hMe-Seal (Fig. 2) and SMRT sequencing to improve the polymerase kinetics for confident detection of 5hmC at single-base resolution[67]. Further technological advances are needed before this approach can be applied to whole mammalian genome 5hmC sequencing. Other third-generation sequencing approaches, such as nanopore sequencing[73], also have the potential to detect 5mC[74] and 5hmC[75,76] at the single-base level, but these applications are still in the early stages of development. Bisulfite sequencing, the gold standard for single-base resolution sequencing of 5mC, can be adapted to essentially any sequencing platform. In this approach the distinct chemical reactions of cytosine and 5mC with sodium bisulfite (NaHSO3) (cytosine deaminates to uracil, whereas 5mC remains intact), are explored to achieve single base resolution differentiation of cytosine from 5mC[77]. Complications arise, however, with all of the newly discovered cytosine derivatives. Under bisulfite conditions, cytosine, 5fC (which requires harsher conditions to achieve complete deamination)[78], and 5caC[7,69] undergo deamination to read as thymine, whereas 5mC and 5hmC resist deamination and thus will read as cytosine[79-81]. Therefore, traditional bisulfite sequencing cannot differentiate 5hmC from 5mC, nor can it differentiate 5fC or 5caC from unmodified cytosine. Two groups independently designed modified bisulfite sequencing for quantitative single-nucleotide resolution mapping of 5hmC and 5mC in mammalian DNA by taking advantage of different properties of modified cytosines[69,78]. In the first approach, termed oxidative bisulfite sequencing (oxBS-Seq), Balasubramanian, Reik and colleagues explored the chemical property of 5hmC and discovered that potassium perruthenate (KRuO4) specifically oxidizes 5hmC to 5fC, which subsequently deaminates under repeated bisulfite treatments (Fig. 3a)[78]. Therefore, in a KRuO4- and bisulfite-treated DNA sample, 5hmC would read as thymine while 5mC still reads as cytosine. This method directly reads out 5mC. To reveal base-resolution information of 5hmC, traditional bisulfite sequencing of the KRuO4-untreated DNA sample can be performed to reveal both 5mC and 5hmC as cytosine. A subtraction yields the abundance of 5hmC (Fig. 3a). The authors then applied oxBS-Seq to reduced representation bisulfite sequencing (RRBS, which selects a fraction of restriction enzyme digested fragments to generate a ‘reduced representation’ of the genome) in order to sequence a subset of genomic regions enriched with CpG islands (CGIs) in mouse ESCs[78]. Potential limitations of this method include that genomic DNA can be damaged and degraded by chemical oxidation conditions and by the repeated bisulfite treatments needed to fully deaminate 5fC (generated from 5hmC). However, this simple method, with further optimization to avoid extensive DNA degradation and achieve high yields of 5fC deamination, could be very attractive in practical sequencing of 5hmC in genomic samples.

Figure 3

OxBS-Seq and TAB-seq for single-base resolution sequencing of 5hmC. (a) OxBS-Seq requires two bisulfite sequencings. In the first sequencing, 5hmC in genomic DNA is oxidized to 5fC by KRuO4, and subsequently converted into T by bisulfilte treatment and PCR. In the second sequencing, genomic DNA is subjected to bisulfite treatment and PCR without KRuO4 treatment. The first sequencing provides genuine sites of 5mC; this information is subtracted from the 5mC plus 5hmC sites provided by the second traditional bisulfite sequencing. (b) TAB-Seq directly reads out 5hmC in one bisulfite sequencing. 5hmC is protected from TET-mediated oxidation and bisulfite conversion by βGT-catalyzed glucosylation. Next, 5mC is oxidized by TET to 5caC, and subsequently converted into T after bisulfite treatment and PCR. Therefore, TAB-Seq provides genuine sites of 5hmC in genomic DNA with absolute abundance at each modification site.

As an approach to directly read out 5hmC, we and our collaborators developed TET-assisted bisulfite sequencing, which we have termed TAB-Seq (Fig. 3b)[69]. In TAB-Seq, we first protect 5hmC from TET-mediated oxidation by blocking it with glucose using βGT. Next, all the 5mC are oxidized by the mTet1 enzyme to 5caC, which subsequently undergo deamination in bisulfite treatment. When the DNA is sequenced, these bases are read as thymine. The only remaining cytosine signals after TAB-Seq stem from the protected 5hmCs (Fig. 3b). To obtain base-resolution information of 5mC, the results of TAB-Seq can be compared with those of traditional bisulfite sequencing, which reveals the sum of 5mC+5hmC. A subtraction yields the base-resolution map of 5mC. A current limitation to this method is the requirement of highly active TET enzymes. An oxidation conversion rate over 96% of 5mC to 5caC is desirable to reduce sequencing costs[69]. Currently, only mTet1 expressed from insect cells and carefully purified can achieve this level of activity[69]. Neither oxBS-Seq nor TAB-Seq requires an enrichment step. Therefore, quantitative information of 5mC and 5hmC within the genome can be obtained. The availability of base-resolution methods for 5hmC sequencing is potentially transformative for studies of 5hmC biology. We have applied TAB-Seq to provide the first full maps of 5hmC in human and mouse ESCs and uncovered new features of 5hmC, including its significant enrichment at distal functional regulatory elements such as enhancers, its distribution near but not on transcription factor–binding sites, and the sequence bias and strand asymmetry associated with 5hmC sites, suggesting that active demethylation occurs at regulatory elements through 5hmC[69]. However, the depletion of 5hmC at transcription factor binding sites could also be attributed to less methylation and/or the steric exclusion of TET proteins by transcription factors at these sites.

RNA modifications

Chemical modifications (e.g. methylation) on DNA and histones have been widely accepted as key processes that regulate gene expression. In contrast to the limited types of modifications found in DNA, cellular RNAs, including mRNA and non-coding RNA, contain more than a hundred structurally distinct post-transcriptional modifications at thousands of sites (http://rna-mdb.cas.albany.edu/RNAmods/)[82]. We have hypothesized that some of these RNA modifications can also be dynamic and reversible and may play regulatory roles analogous to reversible DNA and protein modifications[83,84]. Traditional methods to determine the localization of RNA modifications such as TLC[85], primer extension[86], ligation[87], microarray[88], or mass spectrometry[89] are low-throughput, laborious, time-consuming, and especially difficult for low abundant cellular RNAs such as mRNA. As a result, the functions of potential dynamic RNA modifications, especially those on low abundant mRNA and non-coding RNA that will be discussed in this review, remained largely unexplored due to the lack of large scale sequencing methods and lack of RNA demodification enzymes[84]. In fact, prior to 2011, there were no known reversible chemical modifications on RNA that could affect gene expression. Several recently developed high-throughput sequencing methods specific for RNA modifications have rekindled interest in functional dynamics of RNA modifications, in particular those in mRNA and non-coding RNA. For example, bisulfite sequencing was applied to map transcriptome-wide 5mC in RNA and reveal that 5mC exists not only in tRNA and rRNA as previously known[90,91], but also in mRNA and certain non-coding RNAs[92] (Fig. 4a, Tables 1,2). A chemical method, termed inosine chemical erasing (ICE), which involves cyanoethylation combined with reverse transcription, was developed to sequence inosine (I) in mammalian transcriptomes[93] (Fig. 4b). RNA editing converts A to I and C to U, and I may play regulatory roles[94]. Although sequencing of RNA editing events is straightforward using current RNA-Seq technology, caution should be exercised when analyzing sequencing data[95] so as to avoid errors that arise from copy number variants or sequencing errors[96-99]. More comprehensive analyses and orthogonal approaches such as ICE should facilitate the discovery of additional RNA-editing events[93,100].

Figure 4

New sequencing methods for RNA modifications. (a) Bisulfite sequencing can be used to map transcriptome-wide 5mC in RNA. (b) Inosine chemical erasing (ICE) can be used to sequence inosine (I) in mammalian transcriptome. In the control group, inosine is converted into G by reverse transcription and PCR amplification. In the acrylonitrile treatment group, reverse transcription is blocked at the modified inosine site, which leads to identification of inosines on RNA. (c) m6A as a reversible RNA modification. m6A is generated by RNA methyltransferase(s) and removed by demethylases such as FTO. It further interacts with binding-proteins and may regulate various biological functions. Its genomic distribution can be determined by antibody-based immunoprecipitation.

In 2011, our laboratory showed that m6A, the most prevalent internal mRNA modification, is a major substrate of the fat mass and obesity-associated protein FTO both in vitro and inside cells (mRNA was isolated by poly(T) oligo with subsequent removal of rRNA)[37], raising the possibility that this reversible RNA nucleotide modification could serve as an epigenetic mark to tune gene expression analogous to methylated nucleotides observed in DNA[83]. Recently, antibodies raised against m6A were used to enrich m6A-containing RNA fragments for high-throughput sequencing (Fig. 4c). This m6A-Seq approach was applied to human and mouse samples, and revealed that the transcriptome-wide m6A distribution was dynamically modulated and preferentially enriched around stop codons, in 3′-UTR, and within long internal exons[101,102] (Table 1,2). In addition, several m6A-binding proteins have been identified, suggesting a function for m6A in regulating cellular dynamics. This field of reversible RNA modifications holds great promise in uncovering new biology associated with RNA metabolism, localization, and translation.

Perspective

The rapid progress of research on 5hmC has benefited from the rapid development of methods for 5 hmC detection, profiling, and now quantitative base-resolution mapping. These advances may guide studies of other nucleotide variants, especially the recently discovered 5fC and 5caC in mammalian DNA. The current lack of methods to reliably profile and quantitatively assess the location and abundance of these further oxidized 5mC derivatives substantially limits further research on these nucleotide variants. Antibodies against 5fC and 5caC are available for immunostaining[42], but given the low levels of 5fC and 5caC in mammalian genomic DNA (only ppm levels compared to cytosine in mouse ESC[6]; comparable to the levels of DNA damage), it can be very challenging to apply an antibody-based capture strategy, which tends to favor densely populated modifications. Even if the antibodies can pull down certain genomic regions, such an approach will still have very limited coverage. One potential solution to this problem is to selectively label 5fC or 5caC with biotin. The high-affinity interaction between biotin and streptavidin can in principle capture every modification with no density or sequence-dependent bias, which is extremely important for reliable enrichment of scarce modifications. Chemical transformations are available for the aldehyde group in 5fC and the carboxylate group in 5caC, such as hydroxylamine–aldehyde condensation for 5fC[5,6,22] (right at the time this paper was accepted a method describing the hydroxylamine-based profiling of 5fC was published on-line, which showed the enrichment of 5fC in CGIs of promoters and exons[103]) and EDC-mediated coupling for 5caC[6], which can be used to introduce a biotin group. However, both approaches can introduce high background noise as a result of side reactions of hydroxylamine and EDC with other functionalities on DNA[22,104,105]. Therefore, careful tuning of the reaction conditions and appropriate controls are necessary.Besides chemical transformation, enzymatic approaches are also attractive if selective 5fC and 5caC enzymes can be developed. TDG is a good starting point because it can remove 5fC and 5caC and generate abasic sites for further labeling. TDG also recognizes T/G and U/G mismatches[23,24], which have to be repaired first. Another possibility is to evolve an engineered βGT that can selectively label these modifications, especially 5caC[106]. Compared to limited DNA modifications, hundreds of RNA modifications present an even greater technological challenge owing to the huge pool of structural and functional diversity. For instance, although m6A has been known for decades as an internal mRNA modification[107], it has only recently been recognized as another reversible nucleotide modification[37]. Although distribution of m6A has been determined by the antibody-based affinity enrichment approach, high-resolution sequencing to assess the exact location and relative abundance at each modification site is highly desired. For other RNA modifications, antibody-based approaches can be a good start towards revealing their distributions. Quantitative mapping of 5fC, 5caC, and m6A (and many other modifications in RNA) at single-base resolution remains challenging. The application of third-generation sequencing to 5fC and 5caC in DNA and m6A in RNA may seem feasible but has yet to be adequately exploited. Future innovations in sequencing technology, such as SMTR and nanopore analysis, to generate truly high-throughput, high-capacity platforms that can discern modifications are highly desirable. On the other hand, base-resolution sequencing for 5fC and 5caC analogous to TAB-Seq and oxBS-Seq can be developed if specific chemical transformations alter the behavior of these nucleotides in bisulfite sequencing. Similarly, chemical or enzymatic transformations for m6A or other RNA modifications[108] that can affect base reading in PCR followed by sequencing would be required to develop base-resolution sequencing methods. In addition to high-resolution sequencing, methods are needed to analyze nucleotide variants in rare cells and in living cells. 5hmC and related nucleotide variants may play roles in the development of cancer[38,109,110] and early zygotes[111-113], where sample amounts can be very limited. Therefore, sequencing methods that can deal with hundreds to thousands of cells or even single-cell sequencing will have a profound impact on fundamental biological understanding as well as diagnostics. Understanding the dynamics of these modifications in living biological systems would benefit from methods for high-resolution, single-molecule imaging. In summary, recent discoveries of new nucleotide variants with epigenetic functions have stimulated the development of methods to detect, profile, and sequence these base modifications in the genome and transcriptome. In turn, the technological advances accelerate research to understand the biology of these nucleotide variants. This trend will continue as refined or completely new methods are developed.

118 in total

1. Detection and quantitation of RNA base modifications.

Authors: Xinliang Zhao; Yi-Tao Yu
Journal: RNA Date: 2004-06 Impact factor: 4.942

Review 2. Identification of modified residues in RNAs by reverse transcription-based methods.

Authors: Yuri Motorin; Sébastien Muller; Isabelle Behm-Ansmant; Christiane Branlant
Journal: Methods Enzymol Date: 2007 Impact factor: 1.600

3. TET1 and hydroxymethylcytosine in transcription and DNA methylation fidelity.

Authors: Kristine Williams; Jesper Christensen; Marianne Terndrup Pedersen; Jens V Johansen; Paul A C Cloos; Juri Rappsilber; Kristian Helin
Journal: Nature Date: 2011-04-13 Impact factor: 49.962

Review 4. Going beyond five bases in DNA sequencing.

Authors: Jonas Korlach; Stephen W Turner
Journal: Curr Opin Struct Biol Date: 2012-05-09 Impact factor: 6.809

5. Comprehensive analysis of mRNA methylation reveals enrichment in 3' UTRs and near stop codons.

Authors: Kate D Meyer; Yogesh Saletore; Paul Zumbo; Olivier Elemento; Christopher E Mason; Samie R Jaffrey
Journal: Cell Date: 2012-05-17 Impact factor: 41.582

6. High sensitivity 5-hydroxymethylcytosine detection in Balb/C brain tissue.

Authors: Theodore Davis; Romualdas Vaisvila
Journal: J Vis Exp Date: 2011-02-01 Impact factor: 1.355

7. Chemical discrimination between dC and 5MedC via their hydroxylamine adducts.

Authors: Martin Münzel; Lukas Lercher; Markus Müller; Thomas Carell
Journal: Nucleic Acids Res Date: 2010-09-02 Impact factor: 16.971

8. Very few RNA and DNA sequence differences in the human transcriptome.

Authors: Daniel R Schrider; Jean-Francois Gout; Matthew W Hahn
Journal: PLoS One Date: 2011-10-12 Impact factor: 3.240

9. Biochemical characterization of recombinant β-glucosyltransferase and analysis of global 5-hydroxymethylcytosine in unique genomes.

Authors: Jolyon Terragni; Jurate Bitinaite; Yu Zheng; Sriharsa Pradhan
Journal: Biochemistry Date: 2012-01-27 Impact factor: 3.162

10. Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq.

Authors: Dan Dominissini; Sharon Moshitch-Moshkovitz; Schraga Schwartz; Mali Salmon-Divon; Lior Ungar; Sivan Osenberg; Karen Cesarkas; Jasmine Jacob-Hirsch; Ninette Amariglio; Martin Kupiec; Rotem Sorek; Gideon Rechavi
Journal: Nature Date: 2012-04-29 Impact factor: 49.962

87 in total

Review 1. Advances in the profiling of DNA modifications: cytosine methylation and beyond.

Authors: Nongluk Plongthongkum; Dinh H Diep; Kun Zhang
Journal: Nat Rev Genet Date: 2014-08-27 Impact factor: 53.242

2. Watching DNA breath one molecule at a time.

Authors: Jingyi Fei; Taekjip Ha
Journal: Proc Natl Acad Sci U S A Date: 2013-10-04 Impact factor: 11.205

Review 3. High-throughput sequencing offers new insights into 5-hydroxymethylcytosine.

Authors: Alina P S Pang; Christopher Sugai; Alika K Maunakea
Journal: Biomol Concepts Date: 2016-06-01

Review 4. Epigenetic Mechanisms and Hypertension.

Authors: Mingyu Liang
Journal: Hypertension Date: 2018-12 Impact factor: 10.190

5. Identification of DNA methylation changes at cis-regulatory elements during early steps of HSC differentiation using tagmentation-based whole genome bisulfite sequencing.

Authors: Daniel B Lipka; Qi Wang; Nina Cabezas-Wallscheid; Daniel Klimmeck; Dieter Weichenhan; Carl Herrmann; Amelie Lier; David Brocks; Lisa von Paleske; Simon Renders; Peer Wünsche; Petra Zeisberger; Lei Gu; Simon Haas; Marieke Ag Essers; Benedikt Brors; Roland Eils; Andreas Trumpp; Michael D Milsom; Christoph Plass
Journal: Cell Cycle Date: 2014 Impact factor: 4.534

Review 9. Epigenomics of hypertension.

Authors: Mingyu Liang; Allen W Cowley; David L Mattson; Theodore A Kotchen; Yong Liu
Journal: Semin Nephrol Date: 2013-07 Impact factor: 5.299

10. Understanding the Molecular Basis of RNA Polymerase II Transcription.

Authors: Su Zhang; Dong Wang
Journal: Isr J Chem Date: 2013-06 Impact factor: 3.333