| Literature DB >> 33974631 |
Brian P Anton1, Alexey Fomenkov1, Victoria Wu1,2, Richard J Roberts1.
Abstract
Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that "write" these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.Entities:
Year: 2021 PMID: 33974631 PMCID: PMC8112702 DOI: 10.1371/journal.pone.0247541
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1MFRE cleavage and formation of library inserts.
A) Recognition sites (blue) and cleavage positions (arrows) of three commercially available MFREs. B) Product of MspJI (recognition site in blue) cleavage of the fully methylated motif C (boxed), before and after end repair. The m5C residue on each strand is the 17th base from the 3’ end.
| Type | True Motif(s) | Apparent Motif |
|---|---|---|
| Multiple motifs | ||
| Non-palindromic | ||
| Base dependency | ||
| True degeneracy |
Fig 2Overview of MFRE-Seq.
Genomic DNA containing motifs that are fully methylated (red dots) or hemi-methylated (open red circles) is digested with one or more MFREs. Size selection enriches for the small fragments that result from MFRE cleavage of fully methylated sites, and sequencing libraries are prepared from these fragments (adapters in green). Sequence reads are then mined for motifs. The computational method for doing so described in this work involves binning reads by length, enriching for CCRM reads by base-filtering, aligning, and examining the base distribution at each position. Base distributions can also be represented as a sequence logo, as shown here.
Replicate experiment statistics, E. coli DHB4 genomic DNA.
| Replicate | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| MFRE | FspEI | FspEI | FspEI | MspJI |
| Platform | MiSeq | NextSeq | NextSeq | NextSeq |
| Multiplex | 2 | 9 | 9 | 9 |
| Pairs Merged | 7,925,782 | 15,288,614 | 19,851,338 | 13,396,381 |
| Pairs with Adapters | 7,118,719 | 14,941,064 | 19,027,975 | 13,258,050 |
| Pairs Discarded | 270,585 | 312,512 | 600,062 | 550,688 |
| Reads Matching Reference | 7,657,629 | 14,944,470 | 19,419,069 | 13,099,298 |
| Fraction Matching Reference | 0.966 | 0.977 | 0.978 | 0.978 |
| Unique Reads Matching Ref. | 191,145 | 230,260 | 243,416 | 876,198 |
| Mean Redundancy | 40 | 65 | 80 | 15 |
| Unrepresented CCWGG sites | 696 | 684 | 663 | 636 |
a Output from SeqPrep.
b Exact matches, no polymorphisms or indels.
Flank length analysis of all reference-matching, motif-containing reads derived from Illumina DHB4 run.
| 0 | 9 | 229 | 24,276 | 4,645 | 158 | 15,079 | |
| 87 | 90 | 10,295 | 1,563 | 16 | 658 | ||
| 737 | 120,429 | 19,919 | 335 | 2,382 | |||
| 4,161,371 | 1,440,996 | 27,914 | 148,368 | ||||
| 130,421 | 5,278 | 29,356 | |||||
| 40 | 910 | ||||||
| 7,306 | |||||||
Flank length analysis of base-filtered reference-matching, motif-containing reads derived from Illumina DHB4 run.
| 0 | 0 | 55 | 7,948 | 0 | 0 | 1,466 | |
| 0 | 15 | 1,160 | 0 | 0 | 2 | ||
| 737 | 120,429 | 0 | 0 | 425 | |||
| 4,161,371 | 0 | 0 | 38,066 | ||||
| 0 | 0 | 0 | |||||
| 0 | 0 | ||||||
| 372 | |||||||
Fig 3Example of theoretical fragment types generated by MFRE digestion.
For simplicity, DNA is drawn as a single line, methylated motifs as colored dots, and cut sites on either side as triangles with color corresponding to that of the motif. Fragments were classified as one of six categories: “motif-cleaved” (when exactly cut, these are CCMD fragments), “interstitial” (regions between motif-cleaved fragments), “overlap-short” and “overlap-long” (created by cutting CCWGG sites less than 30 bp apart), “concatenated” (reads spanning an expected cut site, which most often consist of a motif-containing CCMD fragment joined to an interstitial fragment), and “other” (created by more complicated situations such as 3 or more clustered motifs). (A) Examples of motif-cleaved, interstitial, and concatenated fragments. (B) and (C) Examples of different types of overlap fragments, depending on whether any cleavage occurs between the two nearby motifs.
Comparison of real sequence reads with theoretical digest fragments of E. coli DHB4.
| Exact | Approximate | One-Cut | Neither | |
|---|---|---|---|---|
| 11,695 | 50,573 | 736 | 0 | |
| (356x) | (34x) | (3.9x) | (n/a) | |
| 0 | 5,687 | 57,922 | 20,555 | |
| (n/a) | (211x) | (3.2x) | (1.2x) | |
| 0 | 0 | 0 | 0 | |
| (143x) | (35x) | (n/a) | (n/a) | |
| 236 | 369 | 86 | 1 | |
| (12x) | (4.3x) | (7.0x) | (1.0x) | |
| 0 | 7,198 | 1,696 | 54 | |
| (n/a) | (33x) | (4.1x) | (2.3x) | |
| 0 | 0 | 14,525 | 19,812 | |
| (n/a) | (n/a) | (6.0x) | (1.3x) |
a For each category, the top line (Roman type) shows the number of unique sequence reads, the middle line (italic) shows the number of all sequence reads, and the bottom line (in parentheses) shows the copy number of the reads in this category (all/unique).
Fig 4Diagnostic statistics from Illumina sequencing of FspEI-digested E. coli K-12 DHB4.
This strain is methylated by Dcm at C sites, resulting in 31 nt CCMD reads (dotted vertical line). All numbers are for reference-matched reads. (A) Total number of reads of each length. (B) Mean read copy number of each length. (C) Fraction of reads of each length that passed base filtering.
Motifs determined using the Illumina MiSeq platform.
| Sample | Enz | Plex | Merged Ref | Motif(s) | Sites Detected | % Detected |
|---|---|---|---|---|---|---|
| F | 2 | 7,657,629 | 11,625/12,321 | 94.3 | ||
| M | 8 | 39,168 | 1,351/2,503 | 54.0 | ||
| 821/11,706 (736/2,719) | 7.0 (27.1) | |||||
| M | 6 | 279,772 | 533/560 | 95.2 | ||
| 452/1,029 | 43.9 | |||||
| M.HhaI clone | M | 9 | 1,259,223 | 10,021/11,936 | 84.0 | |
| 6,219/32,532 (4,626/8,173) | 19.1 (56.6) | |||||
| M | 9 | 432,800 | 1,168/1,311 | 89.1 | ||
| F | 9 | 585,880 | 240/6,354 (4/4) | 3.8 (100) | ||
| M.AvaII clone | M | 9 | 2,845 | 273/2,792 (103/303) | 9.8 (100) |
a Enz = enzyme used for digestion (M = MspJI, F = FspEI, L = LpnPI). Plex = number of multiplexed samples in this run. Merg ref = total number of merged reads exactly matching the reference. Sites detected = fraction of all sites in the genome for which (16,16) reads were detected in the sequence data.
b Additional bases called outside recognition sequence due to cutting constraints. Sites and % detected are reported for the site as written, followed by the results for the “constrained” site (e.g., YTCGAR is the “constrained” version of TCGA for a MspJI-cleaved library) in parentheses.
c The E. coli strain used for this clone was Dcm+, resulting in the discovery of both the Dcm and M.HhaI motifs.
d With only 4 cleavable sites in this genome, this motif was identified only by manual inspection.
Fig 5Bar graph of random read analysis.
For each combination of G+C content and fraction of non-CCMD reads (horizontal axes), we determined the largest number of reads at which the motif was inaccurately called and added one to this value. The number of reads required to accurately call the motif (vertical axis) was calculated as the mean of 25 replicate determinations.
Motifs determined using the Ion Torrent platform.
| Sample | Enz | Plex | Merged Ref | Motif(s) | Sites Detected | % Detected |
|---|---|---|---|---|---|---|
| M | 8 | 39400 | 1,201/3,757 | 32.0 | ||
| M | 6 | 84413 | 260/956 (71/98) | 27.2 (72.4) | ||
| MFL | 8 | 10023 | 1,628/2,676 | 60.8 | ||
| MFL | 8 | 127746 | 5,897/6,751 | 87.4 | ||
| MF | 8 | 302550 | 7,411/9,288 | 79.8 | ||
| MF | 8 | 40618 | 184/2,152 (132/144) | 8.6 (91.7) | ||
| MF | 8 | 14435 | 147/2,154 (113/144) | 6.8 (78.5) | ||
| M | 8 | 235919 | 1,208/3,048 | 39.6 | ||
| 405/1,762 (306/501) | 23.0 (61.1) | |||||
| 587/768 | 76.4 | |||||
| 841/6,971 | 12.1 | |||||
| F | 8 | 195751 | 648/768 | 84.4 | ||
| M | 8 | 66713 | 350/413 | 84.7 | ||
| 685/748 | 91.6 | |||||
| 395/1,846 (260/389) | 21.4 (66.8) | |||||
| 47/157 (14/42) | 29.9 (33.3) | |||||
| F | 8 | 56919 | 133/157 | 84.7 | ||
| M | 8 | 124281 | 3,601/5,878 | 61.3 | ||
| F | 8 | 99275 | 1,908/5,878 (252/262) | 32.5 (96.2) | ||
| M | 11 | 484558 | 3,050/8,133 | 37.5 | ||
| M | 11 | 56627 | 552/1,897 (303/409) | 29.1 (74.1) | ||
| 743/1,039 | 71.5 | |||||
| F | 11 | 36167 | 1,141/1,897 | 60.1 | ||
| M | 11 | 271315 | 1,145/1,266 | 90.4 | ||
| F | 11 | 119489 | 5,805/10,467 | 55.5 | ||
| M | 11 | 291742 | 2,348/20,379 | 11.5 | ||
| F | 11 | 377335 | 1,937/20,379 | 9.5 | ||
| M | 11 | 44019 | 253/8,622 (153/1,695) | 2.9 (9.0) | ||
| M | 11 | 166554 | 2,545/3,412 | 74.6 | ||
| F | 11 | 174303 | 3,073/3,412 | 90.1 | ||
| M | 11 | 228420 | 2,423/18,633 (362/803) | 13.0 (45.1) | ||
| 405/563 | 71.9 | |||||
| F | 11 | 3457 | 125/6,991 | 1.8 | ||
| M | 5 | 592703 | 3,295/4,587 | 71.8 | ||
| 494/2,037 | 24.3 | |||||
| F | 5 | 270012 | 271/4,587 (133/174) | 5.9 (76.4) | ||
| MF | 4 | 352337 | 320/8,549 (304/1,324) | 3.7 (23.0) | ||
| M | 4 | 13718 | 290/13,771 (263/5,523) | 2.1 (5.2) |
a Enz = enzyme used for digestion (M = MspJI, F = FspEI, L = LpnPI; some digests were performed with more than one enzyme in combination). Plex = number of multiplexed samples in this run. Merged ref = total number of merged reads exactly matching the reference. Sites detected = fraction of all sites in the genome for which (16,16) or (16,17) reads were detected in the sequence data.
b Additional bases called outside recognition sequence due to cutting constraints. Sites and % detected are reported for the site as written, followed by the results for the “constrained” site (e.g., YT is the “constrained” version of T for a MspJI-cleaved library) in parentheses.
c Requires off-target cleavage by the MFRE.
d This motif appears as the combination C and C.
e Due to its non-palindromic nature, this motif appears as S, with methylation exclusively at C sites. The extra C appears due to cleavage constraints by FspEI and MspJI.
Read data for M.AvaVIII.
| Motif | Sites in genome | Fraction of Motif Sites | Sites with Reads | Fraction of Sites with Reads |
|---|---|---|---|---|
| 6354 | 1.000 | 302 | 0.048 | |
| 2 | 0.000 | 0 | 0.000 | |
| 3 | 0.000 | 1 | 0.333 | |
| 64 | 0.010 | 2 | 0.031 | |
| 0 | 0.000 | 0 | 0.000 | |
| 300 | 0.047 | 9 | 0.030 | |
| 165 | 0.026 | 68 | 0.412 | |
| 5268 | 0.829 | 149 | 0.028 | |
| 60 | 0.009 | 2 | 0.033 | |
| 3 | 0.000 | 1 | 0.333 | |
| 4 | 0.001 | 4 | 1.000 | |
| 157 | 0.025 | 61 | 0.389 | |
| 1 | 0.000 | 0 | 0.000 | |
| 4 | 0.001 | 0 | 0.000 | |
| 1 | 0.000 | 0 | 0.000 | |
| 317 | 0.050 | 5 | 0.016 | |
| 5 | 0.001 | 0 | 0.000 | |
| 6354 | 1.000 | 302 | 0.048 |
a Fraction of motif sites = fraction of the 6354 NCGATCGN sites that each sequence represents. Sites with reads = number of sites for which eat least one (16,16), (16,17), or (15,16) read was identified. Fraction of sites with reads = sites with reads / sites in genome.