| Literature DB >> 30635580 |
Satoshi Hiraoka1,2, Yusuke Okazaki3, Mizue Anda4, Atsushi Toyoda5, Shin-Ichi Nakano3, Wataru Iwasaki6,7,8.
Abstract
DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the 'metaepigenomes' of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30635580 PMCID: PMC6329791 DOI: 10.1038/s41467-018-08103-y
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Statistics of SMRT sequencing and CCS-read analysis
| Sample | biwa_5m | biwa_65m |
|---|---|---|
| Sequenced reads | 850,494 | 688,436 |
| Total base pairs (bp) | 9,570,723,004 | 6,419,717,083 |
| CCS reads | 168,599 | 117,802 |
| Read length (bp) | 4474 ± 931 | 4394 ± 587 |
| Total base (bp) | 754,416,328 | 517,663,806 |
| 16S rRNA | 170 | 106 |
| Length (bp) | 1491 ± 64 | 1468 ± 104 |
Fig. 1Phylogenetic distribution of CCS reads. Estimated relative abundances at the a domain, b phylum, and c class levels are shown. Eukaryotic and viral reads are ignored, and groups with <1% abundance are grouped as ‘Other’ in b, c
Fig. 2Genome binning of the assembled contigs. Each circle represents a contig, where the color and size represent its assigned bin and total sequence length, respectively. Contigs not assigned to any bin are indicated in gray (named ‘NA’). The x-axis and y-axis represent GC% and genome coverage, respectively
Statistics for draft genomes
| Genome ID | Lineage | Estimated genome size (Mbp) | Contigs | N50 (bp) | GC content (%) | Completeness (%) | Contamination (%) | 16S rRNA | CDSs | CCS-read coverage | Methylated motifs | MTases |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BS1 | Bacteria; Chloroflexia | 2.24 | 21 | 64,528 | 59.5 | 30.6 | 0.0 | 0 | 751 | 5.79 | 3 | 0 |
| BS2 | Bacteria; Actinobacteriaa | 1.57 | 13 | 28,617 | 40.6 | 16.9 | 0.0 | 0 | 363 | 5.13 | 0 | 0 |
| BS3 | Bacteria; Chloroflexi; Anaerolineae; Anaerolineales; Anaerolineaceae; uncultured; uncultured Crater Lake bacterium CL500-11 | 3.35 | 36 | 58,996 | 61.8 | 49.1 | 0.0 | 1 | 1646 | 6.91 | 3 | 3 |
| BS4 | Bacteria; Actinobacteria; Acidimicrobiia; Acidimicrobiales; Acidimicrobiaceae; CL500-29 marine group | 2.31 | 40 | 61,750 | 49.8 | 76.8 | 1.3 | 1 | 2066 | 6.67 | 0 | 0 |
| BS5 | Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured | 1.51 | 8 | 190,417 | 44.2 | 71.6 | 0.0 | 1 | 1209 | 10.02 | 0 | 0 |
| BS6 | Bacteria; Verrucomicrobia; Opitutae; Opitutae vadinHA64; uncultured bacterium | 2.27 | 37 | 100,045 | 63.4 | 89.2 | 0.7 | 1 | 1889 | 6.85 | 0 | 1 |
| BS7 | Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured | 1.49 | 6 | 470,028 | 42.1 | 58.4 | 0.6 | 1 | 948 | 9.26 | 0 | 0 |
| BS8 | Bacteria; Verrucomicrobiab | 2.71 | 34 | 102,020 | 61.2 | 82.5 | 2.0 | 0 | 2121 | 7.34 | 1 | 1 |
| BS9 | Bacteria; Actinobacteriab | 1.65 | 3 | 315,861 | 45.5 | 37.6 | 0.0 | 0 | 677 | 12.09 | 0 | 0 |
| BS10 | Bacteria; Verrucomicrobia; Opitutae; Opitutae vadinHA64; uncultured bacterium | 2.55 | 24 | 1,672,582 | 68.4 | 95.9 | 2.7 | 1 | 2165 | 17.93 | 1 | 1 |
| BS11 | Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured actinobacterium | 1.03 | 3 | 365,154 | 46.3 | 62.1 | 0.0 | 1 | 675 | 10.28 | 0 | 0 |
| BS12 | Bacteria; Proteobacteria; Betaproteobacteria; Methylophilales; Methylophilaceae; | 1.40 | 10 | 169,468 | 37.3 | 80.7 | 0.4 | 1 | 1289 | 8.37 | 1 | 0 |
| BS13 | Bacteria; Actinobacteria; Actinobacteriaa | 1.49 | 5 | 47,968 | 41.3 | 19.0 | 0.0 | 0 | 351 | 7.56 | 0 | 0 |
| BS14 | Proteobacteria; Alphaproteobacteria; Pelagibacteralesa | 1.02 | 6 | 222,441 | 29.4 | 88.6 | 0.0 | 0 | 1075 | 20.45 | 1 | 1 |
| BS15 | Bacteria; Bacteroidetes; Sphingobacteriia; Sphingobacteriales; Chitinophagaceae; Filimonas; uncultured bacterium | 4.08 | 44 | 45,979 | 42.4 | 43.1 | 0.1 | 1 | 1908 | 5.57 | 6 | 6 |
| BD1 | Bacteria; Chloroflexia | 2.89 | 30 | 157,947 | 60.9 | 90.9 | 0.9 | 0 | 2429 | 45.74 | 3 | 3 |
| BD2 | Bacteria; Nitrospiraea | 1.92 | 11 | 313,929 | 57.6 | 93.9 | 0.9 | 0 | 1890 | 8.01 | 1 | 2 |
| BD3 | Archaea; Thaumarchaeota; Marine Group I; Unknown Order; Unknown Family; | 1.48 | 10 | 250,506 | 33.0 | 98.5 | 1.9 | 1 | 1869 | 13.93 | 2 | 2 |
| BD4 | Bacteria; Verrucomicrobiab | 2.09 | 49 | 46,663 | 65.9 | 81.5 | 0.7 | 0 | 1705 | 5.98 | 0 | 0 |
aEstimated using CAT
bEstimated using Kaiju
Detected methylated motifs
| Genome ID | Detected methylated motif | Modification type | Motif in REBASE | Number of methylated sites | Number of motif sequences | Methylation ratio (%) | Mean modification QV | Mean subread coverage |
|---|---|---|---|---|---|---|---|---|
| BS1 | G | m6A | Yes | 1813 | 2070 | 87.6 | 58.0 | 35.2 |
| TTA | m6A | Yes | 1264 | 1522 | 83.0 | 55.5 | 34.1 | |
| G | m4C | Yes | 3026 | 15,948 | 19.0 | 38.4 | 40.6 | |
| BS3 | m6A | Yes | 3724 | 4014 | 92.8 | 66.1 | 41.3 | |
| TTA | m6A | Yes | 3036 | 3338 | 91.0 | 62.4 | 40.4 | |
| G | m4C | Yes | 13,821 | 54,026 | 25.6 | 39.5 | 46.4 | |
| BS8 | m6A | No | 80 | 276 | 29.0 | 39.6 | 65.8 | |
| BS10 | ACG | m6A | No | 1986 | 7185 | 27.6 | 45.0 | 171.4 |
| BS12 | GMAG | m4C | No | 169 | 220 | 76.8 | 50.9 | 83.5 |
| HCAG | m4C | No | 124 | 293 | 42.3 | 46.8 | 79.0 | |
| BGMAG | m4C | No | 78 | 185 | 42.2 | 46.3 | 76.3 | |
| BS14 | G | m6A | Yes | 2856 | 2880 | 99.2 | 190.6 | 166.9 |
| BS15 | G | m6A | Yes | 1309 | 1472 | 88.9 | 55.6 | 30.9 |
| m6A | No | 642 | 726 | 88.4 | 56.0 | 29.4 | ||
| m6A | No | 619 | 726 | 85.3 | 52.0 | 29.8 | ||
| m6A | No | 311 | 349 | 89.1 | 56.9 | 30.4 | ||
| C | m6A | No | 293 | 349 | 84.0 | 53.3 | 30.9 | |
| CA | m6A | No | 205 | 256 | 80.1 | 49.4 | 29.1 | |
| CA | m6A | No | 164 | 214 | 76.6 | 48.7 | 28.7 | |
| TT | m6A | No | 87 | 99 | 87.9 | 51.3 | 29.8 | |
| m6A | No | 77 | 99 | 77.8 | 49.4 | 29.7 | ||
| GYT | m6A | No | 76 | 89 | 85.4 | 56.0 | 31.3 | |
| CYA | m6A | No | 59 | 127 | 46.5 | 53.5 | 32.6 | |
| BD1 | G | m4C | Yes | 72,730 | 77,932 | 93.3 | 140.2 | 297.3 |
| G | m6A | Yes | 6754 | 6844 | 98.7 | 346.3 | 281.7 | |
| TTA | m6A | Yes | 5475 | 5564 | 98.4 | 325.3 | 270.9 | |
| BD2 | TANGG | m6A | No | 1276 | 1367 | 93.3 | 64.4 | 48.5 |
| BD3 | G | m6A | Yes | 9446 | 9618 | 98.2 | 122.1 | 93.7 |
| AG | m4C | Yes | 5974 | 6224 | 96.0 | 84.0 | 92.1 |
R = A/G, M = A/C, W = A/T, S = C/G, Y = C/T, K = G/T, H = A/C/T, B = C/G/T, D = A/G/T, V = A/C/G, N = A/C/G/T
Underlined bold face indicates methylation sites
Detected MTases, REases, and specificity subunit genes
| Genome ID | CDS ID | Gene type | Top-hit protein in REBASE | Identity (%) | Recognition motif of the closest-match MTase | Modification type | RM type | RM system | TRD divergence | Motif detected | MTase name | Confirmed recognition motif |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BS3 | EMGBS3_04270 | M | M.SstE37II | 58.9 | G | m6A | II | No | No | Yes | ||
| EMGBS3_09240 | M | M.Sth20745I | 71.4 | TTA | m6A | II | No | No | Yes | |||
| EMGBS3_12600 | M | M1.BceSIII | 22.9 | A | m4C | II | No | Yes | No | M.AbaBS3I | G | |
| BS6 | EMGBS6_08960 | M | M.SinI | 57.0 | GGW | m5C | II | No | No | No | ||
| BS8 | EMGBS8_10720 | R | DvuI | 36.3 | ? | – | I | – | – | – | ||
| EMGBS8_10740 | S | S.PveNS15I | 32.4 | ? | – | I | – | Yes | – | |||
| EMGBS8_10750 | M | M.RbaNRL2II | 55.6 | ACG | m6A | I | Yes | – | No | |||
| BS10 | EMGBS10_10070 | RM | CjeFIII | 23.7 | GCA | m6A | II | Yes | Yes | No | M.ObaBS10I | ACG |
| BS14 | EMGBS14_10020 | M | M.Bsp460I | 56.7 | G | m6A | II | No | No | Yes | ||
| BS15 | EMGBS15_02830 | M | M.Bli37I | 56.6 | G | m6A | I | Yes | – | No | ||
| EMGBS15_02840 | M | M.EcoNIH1III | 59.2 | G | m6A | I | Yes | – | No | |||
| EMGBS15_02870 | S | S.PveNS15I | 47.2 | ? | – | I | – | Yes | – | |||
| EMGBS15_02930 | R | DvuI | 38.4 | ? | – | I | – | – | – | |||
| EMGBS15_03820 | M | M.EcoGI | 25.8 | Nonspecific | m6A | II | Yes | Yes | No | M.FspBS16I | G | |
| EMGBS15_03830 | R | XmnI | 34.0 | GAANNNNTTC | – | II | – | – | – | |||
| EMGBS15_04560 | R | GmeII | 33.8 | TCCAGG | – | III | – | – | – | |||
| EMGBS15_04600 | M | M.FpsJII | 53.4 | CGC | m6A | III | Yes | No | No | |||
| EMGBS15_05670 | M | M.FnuDI | 59.8 | GGCCa | m4C | II | Yes | No | No | |||
| EMGBS15_05690 | R | BhaII | 45.6 | GGCC | – | II | – | – | – | |||
| EMGBS15_12460 | M | M.Mva1261III | 37.1 | CT | m6A | I | No | No | No | |||
| BD1 | EMGBD1_08400 | M | M.Sth20745I | 71.0 | TTA | m6A | II | No | No | Yes | ||
| EMGBD1_09320 | M | M1.BceSIII | 22.9 | A | m4C | II | No | Yes | No | M.AbaBS3I | G | |
| EMGBD1_19510 | M | M.SstE37II | 58.9 | G | m6A | II | No | No | Yes | |||
| BD2 | EMGBD2_08760 | M | M.HgiDII | 55.0 | GTCGACa | m5C | II | Yes | No | No | ||
| EMGBD2_08790 | RM | AquIV | 28.5 | GRGGA | m6A | II | Yes | Yes | No | M.NbaBD2I | TAHGG | |
| EMGBD2_08800 | R | LpnPI | 56.3 | CCDG | – | II | – | – | – | |||
| BD3 | EMGBD3_00670 | M | M.Mma5219II | 45.9 | AG | m4C | II | No | No | Yes | ||
| EMGBD3_01960 | M | M.AvaVI | 50.3 | G | m6A | II | No | No | Yes |
Underlined bold face indicates methylation sites
M: methyltransferase, R: restriction endonuclease, S: specificity subunit
aModified base undetermined
Fig. 3REase digestion assays. a Assay of the EMGBS3_12600 gene (and EMGBD1_09320, which has the same amino-acid sequence). BceAI and TseI were used, where the plasmid contained 12 (ACGGC) and 21 (GCWGC) target sites, respectively. Plasmid DNAs were linearized using SalI before the assay. An NEB 2-log DNA ladder was employed as a size marker. b Assay of the EMGBS15_03820 gene. DpnII and XmnI were used, where the plasmid contained 27 (GATC) and 2 (GAANNNNTTC) target sites, respectively