Literature DB >> 30635580

Metaepigenomic analysis reveals the unexplored diversity of DNA methylation in an environmental prokaryotic community.

Satoshi Hiraoka1,2, Yusuke Okazaki3, Mizue Anda4, Atsushi Toyoda5, Shin-Ichi Nakano3, Wataru Iwasaki6,7,8.   

Abstract

DNA methylation plays important roles in prokaryotes, and their genomic landscapes-prokaryotic epigenomes-have recently begun to be disclosed. However, our knowledge of prokaryotic methylation systems is focused on those of culturable microbes, which are rare in nature. Here, we used single-molecule real-time and circular consensus sequencing techniques to reveal the 'metaepigenomes' of a microbial community in the largest lake in Japan, Lake Biwa. We reconstructed 19 draft genomes from diverse bacterial and archaeal groups, most of which are yet to be cultured. The analysis of DNA chemical modifications in those genomes revealed 22 methylated motifs, nine of which were novel. We identified methyltransferase genes likely responsible for methylation of the novel motifs, and confirmed the catalytic specificities of four of them via transformation experiments using synthetic genes. Our study highlights metaepigenomics as a powerful approach for identification of the vast unexplored variety of prokaryotic DNA methylation systems in nature.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30635580      PMCID: PMC6329791          DOI: 10.1038/s41467-018-08103-y

Source DB:  PubMed          Journal:  Nat Commun        ISSN: 2041-1723            Impact factor:   14.919


Introduction

DNA methylation is a major class of epigenetic modification that is found in diverse prokaryotes, in addition to eukaryotes[1]. For example, prokaryotic DNA methylation by sequence-specific restriction-modification (RM) systems that protect host cells from invasion by phages or extracellular DNA has been well characterized and is utilized as a key tool in biotechnology[2-4]. In addition, recent studies have revealed that prokaryotic DNA methylation plays additional roles, performing various biological functions, including regulation of gene expression, mismatch DNA repair, and cell cycle functions[5-9]. Research interest in the diversity of prokaryotic methylation systems is therefore growing due to their importance in microbial physiology, genetics, evolution, and disease pathogenicity[7,10]. However, our knowledge of the diversity of prokaryotic methylation systems has been severely limited thus far because most studies focus only on the rare prokaryotes that are cultivable in laboratories. The recent development of single-molecule real-time (SMRT) sequencing technology provides us with another tool for observing DNA methylation. An array of DNA methylomes of cultivable prokaryotic strains, including N6-methyladenine (m6A), 5-methylcytosine (m5C), and N4-methylcytosine (m4C) modifications, have been revealed by this technology[11-14]. Despite its high rates of base-calling and modification detection errors per raw read[15,16], SMRT sequencing technology can produce ultralong reads of up to 60 kbp with few context-specific biases (e.g., GC bias)[17]. This characteristic enables SMRT sequencing to achieve high accuracy by merging data from many erroneous raw reads originating from clonal DNA molecules, typically from cultivated prokaryotic populations[18]. Alternatively, in an approach referred to as circular consensus sequencing (CCS), a circular DNA library is prepared as a sequence template to allow the generation of a single ultralong raw read containing multiple sequences (‘subreads’) that correspond to the same stretch on the template[19,20]; therefore, a cultivated clonal population is not required[21]. However, CCS has thus far been applied in only a few shotgun metagenomics studies[22] and, to the best of our knowledge, has not yet been applied to ‘metaepigenomics’ or direct methylome analysis of environmental microbial communities, which are usually constituted by uncultured prokaryotes. Here, we applied CCS to shotgun metagenomic and metaepigenomic analyses of freshwater microbial communities in Lake Biwa, the largest lake in Japan, to reveal the genomic and epigenomic characteristics of the environmental microbial communities using the PacBio Sequel platform (Supplementary Fig. 1a). Freshwater lakes are of economical and social importance, where microbes constitute the bases of their ecosystems[23]. In addition, freshwater habitats are rich in phage–prokaryote interactions[24-27], which can affect prokaryotic DNA methylation. We report that our CCS analyses of the environmental microbial samples allowed reconstruction of draft genomes and the identification of their methylated motifs, at least nine of which were novel. Furthermore, we computationally predicted and experimentally confirmed four methyltransferases (MTases) responsible for the detected methylated motifs. Importantly, two of the four MTases were revealed to recognize novel motif sequences.

Results and Discussion

Water sampling, SMRT sequencing, and circular consensus analysis

Water samples were collected at a pelagic site in Lake Biwa, Japan, at 5 m (biwa_5m) and 65 m depths (biwa_65m), from which PacBio Sequel produced a total of 2.6 million (9.6 Gbp) and 2.0 million (6.4 Gbp) subreads, respectively (Table 1). The circular consensus analysis produced 168,599 and 117,802 CCS reads, with lengths of 4474 ± 931 and 4394 ± 587 bp, respectively (Table 1 and Supplementary Fig. 2). In the shallow sample data, at least 90% of the CCS reads showed high quality (Phred quality scores > 20) at each base position, except for the 5′-terminal five bases and 3′-terminal bases after the 5638th base. In the deep sample data, the same was true, except for the 5′-terminal four bases and 3′-terminal bases after the 5356th base (Supplementary Fig. 3).
Table 1

Statistics of SMRT sequencing and CCS-read analysis

Samplebiwa_5mbiwa_65m
Sequenced reads850,494688,436
 Total base pairs (bp)9,570,723,0046,419,717,083
CCS reads168,599117,802
 Read length (bp)4474 ± 9314394 ± 587
 Total base (bp)754,416,328517,663,806
16S rRNA170106
 Length (bp)1491 ± 641468 ± 104
Statistics of SMRT sequencing and CCS-read analysis

Taxonomic analysis

Taxonomic assignment of the CCS reads was performed using Kaiju[28] and the National Center for Biotechnology Information non-redundant (NCBI nr) database[29] (Fig. 1). The assignment ratios were >88% and >56% at the phylum and genus levels, respectively, which were higher than those for the Illumina-based shotgun metagenomic analysis of lake freshwater and other environments using the same computational method[28]. Kraken[30] with complete prokaryotic and viral genomes in RefSeq[31] (Supplementary Fig. 4a–c) provided similar results but resulted in much lower assignment ratios (30% and 27%, respectively), likely due to the lack of genomic data for freshwater microbes in RefSeq. The 16S ribosomal RNA (rRNA) sequence-based taxonomic assignment via blastn searches against the SILVA database[32] also provided consistent results (Supplementary Fig. 4d–f). It should be noted that 16S rRNA-based and CDS-based taxonomic assignments can be affected by 16S rRNA gene copy numbers and genome sizes, respectively.
Fig. 1

Phylogenetic distribution of CCS reads. Estimated relative abundances at the a domain, b phylum, and c class levels are shown. Eukaryotic and viral reads are ignored, and groups with <1% abundance are grouped as ‘Other’ in b, c

Phylogenetic distribution of CCS reads. Estimated relative abundances at the a domain, b phylum, and c class levels are shown. Eukaryotic and viral reads are ignored, and groups with <1% abundance are grouped as ‘Other’ in b, c At the phylum level, Proteobacteria dominated both samples, followed by Actinobacteria, Verrucomicrobia, and Bacteroidetes (Fig. 1). Chloroflexi and Thaumarchaeota were especially abundant in the deep water sample, consistent with previous findings[33,34]. The ratio of Archaea was particularly low in the shallow sample (0.6 and 6.9% in biwa_5m and biwa_65m, respectively). Although the filter pore-size range (5–0.2 μm) was not suitable for most viruses and eukaryotic cells, non-negligible ratios corresponding to their existence were observed in the shallow sample. The dominant eukaryotic phylum was Opisthokonta (2.68 and 0.92%), followed by Alveolata (1.67 and 0.45%) and Stramenopiles (1.45 and 0.15%). Among viruses, Caudovirales and Phycodnaviridae were the most abundant families in both samples. Caudovirales are known to act as bacteriophages, while Phycodnaviridae primarily infect eukaryotic algae. The third most abundant viral family was Mimiviridae, whose members are also known as ‘Megavirales’ due to their large genome size (0.6–1.3 Mbp)[35,36]. Viruses without double-stranded DNA (i.e., single-stranded DNA and RNA viruses) were not observed because of the experimental method employed. Overall, the taxonomic composition was consistent with those obtained in previous studies on microbial communities in freshwater lake environments, reflecting the fact that SMRT sequencing provides taxonomic compositions consistent with those obtained using short-read technologies, such as the Illumina MiSeq and HiSeq platforms[37,38].

Metagenomic assembly and genome binning

The CCS reads from the shallow and deep samples were assembled into 599 and 429 contigs, respectively, using Canu[18]. After removing 45 (7.5%) and 84 (19.6%) repetitive contigs, we retrieved 554 and 345 contigs, respectively (Supplementary Table 3). The corresponding N50 values were 83 and 76 kbp, and the longest contigs had lengths of 481 and 740 kbp, respectively. Notably, the contigs were much longer than those obtained in a previous study that applied CCS for shotgun metagenomics analysis of an active sludge microbial community[22]. We also used Mira[39] for metagenomic assembly, but this resulted in shorter longest contigs (148 and 151 kbp, respectively) and N50 values (19 and 18 kbp, respectively). The contigs were binned to genomes using MetaBAT[40], which is a reference-independent binning tool, based on CCS-read coverage and tetranucleotide frequency (Fig. 2 and Table 2). Among a total of 554 and 345 contigs, 290 (52.3%) and 100 (29.0%) were assigned to 15 and 4 bins from the shallow and deep samples, respectively. In total, 46.9 and 44.8% of the CCS reads could be mapped to the draft genomes for the shallow and deep samples, respectively. We obtained a draft genome for each bin, where the completeness of the genome ranged from 17 to 99% (67% on average). Estimated contamination levels were low (<3% in each draft genome). Based on the total contig size and estimated genome completeness of each draft genome, the genome sizes were estimated to range from 1.0 to 5.6 Mbp. The GC content ranged from 29 to 68%, and the N50 was 24 kbp on average, with a maximum of 1.67 Mbp.
Fig. 2

Genome binning of the assembled contigs. Each circle represents a contig, where the color and size represent its assigned bin and total sequence length, respectively. Contigs not assigned to any bin are indicated in gray (named ‘NA’). The x-axis and y-axis represent GC% and genome coverage, respectively

Table 2

Statistics for draft genomes

Genome IDLineageEstimated genome size (Mbp)ContigsN50 (bp)GC content (%)Completeness (%)Contamination (%)16S rRNACDSsCCS-read coverageMethylated motifsMTases
BS1Bacteria; Chloroflexia2.242164,52859.530.60.007515.7930
BS2Bacteria; Actinobacteriaa1.571328,61740.616.90.003635.1300
BS3Bacteria; Chloroflexi; Anaerolineae; Anaerolineales; Anaerolineaceae; uncultured; uncultured Crater Lake bacterium CL500-113.353658,99661.849.10.0116466.9133
BS4Bacteria; Actinobacteria; Acidimicrobiia; Acidimicrobiales; Acidimicrobiaceae; CL500-29 marine group2.314061,75049.876.81.3120666.6700
BS5Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured Clavibacter sp.1.518190,41744.271.60.01120910.0200
BS6Bacteria; Verrucomicrobia; Opitutae; Opitutae vadinHA64; uncultured bacterium2.2737100,04563.489.20.7118896.8501
BS7Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured Candidatus Planktophila sp.1.496470,02842.158.40.619489.2600
BS8Bacteria; Verrucomicrobiab2.7134102,02061.282.52.0021217.3411
BS9Bacteria; Actinobacteriab1.653315,86145.537.60.0067712.0900
BS10Bacteria; Verrucomicrobia; Opitutae; Opitutae vadinHA64; uncultured bacterium2.55241,672,58268.495.92.71216517.9311
BS11Bacteria; Actinobacteria; Actinobacteria; Frankiales; Sporichthyaceae; hgcI clade; uncultured actinobacterium1.033365,15446.362.10.0167510.2800
BS12Bacteria; Proteobacteria; Betaproteobacteria; Methylophilales; Methylophilaceae; Candidatus Methylopumilus; uncultured bacterium1.4010169,46837.380.70.4112898.3710
BS13Bacteria; Actinobacteria; Actinobacteriaa1.49547,96841.319.00.003517.5600
BS14Proteobacteria; Alphaproteobacteria; Pelagibacteralesa1.026222,44129.488.60.00107520.4511
BS15Bacteria; Bacteroidetes; Sphingobacteriia; Sphingobacteriales; Chitinophagaceae; Filimonas; uncultured bacterium4.084445,97942.443.10.1119085.5766
BD1Bacteria; Chloroflexia2.8930157,94760.990.90.90242945.7433
BD2Bacteria; Nitrospiraea1.9211313,92957.693.90.9018908.0112
BD3Archaea; Thaumarchaeota; Marine Group I; Unknown Order; Unknown Family; Candidatus Nitrosoarchaeum1.4810250,50633.098.51.91186913.9322
BD4Bacteria; Verrucomicrobiab2.094946,66365.981.50.7017055.9800

aEstimated using CAT

bEstimated using Kaiju

Genome binning of the assembled contigs. Each circle represents a contig, where the color and size represent its assigned bin and total sequence length, respectively. Contigs not assigned to any bin are indicated in gray (named ‘NA’). The x-axis and y-axis represent GC% and genome coverage, respectively Statistics for draft genomes aEstimated using CAT bEstimated using Kaiju The 19 draft genomes belonged to 7 phyla (Table 2 and Supplementary Fig. 5). Among these draft genomes, 10 contained 16S rRNA genes, and many of them showed top hits to uncultured clades; thus, our CCS-based approach was estimated to have truly targeted multiple uncultured prokaryotes. Seven draft genomes were predicted to belong to the phylum Actinobacteria, including Candidatus Planktophila (BS7), one of the most dominant bacterioplankton lineages in freshwater systems[23,41]. The draft genomes affiliated with other dominant freshwater lineages were also recovered, including Candidatus Methylopumilus (BS12)[42], the freshwater lineage (LD12) of Pelagibacterales (BS14)[43,44], and Nitrospirae (BD2) and Candidatus Nitrosoarchaeum (BD3), the predominant nitrifying bacteria and archaea in the hypolimnion[33,34]. Four draft genomes were affiliated with the phylum Verrucomicrobia (BS6, BS8, BS10, and BD4), in line with a previous study[45]. The BS3 and BD1 draft genomes likely represent members of the CL500-11 group (class Anaerolineae) of the Chloroflexi phylum, where BD1 presented the highest coverage of >45×. This group is a dominant group in the hypolimnion of Lake Biwa and is frequently found in deep oligotrophic freshwater environments worldwide[46]. Although Proteobacteria is the most dominated phylum, two and no draft genomes were retrieved from the shallow and deep samples, respectively. Regarding the shallow sample, approximately one-fourth of the Proteobacteria CCS reads could be mapped to the two draft genomes, which means three-fourths of them likely originated from minor and diverse Proteobacteria clades. Overall, the phylogeny of the reconstructed genomes likely reflects the major lineages that are yet to be cultured but are dominantly present in the water of Lake Biwa.

Metaepigenomic analysis

A total of 29 candidate methylated motifs were detected in 10 draft genomes (Table 3). Their methylation ratios ranged from 19 to 99%, which can be affected by modification detection power, i.e., these ratios are likely lower than the true methylation levels. The mapped subread coverages of the methylated motifs ranged from 28.7 to 297.3×. Three motifs from the Proteobacteria BS12 genome contained similar sequences (HCAGTKC, BGMAGTGD, and GMAGTKC, where B: C/G/T, D: A/G/T, H: A/C/T, K: G/T, and M: A/C, where the underlined bold face indicates methylation sites) that were likely due to incomplete detection of a single methylated motif or heterogeneous motif sequences between closely related lineages contained within that genome. A palindromic motif and five complementary motif pairs that likely reflect double-strand methylation were observed in the Bacteroidetes BS15 genome (e.g., a pair of GCNNNNNNCAT and TGNNNNNNGCT). It may also be notable that three draft genomes from the Chloroflexi phylum (BS1, BS3, and BD1) shared the same motif sequence set (GNTC, TTA, and GWGC, where W: A/T), likely due to evolutionarily shared methylation systems. Contigs in each draft genome showed a similar methylation pattern in general, providing additional epigenomic support of the quality of the genome binning (Supplementary Fig. 6).
Table 3

Detected methylated motifs

Genome IDDetected methylated motifModification typeMotif in REBASENumber of methylated sitesNumber of motif sequencesMethylation ratio (%)Mean modification QVMean subread coverage
BS1GANTCm6AYes1813207087.658.035.2
TTAAm6AYes1264152283.055.534.1
GCWGCm4CYes302615,94819.038.440.6
BS3GANTCm6AYes3724401492.866.141.3
TTAAm6AYes3036333891.062.440.4
GCWGCm4CYes13,82154,02625.639.546.4
BS8AGGNNNNNRTTTm6ANo8027629.039.665.8
BS10ACGAGm6ANo1986718527.645.0171.4
BS12GMAGCTKCm4CNo16922076.850.983.5
HCAGCTKCm4CNo12429342.346.879.0
BGMAGCTGDm4CNo7818542.246.376.3
BS14GANTCm6AYes2856288099.2190.6166.9
BS15GAANNNNTTCm6AYes1309147288.955.630.9
AGCNNNNNNCATm6ANo64272688.456.029.4
ATGNNNNNNGCTm6ANo61972685.352.029.8
AGCNNNNNNGTGm6ANo31134989.156.930.4
CACNNNNNNGCTm6ANo29334984.053.330.9
CAANNNNNNNNCTTGm6ANo20525680.149.429.1
CAAGNNNNNNNDTTGm6ANo16421476.648.728.7
TTAGNNNNNCCTm6ANo879987.951.329.8
AGGNNNNNCTAAm6ANo779977.849.429.7
GYTANNNNNNNTTRGm6ANo768985.456.031.3
CYAANNNNNNNTAVCHm6ANo5912746.553.532.6
BD1GCWGCm4CYes72,73077,93293.3140.2297.3
GANTCm6AYes6754684498.7346.3281.7
TTAAm6AYes5475556498.4325.3270.9
BD2TANGGABm6ANo1276136793.364.448.5
BD3GATCm6AYes9446961898.2122.193.7
AGCTm4CYes5974622496.084.092.1

R = A/G, M = A/C, W = A/T, S = C/G, Y = C/T,  K = G/T, H = A/C/T, B = C/G/T, D = A/G/T, V = A/C/G,  N = A/C/G/T

Underlined bold face indicates methylation sites

Detected methylated motifs R = A/G, M = A/C, W = A/T, S = C/G, Y = C/T,  K = G/T, H = A/C/T, B = C/G/T, D = A/G/T, V = A/C/G,  N = A/C/G/T Underlined bold face indicates methylation sites Overall, even if such similar, complementary, and shared motif sequences are considered, at least 9 motifs among the identified 22 motifs still presented no match to existing recognition sequences in the REBASE repository. This result demonstrates the existence of unexplored diversity of DNA methylation systems in environmental prokaryotes, which include many uncultured strains.

Known MTases that correspond to detected methylated motifs

To identify MTases that can catalyze the methylation reactions of the detected methylated motifs, systematic annotation of MTase genes was performed. Sequence similarity searches against known genes identified 20 MTase genes in nine draft genomes (sequence identities ranged from 23 to 71%) (Table 4). The most abundant group was Type II MTases, followed by Type I and Type III MTases, a trend that is consistent with the general MTase distribution[13,47]. Several genes encoding REases and DNA sequence-recognition proteins were also detected, and 9 of the 20 MTases (45%) were estimated to constitute RM systems (Table 4). The known motifs of 7 of the 20 MTases were matched to those identified in our metaepigenomic analysis (Table 3). For example, the Thaumarchaeota BD3 genome contained two MTases that showed the best sequence similarities to those that recognize AGT and GTC motif sequences, which were perfectly congruent with the two motifs detected in our metaepigenomic analysis. It may be notable that these two motifs were also reported in an enrichment-culture study of the closely related genus Candidatus Nitrosomarinus catalina[48] and are therefore likely evolutionarily conserved within their group. In the Proteobacteria BS14 genome, a similar one-to-one perfect match was also observed. The two genomes Chloroflexi BS3 and Chloroflexi BD1 were characterized by the same set of three methylated motifs, each of which contained three MTases. No MTase gene was found in the other Chloroflexi genome BS1, likely due to its low estimated genome completeness of 31% (Table 2). Among these MTases, two were most similar to those possessing methylation specificities that were congruent with two of the detected motifs, GNTC and TTA (the other MTase and motif will be discussed in the next section). Collectively, these observations suggest that metaepigenomic analysis is an effective tool for identifying the methylation systems of environmental prokaryotes.
Table 4

Detected MTases, REases, and specificity subunit genes

Genome IDCDS IDGene typeTop-hit protein in REBASEIdentity (%)Recognition motif of the closest-match MTaseModification typeRM typeRM systemTRD divergenceMotif detectedMTase nameConfirmed recognition motif
BS3EMGBS3_04270MM.SstE37II58.9GANTCm6AIINoNoYes
EMGBS3_09240MM.Sth20745I71.4TTAAm6AIINoNoYes
EMGBS3_12600MM1.BceSIII22.9ACGGCm4CIINoYesNoM.AbaBS3IGCWGC
BS6EMGBS6_08960MM.SinI57.0GGWCCm5CIINoNoNo
BS8EMGBS8_10720RDvuI36.3?I
EMGBS8_10740SS.PveNS15I32.4?IYes
EMGBS8_10750MM.RbaNRL2II55.6ACGANNNNNNGRTCm6AIYesNo
BS10EMGBS10_10070RMCjeFIII23.7GCAAGGm6AIIYesYesNoM.ObaBS10IACGAG
BS14EMGBS14_10020MM.Bsp460I56.7GANTCm6AIINoNoYes
BS15EMGBS15_02830MM.Bli37I56.6GAYNNNNNRTCm6AIYesNo
EMGBS15_02840MM.EcoNIH1III59.2GATGNNNNNNTACm6AIYesNo
EMGBS15_02870SS.PveNS15I47.2?IYes
EMGBS15_02930RDvuI38.4?I
EMGBS15_03820MM.EcoGI25.8Nonspecificm6AIIYesYesNoM.FspBS16IGAANNNNTTC
EMGBS15_03830RXmnI34.0GAANNNNTTCII
EMGBS15_04560RGmeII33.8TCCAGGIII
EMGBS15_04600MM.FpsJII53.4CGCAGm6AIIIYesNoNo
EMGBS15_05670MM.FnuDI59.8GGCCam4CIIYesNoNo
EMGBS15_05690RBhaII45.6GGCCII
EMGBS15_12460MM.Mva1261III37.1CTANNNNNNRTTCm6AINoNoNo
BD1EMGBD1_08400MM.Sth20745I71.0TTAAm6AIINoNoYes
EMGBD1_09320MM1.BceSIII22.9ACGGCm4CIINoYesNoM.AbaBS3IGCWGC
EMGBD1_19510MM.SstE37II58.9GANTCm6AIINoNoYes
BD2EMGBD2_08760MM.HgiDII55.0GTCGACam5CIIYesNoNo
EMGBD2_08790RMAquIV28.5GRGGAAGm6AIIYesYesNoM.NbaBD2ITAHGGAB
EMGBD2_08800RLpnPI56.3CCDGII
BD3EMGBD3_00670MM.Mma5219II45.9AGCTm4CIINoNoYes
EMGBD3_01960MM.AvaVI50.3GATCm6AIINoNoYes

Underlined bold face indicates methylation sites

M: methyltransferase, R: restriction endonuclease, S: specificity subunit

aModified base undetermined

Detected MTases, REases, and specificity subunit genes Underlined bold face indicates methylation sites M: methyltransferase, R: restriction endonuclease, S: specificity subunit aModified base undetermined

Unexplored diversity of prokaryotic methylation systems

Among the 20 detected MTases, 13 MTases did not show sequence similarities to MTases that recognize the motifs identified in our metaepigenomic analysis (Tables 3 and 4). Although homology search-based MTase identification and recognition motif estimation are frequently conducted in genomic and metagenomic studies, this result suggests that these approaches are not sufficient, and direct observation of DNA methylation is needed to reveal the methylation systems of diverse environmental prokaryotes. As noted earlier, each of the Chloroflexi BS3 and Chloroflexi BD1 genome had three MTase genes, two of which were congruent to two of the detected motifs. The other MTase from each genome (EMGBS3_12600 and EMGBD1_09320 in Chloroflexi BS3 and Chloroflexi BD1, respectively) showed the highest sequence similarity to an MTase that was reported to recognize AGGC; however, the other methylated motif detected in the Chloroflexi BS3 and Chloroflexi BD1 genomes was GWGC. In the Bacteroidetes BS15 genome, 6 MTases and 11 methylated motifs were detected, but none of the MTases and motifs matched each other. At the methylation type level, five MTases and all of the methylated motifs were of the m6A type. We predicted that the EMGBS15_03820, whose closest homolog was an MTase that exhibits nonspecific m6A methylation activity, is actually a sequence-specific enzyme that recognizes a GANNNNTTC motif that was detected through metaepigenomic analysis, because the adjacent gene EMGBS15_03830 encodes an REase that targets the same GAANNNNTTC sequence. In the Verrucomicrobia BS8 genome, one MTase and one methylated motif were detected; however, the reported recognition motif sequence of the closest MTase was incongruent with the detected motif (the reported and detected motifs were ACGNNNNNNGRTC and GGNNNNNRTTT, respectively, where R: A/G). This MTase is predicted to function in an RM system because of the existence of the neighboring REase and DNA sequence-recognition protein genes. In the Verrucomicrobia BS10 genome, one MTase and one methylated motif were detected, and their motifs were also incongruent (GCAGG and ACGG, respectively). In the Nitrospirae BD2 genome, two MTases and one methylated motif were detected. The two MTases EMGBD2_08760 and EMGBD2_08790 showed the best sequence similarities to those with m5C and m6A methylation activities, respectively, while the detected motif contained an m6A site. Thus, the former MTase was predicted to catalyze the methylation reaction, although their motifs were again incongruent (GRGGAG and TANGGB, respectively). It should also be noted that these MTases appear to constitute a recently proposed system known as the Defense Island System Associated with Restriction-Modification (DISARM), which is a phage-infection defense system composed of MTase, helicase, phospholipase D, and DUF1998 genes[49]. To our knowledge, this is the first DISARM system identified in the phylum Nitrospirae. In the Verrucomicrobia BS6 genome, one MTase gene was found, but we could not detect any methylated motif, and we therefore anticipate that this MTase gene does not exhibit methylation activity or the corresponding methylation motif was undetected due to the low sensitivity of SMRT sequencing to m5C modification as described previously[13,14]. However, in the Proteobacteria BS12 genome, we detected methylated motifs but no MTase genes. We assume that the MTase genes corresponding to this genome were missed due to insufficient genome completeness (although the estimated completeness was 81%), or because these MTase genes have diverged considerably from MTase genes found in cultivable strains, or because these MTases belong to a new group.

Experimental verification of MTases with new methylated motifs

Among the MTases whose sequences showed the best similarities to MTases that recognize motifs incongruent with our metaepigenomic results, we experimentally verified the methylation specificities of the four MTases: EMGBS3_12600 in Chloroflexi BS3 (and EMGBD1_09320 in Chloroflexi BD1, which has exactly the same amino-acid sequence), EMGBS15_03820 in Bacteroidetes BS15, EMGBS10_10070 in Verrucomicrobia BS10, and EMGBD2_08790 in Nitrospirae BD2 (Table 4). We constructed plasmids that each carried one of the artificially synthesized MTase genes, transformed them to Escherichia coli cells, forced their expression, and observed the methylation status of the isolated plasmid DNA by REase digestion. Although the EMGBS3_12600 showed the best sequence similarity to a sequence-diverged MTase that possesses the AGGC specificity, the unaccounted-for motif sequence observed in Chloroflexi BS3 was GWGC. Thus, we hypothesized that the true recognition sequence of EMGBS3_12600 is GWGC. The REase digestion assay showed that TseI (GCWGC specificity) did not cleave the plasmids when EMGBS3_12600 was expressed in the cells, which clearly supports our hypothesis (Fig. 3a). Furthermore, we confirmed that BceAI (ACGGC specificity) cleaved plasmids regardless of whether EMGBS3_12600 was expressed, indicating that the EMGBS3_12600 protein does not show ACGGC sequence specificity (Fig. 3a). Accordingly, we named this protein M.AbaBS3I, as a novel MTase that possesses GWGC specificity (Table 4).
Fig. 3

REase digestion assays. a Assay of the EMGBS3_12600 gene (and EMGBD1_09320, which has the same amino-acid sequence). BceAI and TseI were used, where the plasmid contained 12 (ACGGC) and 21 (GCWGC) target sites, respectively. Plasmid DNAs were linearized using SalI before the assay. An NEB 2-log DNA ladder was employed as a size marker. b Assay of the EMGBS15_03820 gene. DpnII and XmnI were used, where the plasmid contained 27 (GATC) and 2 (GAANNNNTTC) target sites, respectively

REase digestion assays. a Assay of the EMGBS3_12600 gene (and EMGBD1_09320, which has the same amino-acid sequence). BceAI and TseI were used, where the plasmid contained 12 (ACGGC) and 21 (GCWGC) target sites, respectively. Plasmid DNAs were linearized using SalI before the assay. An NEB 2-log DNA ladder was employed as a size marker. b Assay of the EMGBS15_03820 gene. DpnII and XmnI were used, where the plasmid contained 27 (GATC) and 2 (GAANNNNTTC) target sites, respectively While the homology-based analysis showed that the closest homolog of EMGBS15_03820 was a non-sequence-specific MTase, its adjacency to an REase and the results of the metaepigenomic analysis suggested that this MTase presents GANNNNTTC sequence specificity. The REase digestion assay showed that XmnI (GAANNNNTTC specificity) did not cleave the plasmids only when EMGBS15_03820 was expressed in the cells, which also supports our hypothesis (Fig. 3b). Furthermore, we confirmed that DpnII (GATC specificity) cleaved the plasmids regardless of whether EMGBS15_03820 was expressed, indicating that EMGBS15_03820 is not a nonspecific MTase. We named this protein M.FspBS15I, as a novel MTase that possesses GANNNNTTC methylation specificity (Table 4). For EMGBS10_10070 in Verrucomicrobia BS10 and EMGBD2_08790 in Nitrospirae BD2, we also conducted REase digestion assays to confirm the recognition motif sequences. Based on the results of the metaepigenomic analysis, their motifs were predicted to be ACGG and TANGGB, respectively. Expression of each gene altered the electrophoresis patterns of the digested plasmids to contain fragments that resulted from inhibition of REase cleavage at the estimated methylation sites (Supplementary Fig. 7). Furthermore, we additionally conducted SMRT sequencing analysis using the PacBio RSII platform to examine the methylation status of the chromosomal DNA of the E. coli transformed with each of the two MTase genes. The results were basically consistent (Supplementary Table 4): ACGG was actually detected as the methylated motif in E. coli transformed with EMGBS10_10070, and we named the protein M.ObaBS10I. In the case of EMGBD2_08790, the detected TAHGGB motif was almost the same, but a subset of the estimated TANGGB motif (i.e., TAGGGB was excluded), and this difference could be due to E. coli-specific conditions (e.g., cofactors and sequence biases), insufficient data, inaccuracy of the methylated motif detection method. Regardless of this minor difference, we concluded that EMGBD2_08790 is a novel MTase gene responsible for methylation of the TAHGGB motif and we named the protein M.NbaBD2I accordingly.

Metaepigenomics for exploring prokaryotic methylation systems in nature

The present study demonstrated the effectiveness of the metaepigenomic approach powered by SMRT sequencing and CCS, showing obvious advantages over sequence similarity-based and culture-based methylation system analyses and short-read metagenomics. The CCS reads facilitated metagenomic assembly, binning, and protein sequence-based taxonomic assignment from an environmental sample that contained dominant uncultured prokaryotes. Most importantly, this approach revealed several methylated motifs, including novel ones in environmental prokaryotes, and subsequent experiments identified four MTases responsible for those reactions. The current throughput of SMRT sequencing may be still insufficient to apply the metaepigenomic approach to more diverse and complex samples. Because deep sequencing coverage is required for the reliable detection of DNA methylation (for example, >25× subreads per each DNA strand is recommended according to the official instruction), it is still difficult to obtain sufficient sequencing reads to recover long contigs and detect methylated motifs for ‘rare’ species (typically those with <1% relative abundance). In addition to rapid and ongoing technological advances in SMRT sequencing, the emergence of Oxford Nanopore Technology may provide as another long-read, single-molecule, and methylation-detectable technology[50,51]. Another problem is that the detectable types of DNA modifications are limited (i.e., m4C, m5C, and m6A) with the currently available SMRT sequencing technology, while many other DNA chemical modifications occur in nature[52]. In addition to advances in sequencing methods, novel bioinformatic tools will be critical for metaepigenomic analyses of environmental prokaryotes. A recent study showed that sets of methylated motifs and MTases can vary widely, even between closely related strains[53], where metaepigenomics is expected to enable differential methylation analyses between populations. It should be noted that metaepigenomic data may be adopted for various bioinformatic applications. For example, because reads and contigs in the same genome are expected to have the same methylation patterns, metaepigenomic information may be used for improving metagenomic assembly and binning[54]. In addition, genus-level conservation of MTases that are not associated with REases is sometimes observed, which suggests that MTases play unexplored adaptive roles, in addition to their functions in combating phages[13,55]. Novel MTases may be adopted for biotechnological uses, such as DNA recombination and methylation analyses[56]. It is envisioned that metaepigenomics of environmental prokaryotes under different sampling conditions and environments will significantly deepen our understanding of the ecological impacts of DNA methylation on prokaryotes, enigmatic evolution of prokaryotic methylation systems, and broaden their application potential.

Methods

Sample collection

Water samples were collected at a pelagic long-term survey station (Ie-1) (35° 13′09.5″N 135°59′44.7″E) of the Center for Ecological Research, Kyoto University in Lake Biwa, Japan, on 26 December 2016 (Supplementary Fig. 1a). The sampling site was located approximately 3 km from the nearest shore and had a depth of 73 m. The lake has a permanently oxygenated hypolimnion and was thermally stratified during sampling (Supplementary Fig. 1b). Water sampling into prewashed 5-L Niskin bottles was conducted at depths of 5 m and 65 m, above and below the thermally stratified layer, respectively, to collect prokaryotic communities with different structures[34]. The vertical profiles of temperature, dissolved oxygen concentrations, and chlorophyll a concentrations were measured using a conductivity, temperature, and depth probe in situ. Equipment that could come into direct contact with the water samples in the following steps was either sterilized by autoclaving or disinfected with a hypochlorous acid solution. The water samples were transferred to sterile bottles, kept cool by contact with ice packs in a dark cool box, and immediately transported to the laboratory. Water samples with a total volume of approximately 30 L were prefiltered through 5 μm membrane PC filters (Whatman). Microbial cells were collected using 0.22 μm Sterivex filters (Millipore) and immediately stored at −20 °C in a refrigerator until analysis.

DNA extraction and SMRT sequencing

The microbial DNA was retrieved using a PowerSoil DNA Isolation Kit (QIAGEN) according to the supplier’s protocol with slight modifications as described below. The filters were removed from the container, cut into 3 mm fragments, and directly suspended in the extraction solution from the kit for cell lysis. The bead-beating time was extended to 20 min to yield sufficient quantities of DNA for SMRT sequencing, with reference to Albertsen et al.[57]. SMRT sequencing was conducted using a PacBio Sequel system (Pacific Biosciences) in two independent runs according to the manufacturer’s standard protocols. SMRT libraries for CCS were prepared with a 4 kbp insertion length and two SMRT cells were used for each sample. Briefly, 3–5 kbp DNA fragments from each genomic DNA sample were extracted using the BluePippin size-selection system (Sage Science). Two sequencing libraries for CCS analysis were prepared using the SMRTbell Template Prep Kit 1.0-SPv3 according to the manufacturer’s protocol (Pacific Biosciences). The final libraries were sequenced using a PacBio Sequel sequencer with Sequel SMRT Cell 1M v2 and Sequel Binding/Sequencing Kits 2.0.

Bioinformatic analysis of CCS reads

Reads that contained at least three full-pass subreads on each polymerase read were retained to generate CCS reads using the standard PacBio SMRT software package with the default settings. Only CCS reads with >97% average base-call accuracy were retained. For taxonomic assignment of the CCS reads, Kaiju[28] in Greedy-5 mode with the NCBI nr database[29] and Kraken[30] with the default parameters and complete prokaryotic genomes from RefSeq[31] were used. CCS reads that potentially encoded 16S rRNA genes were extracted using SortMeRNA[58] with the default settings, and the 16S rRNA sequences were predicted by RNAmmer[59] with the default settings. The 16S rRNA sequences were taxonomically assigned using blastn[60] searches against the SILVA database release 128[61], where the top-hit sequences with e-values ≤ 1E−15 were retrieved. CCS reads were de novo assembled using Canu[18] with the -pacbio-corrected setting and Mira[39] with the settings for PacBio CCS reads, according to the provided instructions. The Canu assembler provides information on repetitive contigs based on the graph topology and read-overlap analyses. Because such contigs are known to tend to contain misassembles, which can negatively affect accuracies of downstream analyses, we removed them. The remaining contigs were binned into genomes using MetaBAT[40] based on genome coverage and tetranucleotide frequencies as genomic signatures, where the genome coverage was calculated by mapping the CCS reads to the assembled contigs using BLASR[62] with the settings for PacBio CCS reads. The quality of all genomes was assessed using CheckM[63], which estimates completeness and contaminations based on taxonomic collocation of prokaryotic marker genes with the default settings. Sequence extraction and taxonomic assignment of 16S rRNA genes in each draft genome were conducted using RNAmmer[59] with the default settings. Taxonomic assignment of the draft genomes was based on the 16S rRNA genes if found or on the taxonomic groups most frequently estimated by CAT[64] otherwise (and Kaiju[28] if CAT did not provide an estimation). Coding sequences (CDSs) in each draft genome were predicted using Prodigal[65] with the default settings. Functional annotations were achieved through GHOSTZ[66] searches against the eggNOG[67] and Swiss-Prot[68] databases, with a cut-off e-value ≤ 1E−5, and HMMER[69] searches against the Pfam database[70], with a cut-off e-value ≤ 1E−5. A maximum-likelihood tree of the draft genomes was constructed on the basis of the set of 400 conserved prokaryotic marker genes using PhyloPhlAn[71] with the default settings.

Metaepigenomic and RM system analyses

DNA modification detection and motif analysis were performed according to BaseMod (https://github.com/ben-lerch/BaseMod-3.0). Briefly, the subreads were mapped to the assembled contigs using BLASR[62], and interpulse duration ratios were calculated. Candidate motifs with scores higher than the default threshold value were retrieved as methylated motifs. Those with infrequent occurrences (<50) or very low methylation fractions (<1%) in each draft genome were excluded from further analysis. The methylated ratios of all detected motifs on each contig were calculated using Seqkit[72]. The sequence divergences of target recognition domains (TRDs) from those of the closest-match MTases were investigated using amino-acid alignments of BLASTP[60]. Genes encoding MTases, restriction endonucleases (REases), and DNA sequence-recognition proteins were detected by BLASTP[60] searches against an experimentally confirmed gold-standard dataset from the Restriction Enzyme Database (REBASE)[73] (downloaded on 2 October 2017), with a cut-off e-value of ≤ 1E−15. Sequence specificity information for each hit MTase gene was also retrieved from REBASE. The flanking regions of the MTase genes were investigated to search for REase genes and examine whether they constitute RM systems.

Experimental verification of MTase activities

For verification of the estimated methylation specificities, all four estimated Type II MTase genes (EMGBS3_12600, EMGBS15_03820, EMGBS10_10070, and EMGBD2_08790) that satisfied the following two criteria were selected: (1) their novel methylation motifs were uniquely predicted and (2) additional proteins were not required in evaluating their enzyme activities. The four MTases were artificially synthesized with codon optimization and cloned into the pUC57 cloning vector by Genewiz (Supplementary Data 1). The genes were subcloned into the pCold III expression vector (Takara Bio) using an In-FusionHD Cloning Kit (Takara Bio). The gene-specific oligonucleotide primers used for polymerase chain reaction and recombination are described in Supplementary Table 1. For verification of the EMGBS10_10070 gene function, the 5′-ACGAGTC-3′ sequence was inserted downstream of the termination codon for the sake of the methylation assay (the first five-base ACGG sequence was the estimated methylated motif, and the last five-base GAGTC is recognized by the restriction enzyme PleI) (Supplementary Data 1). The constructs were transformed into E. coli HST04 dam/dcm (Takara Bio), which lacks dam and dcm MTase genes. The E. coli strains were cultured in LB broth medium supplemented with ampicillin. MTase expression was induced according to the supplier’s protocol. Plasmid DNAs were isolated using the FastGene Xpress Plasmid PLUS Kit (Nippon Genetics). SalI was employed to linearize the plasmid DNAs encoding EMGBS3_12600 and EMGBS15_03820 and then inactivated by heat. Methylation statuses were assayed by enzymatic digestion using the following restriction enzymes: BceAI and TseI for EMGBS3_12600, DpnII and XmnI for EMGBS15_03820, PleI for EMGBS10_10070, and FokI for EMGBD2_08790. All restriction enzymes were purchased from New England BioLabs. All digestion reactions were performed at 37 °C for 1 h, except for those involving TseI (8 h) and FokI (20 min). Notably, although TseI digestion is conducted at 65 °C in the manufacturer’s protocol, we adopted a temperature of 37 °C to avoid cleavage of methylated DNA. We further verified the methylated motifs that were newly estimated in this study, i.e., those of EMGBS10_10070 and EMGBD2_08790. Chromosomal DNA was extracted from cultures of the transformed E. coli strains using a PowerSoil DNA Isolation Kit (QIAGEN) according to the supplier’s protocol. SMRT sequencing was conducted using PacBio RSII (Pacific Biosciences), and methylated motifs were detected via the same method described above.
  69 in total

Review 1.  The use of prokaryotic DNA methyltransferases as experimental and analytical tools in modern biology.

Authors:  Yaroslav Buryanov; Taras Shevchuk
Journal:  Anal Biochem       Date:  2005-03-01       Impact factor: 3.365

Review 2.  Entering the era of bacterial epigenomics with single molecule real time DNA sequencing.

Authors:  Brigid M Davis; Michael C Chao; Matthew K Waldor
Journal:  Curr Opin Microbiol       Date:  2013-02-19       Impact factor: 7.934

3.  Improved metagenome assemblies and taxonomic binning using long-read circular consensus sequence data.

Authors:  J A Frank; Y Pan; A Tooming-Klunderud; V G H Eijsink; A C McHardy; A J Nederbragt; P B Pope
Journal:  Sci Rep       Date:  2016-05-09       Impact factor: 4.379

4.  Genome characteristics and environmental distribution of the first phage that infects the LD28 clade, a freshwater methylotrophic bacterial group.

Authors:  Kira Moon; Ilnam Kang; Suhyun Kim; Sang-Jong Kim; Jang-Cheon Cho
Journal:  Environ Microbiol       Date:  2017-11-02       Impact factor: 5.491

Review 5.  N6-methyl-adenine: an epigenetic signal for DNA-protein interactions.

Authors:  Didier Wion; Josep Casadesús
Journal:  Nat Rev Microbiol       Date:  2006-03       Impact factor: 60.633

6.  eggNOG v4.0: nested orthology inference across 3686 organisms.

Authors:  Sean Powell; Kristoffer Forslund; Damian Szklarczyk; Kalliopi Trachana; Alexander Roth; Jaime Huerta-Cepas; Toni Gabaldón; Thomas Rattei; Chris Creevey; Michael Kuhn; Lars J Jensen; Christian von Mering; Peer Bork
Journal:  Nucleic Acids Res       Date:  2013-12-01       Impact factor: 16.971

7.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

8.  MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities.

Authors:  Dongwan D Kang; Jeff Froula; Rob Egan; Zhong Wang
Journal:  PeerJ       Date:  2015-08-27       Impact factor: 2.984

9.  Next generation sequencing data of a defined microbial mock community.

Authors:  Esther Singer; Bill Andreopoulos; Robert M Bowers; Janey Lee; Shweta Deshpande; Jennifer Chiniquy; Doina Ciobanu; Hans-Peter Klenk; Matthew Zane; Christopher Daum; Alicia Clum; Jan-Fang Cheng; Alex Copeland; Tanja Woyke
Journal:  Sci Data       Date:  2016-09-27       Impact factor: 6.444

Review 10.  Metagenomics and Bioinformatics in Microbial Ecology: Current Status and Beyond.

Authors:  Satoshi Hiraoka; Ching-Chia Yang; Wataru Iwasaki
Journal:  Microbes Environ       Date:  2016-07-05       Impact factor: 2.912

View more
  13 in total

Review 1.  Prokaryotic DNA methylation and its functional roles.

Authors:  Hoon Je Seong; Sang-Wook Han; Woo Jun Sul
Journal:  J Microbiol       Date:  2021-02-23       Impact factor: 3.422

Review 2.  Experimental approaches to tracking mobile genetic elements in microbial communities.

Authors:  Christina C Saak; Cong B Dinh; Rachel J Dutton
Journal:  FEMS Microbiol Rev       Date:  2020-09-01       Impact factor: 16.408

3.  Metagenomic methylation patterns resolve bacterial genomes of unusual size and structural complexity.

Authors:  Elizabeth G Wilbanks; Hugo Doré; Meredith H Ashby; Cheryl Heiner; Richard J Roberts; Jonathan A Eisen
Journal:  ISME J       Date:  2022-04-22       Impact factor: 11.217

4.  Epigenomics, genomics, resistome, mobilome, virulome and evolutionary phylogenomics of carbapenem-resistant Klebsiella pneumoniae clinical strains.

Authors:  Katlego Kopotsa; Nontombi M Mbelle; John Osei Sekyere
Journal:  Microb Genom       Date:  2020-11-10

5.  Bacterial Epigenomics: Coming of Age.

Authors:  Pedro H Oliveira
Journal:  mSystems       Date:  2021-08-17       Impact factor: 6.496

6.  Long-read metagenomics of soil communities reveals phylum-specific secondary metabolite dynamics.

Authors:  Marc W Van Goethem; Andrew R Osborn; Benjamin P Bowen; Peter F Andeer; Tami L Swenson; Alicia Clum; Robert Riley; Guifen He; Maxim Koriabine; Laura Sandor; Mi Yan; Chris G Daum; Yuko Yoshinaga; Thulani P Makhalanyane; Ferran Garcia-Pichel; Axel Visel; Len A Pennacchio; Ronan C O'Malley; Trent R Northen
Journal:  Commun Biol       Date:  2021-11-18

7.  A genome-phenome association study in native microbiomes identifies a mechanism for cytosine modification in DNA and RNA.

Authors:  Weiwei Yang; Yu-Cheng Lin; William Johnson; Nan Dai; Romualdas Vaisvila; Peter Weigele; Yan-Jiun Lee; Ivan R Corrêa; Ira Schildkraut; Laurence Ettwiller
Journal:  Elife       Date:  2021-11-08       Impact factor: 8.140

8.  Microbial community and geochemical analyses of trans-trench sediments for understanding the roles of hadal environments.

Authors:  Satoshi Hiraoka; Miho Hirai; Yohei Matsui; Akiko Makabe; Hiroaki Minegishi; Miwako Tsuda; Eugenio Rastelli; Roberto Danovaro; Cinzia Corinaldesi; Tomo Kitahashi; Eiji Tasumi; Manabu Nishizawa; Ken Takai; Hidetaka Nomaki; Takuro Nunoura
Journal:  ISME J       Date:  2019-12-11       Impact factor: 10.302

9.  Diverse DNA modification in marine prokaryotic and viral communities.

Authors:  Satoshi Hiraoka; Tomomi Sumida; Miho Hirai; Atsushi Toyoda; Shinsuke Kawagucci; Taichi Yokokawa; Takuro Nunoura
Journal:  Nucleic Acids Res       Date:  2022-02-22       Impact factor: 16.971

10.  Revealing the full biosphere structure and versatile metabolic functions in the deepest ocean sediment of the Challenger Deep.

Authors:  Ping Chen; Hui Zhou; Yanyan Huang; Zhe Xie; Mengjie Zhang; Yuli Wei; Jia Li; Yuewei Ma; Min Luo; Wenmian Ding; Junwei Cao; Tao Jiang; Peng Nan; Jiasong Fang; Xuan Li
Journal:  Genome Biol       Date:  2021-07-13       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.