| Literature DB >> 24722038 |
Yoshikazu Furuta1, Hiroe Namba-Fukuyo2, Tomoko F Shibata3, Tomoaki Nishiyama4, Shuji Shigenobu5, Yutaka Suzuki2, Sumio Sugano2, Mitsuyasu Hasebe5, Ichizo Kobayashi1.
Abstract
Epigenetic modifications such as DNA methylation have large effects on gene expression and genome maintenance. Helicobacter pylori, a human gastric pathogen, has a large number of DNA methyltransferase genes, with different strains having unique repertoires. Previous genome comparisons suggested that these methyltransferases often change DNA sequence specificity through domain movement--the movement between and within genes of coding sequences of target recognition domains. Using single-molecule real-time sequencing technology, which detects N6-methyladenines and N4-methylcytosines with single-base resolution, we studied methylated DNA sites throughout the H. pylori genome for several closely related strains. Overall, the methylome was highly variable among closely related strains. Hypermethylated regions were found, for example, in rpoB gene for RNA polymerase. We identified DNA sequence motifs for methylation and then assigned each of them to a specific homology group of the target recognition domains in the specificity-determining genes for Type I and other restriction-modification systems. These results supported proposed mechanisms for sequence-specificity changes in DNA methyltransferases. Knocking out one of the Type I specificity genes led to transcriptome changes, which suggested its role in gene expression. These results are consistent with the concept of evolution driven by DNA methylation, in which changes in the methylome lead to changes in the transcriptome and potentially to changes in phenotype, providing targets for natural or artificial selection.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24722038 PMCID: PMC3983042 DOI: 10.1371/journal.pgen.1004272
Source DB: PubMed Journal: PLoS Genet ISSN: 1553-7390 Impact factor: 5.917
Figure 1Type I RM system.
(A) A Type I RM system consists of restriction (R), modification (M) and specificity (S) genes. A specificity gene typically encodes two target recognition domains (TRD) in tandem. Each domain recognizes half of a bipartite target sequence. Copy number of repeat sequences between the two domains in some S genes determines the number of Ns (N = A, T, G or C) in the middle of their DNA target sequence. A TRD that recognizes a particular target sequence will recognize the reverse complement sequence when moved to the other TRD site. A TRD sequence that recognizes 5′-TAG-3′/3′-ATC-5′ duplex at the TRD1 site recognizes 5′-CTA-3′/3′-GAT-5′ when moved to the TRD2 site. (B) Domain Movement (DoMo). Amino acid sequences move between TRD1 and TRD2 in the same gene (locus) likely through recombination at repeat sequences flanking TRD1 and TRD2.
Figure 2Assignment of a methylation motif to target recognition domains of Type I specificity genes.
(A) Group 2 Type I S genes. (B) Group 3 Type I S genes. Amino acid sequences in target recognition domains were classified into homology groups represented by different colors (a, b, c, d, e, f, h, k, m, n, o) [18]. Symbols are as in Figure 1. The number in the middle of the gene is the copy number of tandem repeats. Underline, adenine nucleotides within motifs detected as methylated; white circle, start codon in an unusual position; black circle, stop codon in an unusual position.
Fraction of methylated copies along a genome for each known Type II methyltransferase (M) genes.
| Strains | |||||||||||
| P12 | F16 | F30 | F32 | F57 | |||||||
| Motif | Gene name | Locus tag | % | Locus tag | % | Locus tag | % | Locus tag | % | Locus tag | % |
|
| M.hpyAVII | HPP12_0488 |
| HPF16_0459 |
| HPF30_0843 |
| HPF32_0853 |
| HPF57_0507 |
|
| 5'- | - | HPP12_1052 |
| HPF16_1028 |
| HPF30_0302 |
| HPF32_1023 |
| HPF57_1048 |
|
|
| M.hpyAI | HPP12_1173 |
| HPF16_1145 |
| HPF30_0185 |
| HPF32_1140 |
| HPF57_1170 |
|
|
| M.hpyAIII | HPP12_0095 |
| HPF16_0105 |
| HPF30_1171 |
| HPF32_0104 |
| (HPF57_0142) |
|
| (HPF57_0143) | |||||||||||
|
| M.hpyAXII | HPP12_0510 |
| HPF16_0851 |
| - |
| HPF32_0484 |
| HPF57_0534 |
|
| 5'- | M.hpyAIV | (HPP12_1318) |
| HPF16_0047 |
| - |
| HPF32_0049 |
| HPF57_0049 |
|
|
| - | HPP12_1389 |
| - |
| HPF30_1285 |
| - |
| - |
|
|
| M.hpyAV | HPP12_0048 |
| - |
| (HPF30_1247) |
| - |
| - |
|
|
| - | - |
| HPF16_1396 |
| HPF30_1367 |
| HPF32_1386 |
| - |
|
| 5'- | M.hpyHI | - |
| HPF16_0320 |
| HPF30_0982 |
| - |
| HPF57_0366 |
|
| 5'- | M.hpy99IV | - |
| HPF16_0699 |
| - |
| - |
| - |
|
| 5'- | M1.hpyAII | - |
| - |
| - |
| HPF32_0034 |
| - |
|
|
| M2.hpyAII | - |
| - |
| - |
| HPF32_0033 |
| - |
|
| 5'- | - | - |
| - |
| - |
| - |
| HPF57_1130 |
|
|
| M1.hpyAVI | HPP12_0044 |
|
|
| HPF30_1250 |
| HPF32_0059 |
| HPF57_0060 |
|
| 5'- | M.hpyAIX |
|
| HPF16_0891 |
| HPF30_0429 |
| HPF32_0444 |
|
|
|
|
| M.hpyAX |
|
| HPF16_0267 |
| HPF30_1036 |
| HPF32_0269 |
| (HPF57_0278) |
|
| (HPF57_0312) | |||||||||||
|
| - |
|
| HPF16_1447 |
| HPF30_1424 |
| HPF32_1440 |
| HPF57_1467 |
|
|
| - |
|
| HPF16_0270 |
|
|
| HPF32_0272 |
| HPF57_0316 |
|
Methylated base is underlined.
Locus tag of a truncated gene, which has shorter than 80% of the intact ORF, is bracketed. Locus tag with rare methylated sites is in italics.
Average in two biological replicates.
Methylated copies overlap with 5′-TCNNGA and 5′-GAYN6TTC.
Methylated copies overlap with 5′-GTNNAC.
Methylated copies overlap with 5′-GNGRGA.
Methylated copies overlap with 5′-TCG.
Methylated copies overlap with 5′-CGRAG.
Methylated copies overlap with 5′-CNNGNAG.
Methylated copies overlap with 5′-GAAN6RTC.
Methylated copies overlap with 5′-.
Methylated copies overlap with 5′-G.
Methylated copies overlap with 5′-G.
Methylated copies overlap with 5′-GANTC.
Methylated copies overlap with 5′-G.
Methylated copies overlap with 5′-TGC and 5′-GT.
Methylated copies overlap with 5′-GCRGA.
Methylated copies overlap with 5′-C
Assignment of a target sequence to each TRD in Type I specificity (S) genes.
| Group | TRD homology group | TRD Length (aa) | Locus tag | Recognition sequence |
| Group 2 | a | 140 | HPP12_0797, HPF16_0513 | 5′-CT |
| HPF57_0810 | 5′-CC | |||
| b | 146 | HPF30_0484 | N/D | |
| c | 123 | HPP12_0797, HPF57_0810 | 5′-GA | |
| d | 126 | HPF32_0814, HPF32_0757 | 5′-RT | |
| HPF16_0572 | 5′-VT | |||
| e | 119 | HPF32_0814, HPF57_0869 | 5′-G | |
| f | 122 | HPF32_0757 | 5′-AC | |
| h | 120 | HPF16_0513, HPF16_0572 | 5′-CA | |
| Group 3 | k | 125 | HPF30_1299, HPF32_1419 | 5′-CRT |
| m | 142 | HPP12_1508, HPF30_1299 | 5′-GRNA | |
| n | 121 | HPP12_1508 | 5′-GRT | |
| o | 113 | HPF32_1419 | 5′-G |
Type I S genes were classified according to orthologous groups [18].
Classification based on amino acid sequence similarity[18].
Adenine detected as methylated is underlined.
Y, C or T; R, A or G; V, A or C or G; B, C or G or T.
Locus tag of genes that include the TRD.
N/D, methylation motif not detected.
Top 5 hypermethylated regions in each strain.
| Strain | Methylated base # | Start | End | Strand | Locus tag | Orientation | Annotation |
| P12 | 36 | 1251001 | 1252000 | - | HPP12_1163 | sense | DNA-directed RNA polymerase subunit beta rpoB |
| 35 | 8501 | 9500 | - | HPP12_0008 | sense | chaperonin groEL | |
| 34 | 463501 | 464500 | + | HPP12_0447 | sense | DNA methylase | |
| 32 | 320501 | 321500 | + | HPP12_0305 | sense | glutamate-1-semialdehyde aminotransferase hemL | |
| 32 | 1251501 | 1252500 | - | HPP12_1163 | sense | DNA-directed RNA polymerase subunit beta rpoB | |
| F16 | 55 | 1209501 | 1210500 | - | HPF16_1134 | sense | DNA-directed RNA polymerase subunit beta rpoB |
| 52 | 501001 | 502000 | + | HPF16_0494 | sense | flagellar hook protein flgE | |
| 52 | 1043001 | 1044000 | - | HPF16_0987 | sense | biotin sulfoxide reductase bisC fragment | |
| 50 | 1042501 | 1043500 | - | HPF16_0986 | sense | biotin sulfoxide reductase bisC fragment | |
| 49 | 86001 | 87000 | - | HPF16_0084 | sense | urease subunit alpha ureC | |
| F30 | 50 | 603501 | 604500 | + | HPF30_0567 | sense | hypothetical protein |
| 50 | 185501 | 186500 | + | HPF30_0196 | sense | DNA-directed RNA polymerase subunit beta rpoB | |
| 48 | 186001 | 187000 | + | HPF30_0196 | sense | DNA-directed RNA polymerase subunit beta rpoB | |
| 46 | 8501 | 9500 | - | HPF30_0008 | sense | chaperonin groEL | |
| 44 | 1411501 | 1412500 | - | HPF30_1330 | sense | hypothetical protein | |
| F32 | 51 | 685001 | 686000 | - | HPF32_0628 | sense | N-methylhydantoinase |
| 49 | 1204501 | 1205500 | - | HPF32_1126 | sense | elongation factor G fusA | |
| 47 | 546501 | 547500 | - | HPF32_0506 | sense | cag pathogenicity island protein cagY | |
| 46 | 8501 | 9500 | - | HPF32_0008 | sense | chaperonin groEL | |
| 46 | 33501 | 34500 | + | HPF32_0030 | sense | rod shape-determining protein mreB | |
| F57 | 44 | 1351001 | 1352000 | - | HPF57_1280 | sense | hypothetical protein |
| 42 | 85001 | 86000 | - | HPF57_0083 | sense | urease subunit alpha ureC | |
| 41 | 1176501 | 1177500 | - | HPF57_r03 | sense | 16S ribosomal RNA | |
| 41 | 8501 | 9500 | - | HPF57_0008 | sense | chaperonin groEL | |
| 39 | 1351501 | 1352500 | - | HPF57_1281 | sense | fumarate hydratase fumC |
Top 5 hypomethylated regions in each strain.
| Strain | Methylated base # | Start | End | Strand | Locus tag | Orientation | Annotation | Genomic Islands |
| P12 | 1 | 1419501 | 1420500 | + | HPP12_1351 | antisense | integrase/recombinase xercD family protein | TnPZ |
| 1 | 1397001 | 1398000 | - | HPP12_1325 | antisense | hypothetical protein | TnPZ | |
| 1 | 478001 | 479000 | + | HPP12_0456 | antisense | hypothetical protein | TnPZ | |
| 2 | 1474501 | 1475500 | + | HPP12_1394 | antisense | type IV secretion system ATPase virB11-4 | ||
| 2 | 1421501 | 1422500 | + | HPP12_1353 | antisense | relaxase virD2-2 | TnPZ | |
| F16 | 7 | 400001 | 401000 | - | HPF16_0396 | antisense | hypothetical protein | |
| 9 | 1373001 | 1374000 | - | HPF16_1309 | antisense | hypothetical protein | ||
| 9 | 1080501 | 1081500 | + | HPF16_1024 | antisense | outer membrane protein homC | ||
| 9 | 1027001 | 1028000 | - | HPF16_0974 | antisense | hypothetical protein | ||
| 9 | 908001 | 909000 | + | HPF16_0870 | sense | virulence factor mviN | ||
| F30 | 2 | 546501 | 547500 | - | HPF30_0507 | sense | hypothetical protein | |
| 3 | 546501 | 547500 | + | HPF30_0507 | antisense | hypothetical protein | ||
| 5 | 846001 | 847000 | + | - | - | Intergenic region | ||
| 6 | 301501 | 302500 | - | HPF30_0293 | sense | putative outer membrane protein | ||
| 6 | 587001 | 588000 | + | HPF30_0549 | antisense | hypothetical protein | ||
| F32 | 3 | 1082001 | 1083000 | - | HPF32_1015 | antisense | hypothetical protein | TnPZ |
| 5 | 1076501 | 1077500 | - | HPF32_1009 | antisense | hypothetical protein | TnPZ | |
| 6 | 518001 | 519000 | + | HPF32_0484 | sense | Type II modification enzyme | ||
| 7 | 1085501 | 1086500 | + | HPF32_1019 | antisense | outer membrane protein homC | ||
| 7 | 1082501 | 1083500 | - | HPF32_1015 | antisense | hypothetical protein | TnPZ | |
| F57 | 4 | 1384501 | 1385500 | - | HPF57_1314 | antisense | hypothetical protein | |
| 5 | 305001 | 306000 | + | HPF57_0301 | antisense | hypothetical protein | TnPZ | |
| 5 | 305501 | 306500 | + | HPF57_0302 | antisense | parA | TnPZ | |
| 5 | 574501 | 575500 | + | HPF57_0558 | antisense | cag pathogenicity island protein cagX | cagPAI | |
| 5 | 680001 | 681000 | - | HPF57_0648 | sense | UDP-N-acetylmuramate—L-alanine ligase murC |
Figure 3Distribution of methylated bases in genomes and hypermethylated genes.
(A) Density of methylated bases on each strand of the P12 genome. (B) Density of methylated bases on each strand of the F30 genome. See Figure S2 for the other strains. (C) Distribution of methylated bases in rpoB of each strain. (D) Distribution of methylated bases in groEL of each strain. Red box, regions with the densest methylation within the gene. Bar color indicates a specific methylation motif. See Figure S3 for color and distribution of each methylation motif.
Microevolution in methylation motifs.
| Fraction of methylated copies in the genome (%) | |||||
| Sequence | P12 | F16 | F30 | F32 | F57 |
| 5′- | 1 | 91 | 43 | 87 | 89 |
|
| 0 | 92 | 0 | 86 | 87 |
|
| 2 | 91 | 54 | 88 | 90 |
|
| 0 | 89 | 36 | 86 | 87 |
|
| 1 | 91 | 62 | 87 | 89 |
N = A, T, C, G.
Figure 4TRD sequences within a homology group differing in the recognition sequence.
(A) Homology group a. (B) Homology group d. Locus tag is followed by TRD homology group name (half recognition sequence). The SNPs co-segregating with the recognition sequence are indicated by fonts in color. N/D, not detected.
Figure 5Relationship between TRD structure and recognition sequence in Type I S genes.
(A) Plot of the length of TRD1/2 versus the length of half recognition sequence. Circle, Group 2; cross, Group 3. (B) Plot of the copy number of tandem repeats between TRD1 and TRD2 versus the number of base pairs separating the two methylated bases in the full recognition sequences in Group 2.
Unassigned methylation motifs in each strain.
| Strain | Methylation motif | Fraction of methylated copies | Motif # | Comments | |
| Sample 1 | Sample 2 | ||||
| A. P12 |
| 21 | 23 | 506 | |
| 5′-GNGRG | 94 | 97 | 4,015 | ||
|
| 94 | 97 | 2,604 | ||
| B. F16 | 5′-HG | 56 | 63 | 196 | |
|
| 99 | 100 | 2,236 | ||
| 5′-CGR | 99 | 99 | 1,512 | ||
| 5′-GCRG | 99 | 99 | 2,327 | ||
|
| 99 | 100 | 7,175 | ||
| C. F30 | 5′-C | 95 | 95 | 879 | |
| 5′-G | 94 | 95 | 879 | Complement with 5′-C | |
| 5′-CAAGW | 46 | 51 | 508 | Part of 5′-CNNGNAG? | |
| 5′-CRTGH | 75 | 77 | 502 | Part of 5′-CNNGNAG? | |
| 5′-CTNGN | 95 | 97 | 1,239 | Part of 5′-CNNGNAG? | |
| 5′-CCDGN | 92 | 95 | 733 | Part of 5′-CNNGNAG? | |
|
| 66 | 60 | 453 | ||
|
| 95 | 98 | 1,456 | ||
| D. F32 | 5′-RG | 99 | 99 | 1,854 | Palindromic Type I-like methylation motif |
| 5′-RG | 44 | 44 | 2,112 | ||
| 5′-RT | 28 | 28 | 2,112 | Complement with 5′-RG | |
| 5′-CCTMC | 45 | 54 | 837 | ||
| 5′-CCR | 98 | 99 | 3,019 | ||
| E. F57 | 5′-CC | 96 | 94 | 1,229 | |
| 5′-TT | 96 | 93 | 1,229 | Complement with 5′-CC | |
| 5′-RCT | 37 | 34 | 668 | ||
| 5′-TT | 36 | 32 | 668 | Complement with 5′-RCT | |
|
| 86 | 95 | 97 | ||
| 5′-GA | 98 | 98 | 4,188 | ||
Methylated base is underlined.
Transcriptome affected by knockout of a Type I specificity gene.
| Gene (locus tag) | Product annotation | Read count in RNA-seq | p-value | qPCR | |||
| S− | S+ | (S−/S+) | |||||
| Exp. 1 | Exp. 2 | Exp. 1 | Exp. 2 | ||||
| HPP12_0008 | Chaperone and heat shock protein (GroEL) | 22459 | 52845 | 25301 | 46382 | 0.05 | (Control) |
| HPP12_0797 | Type I R-M system S protein | 1 | 0 | 55 | 32 | 4.42E-09 | - |
| HPP12_0959 | ATP/GTP-binding protein | 27 | 8 | 1 | 0 | 0.0013 | 3.1 |
| HPP12_0960 | ATP/GTP-binding protein | 325 | 414 | 5 | 29 | 8.52E-10 | 2.3 |
| HPP12_0961 | Hypothetical protein | 493 | 694 | 6 | 24 | 3.80E-13 | 48 |
| HPP12_0962 | Hypothetical protein | 164 | 213 | 4 | 5 | 6.53E-14 | 25 |
S-, HPYF1; S+, HPYF2. HPP12_0797 gene was knocked out.
Read counts were normalized by the number of non-rRNA mapped reads.
An average of two experiments.
No specific amplification in S- strain.
Figure 6Gene cluster with transcripts decreased by specific Type I methylation.
(A) Map. Black arrow, gene whose transcript was decreased by methylation; Black bar, position of methylation motif (5′-GAAN8TAG). TSS, transcription start site. (B) Sequence around a methylation site. Box, the half of the recognitioin sequence; arrows, palindrome; The leftmost bp is coordinate 1022181 of the P12 genome. Me, methyl group.