| Literature DB >> 26704977 |
Lingxiang Zhu1, Jun Zhong2, Xinmiao Jia3, Guan Liu4, Yu Kang2, Mengxing Dong3, Xiuli Zhang3, Qian Li3, Liya Yue3, Cuidan Li3, Jing Fu3, Jingfa Xiao2, Jiangwei Yan2, Bing Zhang5, Meng Lei5, Suting Chen4, Lingna Lv4, Baoli Zhu6, Hairong Huang7, Fei Chen8.
Abstract
Tuberculosis (TB) remains one of the most common infectious diseases caused by Mycobacterium tuberculosis complex (MTBC). To panoramically analyze MTBC's genomic methylation, we completed the genomes of 12 MTBC strains (Mycobacterium bovis; M. bovis BCG; M. microti; M. africanum; M. tuberculosis H37Rv; H37Ra; and 6 M. tuberculosis clinical isolates) belonging to different lineages and characterized their methylomes using single-molecule real-time (SMRT) technology. We identified three (m6)A sequence motifs and their corresponding methyltransferase (MTase) genes, including the reported mamA, hsdM and a newly discovered mamB. We also experimentally verified the methylated motifs and functions of HsdM and MamB. Our analysis indicated the MTase activities varied between 12 strains due to mutations/deletions. Furthermore, through measuring 'the methylated-motif-site ratio' and 'the methylated-read ratio', we explored the methylation status of each modified site and sequence-read to obtain the 'precision methylome' of the MTBC strains, which enabled intricate analysis of MTase activity at whole-genome scale. Most unmodified sites overlapped with transcription-factor binding-regions, which might protect these sites from methylation. Overall, our findings show enormous potential for the SMRT platform to investigate the precise character of methylome, and significantly enhance our understanding of the function of DNA MTase.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26704977 PMCID: PMC4737169 DOI: 10.1093/nar/gkv1498
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
The general genome information of 12 MTBC strains
| Strain No. | Species | ATCC No./Lineage | Average read size (kb) | Sequencing coverage (x) | Completed genome size (Mb) | GC content (%) | Gene number | Average gene size (bp) |
|---|---|---|---|---|---|---|---|---|
| F1 | 27294/L4 | 4.39 | 120 | 4.43 | 65.61 | 4320 | 927 | |
| F28 | 25177/L4 | 4.03 | 94 | 4.42 | 65.60 | 4179 | 959 | |
| 30 | 19210/L8 | 4.39 | 94 | 4.34 | 65.60 | 4198 | 932 | |
| 26 | 35735/L8 | 5.37 | 99 | 4.35 | 65.61 | 4780 | 809 | |
| 12 | 19422/L8 | 6.81 | 128 | 4.37 | 65.63 | 4323 | 910 | |
| 25 | 35711/L6 | 5.30 | 95 | 4.39 | 65.58 | 4801 | 813 | |
| 2242 | L2 | 4.90 | 89 | 4,42 | 65.60 | 4467 | 888 | |
| 2279 | L2 | 5.13 | 97 | 4.41 | 65.59 | 4601 | 858 | |
| 22115 | L4 | 5.74 | 123 | 4.40 | 65.57 | 4213 | 946 | |
| 37004 | L4 | 6.66 | 125 | 4.42 | 65.60 | 4231 | 943 | |
| 22103 | L4 | 4.37 | 101 | 4.40 | 65.61 | 4186 | 952 | |
| 26105 | L3 | 4.97 | 101 | 4.43 | 65.63 | 4200 | 952 |
Figure 1.Phylogenetic analysis of Mycobacterium tuberculosis complex (MTBC) strains. Blue indicates that strains belong to lineage 4 (L4, Euro-American lineage). Pink indicates that strains belong to L3 (East African-Indian lineage). Red indicates that strains belong to L2 (East Asian lineage). Orange indicates that strains belong to L1 (Indo-Oceanic lineage). Purple indicates that strains belong to L5 and L6 (West African 1 and West African 2 lineage). Dark blue indicates that strains belong to L7 (Ethiopia lineage). Green indicates that strains belong to animal-adapted lineage (L8). The strains in red boxes represent the 12 MTBC genomes we finished here.
Comparison of genome-wide methylation patterns among 12 MTBC strains
| Sequence motifb | ||||||
|---|---|---|---|---|---|---|
| Strain No. | C | CACGC | G | |||
| No. of motifa | % modified motif | No. of motif | % modified motif | No. of motif | % modified motif | |
| Mtb F1 | 3902 | 99.67 | 825 | 0 | 724 | 0 |
| Mtb F28 | 3904 | 99.72 | 821 | 0 | 730 | 0 |
| 3828 | 99.76 | 803 | 100 | 686 | 87.03 | |
| 3842 | 99.53 | 806 | 100 | 678 | 77.73 | |
| 3860 | 99.87 | 817 | 100 | 698 | 97.56 | |
| 3872 | 99.5 | 813 | 100 | 706 | 90.65 | |
| Mtb 2242 | 3918 | 0 | 829 | 100 | 730 | 96.85 |
| Mtb 2279 | 3902 | 0 | 824 | 100 | 722 | 97.23 |
| Mtb 22115 | 3890 | 99.59 | 817 | 100 | 716 | 0 |
| Mtb 37004 | 3894 | 96.92 | 820 | 100 | 714 | 0 |
| Mtb 22103 | 3876 | 99.66 | 815 | 100 | 700 | 96 |
| Mtb 26105 | 3908 | 99.56 | 824 | 100 | 724 | 0 |
aThe number of motifs includes ones on the plus and minus strands.
bThe methylated nucleotide in the motif is shown as bold and underlined letter. The underlined letter represents the thymine pairing with the methylated adenine on the complementary strand.
The unmethylated sites among 12 MTBC strains
| Strain No. | Unmethylated sites (G | Unmethylated sites (C | ||||||
|---|---|---|---|---|---|---|---|---|
| Total | GR | IGR | % in IGR | Total | GR | IGR | % in IGR | |
| Mtb F1 | / | / | / | / | 11(5) | 10(4) | 1(1) | 9.09 |
| Mtb F28 | / | / | / | / | 11(3) | 10(2) | 1(1) | 9.09 |
| 89(11)a | 73(9) | 16(2) | 17.98 | 9(3) | 9(3) | 0 | 0.00 | |
| 151(21) | 124(20) | 27(1) | 17.88 | 18(12) | 15(11) | 3(1) | 16.67 | |
| 17(5) | 9(3) | 8(2) | 47.06 | 5(3) | 4(2) | 1(1) | 20.00 | |
| 66(14) | 54(10) | 12(4) | 18.18 | 20(12) | 18(10) | 2(2) | 10.00 | |
| Mtb 2242 | 23(3) | 14(2) | 9(1) | 39.13 | / | / | / | / |
| Mtb 2279 | 20(4) | 12(4) | 8(0) | 40.00 | / | / | / | / |
| Mtb 22115 | / | / | / | / | 16(8) | 15(7) | 1(1) | 6.25 |
| Mtb 37004 | / | / | / | / | 122(62) | 113(59) | 9(3) | 7.38 |
| Mtb 22103 | 28(6) | 22(6) | 6(0) | 21.43 | 13(5) | 13(5) | 0(0) | 0.00 |
| Mtb 26105 | / | / | / | / | 17(5) | 14(4) | 3(1) | 17.65 |
| Average | 22.99 | 9.09 | ||||||
GR: Gene Region; IGR: Intergenic Region.
aThe number in the parentheses indicates the number of hemi-methylated sites.
Figure 2.Distribution of the methylated read ratio of three motifs in the MTBC stains. The horizontal axis shows the strain name. The vertical axis shows the number of motifs with diverse methylated read ratio (30–60%, 60–90%, 90–100%). The methylated read ratio indicates the percentage between the reads containing the methylated base and the total reads mapped to the site. For example, as for one methylated site, if its methylated read ratio is 60% and there are 100 reads covering (mapped to) this methylated site, this means that 60 reads contain the methylated base and the other 40 reads have no methylation. (A) Distribution of the methylated read ratio of CTCCG motif in the MTBC stains. (B) Distribution of the methylated read ratio of GTN4RTAC motif in the MTBC stains. (C) Distribution of the methylated read ratio of CACGCG motif in the MTBC stains.
The average methylated read ratio of 12 MTBC strains
| Stain No. | C | CACGC | G |
|---|---|---|---|
| Mtb F1 | 96.2 | / | / |
| Mtb F28 | 96.17 | / | / |
| 94.5 | 97.2 | 88.81 | |
| 95.7 | 97.29 | 82.43 | |
| 95.86 | 97.6 | 95.51 | |
| 94.28 | 97.02 | 90.41 | |
| Mtb 2242 | / | 97.99 | 95.73 |
| Mtb 2279 | / | 97.1 | 94.05 |
| Mtb 22115 | 95.69 | 98.52 | / |
| Mtb 37004 | 82.06 | 98.38 | / |
| Mtb 22103 | 94.47 | 97.7 | 93.82 |
| Mtb 26105 | 95.88 | 98.49 | / |
Figure 3.Three MTase genes and corresponding methylated sequence motifs in 12 MTBC strains. M. bovis 30 strain is located in the central position and the 11 other MTBC strains are located around it. Only active MTase genes were shown in the figure. The red circles mark the order change of the two MTase genes due to a large-scale inversion (about 1.8 Mbp) in Mtb 2242.
Distribution of SNPs/Indels in three MTBC MTase genes among 12 MTBC strains
|
|
The summary of top 10 frequent genes with unmethylated sites shared in 12 MTBC strains
| Motif | Synonym | E/N | Gene annotation | MTBC Strain | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F1 | F28 | 26 | 30 | 12 | 25 | 2242 | 2279 | 22115 | 37004 | 22103 | 26105 | ||||
| G | Rv1187 | Essential | pyrroline-5-carboxylate dehydrogenase RocA | NA | NA | -/- | -/- | -/+ | -/- | -/- | -/- | NA | NA | -/- | NA |
| Rv2070c | Non-essential | precorrin-6A reductase | NA | NA | -/- | -/- | -/- | -/- | -/+ | -/- | NA | NA | -/- | NA | |
| Rv1753c | Essential | PPE family protein PPE24 | NA | NA | -/- | -/- | +/+ | -/- | -/- | -/- | NA | NA | -/- | NA | |
| Rv0112 | Essential | GDP-mannose 4,6-dehydratase | NA | NA | -/- | -/- | -/+ | -/- | -/- | +/+ | NA | NA | -/+ | NA | |
| Rv2963 | Non-essential | integral membrane protein | NA | NA | -/- | -/- | +/+ | -/- | -/- | -/- | NA | NA | -/- | NA | |
| Rv3341 | Essential | homoserine O-acetyltransferase | NA | NA | -/- | -/- | +/+ | -/- | -/- | -/+ | NA | NA | -/- | NA | |
| Rv0405 | Non-essential | membrane bound polyketide synthase | NA | NA | +/+ | -/- | -/- | -/- | -/- | -/+ | NA | NA | -/- | NA | |
| Rv1461 | Essential | Fe-S cluster assembly protein SufB | NA | NA | -/- | -/- | -/+ | -/- | +/+ | -/+ | NA | NA | -/- | NA | |
| Rv2130c | Essential | D-glucopyranoside ligase | NA | NA | -/- | -/- | +/+ | -/- | +/+ | +/+ | NA | NA | +/+ | NA | |
| Rv2984 | No-data | polyphosphate kinase | NA | NA | -/- | -/- | +/+ | -/- | +/+ | +/+ | NA | NA | +/+ | NA | |
| C | Rv0450c | Essential | transmembrane transport protein MmpL4 | -/- | -/- | -/- | -/- | -/- | -/- | NA | NA | -/- | -/- | -/- | -/- |
| Rv1562c | No-data | malto-oligosyltrehalose trehalohydrolase | -/- | -/- | +/+ | -/+ | +/+ | -/- | NA | NA | -/+ | -/- | -/- | -/- | |
| Rv1461 | Essential | Fe-S cluster assembly protein SufB | -/- | -/- | +/+ | -/+ | +/+ | -/- | NA | NA | -/- | -/- | +/+ | -/- | |
| Rv2501c | No-data | acetyl-/propionyl-CoA carboxylase subunit alpha | +/+ | +/+ | -/+ | -/+ | +/+ | -/+ | NA | NA | -/- | -/- | -/- | -/+ | |
| Rv1917c | Non-essential | PPE family protein, PPE34 | -/+ | -/+ | +/+ | +/+ | +/+ | +/+ | NA | NA | -/- | -/- | -/- | -/- | |
| Rv3282 | Essential | Maf-like protein | +/+ | -/+ | +/+ | +/+ | +/+ | +/+ | NA | NA | -/- | -/- | -/+ | -/+ | |
| Rv0400c | Essential | acyl-CoA dehydrogenase FadE7 | +/+ | -/- | +/+ | +/+ | -/+ | +/+ | NA | NA | -/- | -/+ | +/+ | -/- | |
| Rv1664 | Non-essential | polyketide synthase | +/+ | +/+ | -/+ | +/+ | +/+ | +/+ | NA | NA | -/- | -/+ | -/+ | +/+ | |
| Rv2174 | Essential | alpha-(1->6)-mannopyranosyltransferase A | +/+ | +/+ | +/+ | +/+ | +/+ | +/+ | NA | NA | -/- | -/- | -/+ | -/+ | |
| Rv1552 | No-data | fumarate reductase flavoprotein subunit | +/+ | +/+ | +/+ | +/+ | +/+ | +/+ | NA | NA | -/- | -/+ | +/+ | +/+ | |
E/N: essential / nonessential genes. NA: Not applicable. ‘-/-’ denotes the gene with unmethylated motifs on both strands. ‘-/+’ indicates gene with hemimethylated motif. ‘+/+’ represents the gene with methylated motifs on both strands.