| Literature DB >> 19806175 |
Weihong Qi1, Michael Käser, Katharina Röltgen, Dorothy Yeboah-Manu, Gerd Pluschke.
Abstract
Mycobacterium ulcerans is the causative agent of Buruli ulcer, the third most common mycobacterial disease after tuberculosis and leprosy. It is an emerging infectious disease that afflicts mainly children and youths in West Africa. Little is known about the evolution and transmission mode of M. ulcerans, partially due to the lack of known genetic polymorphisms among isolates, limiting the application of genetic epidemiology. To systematically profile single nucleotide polymorphisms (SNPs), we sequenced the genomes of three M. ulcerans strains using 454 and Solexa technologies. Comparison with the reference genome of the Ghanaian classical lineage isolate Agy99 revealed 26,564 SNPs in a Japanese strain representing the ancestral lineage. Only 173 SNPs were found when comparing Agy99 with two other Ghanaian isolates, which belong to the two other types previously distinguished in Ghana by variable number tandem repeat typing. We further analyzed a collection of Ghanaian strains using the SNPs discovered. With 68 SNP loci, we were able to differentiate 54 strains into 13 distinct SNP haplotypes. The average SNP nucleotide diversity was low (average 0.06-0.09 across 68 SNP loci), and 96% of the SNP locus pairs were in complete linkage disequilibrium. We estimated that the divergence of the M. ulcerans Ghanaian clade from the Japanese strain occurred 394 to 529 thousand years ago. The Ghanaian subtypes diverged about 1000 to 3000 years ago, or even much more recently, because we found evidence that they evolved significantly faster than average. Our results offer significant insight into the evolution of M. ulcerans and provide a comprehensive report on genetic diversity within a highly clonal M. ulcerans population from a Buruli ulcer endemic region, which can facilitate further epidemiological studies of this pathogen through the development of high-resolution tools.Entities:
Mesh:
Year: 2009 PMID: 19806175 PMCID: PMC2736377 DOI: 10.1371/journal.ppat.1000580
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 6.823
Mycobacterium ulcerans strains sequenced in this study.
| Strain | Year of isolation | Place of origin | MIRU 1 allele | STI allele |
| NM20/02 | 2002 | Ga District, Greater Accra region, Ghana | B | BD |
| NM31/04 | 2004 | Amansie West District, Ashanti region, Ghana | BAA | C |
| Jp8756 | 1980 | Japan | nd | CF |
| Agy99 | 1999 | Ga district, Greater Accra region, Ghana | BAA | BD |
Reference strain
Hilty et al., 2006 [7]
Not determined
Summary of next generation sequencing results.
| Strain | NM20/02 | NM31/04 | Jp8756 | |
| Sequencing method | Roche 454 GS FLX | Illumina Solexa GA | Illumina Solexa GA | |
| Total no. Reads | 424,494 | 2,538,429 | 2,651,276 | |
| Averaged read length (nt) | 213 | 35 | 35 | |
| Total sequences (nt) | 90,299,836 | 88,845,015 | 92,794,660 | |
| Map to Agy99 chromosome | Total no. reads mapped (%) | 382,116 (90.01) | 2,343,269 (92.31) | 2,279,741 (85.99) |
| % genome mapped | 93.72 | 99.99 | 94.47 | |
| Average depth of mapped regions | 14.5 | 14.1 | 13.6 | |
| Map to Agy99 plasmid pMUM001 | Total no. reads mapped (%) | 5,269 (1.24) | 326,541 (12.86) | 89,325 (3.37) |
| % genome mapped | 32.56 | 100 | 20.35 | |
| Average depth of mapped regions | 19.8 | 63.4 | 17.2 | |
Figure 1Venn diagram of single nucleotide polymorphisms in M. ulcerans strains.
Figure 2Summary of single nucleotide polymorphisms in M. ulcerans strains.
The number in parentheses represents the total number of SNPs in each sequenced strain. The color bars show the distributions of different SNP categories in each strain.
Figure 3Distribution of number of SNPs per gene.
Genes potentially under selection ordered by SNP density.
| Locus_tag | Locus | Product | COG | SNP density (bp per SNP) | Selection |
| MUL_3769 | - | hypothetical protein | - | 39 | |
| MUL_4312 | - | hypothetical protein | - | 41 | − |
| MUL_4235 | - | hypothetical protein | - | 45 | |
| MUL_5054 |
| ESAT-6 like protein EsxE | COG1314U | 46 | + |
| MUL_3425 |
| multidrug-transport integral membrane protein Mmr | - | 46 | + |
| MUL_1135 | - | hypothetical protein | COG1902C | 46 | |
| MUL_5072 |
| glucose-inhibited division protein B Gid | - | 47 | |
| MUL_4359 | - | PE family protein | COG0357M | 48 | |
| MUL_3746 |
| globin (oxygen-binding protein) GlbO | - | 48 | − |
| MUL_4906 | - | hypothetical protein | - | 50 | |
| MUL_0630 | - | hypothetical protein | COG0500QR | 51 | |
| MUL_5017 | - | hypothetical protein | - | 52 | |
| MUL_4764 | - | hypothetical protein | COG0500QR | 54 | |
| MUL_2201 | - | hypothetical protein | - | 55 | |
| MUL_0760 | - | hypothetical protein | - | 56 | |
| MUL_3596 | - | hypothetical protein | - | 56 | |
| MUL_1662 |
| lactoylglutathione lyase, GloA | - | 56 | |
| MUL_2106 | - | hypothetical protein | COG0315H | 57 | + |
| MUL_4509 | - | hypothetical protein | - | 57 | |
| MUL_4133 |
| lipoprotein DsbF | - | 58 | |
| MUL_0355 | - | PE-PGRS family protein family protein | COG2346R | 60 | + |
| MUL_3885 |
| enoyl-CoA hydratase, EchA4_2 | COG1773C | 60 | |
| MUL_5108 | - | transposase | COG1119P | 60 | |
| MUL_0384 | - | hypothetical protein | - | 61 | |
| MUL_5010 | - | phosphoglycerate mutase | COG0406G | 62 | |
| MUL_0161 | - | hypothetical protein | COG2076P | 63 | + |
| MUL_4655 |
| serine acetyltransferase CysE_1 | COG1278K | 63 | |
| MUL_2839 | - | hypothetical protein | - | 64 | |
| MUL_2263 | - | hypothetical protein | COG0793M | 64 | |
| MUL_1479 |
| thioredoxin TrxB1 | COG0526OC | 66 | |
| MUL_4386 | - | hypothetical protein | COG0620E,COG1309K | 67 | |
| MUL_5055 |
| ESAT-6 like protein EsxF | COG4842S | 67 | |
| MUL_0327 | - | oxidoreductase | COG1028IQR | 67 | |
| MUL_3206 | - | hypothetical protein | - | 67 | |
| MUL_3717 | - | hypothetical protein | - | 68 | |
| MUL_0010 | - | hypothetical protein | - | 70 | |
| MUL_3264 |
| ferredoxin FdxA_1 | COG1146C | 70 | |
| MUL_1435 | - | exported protein | COG0704P | 70 | − |
| MUL_3277 | - | hypothetical protein | - | 71 | |
| MUL_3581 | - | phage-related integrase | COG0582L | 71 | |
| MUL_1216 |
| trans-aconitate methyltransferase Tam | COG1522K | 72 | |
| MUL_2917 | - | hypothetical protein | - | 72 | |
| MUL_1771 | - | hypothetical protein | COG1670J | 72 | |
| MUL_0951 | - | hypothetical protein | - | 73 | |
| MUL_0889 | - | hypothetical protein | - | 73 | |
| MUL_0993 | - | transcriptional regulatory protein | COG0236IQ | 73 | |
| MUL_4846 | - | hypothetical protein | - | 73 | − |
| MUL_0366 |
| methylmalonyl-CoA mutase alpha subunit, McmA2b | - | 73 | − |
| MUL_1003 | - | hypothetical protein | - | 74 | |
| MUL_4682 | - | hypothetical protein | COG3391S | 74 | |
| MUL_4870 | - | short chain dehydrogenase | COG1028IQR | 74 | |
| MUL_0457 | - | hypothetical protein | COG2351R | 74 | |
| MUL_5109 | - | hypothetical protein | - | 75 | |
| MUL_0761 | - | hypothetical protein | - | 76 | |
| MUL_1274 |
| lipoprotein LppN | - | 76 | |
| MUL_2490 | - | hypothetical protein | - | 76 | |
| MUL_0820 | - | methyltransferase | COG0500QR,COG2226H | 76 | |
| MUL_2645 | - | AsnC family transcriptional regulator | COG0526OC | 76 | − |
| MUL_3440 | - | hypothetical protein | COG2185I | 77 | |
| MUL_3194 | - | hypothetical protein | COG2261S | 77 | |
| MUL_0424 | - | hypothetical protein | - | 77 | |
| MUL_5032 |
| immunogenic protein Mpt64 | COG0425O | 77 | |
| MUL_4365 | - | hypothetical protein | COG0393S | 77 | |
| MUL_4394 | - | hypothetical protein | COG0526OC | 78 | |
| MUL_3305 |
| hypothetical protein | COG1985H | 78 | |
| MUL_3524 | - | diphosphomevalonate decarboxylase | COG3407I | 78 | |
| MUL_0217 |
| lipoprotein LpqV | - | 78 | |
| MUL_0241 |
| 8-amino-7-oxononanoate synthase BioF2_1 | COG0156H | 79 | |
| MUL_5058 | - | hypothetical protein | - | 79 | |
| MUL_4336 | - | PE family protein | - | 79 | |
| MUL_4899 | - | hypothetical protein | - | 80 | − |
| MUL_4670 | - | hypothetical protein | - | 82 | − |
| MUL_2937 | - | ArsR-type repressor | COG1846K | 82 | − |
| MUL_0017 |
| para-aminobenzoate synthase component II | COG1695K | 86 | − |
| MUL_5067 |
| thioredoxin TrxC | COG1522K | 89 | − |
| MUL_2060 | - | hypothetical protein | - | 92 | − |
| MUL_0670 |
| acetyltransferase, RimL | COG0664T | 92 | − |
| MUL_4330 | - | hypothetical protein | - | 96 | − |
| MUL_0430 | - | hypothetical protein | COG2608P | 99 | − |
| MUL_1434 | - | hypothetical protein | - | 99 | − |
| MUL_0058 | - | transcriptional regulatory protein | COG1670J | 101 | − |
| MUL_1897 | - | ABC transporter ATP-binding protein | - | 105 | − |
| MUL_5123 | - | hypothetical protein | COG3576R | 105 | − |
| MUL_4776 | - | hypothetical protein | COG1309K | 107 | − |
| MUL_2966 | - | hypothetical protein | - | 110 | − |
| MUL_5035 | - | hypothetical protein | COG0792L | 111 | − |
| MUL_0441 |
| phosphate-transport system regulatory protein, PhoY2 | - | 112 | − |
| MUL_1835 |
| preprotein translocase subunit SecG | - | 117 | − |
| MUL_0825 | - | hypothetical protein | COG1359S | 122 | − |
| MUL_4918 |
| MCE-family protein Mce6B | COG1463Q | 129 | − |
| MUL_2369 | - | hypothetical protein | - | 132 | − |
| MUL_0065 | - | hypothetical protein | COG2353S | 136 | − |
| MUL_0035 | - | DNA-binding protein | COG1045E | 136 | − |
| MUL_3556 | - | integral membrane protein | COG0454KR | 140 | − |
| MUL_0015 | - | putative septation inhibitor protein | COG4842S | 141 | − |
| MUL_2243 |
| ESAT-6 family protein | - | 146 | − |
| MUL_4627 | - | hypothetical protein | - | 147 | − |
| MUL_2232 | - | molecular chaperone (small heat shock protein) | COG3585H | 150 | − |
| MUL_3337 | - | hypothetical protein | - | 152 | − |
| MUL_3714 |
| 50S ribosomal protein L21 | COG0261J | 156 | − |
| MUL_3030 |
| urease beta subunit UreB | COG0832E | 156 | − |
| MUL_4051 | - | hypothetical protein | - | 165 | − |
| MUL_4256 |
| transcriptional regulatory protein Whib-like WhiB4 | - | 183 | − |
“+” represents potential diversifying selection indicated by p values higher than mean + 3× standard deviation; “−” represents potential negative selection indicated by p values lower than mean−3× standard deviation.
Figure 4Compatibility matrix of parsimony informative SNPs.
The genome positions are numbered to the left of the matrix. Black squares indicate incompatible sites, where nucleotide changes are inferred to have occurred multiple times either due to recombination or repeated mutation. White squares represent compatible sites, at which all nucleotide changes can be inferred to have occurred only once in a phylogeny.
Figure 5NeighborNet network of the M. ulcerans strains based on the parsimony informative SNPs.
Bootstrap values shown close to branches are based on 1000 bootstrap replicates.
Figure 6Minimum evolution tree based on 1,032,790 allelic codons of the M. ulcerans and M. marinum strains.
The scale shows the divergence time frame and the number of synonymous substitutions per nucleotide site. The rate of synonymous substitution used for time calibration was 5.8×10−9 substitution per site per year.
Estimation of time of divergence between M. ulcerans Agy99 and other M. ulcerans and M. marinum strains.
| Strain |
| Est. divergence time | Est. divergence time |
| NM31/04 | 0.000017±0.000004 | 1,466±345 | 1,090±256 |
| NM20/02 | 0.00003±0.000006 | 2,586±517 | 1,923±385 |
| Jp8756 | 0.006141±0.000081 | 529,397±6,983 | 393,654±5,192 |
|
| 0.017642±0.000138 | 1,520,862±11,897 | 1,130,897±8,846 |
mean±standard error
Based on the rate of synonymous substitution of 5.8×10−9 per site per year
Based on the rate of synonymous substitution of 7.8×10−9 per site per year
Figure 7Nucleotide diversity among SNPs identified through genome comparison of three Ghanaian strains, for which complete SNP data have been collected in 54 Ghanaian M. ulcerans strains.
Figure 8Linkage disequilibrium among study loci.
A). The distribution of the linkage disequilibrium coefficient (D) for 1,176 pairwise comparisons of alleles at 54 sSNP and intergenic SNP loci. A total of 751 (64%) of these comparisons are significant by a chi-squared test, and600 (51%) remained significant using a Bonferroni correction for multiple tests. B). The distribution of the standardized coefficient of linkage disequilibrium (D'). Ninety-six percent of the locus pairs are in complete linkage disequilibrium.