| Literature DB >> 22163226 |
Chung-Yi Hsu1, Chia-Wei Wu, Adel M Talaat.
Abstract
Mycobacterium avium subspecies paratuberculosis (M. ap), the causative agent of Johne's disease, infects many farmed ruminants, wild-life animals, and recently isolated from humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole-genome sequences of several M. ap and M. avium subspecies avium (M. avium) isolates to gain insights into genomic diversity associated with variable hosts and environments. Using Next-generation sequencing technology, all six M. ap isolates showed a high percentage of similarity (98%) to the reference genome sequence of M. ap K-10 isolated from cattle. However, two M. avium isolates (DT 78 and Env 77) showed significant sequence diversity (only 87 and 40% similarity, respectively) compared to the reference strain M. avium 104, a reflection of the wide environmental niches of this group of mycobacteria. Within the M. ap isolates, genomic rearrangements (insertions/deletions) were not detected, and only unique single nucleotide polymorphisms (SNPs) were observed among M. ap isolates. While more of the SNPs (~100) in M. ap genomes were non-synonymous, a total of ~6,000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomics had a enough discriminatory power to differentiate between isolates from different hosts but yet suggesting a bovine source of infection to other animals examined in this study. Interestingly, the human isolate (M. ap 4B) was closely related to a M. ap isolate from a dairy facility, suggesting a common source of infection. Overall, the identified phylo-genomes further supported the idea of a common ancestor to both M. ap and M. avium isolates. Genome-wide analysis described here could provide a strong foundation for a population genetic structure that could be useful for the analysis of mycobacterial evolution and for the tracking of Johne's disease transmission among animals.Entities:
Keywords: Johne’s disease; Mycobacteria; genomics; paratuberculosis; pathogenesis; whole-genome sequencing
Year: 2011 PMID: 22163226 PMCID: PMC3234532 DOI: 10.3389/fmicb.2011.00236
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
A list of mycobacterial isolates used in this study.
| Strain | Organism | Genes used to verify identity | Host | Sample origin |
|---|---|---|---|---|
| ATCC 19698 | 16s rRNA, IS1311 | Cow | Feces | |
| JTC 1281 | 16s rRNA, IS1311 | Oryx | Lymph node | |
| JTC 1285 | 16s rRNA, IS1311 | Goat | Feces | |
| 16s rRNA, IS1311 | Human | Ileum | ||
| DT 3 | 16s rRNA, IS1311 | British red deer | Feces | |
| Env 210 | 16s rRNA, IS1311 | Dairy farm | Environment | |
| DT 78 | 16s rRNA, Hsp65 | Water buffalo | Ileum | |
| Env 77 | 16s rRNA, Hsp65 | Dairy farm | Environment |
A summary report for CLC Bio reference assembly of .
| ATCC 19698 | JTC 1281 | JTC 1285 | DT 3 | Env 210 | DT 78 | ||
|---|---|---|---|---|---|---|---|
| Reference organism | |||||||
| Reference length | 4,832,589 | 4,832,589 | 4,832,589 | 4,832,589 | 4,832,589 | 4,832,589 | 5,475,491 |
| Total read count | 5,994,312 | 6,729,396 | 4,645,230 | 5,985,952 | 6,374,242 | 6,294,162 | 6,978,706 |
| Matched read count | 5,417,459 | 6,522,333 | 4,164,731 | 5,391,674 | 6,177,155 | 6,080,493 | 5,637,136 |
| Non-specific match read count | 53,145 | 56,051 | 39,879 | 54,951 | 50,700 | 53,340 | 61,192 |
| Consensus length | 4,822,328 | 4,815,985 | 4.823.742 | 4,823,165 | 4,815,376 | 4,817,334 | 4,808,427 |
| Homology (%) | 99.79 | 99.66 | 99.82 | 99.80 | 99.64 | 99.68 | 87.82 |
| Average coverage | 55.77 | 68.71 | 42.87 | 55.50 | 65.07 | 64.05 | 51.16 |
.
.
.
Figure 1A whole-genome alignment of . MAUVE algorithm (Darling et al., 2010) was used for the alignment of the three genomes where white areas indicate low coverage gaps in the sequence of M. avium DT 78 genome, and about seven large region Indels were identified in M. avium DT 78. Regions with the same color indicate high similarity and connected by same color bars. The genomes were drawn to scale based on the reference M. avium 104 genome.
Figure 2Genome composition of . MegaBLAST algorithm was used to identify closely related bacteria to all contig sequences from the M. avium Env 77 isolate. Genomes with <10% homology were excluded from representation. Members of the M. tuberculosis complex included M. tuberculosis and M. bovis with sequence divergence <5%. The same criteria was used to formulate M. avium and M. ap groups.
A list of accession numbers of genome deposited to GenBank.
| Organisms | Accession number |
|---|---|
| AGAR00000000 | |
| AGAK00000000 | |
| AGAL00000000 | |
| AGAM00000000 | |
| AGAN00000000 | |
| AGAO00000000 | |
| AGAP00000000 | |
| AGAQ00000000 |
Figure 3Comparative analysis of . The gapped consensus sequence of each strain was used for comparison by MAUVE version 2.3.1. (A) A close-up depiction of a breaking point in the alignment of six M. ap genomes in comparison to M. ap K-10 reference genome. The white areas indicated low or zero reads. In this example, the flanking sequences of the breaking point contain high GC percentage sequence but not repetitive sequences. (B) Indels among M. ap and M. avium genomes. Notice genome rearrangements are usually surrounding the genome origin of replication.
A list of genes in the 11 kb island which is absent in .
| New annotation (Wynne et al., | Old annotation (Li et al., | Length (bp) | Function |
|---|---|---|---|
| MAPK 2038 | MAP 1730c | 1,023 | Hypothetical protein |
| MAPK 2039 | MAP 1729c | 828 | Hypothetical protein |
| MAPK 2040 | MAP 1728c | 723 | YfnB-hydrolase |
| MAPK 2041 | MAP 1727 | 906 | Hypothetical protein |
| MAPK 2042 | MAP 1726c | 585 | Hypothetical protein |
| MAPK 2043 | MAP 1725c | 1,029 | Hypothetical protein |
| MAPK 2044 | MAP 1724c | 558 | Hypothetical protein |
| MAPK 2045 | MAP 1723 | 666 | Hypothetical protein |
| MAPK 2046 | MAP 1722 | 1,221 | Hypothetical protein |
| MAPK 2047 | MAP 1721c | 672 | Hypothetical protein |
| MAPK 2048 | MAP 1720 | 1,020 | Hypothetical protein |
| MAPK 2049 | MAP 1719c | 615 | Hypothetical protein |
| MAPK 2050 | MAP 1718c | 456 | MAP specific protein |
Figure 4The total number of single nucleotide polymorphism (SNP) among . The number of nSNP (non-synonymous) and sSNP (synonymous) and SNPs in the intergenic regions are color coded as indicated. SNPs were detected using reference assembled sequences of each strain. About 60–130 SNPs were detected M. ap isolates. Percentage of nSNP is generally higher than sSNP which indicates a high selective pressure in these strains.
A list of non-synonymous SNPs in .
| Strains | K-10 position | K-10 allele | Variation | Gene | Function | |
|---|---|---|---|---|---|---|
| 1 | All 6 strains | 3,259,329 | C | T | MAPK 2850 | Trypsin-like serine protease |
| 2 | All 6 strains | 4,394,282 | A | G | MAPK 3393 | Fucose permease |
| 3 | All 6 strains | 2,041,445 | T | C | glnE | Glutamine synthase |
| 4 | ATCC 198698, JTC 1281, JTC 1285, DT 3, Env 210 | 1,169,976 | A | C | MAPK 1064 | Hemolysin-like protein |
| 5 | ATCC 198698, JTC 1281, JTC 1285, DT 3, Env 210 | 91,310 | A | G | nirB | Nitrate reductase |
| 6 | JTC 1281, JTC 1285, | 3,133,871 | G | A | speE | Spermidine synthase |
| 7 | JTC 1281, | 2,806,612 | G | T | cydD | ATP-binding protein ABC transporter CydD |
| 8 | ATCC 19698, JTC 1281, JTC 1285, DT 3 | 3,278,891 | A | T | pyrH | Uridylate kinase PyrH |
| 9 | ATCC 19698, JTC 1281, DT 3 | 1,204,735 | T | C | bpoB | Peroxidase BpoB |
| 10 | JTC 1281, JTC 1285, Env 210 | 4,206,587 | C | T | pks2 | Polyketide synthase Pks2 |
| 11 | 1,50,857 | G | C | lipW | Esterase LipW | |
| 12 | 2,25,551 | C | T | fctA | Transferase | |
| 13 | 6,47,971 | C | A | nuoL | NADH dehydrogenase subunit L | |
| 14 | 2,353,857 | C | A | MAPK 2071, hspR | Heat shock regulator protein | |
| 15 | 3,981,515 | G | A | pks13 | Polyketide synthase Pks13 | |
| 16 | 4,262,844 | T | G | MAPK 3814 | Lipoprotein | |
| 17 | ATCC 19698, DT 3 | 1,363,662 | A | C | MAPK 1234 | Arabinose efflux permease |
A list of genes that harbored >1 SNPs and its SNPs density.
| No. | K-10 position | N/S | Annotations | Size of gene | SNP density |
|---|---|---|---|---|---|
| 1 | 38,870 | N | MAPK 0028 | 11,206 | 5,603 |
| 46,975 | N | MAPK 0028 | |||
| 2 | 1,42,825 | N | MAPK 0106 | 1,481 | 740.5 |
| 1,42,835 | N | MAPK 0106 | |||
| 3 | 1,57,866 | N | fadD4 | 1,517 | 758.5 |
| 1,57,867 | S | fadD4 | |||
| 4 | 2,05,752 | S | mecD | 1,613 | 806.5 |
| 2,05,753 | N | mecD | |||
| 5 | 5,09,508 | S | MAPK 0430 | 524 | 262 |
| 5,09,633 | N | MAPK 0430 | |||
| 6 | 6,93,293 | S | MAPK 0603 | 1,166 | 583 |
| 6,93,483 | N | MAPK 0603 | |||
| 7 | 1,233,762 | N | MAPK 1125 | 500 | 250 |
| 1,233,871 | N | MAPK 1125 | |||
| 8 | 1,603,684 | N | MAPK 1444 | 2,354 | 1,177 |
| 1,604,357 | S | MAPK 1444 | |||
| 9 | 1,910,469 | S | MAPK 1687 | 1,730 | 865 |
| 1,910,564 | N | MAPK 1687 | |||
| 10 | 1,994,322 | S | MAPK 1761 | 1,061 | 530.5 |
| 1,994,370 | S | MAPK 1761 | |||
| 11 | 2,040,806 | N | GlnE | 2,996 | 998.7 |
| 2,040,946 | N | GlnE | |||
| 2,041,445 | N | GlnE | |||
| 12 | 2,150,741 | N | MAPK 1898 | 4,377 | 2188.5 |
| 2,151,370 | S | MAPK 1898 | |||
| 13 | 2,353,857 | N | MAPK 2071 | 380 | 190 |
| 2,353,858 | S | MAPK 2071 | |||
| 14 | 2,367,415 | N | MAPK 2083 | 1,577 | 788.5 |
| 2,367,416 | N | MAPK 2083 | |||
| 15 | 2,605,368 | S | MAPK 2303 | 1,454 | 727 |
| 2,605,372 | N | MAPK 2303 | |||
| 16 | 2,664,521 | N | MAPK 2348 | 19,154 | 9,577 |
| 2,676,784 | S | MAPK 2348 | |||
| 17 | 2,785,781 | N | MAPK 2437 | 2,600 | 1,300 |
| 2,786,976 | S | MAPK 2437 | |||
| 18 | 2,92,000 | N | MAPK 2539 | 1,196 | 598 |
| 2,920,661 | S | MAPK 2539 | |||
| 19 | 3,888,659 | N | MAPK 3467 | 689 | 344.5 |
| 3,888,879 | N | MAPK 3467 | |||
| 20 | 4,040,757 | S | MAPK 3602 | 1,604 | 802 |
| 4,041,118 | N | MAPK 3602 | |||
| 21 | 4,079,226 | N | MAPK 3645 | 905 | 452.5 |
| 4,079,231 | S | MAPK 3645 | |||
| 22 | 4,206,587 | N | pks2 | 6,296 | 3,148 |
| 4,211,613 | S | pks2 | |||
| 23 | 4,773,398 | N | MAPK 4304 | 1,208 | 402.7 |
| 4,774,040 | N | MAPK 4304 | |||
| 4,774,041 | N | MAPK 4304 |
N, non-synonymous SNP; S, synonymous SNP.
A list of nSNP in cytochrome P450 proteins.
| Strains | K-10 position | K-10 allele | Variation | Gene | Amino acid change (functional consequence) |
|---|---|---|---|---|---|
| Env 210 | 1,227,540 | A | G | MAPK 1119 | Ile → Met (non-polar) |
| JTC 1285 | 1,301,615 | C | T | MAPK 1184 | Glu → Lys (Polar acidic → polar basic) |
| JTC 1285 | 2,024,939 | G | A | MAPK 1789 | Ala → Val (non-polar) |
| JTC 1281 | 1,973,792 | A | G | MAPK 1738 | Val → Ala (non-polar) |
| JTC 1281 | 3,841,168 | G | C | MAPK 3424 | Arg → Pro (polar basic → non-polar) |
A list of 10 SNPs picked for Sanger sequencing.
| No. | Gene name | K-10 position | K-10 allele | ATCC 19698 | JTC 1285 | JTC 1281 | ||
|---|---|---|---|---|---|---|---|---|
| 1 | MAP 2578 | 2,900,072 | G | C | C | N/D | C | C |
| 2 | MAP 2578 | 2,900,076 | A | T | T | N/D | T | T |
| 3 | MAP 3165 | 3,517,478 | C | A | C | N/D | C | C |
| 4 | MAP 3165 | 3,517,668 | C | G | G | G | G | G |
| 5 | MAP 3391c | 3,767,913 | T | G | G | G | G | G |
| 6 | MAP 3391c | 3,767,939 | G | C | C | C | C | C |
| 7 | Neighbor of rpoC | 4,606,712 | G | G | C | N/D | G | G |
| 8 | rpoC | 4,607,283 | G | A | G | N/D | G | G |
| 9 | MAP 4302c | 4,771,441 | A | T | T | T | T | T |
| 10 | MAP 4302c | 4,771,588 | G | A | A | A | A | A |
N/D, not detected.
Figure 5Phylogenomic analysis of . (A) A dendrogram displaying an un-rooted, Neighbor-joining tree of the concatenated SNPs from all eight mycobacterial isolates under study. (B) A rooted Neighbor-joining tree using M. ah 104 genome as out group. The bootstrap consensus tree inferred from 1,000 replicates is taken to represent the evolutionary history of the taxa analyzed. The bootstrap replicates are marked on each branch and a less than 50% bootstrap replicates were collapsed. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test is shown next to the branches.