| Literature DB >> 35315699 |
Md Mamun Monir1, Talal Hossain1, Masatomo Morita2, Makoto Ohnishi2, Fatema-Tuz Johura1, Marzia Sultana1, Shirajum Monira1, Tahmeed Ahmed1, Nicholas Thomson3, Haruo Watanabe2, Anwar Huq4, Rita R Colwell4,5, Kimberley Seed6, Munirul Alam1.
Abstract
Comparative genomic analysis of Vibrio cholerae El Tor associated with endemic cholera in Asia revealed two distinct lineages, one dominant in Bangladesh and the other in India. An in-depth whole-genome study of V. cholerae El Tor strains isolated during endemic cholera in Bangladesh (1991 to 2017) included reference genome sequence data obtained online. Core genome phylogeny established using single nucleotide polymorphisms (SNPs) showed V. cholerae El Tor strains comprised two lineages, BD-1 and BD-2, which, according to Bayesian phylodynamic analysis, originated from paraphyletic group BD-0 around 1981. BD-1 and BD-2 lineages overlapped temporally but were negatively associated as causative agents of cholera during 2004 to 2017. Genome-wide association study (GWAS) revealed 140 SNPs and 31 indels, resulting in gene alleles unique to BD-1 and BD-2. Regression analysis of root to tip distance and year of isolation indicated early BD-0 strains at the base, whereas BD-1 and BD-2 subsequently emerged and progressed by accumulating SNPs. Pangenome analysis provided evidence of gene acquisition by both BD-1 and BD-2, of which six crucial proteins of known function were predominant in BD-2. BD-1 and BD-2 diverged and have distinctively different genomic traits, namely, heterogeneity in VSP-2, VPI-1, mobile elements, toxin encoding elements, and total gene abundance. In addition, the observed phage-inducible chromosomal island-like element (PLE1), and SXT ICE elements (ICETET) in BD-2 presumably provided a fitness advantage for the lineage to outcompete BD-1 as the etiological agent of endemic cholera in Bangladesh, with implications for global cholera epidemiology. IMPORTANCE Cholera is a global disease with specific reference to the Bay of Bengal Ganges Delta where Vibrio cholerae O1 El Tor, the causative agent of the disease showed two circulating lineages, one dominant in Bangladesh and the other in India. Results of an in-depth genomic study of V. cholerae associated with endemic cholera during the past 27 years (1991 to 2017) indicate emergence and succession of the two lineages, BD-1 and BD-2, arising from a common ancestral paraphyletic group, BD-0, comprising the early strains and short-term evolution of the bacterium in Bangladesh. Among the two V. cholerae lineages, BD-2 supersedes BD-1 and is predominant in the most recent endemic cholera in Bangladesh. The BD-2 lineage contained significantly more SNPs and indels, and showed richness in gene abundance, including antimicrobial resistance genes, gene cassettes, and PLE to fight against bacteriophage infection, acquired over time. These findings have important epidemic implications on a global scale.Entities:
Keywords: Vibrio cholerae lineages; antimicrobial resistance; bacterial evolution; comparative genomics; phage-inducible chromosomal island-like elements (PLE)
Mesh:
Substances:
Year: 2022 PMID: 35315699 PMCID: PMC9045249 DOI: 10.1128/spectrum.00391-22
Source DB: PubMed Journal: Microbiol Spectr ISSN: 2165-0497
FIG 1Phylogenetic analyses of strains showing respective genomic features and year of isolation. (A) Maximum likelihood phylogenetic tree generated from whole-genome SNPs and number of isolated V. cholerae O1 El Tor strains belonging to lineages BD-0, BD-1, and BD-2 rooted from out-group reference strain Vibrio cholerae N16961. Rings show features of the isolates according to the color scheme provided on the left. Tree branches are colored blue, green, and red defining lineages BD-0, BD-1, and BD-2, respectively. (B) Unrooted tree showing independent evolution of BD-1 and BD-2 strains with the number of core genome SNPs of strains in the lineages compared to the N16961 reference strain. (C) Percentage of isolates per year for the three lineages. The size of the circles indicates the percentage of strains belonging to lineages according to the scheme shown.
FIG 2Box plots of SNPs distribution and indel type in each of three lineage groups. (A) Distribution of 337 synonymous SNP variants. This figure shows that strains of BD-2 lineage accumulated more synonymous SNP variants compared to BD-0 and BD-1 lineages. Notably, synonymous SNP variants do not change the form of protein. (B) Distribution of 613 nonsynonymous SNP variants. These non-synonymous SNP variants include 570 missense variants, 38 stop gained variants, 2 splice-region-variants and stop-retained-variants, 2 stop-lost and splice-region-variants, 1 initiator codon variant. (C) Distribution of 348 upstream/downstream SNP variants. (D) Distribution of 238 frameshift indel variants. (E) Distribution of 107 upstream/downstream indel variants. (F) Distribution of 68 indel variants, including 13 conservative-inframe-insertions, 14 disruptive-inframe-insertions, 11 frameshift-variant and stop-gained, 10 disruptive-inframe-deletions, 10 conservative-inframe-deletions, 1 stop-gained and disruptive-inframe-deletions, 2 feature-elongations, 1 frameshift-variant and stop-lost and splice-region-variant, 1 stop-gained and disruptive-inframe-insertion, 2 frameshift-variant and splice-region-variant, 2 frameshift-variant and start-lost, 1 stop-gained and conservative-inframe-insertion.
SNPs resulted unique mutant proteins in BD1 and BD2
| SNP | REF | ALT | FrqBD1 | FrqBD2 | Gene | AA change | Product | |
|---|---|---|---|---|---|---|---|---|
| S1_2609994 | G | A | 0 | 105 | 5.61E−53 | nudF_1 | Arg109Cys | ADP-ribose pyrophosphatase |
| S2_266019 | A | G | 0 | 105 | 5.61E−53 | ulaA | Ile354Thr | Ascorbate-specific permease IIC component UlaA |
| S2_1024884 | G | A | 0 | 105 | 5.61E−53 | putA | Ala600Val | Bifunctional protein PutA |
| S2_989172 | C | T | 0 | 105 | 5.61E−53 | yecS | Pro191Ser | YecS |
| S1_798976 | T | C | 0 | 105 | 5.61E−53 | suhB | Glu217Gly | Inositol-1-monophosphatase |
| S1_994229 | G | A | 0 | 105 | 5.61E−53 | stcE_2 | Gly201Asp | Metalloprotease StcE precursor |
| S2_921045 | A | C | 0 | 105 | 5.61E−53 | ctpH_6 | Ile161Ser | Methyl-accepting chemotaxis protein CtpH |
| S1_1622584 | G | A | 0 | 105 | 5.61E−53 | cobB | Pro50Leu | NAD-dependent protein deacetylase |
| S2_773493 | T | A | 0 | 105 | 5.61E−53 | phhA | Gln19Leu | Phenylalanine-4-hydroxylase |
| S1_681574 | G | T | 0 | 105 | 5.61E−53 | glmM | Arg196Leu | Phosphoglucosamine mutase |
| S2_161094 | T | G | 0 | 105 | 5.61E−53 | siaT_5 | Ser241Ala | Sialic acid TRAP transporter permease protein SiaT |
| S1_1452755 | T | C | 0 | 105 | 5.61E−53 | cysG_1 | Val38Ala | Siroheme synthase |
| S1_2731709 | G | A | 0 | 105 | 5.61E−53 | tamA | Thr266Ile | Translocation and assembly module TamA precursor |
| S1_545919 | T | G | 0 | 104 | 4.32E−51 | pctB_1 | Leu249Trp | Methyl-accepting chemotaxis protein PctB |
| S1_2814292 | T | C | 0 | 102 | 4.43E−48 | argG | Thr283Ala | Argininosuccinate synthase |
| S1_1332186 | T | G | 0 | 99 | 1.96E−44 | gyrA | Asp660Glu | DNA gyrase subunit A |
| S1_149686 | G | T | 0 | 99 | 1.96E−44 | murI | Ala137Ser | Glutamate racemase |
| S2_562858 | A | T | 0 | 99 | 1.96E−44 | VCA0627 | Thr6Ser | rRNA methylase |
| S1_628646 | C | T | 0 | 85 | 1.32E−32 | hrpB_1 | Ala782Val | ATP-dependent RNA helicase HrpB |
| S1_673206 | A | G | 0 | 85 | 1.32E−32 | tyrS_2 | Thr393Ala | Tyrosine—tRNA ligase |
| S1_2357516 | G | A | 0 | 79 | 7.24E−29 | angR | Leu227Phe | Anguibactin system regulator |
| S1_2483236 | G | A | 66 | 0 | 4.18E−39 | lysX | Ala150Thr | Alpha-aminoadipate—LysW ligase LysX |
| S1_1682925 | C | T | 67 | 0 | 3.63E−40 | appC | Ala226Thr | Cytochrome bd-II ubiquinol oxidase subunit 1 |
| S1_368119 | T | C | 67 | 0 | 3.63E−40 | mutL | Cys350Arg | DNA mismatch repair protein MutL |
| S1_1359179 | G | A | 67 | 0 | 3.63E−40 | licH | Ala56Thr | putative 6-phospho-beta-glucosidase |
| S1_1060408 | C | T | 71 | 0 | 6.86E−45 | nagA_1 | Asp150Asn | N-acetylglucosamine-6-phosphate deacetylase |
| S1_276112 | G | A | 76 | 0 | 5.61E−53 | mak | Gly116Arg | Fructokinase |
| S1_1782501 | G | A | 76 | 0 | 5.61E−53 | cph2_4 | Leu79Phe | Phytochrome-like protein cph2 |
Here, SNP refers to the SNPs which had alternative alleles uniquely found in more than 80% of BD1 or BD-2 strains, located within proteins of known functions and altered amino acids. SNPs were named according to their chromosomal position. For example, “S1_2609994” is an SNP/indel site, where “S” stands for the site and “2609994” stands for the site's base pair location. Reference allele = REF, alternative allele = ALT, AA change = amino acid change. Freq_BD1 is the frequency of an alternative allele in BD1 and Freq_BD2 is the frequency of an alternative allele in BD2. Note that, the frequencies of alternative alleles of the SNPs are zero for BD-0. P value is from the Fisher exact test.
FIG 3SNP analysis of genetic diversity. (A) Phylogenetic treemap of the strains and heat map for genotypes of 140 SNPs are significantly associated with different lineages. The colors delineate four different nucleotides where white represents the missing genotype. Heatmap shows clear differences in the lineages. (B) Number of core genome SNPs referencing the year of isolation. The figure shows the steady accumulation of SNPs of different lineage strains over time. (C) Regression analysis of root-to-tip distance for strains of the lineages. This figure shows the diversity of strains of different lineages. (D) Miami plot of alternative allele frequencies of SNPs for the dominant lineages BD-1 and BD-2. This figure shows the clear difference in SNP accumulation by the two dominant lineages BD-1 and BD-2.
FIG 4Pangenome analysis showing differences in the abundance of gene clusters among the lineages. (A) Relative gene abundance of lineages identified by Roary. Features of the sequences are shown with bars and details for features listed in Table S1. (B) BLAST coverage of SXT regions of BD-1 isolates compared with ICE-GEN. Rings represent sequentially outwards following Table S1. Outermost ring shows the different genes of ICE-GEN. (C) BLAST coverage of SXT regions of BD-2 isolates compared with ICE-TET. The rings represent strains of BD-2 sequentially outwards following Table S1. The outermost ring shows different genes of ICE-TET.
FIG 5Schematic diagram of VSP-II. Schematic alignment view of VSP-II regions for the isolates. Direction of gene transcription is indicated by arrows and gene shadows represent functional annotation. Six types were identified with all BD-0 strains wild-type VSP-II. Two major types, var-2 and var-3, were observed for most BD-2 strains and one major type var-4 for most BD-1 strains.