| Literature DB >> 27303739 |
Katherine E Bowden1, Michael R Weigand1, Yanhui Peng1, Pamela K Cassiday1, Scott Sammons2, Kristen Knipe2, Lori A Rowe2, Vladimir Loparev2, Mili Sheth2, Keeley Weening3, M Lucia Tondella1, Margaret M Williams1.
Abstract
During 2010 and 2012, California and Vermont, respectively, experienced statewide epidemics of pertussis with differences seen in the demographic affected, case clinical presentation, and molecular epidemiology of the circulating strains. To overcome limitations of the current molecular typing methods for pertussis, we utilized whole-genome sequencing to gain a broader understanding of how current circulating strains are causing large epidemics. Through the use of combined next-generation sequencing technologies, this study compared de novo, single-contig genome assemblies from 31 out of 33 Bordetella pertussis isolates collected during two separate pertussis statewide epidemics and 2 resequenced vaccine strains. Final genome architecture assemblies were verified with whole-genome optical mapping. Sixteen distinct genome rearrangement profiles were observed in epidemic isolate genomes, all of which were distinct from the genome structures of the two resequenced vaccine strains. These rearrangements appear to be mediated by repetitive sequence elements, such as high-copy-number mobile genetic elements and rRNA operons. Additionally, novel and previously identified single nucleotide polymorphisms were detected in 10 virulence-related genes in the epidemic isolates. Whole-genome variation analysis identified state-specific variants, and coding regions bearing nonsynonymous mutations were classified into functional annotated orthologous groups. Comprehensive studies on whole genomes are needed to understand the resurgence of pertussis and develop novel tools to better characterize the molecular epidemiology of evolving B. pertussis populations. IMPORTANCE Pertussis, or whooping cough, is the most poorly controlled vaccine-preventable bacterial disease in the United States, which has experienced a resurgence for more than a decade. Once viewed as a monomorphic pathogen, B. pertussis strains circulating during epidemics exhibit diversity visible on a genome structural level, previously undetectable by traditional sequence analysis using short-read technologies. For the first time, we combine short- and long-read sequencing platforms with restriction optical mapping for single-contig, de novo assembly of 31 isolates to investigate two geographically and temporally independent U.S. pertussis epidemics. These complete genomes reshape our understanding of B. pertussis evolution and strengthen molecular epidemiology toward one day understanding the resurgence of pertussis.Entities:
Keywords: Bordetella pertussis; genome rearrangements; optical mapping; pertactin; whole-genome sequencing
Year: 2016 PMID: 27303739 PMCID: PMC4888882 DOI: 10.1128/mSphere.00036-16
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
Metadata for B. pertussis isolates collected during the California 2010 and Vermont 2012 statewide epidemics and vaccine strains that were selected for whole-genome sequencing
| Isolate | Location | Yr of | Age at | Vaccination | PFGE | MLVA | Prn | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| H374 | CA | 2010 | Infant | UV | CDC002 | 16 | 2 | 3 | 1 | 1 | WT | + |
| H375 | CA | 2010 | Infant | UV | CDC268 | 186 | 1 | 1 | 2 | 1 | Del nt 26-109 | − |
| H378 | CA | 2010 | Infant | UV | CDC253 | 27 | 2 | 3 | 1 | 1 | IS | − |
| H379 | CA | 2010 | Infant | UV | CDC046 | 27 | 2 | 3 | 1 | 2 | WT | + |
| H380 | CA | 2010 | Infant | UV | CDC013 | 27 | 2 | 3 | 1 | 2 | WT | + |
| H489 | CA | 2010 | Infant | UV | CDC082 | 27 | 2 | 3 | 1 | 2 | WT | + |
| H542 | CA | 2010 | Infant | UV | CDC269 | 27 | 2 | 3 | 1 | 1 | WT | + |
| H559 | CA | 2010 | Infant | UV | CDC253 | 27 | 2 | 3 | 1 | 1 | IS | − |
| H561 | CA | 2010 | Infant | UV | CDC170 | 16 | 2 | 3 | 1 | 2 | WT | + |
| H563 | CA | 2010 | Infant | ≥1 dose | CDC271 | 27 | 2 | 3 | 1 | 2 | WT | + |
| H564 | CA | 2010 | Child | UV | CDC013 | 27 | 2 | 3 | 1 | 2 | WT | + |
| H622 | CA | 2010 | Infant | UV | CDC217 | 27 | 2 | 3 | 1 | 1 | WT | + |
| H627 | CA | 2010 | Infant | UV | CDC217 | 27 | 2 | 3 | 1 | 1 | WT | + |
| H788 | VT | 2011 | Infant | UV | CDC046 | 128 | 2 | 3 | 1 | 2 | WT | + |
| I669 | VT | 2011 | Adult | UV | CDC013 | 27 | 2 | 3 | 1 | 2 | WT | + |
| I468 | VT | 2012 | Child | UV | CDC002 | 27 | 2 | 3 | 1 | 1 | SC @ nt 1273 | − |
| I469 | VT | 2012 | Child | ≥1 dose | CDC342 | 27 | 2 | 3 | 1 | 2 | IS | − |
| I472 | VT | 2012 | Adolescent | UV | CDC046 | 27 | 2 | 3 | 1 | 2 | IS | − |
| I475 | VT | 2012 | Adult | UV | CDC237 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I476 | VT | 2012 | Adolescent | UTD | CDC300 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I480 | VT | 2012 | Child | ≥1 dose | CDC217 | 27 | 2 | 3 | 1 | 1 | WT | + |
| I483 | VT | 2012 | Infant | UTD | CDC237 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I488 | VT | 2012 | Child | ≥1 dose | CDC343 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I496 | VT | 2012 | Child | UTD | CDC343 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I498 | VT | 2012 | Adult | ≥1 dose | CDC253 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I517 | VT | 2012 | Child | ≥1 dose | CDC344 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I518 | VT | 2012 | Child | UTD | CDC002 | 27 | 2 | 3 | 1 | 1 | SC @ nt 1273 | − |
| I521 | VT | 2012 | Child | ≥1 dose | CDC237 | 27 | 2 | 3 | 1 | 1 | IS | − |
| I538 | VT | 2012 | Child | UNK | CDC002 | 27 | 2 | 3 | 1 | 1 | − | |
| I539 | VT | 2012 | Child | UTD | CDC002 | 27 | 2 | 3 | 1 | 1 | − | |
| I646 | VT | 2012 | Child | UV | CDC274 | 27 | 2 | 3 | 1 | 1 | WT | + |
| I656 | VT | 2012 | Child | UTD | CDC010 | 27 | 2 | 3 | 1 | 1 | WT | + |
| I707 | VT | 2012 | Child | UTD | CDC253 | 27 | 2 | 3 | 1 | 1 | WT | + |
| C393 | China | 1951 | UNK | NA | CDC052 | UNK | 1 | 1 | 2 | 1 | WT | + |
| E476 | Japan | 1954 | UNK | NA | CDC232 | 38 | 1 | 1 | 2 | 1 | WT | + |
MLVA type is defined by the repeat counts for VNTR1, VNTR3, VNTR4, VNTR5, and VNTR6 (21).
Single-copy locus (21).
Formerly referred to as fim3.
Based on enzyme-linked immunosorbent assay.
Fatal cases.
Signal sequence deletion (nt 26 to 109).
IS481 forward insertion at nt 240.
IS481 reverse insertion at nt 1613.
IS481 forward insertion at nt 1613.
IS481 reverse insertion at nt 240.
Promoter disruption (–74 nt), previously identified as promoter inversion (11, 15).
Abbreviations: MLVA, multilocus variable-number tandem-repeat analysis; CA, California; VT, Vermont; nt, nucleotide; UTD, up to date; UV, unvaccinated; UNK, unknown; WT, wild-type; SC, stop codon; NA, not applicable.
FIG 1 Large-scale genome rearrangements within a subset of California (H561 and H563, blue) and Vermont (I496 and I707, red) epidemic isolates compared to vaccine strains E476 and C393 as visualized by whole-genome restriction mapping and alignment with MapSolver (A) and genome sequence alignment with progressiveMauve (B). Connecting lines indicate either conserved restriction fragments (A) or homologous sequence blocks (B). Sequence maps begin at the replication origin, and the approximate replication terminus is indicated with an arrow.
FIG 2 Maximum likelihood for gene order (MLGO) clustering of epidemic strains from the 2010 California (blue) and 2012 Vermont (red) statewide pertussis epidemics. Publicly available genomes (Tohama I, CS, B1917, and B1920) and two resequenced vaccine strains (C393 and E476) were also included (black). Pertactin production and deficiency of each strain are indicated by pink and black boxes, respectively. Clades referenced within the text are indicated by letters. The inset displays banding patterns of the PFGE profiles of all epidemic and vaccine strains sequenced in this study. The dendrogram of PFGE profiles was created using the unweighted pair group method using average linkages (UPGMA) with 1% band tolerance and optimization settings. Genomes for the epidemic isolates I488 and I517 were excluded from the analysis (see Materials and Methods and Data Set S1 in the supplemental material).
Characteristics of rearrangement boundaries
| Characteristic | Value for strain type: | |
|---|---|---|
| Epidemic | Epidemic and vaccine | |
| No. of genomes | 31 | 33 |
| No. of rearrangement boundaries | 25 | 44 |
| No. of copies/genome | ||
| IS | 20 | 39 |
| rRNA (3) | 3 | 3 |
| IS | 1 | 1 |
| IS | 1 | 1 |
I488 and I517 excluded from analysis.
Unique SNPs in virulence-related genes of B. pertussis epidemic isolates compared to E476 (Tohama I)
| Gene | Isolate(s) | SNP location | Region characteristic | Amino acid change | Protein expressed? ( | Reference(s) |
|---|---|---|---|---|---|---|
| H374 | 655 A→G | String of 2 G’s | 219 Asp→Gly | Unknown | This study | |
| H564 | 1634 G→A | GC-rich region | 545 Ser→Asn | Unknown | This study | |
| H563 | 1070 G→T | GC-rich region | 357 Arg→Leu | Unknown | This study | |
| I646 | 640 C→T | GC-rich region | 214 Pro→Ser | Unknown | This study | |
| H375 | 1677 G→A | String of 2 A’s | NA | Yes | This study; | |
| C393 | 36 C→T | String of 3 T’s | NA | Unknown | This study; | |
| All epidemic isolates, C393 | 2113 A→G | GC-rich region | 705 Lys→Glu | Yes | ||
| All epidemic isolates, C393 | 356 T→C | GC-rich region | 119 Phe→Ser | Yes | ||
| All epidemic isolates | 133 G→A | GC-rich region | 45 Gly→Ser | Yes | ||
| All epidemic isolates | 681 C→T | String of 2 T’s | NA | Yes |
Does not include previously identified SNPs associating with MLST typing loci.
NA, not applicable.
FIG 3 Maximum likelihood phylogenetic reconstruction from the concatenated alignment of 408 variable sites (SNVs and MNVs). Vaccine strains resequenced in this study are indicated in black, California isolates are indicated in blue, and Vermont isolates are indicated in red. Yellow circles denote pertactin-deficient strains, and black circles denote strains recovered from fatal cases. The scale bar indicates number of substitutions per site and has been corrected for ascertainment bias based on the nucleotide composition of invariant sites in E476. All epidemic genomes were included in this analysis.
FIG 4 Sequence variants within epidemic genomes. (A) Distribution of noncoding (green), synonymous (yellow), and nonsynonymous (blue) variants. (B) Functional classification of predicted protein sequences with observed nonsynonymous mutation in at least 26 epidemic genomes categorized according to EggNOG v.4. Variants were statistically significant by Fisher's exact test.
State-specific nonsynonymous variants in California and Vermont epidemic B. pertussis genomes
| Nonsynonymous variant in state | Functional category | Location in E476 | Variant | Amino | No. of |
|---|---|---|---|---|---|
| California | |||||
| Transposase for IS | Replication, recombination, | 116940 | 947 T→A | 316 Leu→Gln | 13 |
| Transposase for IS | Replication, recombination, | 527395 | 805 T→C | 269 Tyr→His | 12 |
| Transposase for IS | Replication, recombination, | 1080959 | 947 T→A | 316 Leu→Gln | 9 |
| Transposase for IS | Replication, recombination, | 1090325 | 947 T→A | 316 Leu→Gln | 10 |
| Transposase for IS | Replication, recombination, | 1447860 | 947 T→A | 316 Leu→Gln | 12 |
| Transposase | Replication, recombination, | 3457234 | 947 A→T | 316 Gln→Leu | 12 |
| Regulator (hypothetical protein) | Signal transduction mechanisms | 3614341 | 11 G→A | 4 Gly→Glu | 12 |
| Vermont | |||||
| Zinc transporter ZupT | Inorganic ion transport | 198405 | 194 T→C | 65 Val→Ala | 20 |
| FAD/FMN-containing dehydrogenase | Energy production | 216761 | 19 T→C | 7 Ser→Pro | 20 |
| RNase E | Translation, ribosomal structure, | 489152 | 2480 C→T | 827 Pro→Leu | 20 |
| Endoribonuclease | Function unknown | 528567 | 186 C→G | 62 Asp→Glu | 20 |
| Diacylglycerol kinase | Lipid transport and metabolism | 868389 | 97 G→A | 33 Asp→Asn | 19 |
| Glutamate-ammonia-ligase | Posttranslational modification, | 1333208 | 2519 G→A | 840 Gly→Asp | 20 |
| Aldo-ketoreductase | Function unknown | 1388597 | 625 G→C | 209 Ala→Pro | 19 |
| Multidrug efflux pump subunit AcrB | Inorganic ion transport | 2208591 | 3221 A→G | 1074 Asn→Ser | 19 |
| Acyltransferase | Function unknown | 2271460 | 24 G→C | 8 Gln→His | 17 |
| Hypothetical protein | Function unknown | 2276856 | 158 T→C | 53 Val→Ala | 20 |
| Transposase | Replication, recombination, | 2405180 | 899 C→G | 300 Thr→Ser | 19 |
| Potassium-transporting | Inorganic ion transport | 2640489 | 245 T→C | 82 Ile→Thr | 19 |
| 5-Hydroxyvalerate dehydrogenase | Energy production | 3011513 | 1016 C→T | 339 Ala→Val | 20 |
| NADPH dehydrogenase | Energy production | 3072916 | 55 A→G | 19 Lys→Glu | 20 |
| ADP-dependent (S)-NAD(P)H- | Carbohydrate transport and | 3538308 | 166 G→A | 56 Gly→Arg | 11 |
| Putative oligopeptide | Function unknown | 1189451-1189452 | 1765_GC_1766 | 589 Gly fs | 19 |
| ATP-cobalamin adenosyltransferase | Coenzyme transport and | 3304784-3304785 | 545_CCGG_546 | 183 Ala fs | 17 |
| Hypothetical protein | Function unknown | 3490857-3490858 | 1130_CTA_1131 | 377 Glu delins Asp* | 20 |
Abbreviations: fs, frameshift; delins, deletion/insertion; FAD, flavin adenine dinucleotide; FMN, flavin mononucleotide; PSP, perchloric acid-soluble protein; *, produces premature stop codon.