| Literature DB >> 28611468 |
Jason W Sahl1,2, Jeticia R Sistrunk1, Nabilah Ibnat Baby3, Yasmin Begum3, Qingwei Luo4, Alaullah Sheikh3,5, Firdausi Qadri3, James M Fleckenstein4,5,6, David A Rasko7.
Abstract
Enterotoxigenic Escherichia coli (ETEC) cause more than 500,000 deaths each year in the developing world and are characterized on a molecular level by the presence of genes that encode the heat-stable (ST) and/or heat-labile (LT) enterotoxins, as well as surface structures, known as colonization factors (CFs). Genome sequencing and comparative genomic analyses of 94 previously uncharacterized ETEC isolates demonstrated remarkable genomic diversity, with 28 distinct sequence types identified in three phylogenomic groups. Interestingly, there is a correlation between the genomic sequence type and virulence factor profiles based on prevalence of the isolate, suggesting that there is an optimal combination of genetic factors required for survival, virulence and transmission in the most successful clones. A large-scale BLAST score ratio (LS-BSR) analysis was further applied to identify ETEC-specific genomic regions when compared to non-ETEC genomes, as well as genes that are more associated with clinical presentations or other genotypic markers. Of the strains examined, 21 of 94 ETEC isolates lacked any previously identified CF. Homology searches with the structural subunits of known CFs identified 6 new putative CF variants. These studies provide a roadmap to exploit genomic analyses by directing investigations of pathogenesis, virulence regulation and vaccine development.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28611468 PMCID: PMC5469772 DOI: 10.1038/s41598-017-03631-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1A core genome single nucleotide polymorphism (SNP) phylogeny of ETEC genomes sequenced in this study as well as reference E. coli genomes. SNPs were identified by NUCmer[59] alignments of query genomes against the reference genome, K-12 W3110; these methods were wrapped by the NASP pipeline[62]. A phylogeny was inferred on the concatenated SNP alignment using RAxML v8[60] including 100 bootstrap replicates. ETEC genomes sequenced in this study were assigned to disease groups based on clinical observations at site of isolation or presented in literature.
Figure 2A core genome single nucleotide polymorphism (SNP) phylogeny of ETEC genomes sequenced in this study as well as reference E. coli genomes associated with a heatmap of BSR values of previously-characterized virulence and colonization factors (Table S3). Disease categories were assigned based on clinical observations. Orange brackets around genomes indicate lineages (Groups 1–3) compared to identify coding regions associated with the observed clinical presentations. The heatmap was associated with the phylogeny using the interactive tree of life (47).
Features associated with clinical presentation or phylogenomic groupings.
| Condition | FDR p value < 0.05 |
|---|---|
| Symptomatic | 11 |
| Asymptomatic | 26 |
| Group 1 symptomatic | 7 |
| Group 1 asymptomatic | 7 |
| Group 2 symptomatic | 0 |
| Group 2 asymptomatic | 0 |
Top 20 genes Identified as ETEC or non-ETEC specific.
| Centroid ID | annotationa | Average BSR (ETEC) | Average BSR (non-ETEC) |
|---|---|---|---|
| centroid_109500 | methyltransferase small domain protein |
| 0.325454545 |
| centroid_185863 | putative plasmid maintenance protein |
| 0.166758893 |
| centroid_491002 | type IV secretion protein Rhs |
| 0.287193676 |
| centroid_286834 | heat-stable enterotoxin |
| 0 |
| centroid_401212 | plasmid segregation protein ParM |
| 0.174703557 |
| centroid_352933 | hypothetical protein pEntH10407_p04 |
| 0.097905138 |
| centroid_2111416 | diguanylate cyclase domain protein |
| 0.086007905 |
| centroid_1195229 | heat-labile enterotoxin subunit A |
| 0 |
| centroid_1149275 | heat-labile enterotoxin subunit A |
| 0 |
| centroid_976050 | heat-labile enterotoxin B chain |
| 0 |
| centroid_584031 | protein StbB |
| 0.163754941 |
| centroid_146011 | serine protease EatA |
| 0.025059289 |
| centroid_957787 | insA N-terminal domain protein |
| 0.091857708 |
| centroid_174198 | CFA/I fimbrial subunit D |
| 0.002648221 |
| centroid_1146026 | plasmid stability family protein |
| 0.027549407 |
| centroid_31447 | plasmid segregation protein ParM |
| 0.030671937 |
| centroid_208934 | putative transposase domain protein |
| 0.072687747 |
| centroid_33993 | POTRA domain, ShlB-type family protein |
| 0 |
| centroid_372954 | heat-stable enterotoxin |
| 0 |
| centroid_740634 | putative transporter protein AatB |
| 0.002173913 |
| centroid_1853128 | bacterial type II/III secretion system short domain protein | 0.003280632 |
|
| centroid_1844289 | LEE encoded regulator | 0.003596838 |
|
| centroid_1836929 | type III secretion apparatus protein, YscR/HrcR family | 0.003952569 |
|
| centroid_1405762 | type III secretion apparatus protein SpaR/YscT/HrcT | 0.003952569 |
|
| centroid_1742164 | type III secretion, HrpO family protein | 0.003952569 |
|
| centroid_1726625 | tir chaperone | 0.003952569 |
|
| centroid_1827912 | type III secretion effector delivery regulator, TyeA family | 0.003952569 |
|
| centroid_1613860 | type III secretion system regulator family protein | 0.003952569 |
|
| centroid_1754850 | type III secretion apparatus needle protein | 0.003952569 |
|
| centroid_1761547 | type III secretion low calcium response chaperone LcrH/SycD | 0.003952569 |
|
| centroid_1737915 | secretion system apparatus protein SsaV | 0.004466403 |
|
| centroid_1228577 | N(2)-citryl-N(6)-acetyl-N(6)-hydroxylysine synthase | 0.005098814 |
|
| centroid_1614162 | aerobactin synthase | 0.00541502 |
|
| centroid_1222308 | N(6)-hydroxylysine O-acetyltransferase | 0.007351779 |
|
| centroid_1399399 | serine/threonine-protein phosphatase | 0.067549407 |
|
| centroid_818055 | calcineurin-like phosphoesterase superfamily domain protein | 0.229644269 |
|
| centroid_676376 | gnsA/GnsB family protein | 0.261304348 |
|
| centroid_1832824 | cold shock protein CspA | 0.350632411 |
|
| centroid_269217 | cold shock-like protein CspG | 0.354743083 |
|
| centroid_408267 | L-fucose-proton symporter domain protein | 0.617747036 |
|
aGenes with annotation of hypothetical or conserved hypothetical have been removed from the table. The complete gene list is present in Supplemental Table S4.
The bold values included in the table highlights which genes have an average LS-BSR suggesting ETEC (top of table) or non-ETEC (bottom of table) prevalence.
Figure 3A comparison of BSR values[69] between ETEC (n = 253) and non-ETEC (n = 253) in phylogroups A and B1. A total of 118 genes that are outliers are identified and shown in black as defined by the MASS package in R. A functional breakdown of these genes is listed in Table S4.
Figure 4Analysis of novel putative colonization factors (CFs) identified in isolates sequenced in the current study. (A) A phylogenetic tree inferred from an alignment of peptide sequences from previously described CF major structural subunits, shown in black, and sequences from new putative CFs, shown in red. Sequences were aligned with MUSCLE[72] and a phylogeny was inferred with RAxML[60] with 100 bootstrap replicates. (B) Structural organization of novel putative CFs. Reference CFs were used to organize novel putative CFs. Numbers indicate the percent BLAST identity of protein sequences. The structure of novel putative CFs were identified from Prokka[71] annotation.
Six Novel Colonization Factor Gene Clusters Identified and Prevalence in Isolates.
| name | accession | positive genomes |
|---|---|---|
| pcf_b01 | WP_001493678 | P0302293_3, P0302308_2 |
| pcf_b02 | WP_001701908 | BCE001_MS16, BCE002_MS12 |
| pcf_b03 | WP_001377911 | 178850, C_34666, P0299917_2, 2864350, P0299917_1, P02997067_6 |
| pcf_b04 | WP_004026086 | 2780750, 174900, MP020980_1, MP020980_2, MP021561_2 |
| pcf_b05 | EMW44189 | 2785200, 2788150 |
| pcf_b06 | WP_001741098 | 180050 |