| Literature DB >> 33941580 |
J R Clark1, A M Maresso1.
Abstract
Comparative genomics of bacterial pathogens has been useful for revealing potential virulence factors. Escherichia coli is a significant cause of human morbidity and mortality worldwide but can also exist as a commensal in the human gastrointestinal tract. With many sequenced genomes, it has served as a model organism for comparative genomic studies to understand the link between genetic content and potential for virulence. To date, however, no comprehensive analysis of its complete "virulome" has been performed for the purpose of identifying universal or pathotype-specific targets for vaccine development. Here, we describe the construction of a pathotype database of 107 well-characterized completely sequenced pathogenic and nonpathogenic E. coli strains, which we annotated for major virulence factors (VFs). The data are cross referenced for patterns against pathotype, phylogroup, and sequence type, and the results were verified against all 1,348 complete E. coli chromosomes in the NCBI RefSeq database. Our results demonstrate that phylogroup drives many of the "pathotype-associated" VFs, and ExPEC-associated VFs are found predominantly within the B2/D/F/G phylogenetic clade, suggesting that these phylogroups are better adapted to infect human hosts. Finally, we used this information to propose polyvalent vaccine targets with specificity toward extraintestinal strains, targeting key invasive strategies, including immune evasion (group 2 capsule), iron acquisition (FyuA, IutA, and Sit), adherence (SinH, Afa, Pap, Sfa, and Iha), and toxins (Usp, Sat, Vat, Cdt, Cnf1, and HlyA). While many of these targets have been proposed before, this work is the first to examine their pathotype and phylogroup distribution and how they may be targeted together to prevent disease.Entities:
Keywords: Escherichia coli; ExPEC; InPEC; comparative genomics; enteric pathogens; genomics; pathogenesis; pathogenomics; vaccine development; vaccines; virulence factors
Mesh:
Substances:
Year: 2021 PMID: 33941580 PMCID: PMC8281228 DOI: 10.1128/IAI.00115-21
Source DB: PubMed Journal: Infect Immun ISSN: 0019-9567 Impact factor: 3.441
E. coli pathotypes, acronyms, disease presentations, and associated VFs
| Pathotype | Acronym | Disease presentation(s) | Associated VFs (references) |
|---|---|---|---|
| ExPEC | Disease outside the intestines | See below | |
| UPEC | Urinary tract infections ( | ||
| NMEC | Bacterial meningitis | ||
| APEC | Multiple ExPEC diseases in avian species ( | ||
| InPEC | Disease of the intestines | See below | |
| AIEC | Associated with intestinal inflammation ( | ||
| EHEC | Bloody diarrhea, hemorrhagic colitis, hemolytic-uremic syndrome ( | LEE pathogenicity island; | |
| EAEC | Acute and chronic watery diarrhea ( | ||
| EAHEC | Similar to EHEC, with increased adherence and antibiotic resistance ( | ||
| ETEC | Mild to severe watery diarrhea ( | ||
| EPEC | Severe acute watery diarrhea ( | LEE pathogenicity island, |
FIG 1Pathotype distribution of E. coli capsule. (A and B) Heatmap showing nonpathogenic E. coli and ExPECs (A) and InPECs (B). Columns are organized first by pathotype, then by phylogroup, and finally by sequence type (sequence type not shown). The wzi gene used as a reference is specific for G1C. G2C and G3C both use the kpsFEDUCS and kpsTM operons for export. The biosynthetic operons for the most widely distributed and studied K-types, K1 (neu) and K5 (kfi), are also shown. G4C is synthesized by yccZ, ept, etk, and ymcABCD genes. The percent identity was determined using megaBLAST with reference genes found in Data Set S1 in the supplemental material.
FIG 2Phylogroup distribution of E. coli capsule. megaBLAST was used to bin strains of each phylogroup based on hit versus no hit. (A) Distribution of wzi from strain HS, which is specific for G1C. (B) Distribution of kpsMTII. (C) Distribution of kpsMIII (using AAC38078.1 as a reference). (D) Distribution of ymcDCBA, yccZ, ept, and etk, which are specific for G4C. Distributions were determined using megaBLAST to bin strains from each phylogroup into hit versus no hit.
FIG 3Alignment of KpsT shows strong association between kpsT allele and K-antigen. (A) Geneious alignment of the amino acid sequence of KpsT. Sequences were sorted by differences to compared to K1 strain KpsT. An identity histogram is shown at the top, and colors represent amino acid differences from the majority consensus. (B) Unrooted phylogenetic tree built from using Geneious TreeMaker and the alignment shown in panel A, with bootstrap support with 1,000 replicates. Branch labels indicate percent consensus support. Branches were transformed proportionally to allow for clearer comparison.
FIG 4Pathotype distribution of adherence fimbriae. (A) megaBLAST percent identity results for nonpathogenic and ExPEC strains. (B) megaBLAST percent identity results for InPEC strains. The percent identity was determined using megaBLAST with the reference genes in Data Set S1 in the supplemental material.
FIG 5Phylogroup distribution of select fimbriae. (A) Distribution of CFA/I fimbriae, which has until now been associated with ETEC strains. (B) Distribution of the ELF fimbrial genes. (C and D) Distributions of UPEC-associated P and S fimbriae, respectively. Note that only papCDEFH genes were used to differentiate strains that contained a full array of pap genes and those that contain a disrupted pap operon. Distributions were determined using megaBLAST to bin strains from each phylogroup into hit versus no hit.
FIG 6Pathotype distribution of iron acquisition genes. (A) Nonpathogenic and ExPEC strains. (B) InPEC strains. The chu operon is responsible for heme uptake. Yersiniabactin, aerobactin, and salmochelin are virulence-associated iron-binding siderophores, whereas enterobactin is a ubiquitous iron-binding siderophore. The percent identity was determined using megaBLAST with the reference genes in Data Set S1 in the supplemental material.
FIG 7Distribution of plasmid-encoded iron acquisition genes. (A) Nonpathogenic and ExPEC strains. (B) InPEC strains. Plasmids for each strain in the pathotype database were obtained and analyzed using megaBLAST against the iron acquisition data set. The percent identity was determined using megaBLAST with the reference genes in Data Set S1 in the supplemental material.
FIG 8Phylogroup distribution of iron acquisition genes. (A) Distribution of the heme uptake genes: chuASTUWXY. This is an important control for our phylogroup database and shows that the B2/D/F/G clade and E phylogroup are correctly distinguished from the A/B1/C clade. (B) Distribution of strains carrying any of the yersiniabactin genes: fyuA, irp1, irp2, and ybtAEPQSTUX. (C) Distribution of strains carrying any of the aerobactin genes: iucABCD or iutA. (D) Distribution of strains carrying the iron-regulated gene ireA. (E) Distribution of strains carrying any of the iron/manganese transporter genes: sitABCD. (F) Distribution of strains carrying any of the salmochelin genes: iroBCDEN. Distributions were determined using megaBLAST to bin strains from each phylogroup into hit versus no hit.
FIG 9Pathotype distribution of toxins. (A and B) Percent identity results from megaBLAST alignments of nonpathogenic and ExPEC strains (A) and InPEC strains (B). The percent identity was determined using megaBLAST with the reference genes in Data Set S1 in the supplemental material.
FIG 10Phylogroup distribution of toxins. (A) Distribution of cytolethal distending toxin (cdt). (B) Distribution of cytotoxic necrotizing factor 1 (cnf1). (C) Distribution of chromosomal α-hemolysin (hlyA). (D) Distribution of chromosomal enterotoxin 1 (setA1). (E) Distribution of uropathogenic-specific protein (usp). (F) Distribution of phage-encoded Shiga toxin (stx1 or stx2). Distributions were determined using megaBLAST to bin strains from each phylogroup into hit versus no hit.
FIG 11Pathotype distribution of autotransporters. (A and B) Percent identity results from megaBLAST alignments for nonpathogenic and ExPEC strains (A) and InPEC strains (B). SPATEs, serine protease autotransporters of Enterobacteriaceae. The percent identity was determined using megaBLAST with the reference genes in Data Set S1 in the supplemental material.
FIG 12Phylogroup distribution of autotransporters. (A) Distribution of the gene encoding vacuolating autotransporter toxin (vat). (B) Distribution of the gene encoding secreted autotransporter toxin (sat). (C) Distribution of the gene encoding accessory colonization protein (sslE). (D) Distribution of the gene encoding invasion-like protein (sinH). Distributions were determined using megaBLAST to bin strains from each phylogroup into hit versus no hit.
FIG 13Pathotype distribution of other chromosomal virulence factors. (A and B) Percent identity results from megaBLAST alignments for nonpathogenic and ExPEC strains (A) and InPEC strains (B). The percent identity was determined using megaBLAST with the reference genes found in Data Set S1 in the supplemental material.
FIG 14Phylogroup distribution of other virulence factors. (A) Distribution of the gene encoding invasion of brain endothelium A (ibeA). (B) Distribution of the gene encoding iron-regulated homologous adhesin (iha). (C) Distribution of the gene encoding the Hek adhesin (hek). (D) Distribution of the genes encoding increased serum survival protein or bor (iss/bor).
FIG 15Genetic difference in pathogenicity. (A) Schematic for the loss of pathogenicity in ST73 and ST131 using examples from strains examined in the present study. (B) Gain and loss of virulence factors at various points in E. coli phylogroups. Labels were assigned under the presumption that gaining genetic factors are rare, while loss is common, i.e., the map was built to reflect the lowest gain of genes possible based on the results presented here. ExPEC, extraintestinal pathogenic E. coli; UPEC, uropathogenic E. coli; NMEC, neonatal meningitis-causing E. coli; APEC, avian pathogenic E. coli; InPEC, intestinal pathogenic E. coli; AIEC, adherent-invasive E. coli; EHEC, enterohemorrhagic E. coli; EAEC, enteroaggregative E. coli; ETEC, enterotoxigenic E. coli; EPEC, enteropathogenic E. coli. Light blue, commensals (or those likely to be commensal); dark blue are InPEC strains; red, ExPEC strains; gray, phylogroups of uncertain intrinsic virulence. This illustration was made using BioRender.
FIG 16Potential polyvalent vaccine targets. Phylogroup distributions of potential targets were determined by using megaBLAST to cross reference all 1,348 complete E. coli chromosomes in the RefSeq database that could be categorized by phylogroup against potential targets. (A) Schematic overview of targets and their virulence mechanisms. Made using Biorender. (B) Targets with high disease impact and wide distributions through ExPEC-associated phylogroups; (C) targets with limited distribution, but still high disease impact. Terms: uropathogenic protein, usp; Salmonella-like invasion H, sinH; secreted autotransporter toxin, sat; vacuolating autotransporter toxin, vat; G2C-specific Kapsel polysaccharide genes M and T, kpsTMII; ferric yersiniabactin uptake receptor, fyuA; aerobactin uptake receptor, iutA; iron/manganese transporter, sitA; cytolethal distending toxin, cdt; cytotoxic necrotizing factor-1, cnf1; α-hemolysin, hlyA; afimbrial adhesin, afa; F1C fimbriae, foc; P fimbriae, pap; S fimbriae, sfa; IrgA homologue adhesin, iha.
Strains, accession numbers, phylogroups, STs, antigens, and references
| Pathotype | Strain | Accession no. | Isolation source | Phylogroup | ST | O antigen | H antigen | K antigen | FimH type | Reference(s) |
|---|---|---|---|---|---|---|---|---|---|---|
| Nonpathogenic | HS | CP000802 | Feces, healthy subject | A | ST46 | O9 | 35 | |||
| MG1655 | CP014225 | Feces, healthy subject | A | ST10 | O16 | H48 | 27 | |||
| REL606 | CP000819 | Lab strain | A | ST93 | O7 | 427/608 | ||||
| BG1 | MOAH00000000 | Bovine intestines | B1 | ST58 | O159 | H21 | 53 | |||
| SE11 | AP009240 | Feces, healthy subject | B1 | ST156 | O152 | H28 | 38 | |||
| IAI1 | CU928160 | Feces, healthy subject | B1 | ST1128 | O8 | H4 | 32 | |||
| Nissle 1917 | CP007799 | Feces, healthy subject | B2 | ST73 | O6 | H1 | K5 | 30 | ||
| SE15 | AP009378 | Feces, health subject | B2 | ST131 | O150 | H5 | K+ | 41 | ||
| ED1a | CU928162 | Feces, healthy subject | B2 | ST452 | O81 | H27 | K+ | 225/580 | ||
| ExPEC | SF-088 | CP012635 | Bloodstream infection | B2 | ST95 | O1 | H7 | K1 | 30 | |
| SF-166 | CP012633 | Bloodstream infection | B2 | ST95 | O1 | H7 | K1 | 41 | ||
| SF-173 | CP012631 | Bloodstream infection | B2 | ST95 | O18 | H7 | K1 | 18 | ||
| SF-468 | CP012625 | Bloodstream infection | B2 | ST95 | O25 | H4 | K1 | 27 | ||
| CD306 | CP013831 | Feces, cat | B2 | ST131 | O25 | H4 | K5 | 30 | ||
| Ecol_448 | CP015076 | Infected urine | B2 | ST131 | O16 | H5 | K100? | 41 | ||
| Ecol_743 | CP015069 | Infected urine | B2 | ST131 | O16 | H5 | K+ | 1426 | ||
| Ecol_745 | CP015074 | Infected urine | B2 | ST131 | OR | H5 | K100? | 41 | ||
| G749 | CP014488 | B2 | ST131 | O25 | H4 | K1 | 22 | |||
| JJ1887 | CP014316 | Recurrent cystitis | B2 | ST131 | O25 | H4 | K14 | 30 | ||
| JJ2434 | CP013835 | B2 | ST131 | O25 | H4 | K5 | 30 | |||
| MNCRE44 | CP010876 | Sputum | B2 | ST131 | O25 | H4 | K+ | 30 | ||
| MVAST0167 | CP014492 | B2 | ST131 | O16 | H5 | K100? | 41 | |||
| SaT040 | CP014495 | B2 | ST131 | O25 | H4 | K5 | 22 | |||
| ZH063 | CP014522 | B2 | ST131 | O25 | H4 | K5 | 22 | |||
| ZH193 | CP014497 | B2 | ST131 | O25 | H4 | K5 | 30 | |||
| UMN026 | CU928163 | Urine, acute cystitis | D | ST597 | O17 | H18 | K52 | 27 | ||
| IAI39 | CU928164 | Infected urine | F | ST62 | O7 | H45 | K1 | 44 | ||
| UPEC | VR50 | CP011134 | Urine, ABU | A | ST10 | OR | H- | K1 | 27 | |
| CI5 | CP011018 | Pyelonephritis | B1 | ST5082 | O- | H14 | 1155 | |||
| ABU 83972 | CP001671 | Urine, ABU | B2 | ST73 | O25 | H1 | K5 | 12 | ||
| CFT073 | AE014075 | Blood, Pyelonephritis | B2 | ST73 | O6 | H1 | K2 | 10 | ||
| UTI89 | CP000243 | Urine, cystitis | B2 | ST95 | O18 | H7 | K1 | 18 | ||
| 536 | CP000247 | Pyelonephritis | B2 | ST127 | O6 | H31 | K15 | 1467 | ||
| K-15KW01 | CP016358 | Fecal, ABU | B2 | ST127 | O6 | H100 | K+ | 310 | ||
| EC958 | HG941718 | Infected urine | B2 | ST131 | O25b | H4 | K100 | 30 | ||
| NA114 | CP002797 | urine, UTI | B2 | ST131 | O25 | H4 | K+ | 30 | ||
| MS6198 | CP015834 | Urine | F | ST648 | O1 | H6 | K+ | 27 | ||
| ST648 | CP008697 | Pleural effusion | F | ST648 | O51 | H4 | K+ | 18 | ||
| NMEC | IHE3034 | CP001969 | B2 | ST95 | O18 | H7 | K1 | 18 | ||
| RS218 | CP007149 | Cerebrospinal fluid | B2 | ST95 | O17 | H7 | K1 | 18 | ||
| S88 | CU928161 | Cerebrospinal fluid | B2 | ST95 | O45 | H7 | K1 | 54 | ||
| NMEC O18 | CP007275 | Cerebrospinal fluid | B2 | ST416 | O18 | H7 | K1 | 244 | ||
| MCJCHV-1 | CP030111 | Cerebrospinal fluid | B2 | ST1193 | O75 | H5 | K1 | 64 | ||
| CE10 | CP003034 | Cerebrospinal fluid | F | ST62 | O7 | K1 | 44 | |||
| APEC | ACN001 | CP007442 | Liver, chicken | C | ST23 | O78 | H9 | 35 | ||
| ACN002 | CP007491 | C | ST23 | O79 | H9 | 35 | ||||
| APEC O78 | CP004009 | Lung, turkey | C | ST23 | O79 | 35 | ||||
| 789 | CP010315 | Avian colisepticemia | C | ST88 | O78 | H19 | 27 | |||
| APEC O1 | CP000468 | Colibacillosis, turkey | B2 | ST95 | O1 | H7 | K1 | 15 | ||
| O18 | CP006830 | Pericardium/lung, chicken | B2 | ST95 | O18 | H7 | K1 | 15 | ||
| O2-211 | CP006834 | Air sack, chicken | G | ST117 | O2 | H4 | 97 | |||
| IMT5155 | CP005930 | Colisepticemia, chicken | B2 | ST140 | O2 | H5 | K1 | 15 | ||
| AIEC | LF82 | CU651637 | Ileum, Crohn’s disease | B2 | ST135 | O83 | H1 | K+ | 436 | |
| NRG 857C | CP001855 | Ileum, Crohn’s disease | B2 | ST135 | O83 | H1 | K+ | 2 | ||
| UM146 | CP002167 | Ileum, Crohn’s disease | B2 | ST643 | O18 | H7 | K1 | 18 | ||
| NC101 | AEFA01000000 | Feces, mouse | B2 | ST998 | O2 | H6 | K1 | 1477 | ||
| EHEC/STEC | 11128 | AP010960 | Bloody diarrhea | B1 | ST16 | O111 | H- | 86 | ||
| 12009 | AP010958 | Bloody diarrhea | B1 | ST17 | O103 | H2 | 25 | |||
| 11368 | AP010953 | Bloody diarrhea | B1 | ST21 | O26 | H11 | 440 | |||
| CFSAN027343 | CP037943 | Stool | B1 | ST21 | O26 | H11 | 440 | |||
| E2865 | AP018808 | Cattle | B1 | ST21 | O26 | H11 | 440 | |||
| FORC_028 | CP012693 | Stool | B1 | ST21 | O26 | H11 | 440 | |||
| RM8426 | CP028116 | Creek | B1 | ST21 | O26 | H11 | 440 | |||
| 2011C-3911 | CP015240 | Stool | B1 | ST1727 | O79 | H7 | 31 | |||
| RM9387 | CP009104 | Feces, cattle | B1 | ST2773 | O104 | H7 | 32 | |||
| 150 | CP028592 | Cattle | E | ST11 | O157 | H7 | 82 | |||
| 180-PT54 | CP015832 | Outbreak isolate | E | ST11 | O157 | H7 | 82 | |||
| 1130 | CP017434 | Cattle hide | E | ST11 | O157 | 82 | ||||
| 28RC1 | CP015020 | Bovine carcass | E | ST11 | O157 | H7 | 82 | |||
| ATCC 43889 | CP015854 | Feces, HUS | E | ST11 | O157 | H7 | 82 | |||
| EC4115 | CP001164 | Spinach outbreak | E | ST11 | O157 | H7 | 82 | |||
| EDL933 | CP008957 | Ground beef | E | ST11 | O157 | H7 | 36 | |||
| FRIK944 | CP016625 | Calf feces | E | ST11 | O157 | H7 | 82 | |||
| FRIK2069 | CP015846 | Feces | E | ST11 | O157 | H7 | 82 | |||
| FRIK2455 | CP015844 | Feces, steer | E | ST11 | O157 | H7 | 82 | |||
| JEONG-1266 | CP014314 | Feces, steer | E | ST11 | O157 | H7 | 82 | |||
| Sakai | BA000007 | Stool | E | ST11 | O157 | H7 | 36 | |||
| SRCC 1675 | CP015023 | Apple cider | E | ST11 | O157 | H7 | 36 | |||
| SS52 | CP010304 | Feces, cattle | E | ST11 | O157 | H7 | 82 | |||
| TW14359 | CP001368 | Spinach outbreak | E | ST11 | O157 | H7 | 82 | |||
| Xuzhou21 | CP001925 | Feces | E | ST11 | O157 | H7 | 36 | |||
| RM13514 | CP006027 | Lettuce outbreak | E | ST32 | O145 | H28 | 331 | |||
| 2013C-4465 | CP015241 | Stool | E | ST335 | O55 | H7 | 82 | |||
| RM13516 | CP006262 | Ice cream outbreak | E | ST6130 | O145 | H28 | 54 | |||
| EAHEC | 2009EL-2050 | CP003297 | Bloody diarrhea | B1 | ST678 | O104 | H4 | |||
| 2009EL-2071 | CP003301 | Bloody diarrhea | B1 | ST678 | O104 | H4 | ||||
| 2011C-3493 | CP003289 | Stool, HUS | B1 | ST678 | O104 | H4 | ||||
| CC227-11 | CP011331 | Bloody diarrhea | B1 | ST678 | O104 | H4 | ||||
| HUSEC2011 | HF572917 | Stool, HUS | B1 | ST678 | O104 | H4 | ||||
| EAEC | 55989 | CU928145 | Watery diarrhea | B1 | ST678 | O104 | H4 | |||
| 042 | FN554766 | Diarrhea | D | ST414 | O44 | H18 | K+ | |||
| ETEC | 90-9272 | CP024239 | Diarrhea | A | ST48 | O15 | H11 | 41 | ||
| H10407 | FN649414 | Diarrhea | A | ST48 | O78 | H11 | K80? | 41 | ||
| UMNK88 | CP002729 | Porcine neonatal diarrhea | A | ST100 | O149 | H10 | ||||
| 214-4 | CP025840 | Stool | A | ST398 | Nontypable | 25 | ||||
| 90-9269 | CP024661 | Diarrhea | B1 | ST761* | OUND | H4 | ||||
| ATCC 43886 | CP024256 | Feces | B1 | ST1312 | O25 | H16 | 198 | |||
| 90-9281 | CP024243 | Diarrhea | B1 | ST58 | O128 | H27 | ||||
| 103605 | CP025920 | Stool | B1 | ST443 | O115 | H5 | 24 | |||
| E24377A | CP000800 | Diarrhea | B1 | ST1132 | O139 | H28 | 54 | |||
| 90-9276 | CP024299 | Diarrhea | B1 | ST5305 | O114 | H49 | 32 | |||
| 90-9280 | CP024240 | Diarrhea | B1 | ST5305 | O114 | H49 | 32 | |||
| D181 | CP024252 | Diarrhea | B1 | ST4493 | O182 | H21 | 24 | |||
| 2014EL-1345-2 | CP024223 | E | ST182 | O169 | H41 | 30 | ||||
| EPEC | E2348/69 | FM180568 | Diarrhea | B2 | ST15 | O127 | H6 | 57 | ||
| CB9615 | CP001846 | Diarrhea, infant | E | ST335 | O55 | H7 | 82 | |||
| RM12579 | CP003109 | Urine | E | ST335 | O55 | H7 | 82 | |||
K+ indicates a strain that contained kps genes without additional information. If the kps genes were identical to those for strains that had experimentally determined K-types or if the K-type was described with some uncertainty, the K-type is followed by a “?” (for example “K100?”).
FimH types were determined using CGE’s FimTyper (226).