Literature DB >> 31778182

Structure and genetics of Escherichia coli O antigens.

Bin Liu1,2,3, Axel Furevi4, Andrei V Perepelov5, Xi Guo1,2,3, Hengchun Cao1,2,3, Quan Wang1,2,3, Peter R Reeves6, Yuriy A Knirel5, Lei Wang1,2,3, Göran Widmalm4.   

Abstract

Escherichia coli includes clonal groups of both commensal and pathogenic strains, with some of the latter causing serious infectious diseases. O antigen variation is current standard in defining strains for taxonomy and epidemiology, providing the basis for many serotyping schemes for Gram-negative bacteria. This review covers the diversity in E. coli O antigen structures and gene clusters, and the genetic basis for the structural diversity. Of the 187 formally defined O antigens, six (O31, O47, O67, O72, O94 and O122) have since been removed and three (O34, O89 and O144) strains do not produce any O antigen. Therefore, structures are presented for 176 of the 181 E. coli O antigens, some of which include subgroups. Most (93%) of these O antigens are synthesized via the Wzx/Wzy pathway, 11 via the ABC transporter pathway, with O20, O57 and O60 still uncharacterized due to failure to find their O antigen gene clusters. Biosynthetic pathways are given for 38 of the 49 sugars found in E. coli O antigens, and several pairs or groups of the E. coli antigens that have related structures show close relationships of the O antigen gene clusters within clades, thereby highlighting the genetic basis of the evolution of diversity.
© The Author(s) 2019. Published by Oxford University Press on behalf of FEMS.

Entities:  

Keywords:  zzm321990 Escherichia colizzm321990 ; O antigen; diversity; gene cluster; serogroup; structure

Year:  2020        PMID: 31778182      PMCID: PMC7685785          DOI: 10.1093/femsre/fuz028

Source DB:  PubMed          Journal:  FEMS Microbiol Rev        ISSN: 0168-6445            Impact factor:   16.408


INTRODUCTION

Escherichia coli is the predominant facultative anaerobe of the human colonic flora (Kaper, Nataro and Mobley 2004). It typically colonizes the gastrointestinal tract of human infants within a few hours after birth and co-exists with its human host in good health with mutual benefit for decades (Bäckhed et al. 2015; Dicks et al. 2018). However, E. coli includes both commensal and pathogenic clones. Although the commensal E. coli rarely cause disease, except in immunocompromized hosts or where the normal gastrointestinal barriers are breached, some E. coli clones are adapted to pathogenic niches and cause a broad spectrum of diseases by acquiring specific virulence attributes (Kaper, Nataro and Mobley 2004). Eight E. coli pathotypes causing disease in humans have been described, which include six extensively studied intestinal pathotypes and two pathotypes causing extraintestinal infections commonly named extraintestinal pathogenic E. coli (ExPEC) (Russo and Johnson 2000). The six intestinal pathotypes include enteropathogenic E. coli (EPEC), enterohaemorrhagic E. coli (EHEC), enterotoxigenic E. coli (ETEC), enteroaggregative E. coli (EAEC), enteroinvasive E. coli (EIEC) and diffusely adherent E. coli (DAEC) (Nataro and Kaper 1998); ExPEC include uropathogenic E. coli (UPEC) and meningitis-associated E. coli (NMEC). Pathogenic E. coli are responsible for numerous large outbreaks of infant diarrhea, bloody diarrhea, cystitis, pyelonephritis, meningitis and so on and E. coli are increasingly identified as the pathogens causing infectious diseases, such as the novel Shiga toxin-producing E. coli strain O104:H4, which resulted in a widespread and severe foodborne disease outbreak in Germany in 2011 (Bielaszewska, Mellmann, Zhang et al. 2011; Muniesa et al. 2012). Lipopolysaccharide (LPS) is found exclusively in the outer leaflet of the Gram-negative outer membrane and performs various biological functions. LPS typically consists of three structural regions: lipid A, a hydrophobic glycolipid which anchors LPS in the bacterial membrane; core oligosaccharide, a non-repeating oligosaccharide which commonly contains sugars such as heptose and keto-deoxyoctulosonate (Kdo); and the O antigen, a polysaccharide composed of multiple oligosaccharide repeating units (O units) each with two to seven residues from a broad range of common or rare sugars and their derivatives (Valvano 2003; Merino, Gonzalez and Tomás 2016). The O antigen is one of the most variable cell constituents, due to variation in the sugars present in the O unit and the linkages within and between O units. The diversity allows the various clones in the species to each present a surface that offers a selective advantage in its specific niche, which probably accounts for the maintenance of O antigen diversity (Reeves 1992; Wang, Wang and Reeves 2010). For instance, a serovar of Salmonella enterica may better escape intestinal amoebae predators in a particular intestinal environment by virtue of the specific O-antigen it possesses (Wildschutte and Lawrence 2007). The O antigen is exposed on the cell surface and is usually highly immunogenic, as well as being subject to intense selection by the host immune system and bacteriophages (Reeves and Wang 2002). The O antigen is also an important virulence factor (Pluschke et al. 1983; Achtman and Pluschke 1986), and loss of O antigen makes many pathogens serum sensitive or otherwise seriously impaired in virulence, which has been shown for many species, (Pluschke et al. 1983; Bengoechea, Najdenski and Skurnik 2004; West et al. 2005; Plainvert et al. 2007; Raynaud et al. 2007; March et al. 2013; Sarkar et al. 2014; Caboni et al. 2015). For example, O antigens could promote immune suppression by enabling UPEC to dampen the induction of cytokines and chemokines in epithelial cells and contribute to the protection against phagocytosis and killing by neutrophils and monocytes (Lüthje and Brauner 2014; Vila et al. 2016). Plainvert et al. showed that O antigen plays a key role in the virulence of NMEC (Plainvert et al. 2007). A recent study also showed that the O antigen could influence the inhibitory ability of LPS on enzymatic and bactericidal activity of lysozyme, a key element of innate immunity characterized by antibacterial activity against ExPEC (Bao et al. 2018). The variability in O antigen structure provides the major basis for the serotyping schemes of many Gram-negative bacteria and is the most widely used method for identifying strains for epidemiological purposes, making O serotyping one of the most important components in typing organisms for taxonomy and epidemiology, and a basic tool used in outbreak investigations and surveillance. In the 1940s, Fritz Kauffmann established the serotyping scheme for E. coli which now includes O antigens numbered 1 to 187 (Kauffman 1947; Scheutz et al. 2004), but note that O31, O47, O67, O72, O94 and O122 are no longer recognized, some being duplicate names for an O antigen and others were in organisms that were reclassified into other genera. This gives 181 O antigens in E. coli at present. Also, a few O antigens contain subgroups, such as O9 and O9a in what was the O9 O antigen, O18ab and O18ac in the O18 O antigen, O28ab and O28ac in the O28 O antigen and O112ab and O112ac in the O112 O antigen. As each subgroup possesses a unique O antigen and gene cluster, they, in fact, represent independent serogroups. Thus, we discuss 185 E. coli O serogroups here. The genes for O antigen synthesis are usually present as a gene cluster at a specific locus. In the majority of E. coli and Shigella, the O antigen gene cluster is located between two housekeeping genes, galF and gnd. There are three major classes of genes commonly found in O antigen gene clusters: nucleotide sugar precursor synthesis genes for sugars that are specific to the particular polysaccharide, glycosyltransferase genes that are specific for the donor and acceptor sugars and generate a specific linkage between them, and O unit processing genes for O unit translocation and polymerization. It should be mentioned that a few synthesis genes for sugars also found in other structures or used in metabolism, are usually at other loci, such as glmU which is responsible for the synthesis of UDP-GlcNAc from glucosamine-1-phosphate (Guan et al. 2009). There are three pathways known for assembly and synthesis of an O antigen, the Wzx/Wzy, ABC transporter and synthase pathways each involving a specific gene or set of genes for key steps in processing the O units (Bronner, Clarke and Whitfield 1994; Keenleyside and Whitefield 1996; Daniels, Vindurampulle and Morona 1998; Linton and Higgins 1998; Samuel and Reeves 2003). All three O antigen synthesis pathways are initiated by the transfer of a sugar phosphate from an NDP-sugar to undecaprenyl phosphate (Und-P) (Reeves and Wang 2002) and in most species the gene cluster includes an initial transferase (IT) gene for this step; cf. polyprenol phosphate glycosyltransferases (GTs) (Valvano 2003; Lehrer et al. 2007; Eichler and Imperiali 2018). In most E. coli and Shigella, and most other Enterobacteriaceae, WecA, encoded by the wecA gene in the enterobacterial common antigen (ECA) gene cluster, is the IT, transferring a GlcNAc-P residue from UDP-GlcNAc to Und-P (Alexander and Valvano 1994; Rick, Hubbard and Barr 1994; Yao and Valvano 1994). WecA belongs to the Polyprenol phosphate: N-acetylhexosamine-1-phosphate transferases family (Hakulinen et al. 2017) and a predicted topological model for WecA (Hakulinen et al. 2017) suggests that it contains 11 transmembrane segments, five cytosolic loops and five periplasmic loops (Lehrer et al. 2007). The reaction mechanism of "WecA" from Thermotoga maritima, was characterized and a conserved aspartate residue (D72) ensures the deprotonation of the hydroxyl group of the Und-P substrate, which is required to allow the nucleophilic attack of the phosphate oxyanion of Und-P on the β-phosphate of UDP-GlcNAc, leading to the formation of the Und-PP-GlcNAc and the release of UMP (Al-Dabbagh et al. 2016). The Gnu epimerase, with its gene located upstream of the O antigen cluster, converts Und-PP-GlcNAc to Und-PP-GalNAc when GalNAc is the initial sugar (Rush et al. 2010; Cunneen et al. 2013). Most E. coli O antigens are synthesized by the Wzx/Wzy pathway, in which sugars are sequentially transferred from the respective sugar nucleotides, initially Und-PP-GlcNAc or Und-PP-GalNAc, by GTs and subsequently to further additions to form the O units. In the ABC transporter pathway, the O antigen is built direcly on an Und-PP-GlcNAc residue. An adaptor is first generated on Und-PP-GlcNAc by one or more GTs, followed by the sugars for an O unit being added sequentially to the adaptor by other GTs, and this process for O-unit addition is repeated many times to give the complete polymer. In the Wzx/Wzy pathway, the Und-PP-linked O units are translocated by the flippase protein, Wzx, across the inner membrane, to the periplasmic face of the inner membrane, where the O unit is polymerized by the polymerase protein, Wzy, to generate the polymer, with the number of O units in the final O antigen being regulated by the chain length determinant, Wzz (Mulford and Osborn 1983; McGrath and Osborn 1991; Reeves and Wang 2002). Wzx belongs to the polysaccharide transport (PST) family, with 12 or 14 transmembrane segments (TMS) (Paulsen, Beness and Saier 1997). To date, there are no Wzx tertiary structures elucidated; however, studies of Wzx from E. coli O157 and P. aeruginosa PAO1 identified charged residues essential of translocation, some of which are located in the TMS (Marolda et al. 2010; Islam and Lam 2013). Crystal analysis of the MOP (multidrug/oligosaccharidyl-lipid/polysaccharide) flippase MurJ revealed several regions and sites important for the function and suggested that MurJ-mediated flipping occurs via an alternating access mechanism, which is informative for understanding the mechanism of Wzx-mediated flipping (Kuk, Mashalidis and Lee 2017). The sequence variability is believed to be required to accommodate the wide range of available O-unit structures. Early studies by Marolda et al. pointed out experimentally that Wzx is specific only for the first sugar of an O-unit substrate (Marolda, Vicarioli and Valvano 2004). But this finding was later shown to be due to over-expression of Wzx, and Wzx does have a preference for its O-unit rather than the first sugar (Hong Y and Reeves 2014). Further work is needed to determine whether the Wzx substrate preference is always strict, as there are several instances of structurally diverse O antigens having the same Wzx, for example in P. aeruginosa O2/O5/O16/O18/O20 serogroups (Lam et al. 2011). The Wzy polymerase also exhibits low sequence conservation both between and within bacterial species, and the number of TMS predicted in Wzy proteins ranges from 10 to 14. There are no published X-ray crystal structures of Wzy to enhance our understanding of how this protein functions. The number of O units in each LPS molecule is variable with a very interesting “modal” distribution, in which most O antigen polymers have rather similar lengths. This distribution is determined by the Wzz protein. A detailed assessment of chain lengths in E. coli O111 showed that most of the molecules had from 9 to 16 O units with few having more or less than that (Bastin et al. 1993). Collins et al. (2017) recently proposed a model for a Wzx/Wzy complex based on Cryoelectron Microscopy. They also proposed a mechanism for chain length determination based on modeling using the new structure, in which the growing O antigen chain is bound to the surface of Wzz. In their words, polysaccharide chain extension by Wzy would continue until either the polysaccharide-binding capacity of Wzz was reached, or until the Wzy escaped from the Wzz association. The latter ‘escape event’ would become much less likely (with time) as the size of the polysaccharide chain increases and association of the polysaccharide chain with Wzz becomes stabilized. They noted that this mechanism would incorporate aspects of both molecular ruler and molecular stopwatch mechanisms proposed by Bastin et al. in 1993 as alternative mechansms for chain-length determination (Collins et al. 2017). In the ABC transporter pathway, translocation of the Und-PP-O antigen is carried out by an ABC transporter. A typical O antigen ABC transporter is composed of two transmembrane domains (TMDs, also designated Wzm) and two nucleotide-binding domains (NBDs, also designated Wzt) encoded by wzm and wzt, respectively, with TMDs forming the translocation channel and NBDs hydrolyzing ATP to drive the transport cycle (Cuthbertson, Kimber and Whitfield 2007; Whitfield and Trent 2014; Simpson et al. 2015). A recent structural and functional analysis showed a novel processive O antigen translocation mechanism mediated by the ABC transproter. In this process, the Wzm/Wzt forms a transmembrane channel sufficiently wide to accommodate a linear polysaccharide, and the NBD and a periplasmic loop form “gate helices” at the cytosolic and periplasmic membrane interfaces, likely serving as substrate entry and exit points (Bi et al. 2018). In some systems, the ABC transporter recognizes the modified terminus via a carbohydrate-binding domain (CBD) fused to the C terminus of its NBD to accomplish transport. This is the case for E. coli O9a and K. pneumoniae O12, whose O antigen chains are capped at their non-reducing (growing) termini by methyl-phosphate and Kdo residues, respectively (Clarke et al. 2009; Mann et al. 2016; Liston, Mann and Whitfield 2017). In other systems represented by K. pneumoniae serogroup O2a lacking a capping residue, export of the O antigen is obligatorily coupled to chain elongation, and the ABC transporter is essential for establishing the O antigen (Kos, Cuthbertson and Whitfield 2009; Liston, Mann and Whitfield 2017) The synthase pathway has only been reported in bacteria for the plasmid-encoded S. enterica O54 O antigen (Keenleyside et al. 1994). The S. enterica O54 O antigen is a homopolymer of N-acetylmannosamine (ManNAc). Similarly, to the ABC transporter pathway, the synthesis is initiated by WecA and the first ManNAc residue is added to the Und-PP-GlcNAc primer by the non-processive transferase WbbE, forming the adaptor. Next, the second transferase, WbbF, is proposed to perform the chain-extension steps. However, the exact mechanism of export mediated by WbbF as well as the process of chain termination remain unclear. After translocation and polymerization mediated by one of the three pathways above, the resultant O polysaccharides are then transferred from the Und-PP moiety, and attached to lipid A-core by the ligase WaaL (Han et al. 2011; Ruan et al. 2012) to give an LPS molecule ready for transport to the outermembrane via the Lpt pathway (Okuda et al. 2016). A review in 2006 described the structures of E. coli O-polysaccharide antigens based on 75 published E. coli O antigen structures at that time (Stenutz, Weintraub and Widmalm 2006) and the currently known structures are included in the database ECODAB (www.casper.organ.su.se/ECODAB/) (Lundborg, Modhukur and Widmalm 2010; Rojas-Macias et al. 2015). Two recent reviews cover most of the O antigen gene cluster sequences (DebRoy et al. 2016; Iguchi et al. 2015a). This review covers the diversity in both structures and gene clusters of E. coli, and the genetic basis of the structural diversity. The structures are shown in Table 1 and the gene clusters in Fig. S1 (Supporting Information).
Table 1.

O antigen structures of E. coli serogroups.

O antigen structures of E. coli serogroups.

E. coli O antigen structures

The structures for 197 E. coli O antigens (including 21 subgroups) have been elucidated to date (Table 1). Note that the “O antigen” includes only the polysaccharide composed of O units (or single O unit if the polymerase is missing) and (in some cases) the terminating residue for the ABC transporter pathway. More than half of the O units are branched (58%) and the remaining are linear (42%). The backbones contain two to six residues. The most common (51%) has four residues in the backbone, and most others have three or five residues (41%), while only a few backbones contain two or six residues (Table 2). All seven two-residue backbones (O20, O52, O92, O95, O97, O101 and O162) have O units that use the ABC transporter dependent pathway, except O20 whose gene cluster is not at the galF-gnd locus and has not yet been characterized. Most branched O units have a one-residue side-branch (85%), but a few have two residues, viz. O29, O36, O55, O102, O105, O124, O164 and O183. The most common E. coli O unit contains a four-residue backbone with a single residue side-branch.
Table 2.

Topology of the O antigen repeating units.

Topology of the O antigen repeating units. There are 49 different sugars found in the E. coli O antigens (Table 3) and the 38 hexoses account for 97% of the sugar occurrences. There are four kinds of nonose (Neu, 8eLeg, Leg and Pse) present in one or more of the O antigens of O24, O56, O104, O145, O108, O161, O136, two kinds of pentose (d-Ribf and d-Xulf) present in one or more of O5, O20, O105, O114, O153, O178, O97 and one heptose (6d-d-manHep) is present in O52. Most of the monosaccharides are in the pyranose form (97%) with only d-Gal and d-Fuc also found in the furanose form; d-Rib and d-Xul occur only in the furanose form. Most (79%) of the sugars have the d-configuration and the others (21%) have the l-configuration. Among the hexoses, the most common are d-GlcNAc, d-Gal, d-Glc, l-Rha, d-GalNAc and d-Man, which account for 18%, 13%, 13%, 12%, 11% and 8%, respectively, of the sugars. In addition, l-Fuc, l-FucNAc, d-GlcA, and d-GalA are present more than 10 times, and d-Galf, d-Qui3NAc, d-ManNAc, d-Qui4NAc, d-GalNAcA, d-Rha, 6d-L-Tal, Col, l-QuiNAc and d-Fuc3NAc present 3 to 9 times. Seven kinds of hexoses viz. d-FucNAc, l-RhaNAc, d-Fuc4NAc, d-Fucf, d-Rha4NAc, d-ManNAc3NAcA and l-IdoA are rare, and present only in O45, O3, O10, O52, O157, O180 and O112ab, respectively. It is also notable that some sugars including Col, l-RhaNAc, d-Fuc4NAc, l-RhaNAc3NAc (or l-RhaNAc3NFo) are only present in side-branches. A total of 19 non-sugar constituents are found in E. coli O antigens, of which the acetyl group is the most common constituent (70%), others being e.g. amide-linked amino acids such as glycine (O91), N-acetyl-glycine (O121), l-alanine (O167), N-acetyl-l-serine (O114), d-allo-threonine (O110) and N-acetyl-d-allo-threonine (O63), pyruvate groups substituting positions 4 and 6 of N-acetyl-d-glucosamine (O112ac and O149) or d-galactose (O156) as well as lactyl groups substituting l-rhamnose (O58), d-glucose (O124 and O146) or N-acetyl-d-glucosamine (O69 and O150).
Table 3.

Sugar composition of the E. coli O antigens.

Sugar composition of the E. coli O antigens. Structure analysis usually determines the linkages present in the repeating unit including that made during polymerization, but commonly does not distinguish the initial sugar, as neither the terminal sugar nor the linkage to the core are identified because they are present only once. Thus, the structure as presented is sometimes a frameshift of the repeat unit. In the absence of an alternative IT in the gene cluster and near universal presence of d-GlcNAc and or d-GalNAc, which is transferred as 1-phosphate to Und-P by IT to initiate O-antigen synthesis, it is usually assumed that one or the other is the first sugar. According to this and based on the 195 known structures (excluding O101 and O162, as discussed below), d-GlcNAc can be unambiguously confirmed as the initial sugar for 101 (52%) of the known structures, as there are no d-GalNAc residues. For 37 of them there is more than one d-GlcNAc, so the specific initial d-GlcNAc cannot be decided by considering just the primary structure of the repeating unit. d-GalNAc is confirmed to be the initial sugar for 59 (30%) of the O antigens as no d-GlcNAc is present, but the initial residue cannot be determined in 23 O antigens with more than one d-GalNAc. Among the 16 O antigens that contain just one of each, d-GlcNAc is determined to be the initial sugar in 12 of them (O6, O23, O35, O48, O49, O65, O80, O98, O113, O138, O143 and O181), as the gne gene, whose product catalyzes the isomerization from UDP-d-GlcNAc to UDP-d-GalNAc, is present in the corresponding gene cluster, and d-GalNAc can be deduced as the initial sugar of the remaining four (O33, O55, O160 and O178), as the gnu gene, whose product converts Und-PP-GlcNAc to Und-PP-GalNAc when GalNAc is the initial sugar, is located just upstream of galF. For O103, O107 and O116, with structures possessing both d-GlcNAc and d-GalNAc simultaneously, it is proposed that d-GlcNAc should act as the initial sugar, as each gene cluster contains the gne gene but the gnu gene was not present upstream of galF, as is common for O antigens with GlcNAc as first sugar (Wang et al. 2002; Cunneen et al. 2013). However, theoretically, the possibility of having d-GalNAc as the first sugar of the O-unit could not be excluded, as the gnu homolog was not screened at the whole genome level. The O101 and O162 O units both have two structural forms. The O101 unit contains →6)-d-GlcNAc-(α1→4)-d-GalNAc-(α1→ (form 1) and →4)-d-GalNAc-(α1→4)-d-GalNAc-(β1→ (form 2). The O antigen of O162 has an additional sugar, 4d-d-araHex attached to the GlcNAc of form 1, and to the second GalNAc of form 2. Although there is a gne gene within each gene cluster, it is hard to infer which sugar is the first, as these two serogroups utilize the ABC transporter pathway for the O antigen synthesis. Neither d-GlcNAc nor d-GalNAc are present in the O8, O9, O9a, O20, O45, O52, O60, O62, O92, O95, O97 and O99 O units. All except O20, O45 and O60 are known to be synthesized by the ABC transporter dependent pathway; O20 and O60 are discussed below and the O45 O antigen contains d-Glc, 6d-l-Tal and d-FucNAc, with d-FucNAc proposed as the initial sugar, as is the case for Pseudomonas aeruginosa (Rocchetta et al. 1998; Dean et al. 1999; Raymond et al. 2002). The O45-antigen is unique in E. coli in having an IT gene in the gene cluster. The wbhQ gene encodes a product with 53% identity to WbpL of P. aeruginosa, and is proposed to have the same function using d-FucNAc as first sugar (Rocchetta et al. 1998).

General features of E. coli O antigen gene clusters

Escherichia coli O antigen gene clusters have between 5 and 18 genes and most are located between the galF and gnd genes. However, in O8 and O9/O9a, the O antigen gene clusters are located between the gnd and hisI genes (Kido et al. 1995; Amor and Whitfield 1997; Sugiyama et al. 1998) which is atypical in E. coli, but normal in Klebsiella. The O62 O antigen gene cluster is far from galF and gnd, and is proposed to originate from E. coli O68 by the acquisition of a novel O antigen gene cluster from E. aerogene, followed by an IS event resulting in repression of O68 O antigen synthesis (Hou et al. 2017). Three O serogroups (O34, O89 and O144) are rough (lacking a typical O antigen). However, in each case, an O antigen gene cluster was reported and has a full set of genes (Iguchi et al. 2015a) but is presumably not expressed. The O14 type strain has the O antigen gene cluster deleted apparently by homologous recombination between the manB genes of the colanic acid and O antigen gene clusters, thereby removing the genes between them (Jensen and Reeves 2004) but the LPS includes an O antigen that has the structure of the ECA. The same situation was observed in other O14 strains that have the R1, R4 or K-12 outer core (Jensen and Reeves 2004; Kuhn, Meier-Dieter and Mayer 1988). It appears that the R1, R4 and K-12 WaaL ligases will add ECA repeat units to the outer core if the O antigen is not present. There is no O antigen gene cluster between galF and gnd in O57, although this serogroup is not rough (Naumenko et al. 2018b). For O20 and O60, their O-unit structures do not fit the genetic region flanked by galF and gnd. We therefore anticipate that the functional gene clusters for O20, O57 and O60 are elsewhere on the chromosome. In this review, we discuss the O antigen gene clusters of 178 serogroups, excluding O14, O20, O34, O57, O60, O89 and O144. The GC% contents of the gene clusters are lower than that of the whole genome. Among the three classes of genes, the GC% for O unit processing genes and glycosyltransferase genes is lower than for nucleotide sugar synthesis genes. Most (93%) of the E. coli O antigens use the Wzx/Wzy-dependent process (gene clusters contain wzx and wzy genes) for the synthesis and translocation of O antigens and a few, including O8, O9/O9a, O52, O62, O92, O95, O97, O99, O101 and O162 use the ABC transporter pathway (gene clusters contain wzm and wzt genes). Many O antigens of the latter group are linear homopolysaccharides such as O8, O9/O9a and O92 or have a homopolysaccharide backbone with two monosaccharide side-branches per O unit (O62, O97 and O99). The others contain a short heterodisaccharide O unit (O52 and O95) or O-units of two types (O101 and O162) with a single monosaccharide side-branch in O162. Genes for the biosynthesis of several monosaccharides including UDP-d-Glc, UDP-d-Gal, UDP-d-GlcNAc, UDP-d-GlcA, UDP-d-GalA, NDP-d-Ribf and NDP-d-Xulf are not found in the O antigen gene clusters: these sugars are also present in other E. coli structures, and the genes are elsewhere in the chromosome. The genes for biosynthesis of the sugars that are present in the gene clusters are described below. For most of the Wzx/Wzy-dependent O antigens the number of glycosyltransferase genes in the gene cluster is one less than the number of residues in the O unit, as expected because the initial GlcNAc or GalNAc is added by WecA. For the ABC transporter-dependent O antigens, the initial GlcNAc is present only once as a primer, followed by an adaptor region and then the O antigen (O8/O9/O9a). In some cases, the presence of fewer GTs than expected can be accounted for by the presence of d-Glc side-branch residues, as these are commonly added when the O antigen is in the periplasm by the Gtr process, involving three genes (gtrA/B/C). The genes are commonly clustered in a prophage genome. GtrA and GtrB are highly conserved among serogroups, and GtrC appers to be unique to each serogroup and is referred to as the serogroup-specific glucosyl transferase. For S. flexneri, a model has been proposed showing how the O antigen glucosylation occurs. Overall, GtrB first carries UDP-Glc to the bactoprenol carrier to yield Und-P-Glc precursor in the cytoplasm, then GtrA is thought to flip Und-P-Glc from cytoplasm to the periplasm. In the final step, GtrC attaches Und-P-Glc specifically to the appropriate sugar of the O-unit and recycles the lipid carrier to the cytoplasm (Allison and Verma 2000; Korres and Verma 2004; Korres et al. 2005). Only 10 of the 20 E. coli O units with a d-Glc side-branch have a potential d-Glc glycosyltransferase gene in the O antigen gene cluster and the others are either known or proposed to be attached by the Gtr process involving genes elsewhere in the chromosome. In addition, O units with two d-Man residues present in the O unit have a glycosyltransferase that can transfer both, a situation found in the O9/O9a (Kido and Kobayashi 2000), O17/O44/O73/O77 (Wang et al. 2007a) and O176 (Shashkov et al. 2018) gene clusters. Third, in O units with more than one l-Rha residue, one glycosyltransferase is probably responsible for the formation of different linkages between two l-Rha residues, a situation found in the O1, O2, O13, O19, O35, O50, O99, O129, O135, O139 and O150 gene clusters (Perepelov et al. 2007b; Perepelov et al. 2009b; Perepelov et al. 2010a; Perepelov et al. 2011b; Rundlöf, Weintraub and Widmalm 1998; Yang et al. 2018). However, several other O antigen gene clusters also have more or fewer GTs than expected with no obvious explanation: serogroups O5, O108, O111, O120, O147, O153 and O178 have one less glycosyltransferase than expected, while the O52, O143 and O167 O antigen gene clusters have one extra glycosyltransferase. For the ABC transporter-dependent pathway capping of the non-reducing terminal residue of the polysaccharide is important for chain-termination (Mann, Kimber and Whitfield 2019); the E. coli gene clusters of O52, O60, O89, O97, O101 and O162 have a putative methyltransferase gene, which is proposed to be responsible for the termination of the O polysaccharide (Jansson et al. 1985; Senchenkova et al. 1996; Vinogradov et al. 2002; Feng et al. 2004), as there is not any methyl group in the O-unit structure. In the LPS of Pseudomonas syringae pv. phaseolicola GSPB 1552 the backbone homopolysaccharide consisting of d-rhamnopyranosyl residues is partially O-methylated at O3 of one of the sugar residues in the repeating unit and in addition also at the terminal sugar residue of the polymer (Zdorovenko et al. 2001). Whether the O-methylation at the terminal residue is important for chain-termination in the same way as for the E. coli O8 O antigen remains to be elucidated. Besides the chemical degradation approach complemented by methylation analysis and NMR spectroscopy used in the study one should also be able to differentiate internal and terminal O-methylation patterns by 13C NMR relaxation studies that can reveal differential dynamics between terminal and internal residues. (Lycknert and Widmalm 2004). Most of the ORFs in the O antigen gene clusters could be assigned functions based on similarity to genes in available databases (Fig. S1, Supporting Information). Gene names are used as previously assigned, or given according to the bacterial polysaccharide gene nomenclature (BPGN) system; see also the Bacterial Surface Polysaccharide database (BSPdb.nankai.edu.cn). A total of 68 anomalies are found in 178 serogroups, which include 17 for unusual sugar biosynthesis gene location, five for unusual location gene or transcriptional direction of some other genes, 23 for presence of mobile elements, nine for presence of gene remnant(s) and 14 for presence of non-coding regions. These anomalies usually indicate that the gene cluster involved has undergone a relatively recent genetic event (Liu et al. 2008). The proportion of gene clusters with anomalies is similar to that in Salmonella (32%), but less than in Shigella (52%), indicating that Shigella is undergoing more rapid change in O antigens than Salmonella and typical E. coli. For details of these anomalies and O antigen structures of Shigella (Table S1), (Knirel et al. 2016a, Knirel et al. 2018) see supplementary information (SI). Biosynthesis pathways for the sugars in E. coli O antigens. RmlA, glucose-1-phosphate thymidylyltransferase (Zuccotti et al  2001); RmlB, dTDP-d-glucose 4,6-dehydratase (Allard et al. 2001); RmlC, dTDP-4-keto-6-deoxy-d-glucose 3,5-epimerase (Giraud et al. 1999a); RmlD, dTDP-6-deoxy-l-mannose-dehydrogenase (Giraud et al. 1999b); ManA, phosphomannose isomerase; ManB, phosphomannomutase; ManC, mannose-1-phosphate guanylyltransferase (Samuel and Reeves 2003); Psb1, C6 dehydratase/C5 epimerase; Psb2, aminotransferase; Psb4, nucleotidase; Psb6, condensase; Psb3, cytidylyltransferase; FnlA, 4,6-dehydratase, 3- and 5-epimerase; FnlB, reductase; FnlC, C-2 epimerase (Kneidinger et al. 2003b); GalU, UTP-glucose-1-phosphate uridylyltransferase (Bonofiglio, Garcia and Mollerach 2005); GlmU, UDP-N-acetyl-glucosamine pyrophosphorylase (Mengin-Lecreulx and van Heijenoort 1993); GalE, UDP-glucose-4-epimerase (Samuel and Reeves 2003); Glf, UDP-galactopyranose mutase (Nassau et al. 1996); Gna, UDP-GalNAcA synthetase (Zhao et al. 2000); Gne, UDP-N-acetylglucosamine-4-epimerase (Bengoechea et al. 2002); Gae, C5-epimerase; Gla, UDP-galacturonatenase (Munoz et al. 1999); Ugd, UDP-glucose 6-dehydrogenase (Stevenson et al. 1996); QnlA, dTDP-4-dehydrorhamnose reductase; QnlB, C-2 epimerase (Kneidinger et al. 2003); Gmd, GDP-mannose-4,6-dehydratase (Somoza et al. 2000; Kneidinger et al. 2001); Fcl, GDP-l-fucose synthetase (Rosano et al. 2000); VioA, aminotransferase (Wang et al. 2007); VioB, N-acetyltransferase (Wang et al. 2007); ColA, GDP-4-keto-6-deoxy-d-mannose 3-dehydrase (Alam, Beyer and Liu 2004); ColB, GDP-colitose synthase (Alam, Beyer and Liu 2004); PerA, GDP-perosamine synthetase (Zhao et al. 2007b; Albermann and Beuttler 2008); PerB, GDP-perosamine N-acetyltransferase (Albermann and Beuttler 2008); Rib, ribulose 5-phosphate reductase/CDP-ribitol pyrophosphorylase (Follens et al. 1999); FdtA, dTDP-6-deoxy-hex-4-ulose isomerase; FdtB, dTDP-6-deoxy-d-xylo-hex-3-ulose aminase; FdtC, dTDP-d-Fuc3N acetylase (Pfoestl et al. 2003); QdtA, dTDP-4-oxo-6-deoxy-d-glucose3,4-oxoisomerase; QdtB, dTDP-3-oxo-6-deoxy-d-glucose aminase (Pfostl et al. 2008); QdtC, dTDP-d-Qui3N acetylase (Pfostl et al. 2008); MnaA, UDP-N-acetylglucosamine-2-epimerase (Campbell et al. 2000); FcfA, C-4 reductase; FcfB, mutase; GmhA, Glyceromannoheptose-7-P isomerase; HddA, d-Heptose 7-phosphate kinase; GmhB, d-Heptose 1,7-biphosphate phosphatase; HddC, d-Heptose 1-phosphate guanosyltransferase; DmhA, NAD-dependentepimerase/dehydratase; DmhB, NAD-dependent epimerase/dehydratase; FdfA, aminotransferase; FdfB, acetyltransferase; Rmd, oxidoreductase; WbpS, aminotransferase; WbuX, aminotransferase; wbhS, UDP-d-GlcNAc 4,6-dehydratase; wbhP, NAD-dependent epimerase/dehydratase; GnaA, UDP-GlcNAc 6-dehydrogenase; MndA, oxidoreductase; MndB, acetyltransferase; MndC, aminotransferase; MndD, UDP-N-acetylglucosamine 2-epimerase; WeiP/WekE, aminotransferase; WeiQ/WekF, aminotransferase; WeiS/WekG, dehydratase/epimerase; WeiO, acetyltransferase; WekD, formyltransferase; NnaA, UDP-N-acetylglucosamine-2-epimerase; NnaB, N-acetylneuraminic acid synthetase; NnaC, CMP-N-acetylneuraminic acid synthetase; NnaD, N-acetylneuraminic acid synthetase; Elg1, UDP-N-acetylglucosamine 4,6-dehydratase; Elg2, aminotransferase; Elg3, UDP-N-acetylglucosamine 2-epimerase; Elg4, N-acetyl-neuraminic acid synthetase; Elg5, acetyltransferase; Elg6, sugar-phosphate nucleotide transferase, Elg7, cytidylyl-transferase; Lea1, C6 dehydratase; Lea2, aminotransferase; Lea3, alanyltransferase; Lea4, N-acetylneuraminic acid synthetase; Lea5, UDP-N-acetylglucosamine 2-epimerase; Lea6, sugar-phosphate nucleotide transferase; Lea7, CMP-N-acetylneuraminic acid synthetase; TarI, cytidylyltransferase. aThe enzyme is encoded by the gene which is not located in the O antigen gene cluster. bThe enzyme could not be assigned functionally.

Sugar biosynthetic pathways

The proposed or characterized biosynthetic pathways of 38 different sugars found in E. coli O antigens are shown in Scheme 1. d-GlcNAc, d-Gal, d-Glc, d-GlcA, d-GalA, d-Ribf and d-Xulf are also found in other structures in E. coli and the biosynthesis genes are at conserved loci elsewhere in the chromosome, but those responsible for synthesis of the other sugars are located in the O antigen gene clusters. The precursors of the sugar residues in the O unit are all transferred from nucleotide diphospho (NDP) derivatives such as dTDP, GDP, UDP and CDP, or a monophospho derivative CMP.
Scheme 1.

Biosynthesis pathways for the sugars in E. coli O antigens. RmlA, glucose-1-phosphate thymidylyltransferase (Zuccotti et al  2001); RmlB, dTDP-d-glucose 4,6-dehydratase (Allard et al. 2001); RmlC, dTDP-4-keto-6-deoxy-d-glucose 3,5-epimerase (Giraud et al. 1999a); RmlD, dTDP-6-deoxy-l-mannose-dehydrogenase (Giraud et al. 1999b); ManA, phosphomannose isomerase; ManB, phosphomannomutase; ManC, mannose-1-phosphate guanylyltransferase (Samuel and Reeves 2003); Psb1, C6 dehydratase/C5 epimerase; Psb2, aminotransferase; Psb4, nucleotidase; Psb6, condensase; Psb3, cytidylyltransferase; FnlA, 4,6-dehydratase, 3- and 5-epimerase; FnlB, reductase; FnlC, C-2 epimerase (Kneidinger et al. 2003b); GalU, UTP-glucose-1-phosphate uridylyltransferase (Bonofiglio, Garcia and Mollerach 2005); GlmU, UDP-N-acetyl-glucosamine pyrophosphorylase (Mengin-Lecreulx and van Heijenoort 1993); GalE, UDP-glucose-4-epimerase (Samuel and Reeves 2003); Glf, UDP-galactopyranose mutase (Nassau et al. 1996); Gna, UDP-GalNAcA synthetase (Zhao et al. 2000); Gne, UDP-N-acetylglucosamine-4-epimerase (Bengoechea et al. 2002); Gae, C5-epimerase; Gla, UDP-galacturonatenase (Munoz et al. 1999); Ugd, UDP-glucose 6-dehydrogenase (Stevenson et al. 1996); QnlA, dTDP-4-dehydrorhamnose reductase; QnlB, C-2 epimerase (Kneidinger et al. 2003); Gmd, GDP-mannose-4,6-dehydratase (Somoza et al. 2000; Kneidinger et al. 2001); Fcl, GDP-l-fucose synthetase (Rosano et al. 2000); VioA, aminotransferase (Wang et al. 2007); VioB, N-acetyltransferase (Wang et al. 2007); ColA, GDP-4-keto-6-deoxy-d-mannose 3-dehydrase (Alam, Beyer and Liu 2004); ColB, GDP-colitose synthase (Alam, Beyer and Liu 2004); PerA, GDP-perosamine synthetase (Zhao et al. 2007b; Albermann and Beuttler 2008); PerB, GDP-perosamine N-acetyltransferase (Albermann and Beuttler 2008); Rib, ribulose 5-phosphate reductase/CDP-ribitol pyrophosphorylase (Follens et al. 1999); FdtA, dTDP-6-deoxy-hex-4-ulose isomerase; FdtB, dTDP-6-deoxy-d-xylo-hex-3-ulose aminase; FdtC, dTDP-d-Fuc3N acetylase (Pfoestl et al. 2003); QdtA, dTDP-4-oxo-6-deoxy-d-glucose3,4-oxoisomerase; QdtB, dTDP-3-oxo-6-deoxy-d-glucose aminase (Pfostl et al. 2008); QdtC, dTDP-d-Qui3N acetylase (Pfostl et al. 2008); MnaA, UDP-N-acetylglucosamine-2-epimerase (Campbell et al. 2000); FcfA, C-4 reductase; FcfB, mutase; GmhA, Glyceromannoheptose-7-P isomerase; HddA, d-Heptose 7-phosphate kinase; GmhB, d-Heptose 1,7-biphosphate phosphatase; HddC, d-Heptose 1-phosphate guanosyltransferase; DmhA, NAD-dependentepimerase/dehydratase; DmhB, NAD-dependent epimerase/dehydratase; FdfA, aminotransferase; FdfB, acetyltransferase; Rmd, oxidoreductase; WbpS, aminotransferase; WbuX, aminotransferase; wbhS, UDP-d-GlcNAc 4,6-dehydratase; wbhP, NAD-dependent epimerase/dehydratase; GnaA, UDP-GlcNAc 6-dehydrogenase; MndA, oxidoreductase; MndB, acetyltransferase; MndC, aminotransferase; MndD, UDP-N-acetylglucosamine 2-epimerase; WeiP/WekE, aminotransferase; WeiQ/WekF, aminotransferase; WeiS/WekG, dehydratase/epimerase; WeiO, acetyltransferase; WekD, formyltransferase; NnaA, UDP-N-acetylglucosamine-2-epimerase; NnaB, N-acetylneuraminic acid synthetase; NnaC, CMP-N-acetylneuraminic acid synthetase; NnaD, N-acetylneuraminic acid synthetase; Elg1, UDP-N-acetylglucosamine 4,6-dehydratase; Elg2, aminotransferase; Elg3, UDP-N-acetylglucosamine 2-epimerase; Elg4, N-acetyl-neuraminic acid synthetase; Elg5, acetyltransferase; Elg6, sugar-phosphate nucleotide transferase, Elg7, cytidylyl-transferase; Lea1, C6 dehydratase; Lea2, aminotransferase; Lea3, alanyltransferase; Lea4, N-acetylneuraminic acid synthetase; Lea5, UDP-N-acetylglucosamine 2-epimerase; Lea6, sugar-phosphate nucleotide transferase; Lea7, CMP-N-acetylneuraminic acid synthetase; TarI, cytidylyltransferase. aThe enzyme is encoded by the gene which is not located in the O antigen gene cluster. bThe enzyme could not be assigned functionally.

(i) dTDP sugar biosynthesis Seven dTDP sugars are found in E. coli O antigens, including dTDP derivatives of l-Rha, d-Qui3NAc, d-Qui4NAc, 6d-l-Tal, d-Fuc3NAc, d-Fuc4NAc, d-Fucf. All but the dTDP-d-Fuc4NAc pathway have been determined experimentally. (ii) UDP sugar biosynthesis Twenty UDP sugars are found in E. coli O antigens. Some proposed pathways including those for UDP-l-IdoA, UDP-l-FucNAm, UDP-d-FucNAc, UDP-d-GalNAcAN, UDP-l-RhaNAc3NAc, UDP-d-GalA6N and UDP-l-RhaNAc3NFo are not based on experimental data but are the only likely pathway for the enzymes encoded in the gene cluster, and those for UDP-d-GalNAcAN and UDP-d-GalA6N are proposed in this study for the first time. (iii) GDP sugar biosynthesis Five GDP sugars including GDP derivatives of d-Man, l-Fuc, d-Rha, Col and d-Rha4NAc are found in E. coli O antigens. All of these GDP-sugar biosynthetic pathways have been identified, and all involve the enzymes encoded by manABC and gmd genes. However, manA is not located in the O antigen gene cluster (Samuel and Reeves 2003). (iv) NDP-d-Ribf and NDP-d-Xulf sugar and ribitol-5-phosphate biosynthesis d-Ribf is present in nine serogroups (O5, O20, O54, O105, O114, O153, O178, O183 and O185) and Xulf is present in O95 and O97; however, the existence of a nucleotide-activated Ribf has not been demonstrated and genes responsible for its nucleotide precursor synthesis are not found in either O antigen gene cluster. The rib gene, responsible for the synthesis of ribitol-5-phosphate from ribulose-5-phosphate is present in O118 and O151 O antigen gene clusters, which contain ribitol-5-phosphate in their O antigens. (v) Biosynthesis of CMP-Neu5Ac and derivatives Neu5Ac and derivatives such as Neu5Ac,7Ac,9Ac are present in five O antigens (O24, O56, O104, O145 and O171), and nnaABCD genes responsible for CMP-Neu5Ac biosynthesis are present in all five gene clusters. (vi) CMP-8eLeg5Ac7Ac, CMP-Leg5Ac7Ala and CMP-Pse5Ac7Ac biosynthesis 8eLeg5Ac7Ac, Leg5Ac7Ala and Pse5Ac7Ac are present in the O108, O161 and O136 O antigens, respectively. Genes for biosynthesis of the first two were proposed in our previous studies (Li et al. 2010; Perepelov et al. 2010b), and is proposed in this study for the third. (vii) Biosynthesis of GDP-6d-d-manHep 6d-d-manHep is a heptose present only in the O52 O antigen. Genes including dmhAB, hddAC and gmhAB in the O52 O antigen gene cluster are involved in the synthesis of its GDP derivative. However, the function of dmhA and dmhB has not been confirmed biochemically (Valvano, Messner and Kosma 2002; Feng et al. 2004)

Classification of GTs

GTs can be divided into groups in various ways, as in the frequently employed Pfam (Protein Families) and Cazy (Carbohydrate-Active enZYmes) databases. However, they provide limited guidance for allocating a GT to a specific linkage, and only rarely is it possible to do the assignment with confidence from sequence comparisons alone (Stevenson, Dieckelmann and Reeves 2008). Biochemical analysis of GTs are critical to determine their specific functions in O antigen assembly, but this is still difficult as the dinucleotide-sugar precursor often is not readily available and the acceptor is usually an incomplete O unit, almost never readily available (Stevenson, Dieckelmann and Reeves 2008; Czuchry, Szarek and Brockhausen 2018). Only a few of the GTs have been identified using biochemical methods or genetic methods, including WbbD in O7 (Riley et al. 2005), WbdA in O8 and O9/O9a (Greenfield et al. 2012a) (Kido and Kobayashi 2000), WbdD in O9/O9a (Clarke et al. 2011), WbgO in O55 (Liu et al. 2009), WfaP in O56 (Brockhausen et al. 2008), WcmABCD in O86 (Yi et al. 2005), WclY in O107 and O117 (Wang et al. 2012a), WbdM and WbdN in O111 (Stevenson, Dieckelmann and Reeves 2008), WbsJ in O128 (Li et al. 2008), WbdN in O157 (Gao et al. 2012), WfgD in O152 (Brockhausen et al. 2008), WfcD in O141 (Chen et al. 2017) and WbwABC in O104 (Wang et al. 2014; Czuchry et al. 2015; Czuchry, Szarek and Brockhausen 2018). There is also limited analysis of the substrate range of the GTs found in the gene clusters. However, it is common to find clearly related GTs responsible for identical or similar reactions. It is therefore very useful to identify the GTs that fit into such a group. The GTs in E. coli O antigen gene clusters were analyzed using OrthoMCL (Li et al. 2003) with a cut-off of 1e−50 to assemble all the GTs into homology groups (HGs); 621 GTs from 178 O serogroups were annotated, and among them, 491 could be classified into 106 HGs (Table S2, Supporting Information). The functions of 155 of them can be predicted based on correlations between presence of GTs with a similar protein sequence and a shared or similar structural element in the corresponding O antigens. GTs belonging to the same HG often have or are proposed to have identical or very similar functions. For example, the 18 GTs in HG5 share 51% to 99% identity in pair-wise comparisons. Of these, 13 WbuB are responsible for a l-FucpNAc-(α1→3)-d-GlcpNAc linkage, and two WbuB (O118, O145) are responsible for the similar l-FucpNAm-(α1→3)-d-GlcpNAc linkage. The remaining WbwW in O98, WbuB in O181 and WbwH in O123 are responsible for the transfer of l-QuipNAc or l-QuipNAcyl to form a linkage to d-GlcpNAc. All 18 form an α-(1→3)-linkage between sugars related to d-GlcNAc. Although the above groups of related GTs within an HG perform identical or almost identical functions, they were given different names. We therefore recommend that the genes within each such HG be given a single name based on established function and/or publication priority. We noted above that having most of the O antigen gene clusters at a single locus facilitates transfer of the gene clusters between strains by homologous recombination in the flanking genes. The capacity for serotype substitution has the advantage that in one step a strain can disable an immune response to these immunodominant O antigens, and also gain a replacement O antigen to avoid the problems faced by a rough organism. It has also often been observed that GT genes are commonly present in inverse order of their function order, with the gene for the last sugar to be added being the first GT gene in the cluster, and so on. We used the large number of E. coli GTs with a putative function for a quantitative analysis, and found that 69 (39%) of the 178 suitable gene clusters have their GTs in reverse order to function. Kenyon et al. observed that the gene order also generally keeps the strain-specific genes in any pairwise combinations in a single block, which maintains divergence-level symmetry (Kenyon, Cunneen and Reeves 2017). A detailed description is given in supplementary information including Fig. S2 (Supporting Information).

Formation and expansion of O antigen variety in E. coli

Most of the O antigen gene clusters in E. coli and related species are at the galF-gnd locus, and transfer of the gene clusters between strains can occur by homologous recombination in the flanking genes, which will substitute one O antigen for another. IS elements appear to play an important role in mediating intraspecies and interspecies O antigen gene transfer that is not based on homology, and also plays a role in the diversification of O antigen forms by gene inactivation. Temperate phages play an important role by carrying genes responsible for O antigen modifications that are expressed when the phage is present in the bacterial genome. In addition, there are several cases of amino acid substitutions in key genes resulting in new O antigen forms. Analysis of the E. coli O antigen gene clusters shows that such genetic events are present in several groups of closely related O antigen gene clusters (Fig. 1).
Figure 1.

O antigen gene clusters and structures that are identical or closely related within E. coli (A) KO5 and O8; KO3, O9 and O9a; O157, O24 and O56; O169 and O183; (B) O86, O127 and O90; O107 and O117; O124 and O164; O118 and O151; (C) O2 and O50; O101 and O162; O17, O44, O73, O77 and O106. For gene key, see Supplementary Fig. S1.

(i) E. coli O8, O9/O9a. The O8, O9 and O9a O antigens are a family of related linear mannopyranose homopolymers, which have become prototypes for O antigen synthesis by the ABC transporter pathway. The O antigens of E. coli O9a and O8 are identical to those of K. pneumoniae O3 and O5 O-PSs, respectively. The genetic loci encoding the corresponding O antigens’ biosynthesis enzymes are highly conserved, and their close evolutionary relationships reflect lateral gene transfer between the two species (Sugiyama et al. 1998; Reeves and Wang 2002). WbdC has been shown to possess identical activities in O8 and O9a, and this is the case for WbdB. WbdC is a GDP-Man:GlcpNAc-PP-Und-(α1→3)-mannosyltransferase and WbdB is an α-(1→3) mannosyltransferase that installs, in succession, two Man residues onto the disaccharide WbdC product α-Manp-(1→3)-β-GlcpNAc-PP-Und, forming the adaptor region. However, WbdAs from O8 and O9a share only 16% identity and differ in size, and experimental evidence indicates that WbdA is the serogroup-defining mannosyltransferase. WbdA (O8) has three predicted glycosyltransferase modules and polymerizes a trisaccharide repeat unit containing single α-(1→3)-, α-(1→2)-, and β-(1→2)-linked mannose residues. In contrast, WbdA (O9a) has two predicted glycosyltransferase modules and polymerizes a tetrasaccharide repeat unit containing two α-(1→2)- and two α-(1→3)-linked mannose residues (Greenfield et al. 2012b). Functional analysis of chimeric genes and site-directed mutagenesis of WbdA in the O9 isolate showed that a single amino acid substitution (R55C) is responsible for O9a having one less mannose residue in an (α1→2)-linkage (Kido and Kobayashi 2000). Similarly, the glycosyltransferase WejI in E. coli O99 and WbrX in E. coli O52, also employing the ABC transporter pathway for their O antigen synthesis, are predicted bifunctional enzymes (Feng et al. 2004; Perepelov et al. 2009b), but further investigation is needed. In addition, WbdD (O9/O9a) is a bifunctional kinase-methyl transferase that adds a phosphomethyl group to the O9/O9a non-reducing terminus (Clarke et al. 2011), whereas WbdD (O8) adds only a methyl group in O8 (Clarke, Cuthbertson and Whitfield 2004), both halting polymerization and regulating the O antigen chain length. (ii) E. coli O24 and O56. The O24 and O56 O antigens have similar tetrasaccharide repeat units but have d-Glc/d-Gal and d-GlcNAc/d-GalNAc substitutions. Comparison of the two O antigen gene clusters showed that the O24 O antigen gene cluster arose from that of O56 through inactivation of two genes and acquisition of two new glycosyltransferase genes from E. coli O157 and O152, respectively, which led to these sugar substitutions (Cheng et al. 2006). Several kinds of IS elements are found to play important roles in obtaining genes from other strains and inactivation of genes present in the gene cluster, but this is the only case where donor strains were identified. (iii) E. coli O169 and O183. The O169 and O183 O units have identical backbones, but O169 has d-Glc and d-GlcA side-branch residues separately attached to the d-Gal residue in the backbone, while O183 has a two-sugar d-Ribf-(β1→4)-d-GlcpA side-branch attached to the same residue. The seven genes downstream of galF share > 97% identity level, which is consistent with their common backbone and the side-chain d-GlcA residue. The wbaM gene in O183 is assigned as a putative phosphoribosyltransferase gene, and its function in the transfer of d-Ribf to d-GlcpA has been confirmed (Senchenkova et al. 2005). A glycosyltransferase gene, which occurs at the 3’-end of the O antigen gene cluster of O169 but is absent in O183, may be responsible for side-chain glucosylation of O169. (iv) E. coli O86, O90 and O127. The O127 O unit has the structure →2)-l-Fucp-(α1→2)-d-Galp-(β1→3)-d-GalpNAc-(α1→3)-d-GalpNAc-(α1→. The O90 O unit differs in having a (β1→4)-linkage instead of the (α1→2)-linkage between O units, and also in the acetylation position on l-Fuc. The gene clusters are almost 100% identical. However, it is noteworthy that the putative acetyl transferase WbiO of O127 shared 100% identity level to the 3’-end region of its O90 homolog, WerU, and this may result in the different acetylation of l-Fuc of the two serogroups. In addition, it is interesting that the wzy gene of O127 possesses three more amino acid residues than that of O90, possibly accounting for the Wzy linkage-difference upon polymerization between the two serogroups. The O86 structure has the Wzy (β1→4)-linkage of O90, and contains an added d-Gal side-branch instead of two acetyl modifications on the l-Fuc residue. The O86 and O127/O90 gene clusters are more than 96% identical for the first seven genes, but only 47–88% identical for the five downstream genes. It is proposed that one of the O86 and O90/O127 O antigen gene clusters evolved from the other by transfer of the downstream part of the O antigen gene cluster by homologous recombination (Feng et al. 2005). (v) E. coli O107 and O117. The 5-sugar O units differ only in the substitution of a d-GlcNAc for d-Glc. The two gene clusters have the same genes and, and are 98.6% identical. The difference was shown by gene replacement experiments to be due to specificity of the two GT genes (wekIO107 and wekIO117), the products of which differ by only three amino acid substitutions (Wang et al. 2012a). (vi) E. coli O124 and O164. The O124 O unit has the main chain structure →3)-d-Galp-(β1→6)-d-Galf-(β1→3)-d-GalpNAc-(β→, with a d-GlcpLA-(β1→6)-d-Glcp side chain (α1→4)-linked to the d-Galp residue. The O164 O unit has a very similar structure, with the only difference being the presence of a d-Glcp residue in place of glucolactic acid. The O124 and O164 genes are near identical with > 99.4% identity. The difference is due to a frame shift mutation in the wfeP gene in O164 responsible for the synthesis of UDP-d-GlcpLA. (Liu et al. 2008). (vii) E. coli O118 and O151. The O118 and O151 O antigens have the same O-unit backbone, but differ in the polymerization linkage, which is (β1→3)-linked in O118 but (β1→2) in O151. The gene cluster sequences are nearly identical with > 99% overall identity, but the Wzy have a single amino acid substitution, which must be responsible for the O antigen difference (Liu et al. 2010). Another possibility is that O118 or O151 could acquire a polymerase inhibitor gene (which inhibits the Wzy in the O antigen gene cluster) and a second copy of a wzy gene outside the O antigen gene cluster. A similar case has been reported in Pseudomonas aeruginosa (Newton et al. 2001; Kaluzny, Abeyrathne and Lam 2007). In addition, the O151 antigen is modified by a single side-branch GlcNAc residue. It is most likely that a glycosyltransferase gene for transfer of GlcNAc in O151 is located elsewhere in the genome. (viii) E. coli O2 and O50. The O2 and O50 O units have the same main chain but O2 has a d-Fucp3NAc side-branch on the last sugar. The gene clusters are > 99% identical. The genetic basis for the lack of d-Fucp3NAc in O50 is evidently the point mutation in the aminotransferase gene fdtB of the d-Fucp3NAc synthesis pathway resulting in a single amino acid change from histidine to arginine (Yang et al. 2018). (ix) E. coli O101 and O162. The O101 and O162 O units both have two structural forms. The O101 O unit contains →6)-d-GlcpNAc-(α1→4)-d-GalpNAc-(α1→ (form 1) or →4)-d-GalpNAc-(α1→4)-d-GalpNAc-(β1→ (form 2); O162 has an additional sugar, 4d-d-araHex attached to the GlcpNAc of form 1, and to the second GalpNAc of form 2. The O101 and O162 gene clusters are nearly identical, with an overall DNA identity of 99.8%, which is consistent with the common backbone structures of these two serogroups. There are four GT genes in this ABC transporter gene cluster that presumably cover both forms. We screened the additional six genes between gnd and ugd that normally reside outside the O antigen cluster, and found that a very high DNA identity level are shared between O101 and O162, except that the first gene downstream of gnd in O101 is interrupted by an IS element. The six genes were assigned as oxidoreductase, epimerase, xylose isomerase, hypothetical protein and two GTs, respectively. To our knowledge, the biosynthetic pathway of 4d-d-araHex has not yet been identified. Whether the six genes are responsible for the 4d-d-araHex synthesis and transfer in O162, and that the inserted IS element accounts for the repression of the homologous genes in O101, thus leading to the failure of 4d-d-araHex synthesis, need further investigation. (x) E. coli O17, O44, O73, O77 and O106. The DNA sequences of the E. coli O17, O44, O73, O77 and O106 gene clusters are > 99% identical. The structures share the same →6)-d-Manp-(α1→2)-d-Manp-(α1→2)-d-Manp-(β1→3)-d-GlcpNAc-(α1→ backbone but differ in having only a linear backbone in O77, whereas d-Glcp side-branches with different linkages are present in the others. The O antigen gene clusters are responsible for the synthesis of the backbones, and all but O77 are proposed to have the appropriate gtr genes required for addition of their d-Glcp side-branch (Wang et al. 2007). This was confirmed for O44 by finding an appropriate prophage with the gtr genes in the genome. These O antigens are referred as the O77-group, which has diversified by gain of different d-Glcp side-branches (Wang et al. 2007), as observed on a larger scale in Shigella flexneri (Jakhetia and Verma 2015). O antigen gene clusters and structures that are identical or closely related within E. coli (A) KO5 and O8; KO3, O9 and O9a; O157, O24 and O56; O169 and O183; (B) O86, O127 and O90; O107 and O117; O124 and O164; O118 and O151; (C) O2 and O50; O101 and O162; O17, O44, O73, O77 and O106. For gene key, see Supplementary Fig. S1.

CONCLUSIONS

Serotyping based on O antigen variability has become the primary method for classifying E. coli strains since the 1940s and standardized procedures have been developed. E. coli is divided into O and H (flagellar) antigen-defined serotypes and some have proven to be closely associated with human disease, with high morbidity and mortality. For example, the Shiga toxin-producing E. coli (STEC) O157:H7 serotype strain is well known for causing life-threatening complications, such as bloody diarrhea (hemolytic colitis) and hemolytic-uremic syndrome (HUS) (Qin et al. 2015; Li et al. 2017). On the other hand, the most common serogroups of non-O157 STEC (O26, O45, O103, O111, O121 and O145) usually cause foodborne illnesses, with infections being less severe than those caused by O157:H7, but do vary among serogroups (Gould et al. 2013). O antigen serotyping provides important pathotype information and is still the prevailing standard for detection of E. coli and some other major pathogens and is considered essential for outbreak investigation, risk management, epidemiological studies and treatment options by clinical microbiologists, infectious disease specialists and infection control agencies. Conventional serotyping is a delicate, laborious, time-consuming, and expensive process. In addition, the consistent production of fully validated antisera is difficult. During the last few decades a large number of molecular and PCR-based assays have been developed to target E. coli sero-specific genes resulting in reliable and cost-effective analysis procedures (Liu and Fratamico 2006; Lin et al. 2011; Iguchi et al. 2015b). The decline of the cost of DNA sequencing, and the ongoing progress in developing user-friendly tools, has led to several whole-genome sequencing (WGS)-based in silico serotyping approaches (Ingle et al. 2016), which offer better resolution and are superior to conventional methods for pathogen epidemiological investigation and tracing of E. coli (Joensen et al. 2015), and some of other major pathogens (Ibrahim and Morin 2018; Zhang et al. 2015; Thrane et al. 2016; Wu et al. 2019). However, it is critical for these approaches to have a database containing the relevant details of sero-specific genes, which in turn requires a thorough understanding of O antigen structures in conjunction with the genetic basis for their diversity and variation. We herein presented 197 O antigens (including 21 subgroups) representing all but five (O14, O34, O89, O144 and the unpublished O93 data) of 185 E. coli O serogroups, where 178 serogroups gave an excellent correlation between the O antigen structure and the corresponding gene cluster. Anomalies for the remaining seven serogroups are summarized in the following way. For O14 lacking a typical O antigen, the O antigen gene cluster is deleted due to a recombination event, and the ECA is attached to the LPS core, suggesting a convenient mechanism for the evolution of E. coli whose typical O antigen becomes redundant (Jensen and Reeves 2004). The O34, O89 and O144 type strains are rough, but their O antigen gene clusters seem to be complete (Iguchi et al. 2015a). However, it would be difficult to identify catalytic capability using a simple sequence survey, as one or few point mutations may lead to the inactivation of an enzyme, thus resulting in the failure of O antigen synthesis. Notably, the O antigen structures of O20, O57 and O60 have been determined, but they do not fit well with the genetic loci between galF and gnd. We propose that the functioning O antigen gene clusters of them must be located elsewhere at the chromosome, yet to be identified. Most E. coli structures and gene clusters show no obvious relationship to any others. However, the members within each of the 10 groups summarized herein share almost identical or closely related O antigen structures and gene clusters. This indicates that the members within each group are evolutionarily close, and may have evolved from a common ancester to subsequently diversify further from one to another. The diversification within different groups can be mediated by recombinational replacement of genes (O24/O56, O169/O183, O86/O90/O127), point mutation leading to the inactivation or functional change of certain genes (O107/O117, O124/O164, O118/O151, O2/O50), IS element insertion resulting in the inactivation of certain gene (O101/O162), and prophages carrying gtr genes for the presence of d-Glc side-branch residues (O17/O44/O73/O77/O106). Most Shigella serogroups fall into three clusters within E. coli (Pupo, Lan and Reeves 2000), the only exception being Shigella boydii 13 that is now recognized as an Escherichia albertii serogroup (Hyma et al. 2005). Furthermore, there are 22 pairs of O antigens and associated gene clusters identical or closely related between E. coli and Shigella (Liu et al. 2008; Knirel et al. 2016a, 2018). Compared to that of Shigella (17 anomalies out of 33 O antigen gene clusters), the proportion of anomalies (66 of 178) is lower in E. coli. The genetic basis for the O antigen synthesis of nearly all E. coli O antigens and documented correlations between O antigens and their gene clusters provide a greatly improved basis for understanding of the evolution of E. coli, and for development of sero-specific assays with higher discrimination as well as further improvements of vaccines for protection against pathogenic E. coli strains. Click here for additional data file.
  323 in total

1.  The structure of the O-antigen of Escherichia coli O116:K+:H10.

Authors:  M R Leslie; H Parolis; L A Parolis
Journal:  Carbohydr Res       Date:  1999-10-15       Impact factor: 2.104

2.  Structural determination of the O-antigenic polysaccharide of enteropathogenic Escherichia coli O103:H2.

Authors:  Evgeny Vinogradov; Leann L Maclean; Malcolm B Perry
Journal:  Can J Microbiol       Date:  2010-05       Impact factor: 2.419

3.  Erratum to: "Structural Relationships Between Genetically Closely Related O-Antigens of Escherichia coli and Shigella spp." [Biochemistry (Moscow), 81, 600 (2016)].

Authors:  Ya A Knirel; Chengqian Qian; A S Shashkov; O V Sizova; E L Zdorovenko; O I Naumenko; S N Senchenkova; A V Perepelov; Bin Liu
Journal:  Biochemistry (Mosc)       Date:  2018-11       Impact factor: 2.487

4.  Structural elucidation of the O-antigenic polysaccharides from Escherichia coli O21 and the enteroaggregative Escherichia coli strain 105.

Authors:  M Staaf; F Urbina; A Weintraub; G Widmalm
Journal:  Eur J Biochem       Date:  1999-11

5.  Structural and genetic characterization of the Shigella boydii type 10 and type 6 O antigens.

Authors:  Sof'ya N Senchenkova; Lu Feng; Jinghua Yang; Alexander S Shashkov; Jiansong Cheng; Dan Liu; Yuriy A Knirel; Peter R Reeves; Qi Jin; Qiang Ye; Lei Wang
Journal:  J Bacteriol       Date:  2005-04       Impact factor: 3.490

Review 6.  Virulence factors of uropathogenic E. coli and their interaction with the host.

Authors:  Petra Lüthje; Annelie Brauner
Journal:  Adv Microb Physiol       Date:  2014-11-04       Impact factor: 3.517

7.  Galactofuranose biosynthesis in Escherichia coli K-12: identification and cloning of UDP-galactopyranose mutase.

Authors:  P M Nassau; S L Martin; R E Brown; A Weston; D Monsey; M R McNeil; K Duncan
Journal:  J Bacteriol       Date:  1996-02       Impact factor: 3.490

8.  Topological analysis of glucosyltransferase GtrV of Shigella flexneri by a dual reporter system and identification of a unique reentrant loop.

Authors:  Haralambos Korres; Naresh K Verma
Journal:  J Biol Chem       Date:  2004-03-17       Impact factor: 5.157

9.  Structural studies on the O-polysaccharide of Escherichia coli O57.

Authors:  Olesya I Naumenko; Jingjie Song; Sof'ya N Senchenkova; Xiaohan Jiang; Andrei V Perepelov; Alexander S Shashkov; Yuriy A Knirel
Journal:  Carbohydr Res       Date:  2018-05-19       Impact factor: 2.104

10.  Structural studies of the O-antigen polysaccharides of Klebsiella O5 and Escherichia coli O8.

Authors:  P E Jansson; J Lönngren; G Widmalm; K Leontein; K Slettengren; S B Svenson; G Wrangsell; A Dell; P R Tiller
Journal:  Carbohydr Res       Date:  1985-12-15       Impact factor: 2.104

View more
  27 in total

Review 1.  Lipopolysaccharide O-antigens-bacterial glycans made to measure.

Authors:  Chris Whitfield; Danielle M Williams; Steven D Kelly
Journal:  J Biol Chem       Date:  2020-05-18       Impact factor: 5.157

2.  Isolation and Characterization of Novel Lytic Phages Infecting Multidrug-Resistant Escherichia coli.

Authors:  Javiera Vera-Mansilla; Patricio Sánchez; Cecilia A Silva-Valenzuela; Roberto C Molina-Quiroz
Journal:  Microbiol Spectr       Date:  2022-02-16

3.  Systematic exploration of Escherichia coli phage-host interactions with the BASEL phage collection.

Authors:  Enea Maffei; Aisylu Shaidullina; Marco Burkolter; Yannik Heyer; Fabienne Estermann; Valentin Druelle; Patrick Sauer; Luc Willi; Sarah Michaelis; Hubert Hilbi; David S Thaler; Alexander Harms
Journal:  PLoS Biol       Date:  2021-11-16       Impact factor: 8.029

4.  Structural basis of lipopolysaccharide maturation by the O-antigen ligase.

Authors:  Owen N Vickery; Satchal K Erramilli; Carmen M Herrera; Thomas H McConville; Khuram U Ashraf; Rie Nygaard; Vasileios I Petrou; Sabrina I Giacometti; Meagan Belcher Dufrisne; Kamil Nosol; Allen P Zinkle; Chris L B Graham; Michael Loukeris; Brian Kloss; Karolina Skorupinska-Tudek; Ewa Swiezewska; David I Roper; Oliver B Clarke; Anne-Catrin Uhlemann; Anthony A Kossiakoff; M Stephen Trent; Phillip J Stansfeld; Filippo Mancia
Journal:  Nature       Date:  2022-04-06       Impact factor: 69.504

5.  The biosynthetic origin of ribofuranose in bacterial polysaccharides.

Authors:  Steven D Kelly; Danielle M Williams; Jeremy T Nothof; Taeok Kim; Todd L Lowary; Matthew S Kimber; Chris Whitfield
Journal:  Nat Chem Biol       Date:  2022-04-07       Impact factor: 16.174

6.  The CpxAR Two-Component System Contributes to Growth, Stress Resistance, and Virulence of Actinobacillus pleuropneumoniae by Upregulating wecA Transcription.

Authors:  Kang Yan; Ting Liu; Benzhen Duan; Feng Liu; Manman Cao; Wei Peng; Qi Dai; Huanchun Chen; Fangyan Yuan; Weicheng Bei
Journal:  Front Microbiol       Date:  2020-05-21       Impact factor: 5.640

7.  Structure of a full-length bacterial polysaccharide co-polymerase.

Authors:  Benjamin Wiseman; Ram Gopal Nitharwal; Göran Widmalm; Martin Högbom
Journal:  Nat Commun       Date:  2021-01-14       Impact factor: 14.919

Review 8.  Dismantling the bacterial glycocalyx: Chemical tools to probe, perturb, and image bacterial glycans.

Authors:  Phuong Luong; Danielle H Dube
Journal:  Bioorg Med Chem       Date:  2021-06-07       Impact factor: 3.461

9.  Lipopolysaccharide-Linked Enterobacterial Common Antigen (ECALPS) Occurs in Rough Strains of Escherichia coli R1, R2, and R4.

Authors:  Anna Maciejewska; Marta Kaszowska; Wojciech Jachymek; Czeslaw Lugowski; Jolanta Lukasiewicz
Journal:  Int J Mol Sci       Date:  2020-08-21       Impact factor: 5.923

10.  Chemotaxis and Shorter O-Antigen Chain Length Contribute to the Strong Desiccation Tolerance of a Food-Isolated Cronobacter sakazakii Strain.

Authors:  Chengqian Qian; Min Huang; Yuhui Du; Jingjie Song; Huiqian Mu; Yi Wei; Si Zhang; Zhiqiu Yin; Chao Yuan; Bin Liu; Bin Liu
Journal:  Front Microbiol       Date:  2022-01-04       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.