Literature DB >> 34882085

Comparative genomics of Chinese and international isolates of Escherichia albertii: population structure and evolution of virulence and antimicrobial resistance.

Lijuan Luo¹, Hong Wang², Michael J Payne¹, Chelsea Liang¹, Li Bai³, Han Zheng⁴, Zhengdong Zhang², Ling Zhang², Xiaomei Zhang¹, Guodong Yan², Nianli Zou², Xi Chen², Ziting Wan², Yanwen Xiong⁴, Ruiting Lan¹, Qun Li².

Abstract

Escherichia albertii is a recently recognized species in the genus Escherichia that causes diarrhoea. The population structure, genetic diversity and genomic features have not been fully examined. Here, 169 E. albertii isolates from different sources and regions in China were sequenced and combined with 312 publicly available genomes (from additional 14 countries) for genomic analyses. The E. albertii population was divided into two clades and eight lineages, with lineage 3 (L3), L5 and L8 more common in China. Clinical isolates were observed in all clades/lineages. Virulence genes were found to be distributed differently among lineages: subtypes of the intimin encoding gene eae and the cytolethal distending toxin gene cdtB were lineage associated, and the second type three secretion system (ETT2) island was truncated in L3 and L6. Seven new eae subtypes and one new cdtB subtype (cdtB-VI) were identified. Alarmingly, 85.9 % of the Chinese E. albertii isolates were predicted to be multidrug-resistant (MDR) with 35.9 % harbouring genes capable of conferring resistance to 10 to 14 different drug classes. The majority of the MDR isolates were of poultry source from China and belonged to four sequence types (STs) [ST4638, ST4479, ST4633 and ST4488]. Thirty-four plasmids with some carrying MDR and virulence genes, and 130 prophages were identified from 17 complete E. albertii genomes. The 130 intact prophages were clustered into five groups, with group five prophages harbouring more virulence genes. We further identified three E. albertii specific genes as markers for the identification of this species. Our findings provided fundamental insights into the population structure, virulence variation and drug resistance of E. albertii.

Entities: Chemical

Keywords: Escherichia albertii; multidrug resistance; population structure; species-specific marker genes; transmissible elements; virulence

Mesh：

Substances：
Virulence Factors

Year: 2021 PMID： 34882085 PMCID： PMC8767325 DOI： 10.1099/mgen.0.000710

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

All newly sequenced data in this work were deposited in National Centre for Biotechnology Information (NCBI) under the BioProject of PRJNA693666. All accession numbers of the public available genomes were available in Table S1 (available in the online version of this article). Table S2–S9 are the supporting tables of the main results. The eae gene types including the seven newly defined types, as well as the three newly identified specific genes for were deposited in Figshare (https://doi.org/10.6084/m9.figshare.14846994.v3). Figure S1–S5 are the supporting figures of the main results. is an emerging foodborne pathogen causing diarrhoea. Elucidation of its genomic features is important for the surveillance and control of infections. In this work, 169 genomes from different sources and regions in China were collected and sequenced, which contributed to the currently limited genomic data pool of . In combination with publicly available genomes, the population structure of was defined. The presence and subtypes of virulence genes in different lineages were significantly different, indicating potential pathogenicity variation. Additionally, the presence of MDR genes was alarmingly high in the Chinese dominated lineages. MDR related STs and plasmid subtypes were identified, which could be used as sentinels for MDR surveillance. Moreover, the subtypes of plasmids and prophages were distributed differently across lineages, and were found to contribute to the acquisition of virulence and MDR genes in . Altogether, this work revealed the diversity of and characterised its genomic features in unprecedented detail. The three specific genes found would facilitate the identification of this emerging foodborne pathogen.

Introduction

is a recently defined species and a recognised foodborne human pathogen [1-3]. mainly causes diarrhoea [3, 4], while bacteraemic human infections were also reported [5]. has historically been misidentified as various pathogens such as enterohemorrhagic (EHEC), enteropathogenic (EPEC), serotype 13, and [1, 6]. In 2003, it was confirmed to be a novel species of the genus and named as [2, 6]. Through retrospective studies, was found to be responsible for six human diarrhoea outbreaks in Japan from 2003 to 2015 [7, 8]. can also cause infections in other animals. An outbreak of infection in common redpoll finches in Alaska led to deaths of hundreds of birds in 2004 [9]. Furthermore, has also been isolated from a variety of sources (including food products) and from wide geographic regions [9-12]. The pathogenicity of was mainly attributed to a type III secretion system (T3SS) encoded by the locus of enterocyte effacement (LEE) and the cytolethal distending toxin (Cdt) encoded by the cdtABC operon, both of which were commonly found in [1, 10, 13]. Based on the presence of the intimin eae gene, the LEE locus was found to be widely present in [1, 10]. Non-LEE effector genes, which were mainly acquired through prophages in [14], were also observed in three complete genomes [13]. Another type III secretion system 2 (ETT2), which has major effects on the surface proteins associated with serum survival (as a prerequisite for bloodstream infections) and motility of E. coli, has also been found in [15]. ETT2 was predicted to be common in based on the representative eivG gene [1, 13]. Shiga toxin (Stx) genes, stx 2f and stx 2a, are also sporadically observed in [1]. However, the detailed distribution of these genes in remained unclear, and other virulence factors reported in have not been systematically investigated in . Antimicrobial resistance (AR), especially multidrug resistance (MDR) which is defined as resistance to three or more drug classes, is an increasing global challenge [16]. Phenotypic AR and MDR of strains were observed in Brazil and China [17, 18]. Poultry source isolates in China were phenotypically resistant to up to 11 drug classes, some of which were commonly used in clinical treatment such as cephalosporins, aminoglycosides, fluoroquinolones, and beta-lactam antibiotics [17]. However, the overall presence of AR genes in isolates from different geographic regions and sources remains unclear. It is well known that transmissible elements, especially plasmids and phages, are associated with the acquisition of virulence and AR genes [19]. They are key transmissible elements for the acquisition of stx genes, T3SS effector genes, and other virulence genes in [19]. Multiple intact plasmids of carrying virulence and MDR genes were reported [1, 17, 20]. However, plasmids in draft genomes of and their association with the acquisition of AR and virulence genes remain to be characterised [1, 13]. Prophages have been found in with 4–7 prophages per genome from three complete genomes analysed [1]. However, their carriage of virulence and AR genes has not been examined. Two clades of have previously been defined based on whole genome sequencing analysis [1, 21], with no isolates from China. In this work, from different sources and regions of China were isolated and sequenced, including 163 draft and six complete genomes. Publicly available complete genomes and draft genomes of were analysed together to elucidate the population structure, virulence and resistance of and the relationships of Chinese and international isolates.

Methods

Genomic sequences

A total of 169 isolates from different sources and regions in China were collected and sequenced. The type strain LMG20976 was also sequenced in this study. All of the isolates were sequenced using Illumina sequencing [22], except for six isolates that were sequenced using Pacbio to obtain complete genomes [23]. Raw reads and assemblies of publicly available isolates were downloaded. To identify isolates that were potentially misidentified as E. coli, one reported specific gene (EAKF1_ch4033) of [24], was searched against a total of 30 021 representative and genome assemblies using blastn, using coverage and identity thresholds of 50 and 70%, respectively. In summary, there were a total of 482 genomic sequences of included in this study (Table S1). For draft genome sequences, 164 were from this study and 296 were from public databases (255 raw reads from European Nucleotide Archive and 41 assemblies from NCBI). For complete genomes, there were six genomes from this study, and 16 from NCBI (ten of which were sequenced using PacBio). Raw reads of Illumina sequencing were assembled using Skesa v2.4.0 [25].

Phylogenetic analysis and in silico multilocus sequence typing (MLST) of

In an initial analysis, 38 representative isolates were selected to represent diversity to obtain the overall picture and to identify the root of the phylogeny. Using (Accession No. NZ_CP014583.1) as a reference, SNPs were called by snippy v4.4.0 (https://github.com/tseemann/snippy), and recombinant SNPs were detected and removed by Gubbins v2.0.0 [26]. A maximum parsimony tree based on SNPs of the 38 isolates using as an outgroup was constructed by Mega X with 1000 bootstraps [27]. To elucidate the phylogenetic relationship of the 482 isolates, a phylogenetic tree was constructed using quicktree.pl (which is a pipeline of SaRTree v1.2.2) with ASM287245v1 as reference [28]. The recombination sites of the SNPs were removed using RecDetect v6.0 [28]. The SNP alignment of the genomes was analysed with Fastbaps v1.0.4 to identify lineages of [29]. The lineages defined were mapped onto the phylogenetic tree using iTOL v4 [30]. The in silico MLST which is based on the seven housekeeping genes of , was performed on with sequence types (STs) assigned [31]. Clonal complexes (CCs) of the STs were called based on one allele difference using the eBURST algorithm [32].

Screening for specific gene markers

To screen for specific gene markers, a total of 243 genomes were randomly sampled from different lineages and 1898 representative genomes of and from the ‘identification dataset’ defined by Xiaomei et al. [33] were used. Those 2141 genomes were annotated with Prokka v1.13.3 [34]. The pangenome was defined using Roary v3.12.0 with a nucleotide identity threshold of 80% [35]. Using Scoary, genes that were significantly associated with were identified [36]. Using blastn, the candidate-specific gene markers were further searched against 482 genomes using coverage >=50% and identity >=70% as cutoffs, and 30 021 representative and genomes to evaluate sensitivity and specificity as used by Xiaomei et al. [33].

Virulence and antibiotic resistance analysis of

Predicted virulence and AR genes from the genomes were identified by Abricate v0.8.13 (https://github.com/tseemann/abricate): virulence genes were screened against the virulence factors database (Ecoli_VF) and the virulence factor database (VFDB) with an identity of >=70% and coverage of >=50% [37]; AR genes were screened through the NCBI AMRFinder database with the identity of >=90% and coverage of >=90% [16].

The pangenome of and phylogeny of the eae and cdtB genes

To predict the subtypes of the eae and cdtB genes harboured by each isolate, representative sequences for each eae and cdtB type were used to search the collection of genomes using blastn with an identity of >=97% and coverage of >=50% [38]. The new eae and cdtB subtypes were defined based on the tree structure and blastn results. A new subtype was defined, if it was phylogenetically distant from the known subtypes and was present in >=5 isolates (with identity >=97%). To construct the phylogenetic tree of eae and cdtB genes from different isolates of , the pangenome of was defined. High-quality assemblies were identified using the cutoffs of N50 >29 Kb, total length between 4.3 Mb and 5.7 Mb, and the number of contigs <=450 contigs based on the output of Quast v5.0.2 [39]. A total of 422 high-quality assemblies were included and were annotated using Prokka v1.12 [34]. The pangenome of was defined using Roary v3.11.2 with the identity threshold of 70% [35]. The eae and cdtB genes for each isolate were identified from the output of Roary and Prokka. mega X was used to align the nucleotide sequences of eae and cdtB genes using muscle [27]. Neighbour-joining trees were constructed with partial deletion (90%) and 1000 bootstrap repeats using mega X [27].

Plasmid and prophage analysis based on complete genomes of

For intact plasmids and prophages of , 16 complete genomes by PacBio and one reference genome GCA_001549955.1 (sequenced by 454 GS-FLX) were selected for the prophage and plasmid analyses. To identify the plasmids in the draft genomes, we used both PlasmidFinder and MOB-suite [32, 40]. Plasmid replicon genes were screened against the PlasmidFinder database with an identity of >=65% and coverage of >=50% using Abricate v0.8.13 (https://github.com/tseemann/abricate). MOB-suite was able to identify the potential plasmid sequences in draft genomes. MOB types were assigned if the predicted plasmids were known. To evaluate if the presence of the invasive plasmid pINV of present in , the pINV specific gene ipaH and 38 plasmid-borne virulence genes were screened in the raw reads of using ShigEiFinder [33]. AR genes and virulence genes present on the intact plasmids and MOB-suite predicted plasmids were screened using the aforementioned criteria. The complete genomes were submitted to Phaster for prophage prediction [41]. To define the groups of the intact prophages, the genomic sequences of prophages were annotated with Prokka v1.12 [34]. The gff files of the intact prophages were clustered by Roary v3.11.2 with an identity of >=70%, and a binary gene presence and absence tree was generated [35]. The concatenated prophage sequences in the order of binary clustering were visualized in similarity plots by Gepard v1.3 [42]. Genes whose presence was significantly associated with prophage groups (P<=0.001) were identified using Scoary [36]. The top three to five genes that were of 100% specificity and sensitivity for each prophage group were identified as potential prophage specific genes. These prophage specific gene candidates were searched against the 482 genomes with identity >=70% and coverage >=50% using blastn. The distribution of the prophage specific genes was visualized in Phandango [43]. AR genes, plasmid replicon genes and virulence genes present on the intact prophages were screened using the aforementioned criteria. To compare the prophages of with public phage clusters from the Microbe Versus Phage (MVP) database, the representative phage sequences of different phage clusters were downloaded [44]. Each prophage sequence of was searched against the MVP reference phage cluster sequences with an identity of 80% and coverage of 50% using blastn [44].

Results

A dataset representing distribution in different source types and geographic regions

A total of 169 eae gene-positive isolates from different regions of China were collected from 2014 to 2019, and sequenced in this study. The isolates were from five provinces in China, the majority of which were from Sichuan province in Southern China and Shandong province in Northern China (Table S1). The Chinese isolates belonged to seven different source types, with 90.5% from poultry intestine (with 110 isolates from chicken intestines and 43 from duck intestines). There were six human source isolates from China (Table S2). Three isolates were from patients with diarrhoea, including one patient with bloody diarrhoea. Three isolates were from poultry butchers and retailers who were asymptomatic. Two isolates were from the faecal samples of bats in Yunnan, China. Notably, as only eae positive and lactose nonfermenting samples were cultured for in this study, any eae negative or lactose fermenting isolates would not have been isolated. To compare the genomic characteristics of globally, a total of 312 publicly available genome sequences were included in this study. Based on the metadata available, these isolates were from six continents and 12 different source types including humans, birds, bovine, swine, cats, water mammals, camelid, plants, soil and water. Humans (76 isolates) and birds (30 isolates) were the dominant sources (Table S3).

lineages and their distribution in different geographic regions and source types

Previous studies showed that is divided into two clades [1, 21]. To better define the phylogenetic lineages, we used Fastbaps to analyse the population divisions of the 482 isolates using alignment of non-recombinant SNPs (with recombinant SNPs removed) as input. Eight lineages of were defined containing 353 of the 482 isolates while 129 did not belong to any lineage (Fig. 1) [29]. Lineage 1 (L1) corresponds to previously defined clade 1, and L2 to L7 belonged to the previously defined clade 2 [1, 21]. It is noteworthy that the isolates which were previously identified as serotype 13 belonged to L3. Each lineage includes isolates from multiple continents. L5 and L8 were more common in Asia, while L1 (or clade 1), L3 and L6 were more common in Europe and North America (Fig. S1).

Fig. 1.

Phylogenetic structure of E. albertii. The phylogenetic tree of the 482 isolates was constructed using Quicktree with bootstrap replicates of 1000 [28]. The colour of the branches represented the percentage of bootstrap supporting from 10–100% (from red to green). The innermost ring marks the isolates from human source. The next ring marks the lineages by colour as shown in the colour legend. The outer four rings represented the cdtB subtypes and the stx 2f gene, which were represented with different colours as shown in the colour legend. The 85 human isolates were distributed among the eight lineages indicating all of these lineages were potentially pathogenic to humans (Fig. 1). For Chinese isolates, the six human source isolates belonged to L4 (2), L7 (1), and L8 (1), with two not falling into any lineages (Table S2). The two bat source isolates did not belong to any of the lineages but were most related to L3. There were 158 poultry source isolates from China, 55.7% (88/158) of which belonged to L8 followed by L5 (22.8%, 36/158) (Table S3), and there were two isolates of L8 from wild birds. By contrast, the majority of the bird source isolates from other countries came from wild birds, 51.6% (16/31) of which did not belong to any of the eight lineages while 32.3% (10/31) were from L1. These findings demonstrated that the bird source isolates from the other countries were phylogenetically different from the wild birds and poultry source isolates in China.

In silico MLST of isolates

We performed in silico MLST on the isolates using the established scheme [31]. The 482 isolates were subtyped into 98 STs, among which 53 STs contained >=2 isolates. By lineage, with the exception of L1 and L8, each lineage was dominated by one ST. ST4633 accounted for 84.0% of the total number of isolates in L2, ST5431 for 76.0% of L3, ST4619 for 60.0% of L4, ST4638 for 81.3% of L5, ST5390 for 100% of L6 and ST3762 for 82.1% of L7. And 94.6% of L8 belonged to four STs (ST4488, ST4634, ST4479 and ST4606). We further grouped closely related STs as CC using one allele difference [45]. Nearly half of the STs (43 of 98) were grouped into nine CCs while the remaining 55 STs were singletons (Fig. 2a). With the exception of L4 and L6 which only contained STs, the other lineages were dominated by one CC. CC1 represented 68.1% of the L1 isolates. CC2 to CC6 were representative of more than 90% of the isolates in L2, L3, L5, L7 and L8, respectively. The majority of the singletons (42 of 55) belonged to none of the eight lineages and were classified as other in the lineage division above.

Fig. 2.

Region distribution and resistance profiles of clonal complex (CC) and sequence type (ST) of isolates based on the seven-gene multi-locus sequence typing (MLST). (a) Region distribution of STs and CCs. (b) Drug resistance profiles of STs and CCs. Each circle represented an ST and the size of the circles reflected the number of isolations. STs and CCs belonging to different lineages were separated. STs with one allele difference were linked with solid lines as one CC. Singleton STs were shown for each lineage. While for the 42 singleton STs belonging to none of the eight lineages, only 12 STs with AR genes were shown. The top seven countries with five or more isolates were highlighted in different colours as shown in the colour legend. The predicted antibiotic resistance of different STs is denoted by different colours of different levels of resistance (by the number of predicted drug classes as indicated) as shown in the colour legend. The numbers in the square brackets were the number of isolates. The pie chart within an ST denotes different proportions of isolates displaying a particular characteristic. Thirty-three STs were found in more than one country while 57 STs were only found in one country. The six largest CCs were found in more than one country. However, individual STs or CCs were predominant in different countries or regions. ST5390 was the most common ST in both USA and UK, and ST5431 was the second most common ST in the UK. In China, ST4479, ST4638 and ST4606 were the main STs, representing 54.7 % of the Chinese isolates. CC1 and CC3 were predominant in the USA and UK while CC2, CC4, and CC6 were predominantly found in China.

specific gene markers

The reported specific gene of EAKF1_ch4033 [24] was missing in four isolates (three ST5268 and one ST10002), all of which belonged to L1. Additionally, EAKF1_ch4033 was found in 33 genomes of ST378 and some assemblies in NCBI, although the estimated specificity is still 99.9% (Table 1). Therefore, we searched the genomes as described in the methods and found three candidate specific genes (EAKF1_ch3804, EAKF1_ch4075c and EAKF1_ch0408c), which were positive in all of the genomes and negative in all 30 021 and genomes (Table 1). We further searched the NCBI web database using blastn, EAKF1_ch3804 was present in one complete genome of ST9286 from guinea fowl [46] while the other two markers were only found in . Therefore, EAKF1_ch4075c and EAKF1_ch0408c have 100% sensitivity and specificity, while EAKF1_ch3804 has 100% sensitivity and ~99.99% specificity. The DNA sequences of the three specific genes are deposited in Figshare.

Table 1.

Evaluation of specific gene markers

Locus tag of reference strain KF1	Location (Strand)	Length	Sensitivity* (No. of false negatives)	Specificity† (No. of false positives)	NCBI blastn searches‡	Ref
EAKF1_ch4033	4 243 592…4 243 984 (-)	393	99.2% (4)	99.9% (33)	Some hits to E. coli	[24]
EAKF1_ch3804	3 999 536…3 999 946 (+)	411	100% (0)	100% (0)	One E. coli ‡
EAKF1_ch4075c	4 276 067…4 276 220 (-)	154	100% (0)	100% (0)	Specific
EAKF1_ch0408c	429 916…430 443 (-)	528	100 % (0)	100 % (0)	Specific

*Sensitivity=1-no. of false-negative genomes/total no. of E. albertii genomes.

†Specificity=1-no. of false-positive genomes/total no. of non-E. albertii genomes.

‡One complete genome of E. coli (Accession No. CP053258.1) from guinea fowl of ST9286.

Evaluation of specific gene markers Locus tag of reference strain KF1 Location (Strand) Length Sensitivity* (No. of false negatives) Specificity† (No. of false positives) NCBI blastn searches‡ Ref EAKF1_ch4033 4 243 592…4 243 984 (-) 393 99.2% (4) 99.9% (33) Some hits to [24] EAKF1_ch3804 3 999 536…3 999 946 (+) 411 100% (0) 100% (0) One ‡ EAKF1_ch4075c 4 276 067…4 276 220 (-) 154 100% (0) 100% (0) Specific EAKF1_ch0408c 429 916…430 443 (-) 528 100 % (0) 100 % (0) Specific *Sensitivity=1-no. of false-negative genomes/total no. of E. albertii genomes. †Specificity=1-no. of false-positive genomes/total no. of non-E. albertii genomes. ‡One complete genome of E. coli (Accession No. CP053258.1) from guinea fowl of ST9286.

Virulence genes and their distribution in lineages

Virulence genes from _VF database were screened to evaluate the potential pathogenicity of . The LEE island from LEE1 to LEE7 contains 41 genes [47]. The 41 genes were present in slightly different proportions ranging from 91.1–99.8%, with the espF gene the lowest presenting in 439 of the 482 isolates (Table S4). The eae gene on LEE5 was harboured by 99.4% (479/482) of the isolates. Thirteen previously defined eae subtypes were observed in 387 (80.3%) of the 482 isolates, and seven new eae subtypes were identified (which were observed in >=5 isolates each) among the remaining 92 isolates (Fig. S2). Subtype sigma was the dominant type (37.9%), followed by rho (10.4%), itota2 (6.6%) and epsilon3 (6.2%) (Fig. S2). The eae subtypes were associated with specific lineages: epsilon3, iota2 and rho were the predominant subtypes in L2, L3 and L5, respectively, and subtype sigma was dominant in L6, L7 and L8. However, L1, L4, L5 and L7 harboured multiple eae subtypes. L1 (or clade 1), possessed eight eae subtypes, with beta3, alpha8 and the newly defined sigma2 and alpha9 as the main subtypes (Fig. S2). Cdt facilitates bacterial survival and enhances pathogenicity [48] and is encoded by the cdtABC genes which were widely distributed in [1, 49]. In this study, cdtABC genes were present in 99.4 % (479/482) of the isolates. The cdtB gene had been previously divided into five subtypes (cdtB-I to cdtB-V), with cdtB-II/III/V as one group, and cdtB-I/IV as another group [50]. By phylogenetic analysis of the cdtB genes in , a new cdtB subtype was identified and named as cdtB-VI. CdtB-VI was phylogenetically closer to cdtB group II/III/V (Fig. S3). Notably, almost all cdtB-VI positive isolates (30.1%, 145/482) were located on the same branch that includes L3, L4 and L5 isolates (Fig. 1). CdtB-II, as the dominant type, was present in 68.3% (329/482) of isolates across five lineages (L1, L2, L6, L7 and L8). CdtB-I was found in 65 (13.5%) isolates, 89.2% (58/65) of which were also positive for either cdtB-II or VI. There were 49 isolates positive for stx 2f (10.2%, 49/482), 44 of which possessed cdtB-I (Fig. 1). isolates with cdtB-I were significantly more likely to harbour stx 2f gene (Chi-Square test, P<0.001). Both cdtB-I and stx 2f were observed on the same intact prophage of two complete genomes (ASM331252v2_PF4 and ASM386038v1_PF5). None of the Chinese isolates were positive for stx 2f. ETT2, which plays a role in motility and serum resistance in [15], was found to be nearly intact in 61.4 % (296/482) of the isolates, except for the ygeF gene which was absent in all isolates [13]. Eighty-eight isolates (18.3%) harboured 29 to 31 ETT2 genes with two to four genes missing. Interestingly, ETT2 genes were mostly deleted in L3 and L6 with only four and three genes remaining, respectively (Fig. 3). Other virulence genes were also lineage-restricted, such as the type VI secretion system (T6SS) aec genes, which were present in most of the lineages except for L1, L3 and L5. The haemolysin genes hlyABCD were present only in L3 isolates (Fig. 3). The iuc gene cluster (iuc-ABCD and iutA) which encodes the siderophore aerobactin and the aerobactin receptor [51] was mainly present in L3 and L4 and one isolate of L6. The high pathogenicity island (HPI), which encodes the yersiniabactin (Ybt) [52], was only found in L6 isolates (100%). The lng gene cluster that encodes the CS21 pilus (class b type IV) [53-55] was mainly observed in L5.

Fig. 3.

Virulence genes that were significantly associated with different lineages of E. albertii. The distribution of different virulence genes in was visualized using Phandango [43]. The lineages of were labelled with different colours. The presence of a gene was marked with a coloured box. Only genes or gene clusters significantly associated with lineages were shown. There were other virulence genes including paa, efa1 and the bundle forming pilus encoding bfp genes that were found to be variably present in , which are summarised in Table S4. One genome assembly (ERR1953722) from L5 was found to harbour invasive plasmid pINV genes [56]. However, further investigation by reads mapping found that it was most likely due to contamination (data not shown).

Drug resistance genes and their high prevalence in some STs of

Presence of AR genes was screened using the NCBI AMRFinder database [16]. Among the 482 isolates, 52.3% (252/482) harboured AR genes, 41.9% (202/482) were MDR (harbouring AR genes conferring resistance to >=3 different drug classes), and 13.1% (63/482) harboured genes conferring resistance to 10 to 14 different drug classes that were regarded as highly resistant. Notably, 72.3% (146/202) of the predicted MDR isolates were from China with an AR rate of 88.2% and an MDR rate of 85.9%, with 61 isolates (35.9%) being predicted to be highly resistant. The predicted AR drug classes were shown in Fig. 4, including sulfamethoxazole-trimethoprim, cephalosporin, streptomycin, beta-lactam antibiotics, etc. The AR genes observed in each isolate were shown in Table S5. We determined resistance profiles by STs and found that some STs contained a high proportion of MDR isolates. The predicted MDR rates in ST4638, ST4479, ST4633 and ST4488 were >=80% (Fig. 2b). Additionally, 63.2% of the isolates in ST4606 were highly resistant. For the top six STs in China representing 84.7% (144/170) of the Chinese isolates, 93.8% (135/144) of the isolates were predicted to be MDR, and 41.7% (60/144) were highly resistant. In contrast, isolates from the USA and UK had relatively lower predicted MDR rate (26.2%, 39/149) and were mainly observed in ST5390, ST4619 and ST4638, with only one highly resistant isolate (Fig. 2). By CCs, CC2, CC4 and CC6 had high MDR rates. CC1 carried hardly any resistance genes while CC3 and CC5 had low levels of carriage of resistance genes.

Fig. 4.

Predicted resistance to drug classes in . isolates that harboured genes conferring resistance to different drug classes are shown in purple. The two columns headed with one and two denote the combination of two drugs as follows: 1 = chloramphenicol and florfenicol, 2 = phenicol and quinolone. Isolates with predicted plasmids by PlasmidFinder and MOB-suite (respectively) were also highlighted.

Plasmids and plasmid associated drug resistance and virulence genes

We firstly analysed the 17 complete genomes for the carriage of plasmids. There were 34 intact plasmids ranging from 19 118 bp to 266 043 bp (Table S6). Nineteen plasmids were previously reported [1, 17, 20], while 15 plasmids were newly identified in this study. We further performed plasmid typing using PlasmidFinder and MOB-suite [32, 40]. PlasmidFinder identifies plasmids by replicon types [32]. However, it should be noted that a plasmid may carry more than one replicon type. MOB-suite predicts plasmids using the relaxase gene and groups those predicted plasmids into different MOB types [40]. However, some plasmids have no relaxase gene. Thus, both methods were used to predict and identify plasmids in all isolates. Among the 482 isolates, PlasmidFinder found that 86.7% (418/482) of the isolates harboured plasmids, with a total of 54 replicon types detected. There were 34 replicon types that each was present in more than ten isolates. And 26 replicon types were found to be significantly associated with lineages (P<0.001) (Table S7): for example, IncFII(29)_pUTI89 type with L2, Col156 with L3, and IncFII (pSE11) with L4, IncX1, IncX9, IncHI2, IncHI2A and RepA with L5 and L8. By MOB-suite, a total of 1854 plasmid sequences were predicted in 427 of the 482 isolates with an average of 4.3 plasmids per genome while 55 isolates had no plasmids predicted. The vast majority (90.3%, 1674/1854) of the predicted plasmids were grouped into 170 MOB types with the remaining 9.7% (180/1854) being novel with no MOB types. There were 47 MOB types each of which was present in >=10 isolates, 36 of which were significantly associated with different lineages, which is concordant with findings from replicon types (Table S7). Additionally, there were 64 isolates with neither replicon types nor MOB types observed, including 77.3% (17/22) of L6 isolates (Fig. 3). However, 35.9% (23/64) of these isolates harboured AR genes, especially 72.3% of L6 were predicted to be MDR. Plasmids are known to be responsible for the acquisition of MDR genes. Among the 34 intact plasmids, nine were found to harbour AR genes (Table S6). One newly identified MDR plasmid, ESA136_plas1 (Accession No. CP070297.2) which is of MOB type AA738 and replicon types IncHI2, IncHI2A and RepA, contained 15 AR genes resistant against 13 drug classes. Since plasmids were not fully assembled in the draft genomes, we used the statistical association to determine which plasmid types were likely to carry the MDR genes. Note that this analysis was not aimed to determine whether these plasmid types were more likely to be associated with MDR in general. By PlasmidFinder, 13 replicon types were found to be significantly associated with MDR genes (P<0.001, Chi-square test) (Fig. S4). However, this analysis may be biassed when the MDR genes were not located on the same plasmid as the replicon genes. This bias can be resolved by MOB-suite, which offers the predicted plasmid sequences from the draft genomes. We screened the plasmid replicon genes and MDR genes on the MOB-suite predicted plasmids. Ten replicon types were confirmed to be significantly more likely to be observed in MDR isolates (P<0.001) including IncQ, IncN, ColE10, IncHI2A, IncHI2, RepA, IncFII(pSE11), IncX9, IncFII(pHN7A8), and IncX1. The predicted odds ratio (OR) values ranged from 6.1 to infinity (Fig. 5a). Further, each MOB type possessed one to eight plasmid replicon genes, indicating MOB typing is of higher resolution than replicon typing (Table S8). Five MOB types AE928, AA860, AA738, AA334 and AA327 were significantly associated with MDR genes (P<0.001, OR 15.0 to infinity) (Fig. 5b). Moreover, the MDR associated replicon types and MOB types were mainly observed in L4, L5 and L8, which had a high proportion of MDR isolates, which was consistent with the inference that the MDR genes were carried by these plasmid types.

Fig. 5.

Multidrug resistance (MDR) associated plasmid subtypes. (a) Replicon types detected. (b) MOB types detected. Those types significantly associated with MDR are marked with *** (P-value<0.001). The proportion of drug resistance (%) for each replicon or MOB type was shown as a colour legend. Lastly, we also assessed whether any virulence genes were carried by plasmids. Among the 34 intact plasmids, 27 harboured virulence genes. Two plasmids from bat source isolates harboured the Type II secretion system and the putative heat-stable enterotoxin gene astA [57] (Table S6). Moreover, some lineage-restricted virulence genes were observed in the MOB-suite predicted plasmids, including the LngA-lngX gene cluster, the iucA-iucD gene cluster, and the hlyABCD gene cluster.

Prophages and carriage of resistance and virulence genes

PHASTER was used to search for prophages in the 17 complete genomes [41]. A total of 207 prophages were identified: 130 were intact, 50 were incomplete and 27 were indeterminant (Table S9). The size of the intact prophage genomes ranged from 11.163 to 98.311 Kb. Most of the intact prophages were integrated on the chromosomes with 11 (8.5%) being on plasmids. We grouped the 130 intact prophages based on a tree generated using the presence/absence of prophage genes using Roary v3.11.2 [35], and a nucleotide dotplot generated using Gepard v1.3 [42]. Gepard was a useful method for grouping diverse prophages [58]. As seen in Fig. 6, the darker the colour in the dot plots, the more similar the sequences were. There were five main squares with dense dots corresponding to five main groups of prophages (G1-G5). G5 was more diverse and potentially can be further subdivided into subgroups. Of prophages in G1 and G2, 50% (4/8) and 85.7% (6/7) (respectively) were from the two bat source isolates.

Fig. 6.

Clustering of the intact prophages of E. albertii. (a) Accessory binary gene presence tree of the prophages constructed using Roary v3.11.2 [35]. The five main groups of prophages were labelled with different strip colours. There were 15 prophages of with phage cluster types in the Microbe Versus Phage (MVP) database, the 15 MVP phage cluster types were labelled. (b) Dot plot of similarity of prophages using the nucleotide dot plot tool Gepard [42] and the five prophage groups were marked. Based on the annotation of the 130 intact prophages, genes that were present only in one prophage group were identified using Scoary [36], and were designated as group-specific gene markers for each of the prophage groups. By screening the group-specific genes among the draft genomes, G1 was predicted to be present in 34.4% (166/482) of the isolates, with at least two specific genes of G1 identified in these genomes. G2 was predicted to be in 3.7%, G3 in 46.7%, G4 in 59.1% and G5 in 96.1% of the 482 isolates (Fig. S5). In terms of lineage distribution, G3 prophage specific genes were more likely to be observed in L5 and L8, and G4 prophages in L3, L4 and L8 (P<0.001, OR value >3.9). G1 prophage specific genes were negatively associated with L3 and L6, G3 prophages with L2, L3, L6 and L7, and G4 prophages with L2, L5, L6 and L7 (P<0.001, OR value <0). There were 27 T3SS non-LEE effector genes present in 59 of the 130 intact prophages, 64.7% of which were in G5 prophages (Table S9). Two intact G5 prophages were positive for both stx 2f and cdtABC genes. Additionally, there were three intact prophages harbouring AR genes and all three prophages were located on plasmids. The MVP database collected viral genomes and prophage sequences from bacterial and archaeal genomes [44]. Those virus and prophage genomes were clustered based on their sequence similarity, with unified cluster types assigned [44]. By nucleotide comparison with the MVP representative phage clusters database using blastn, only 13.1% (17/130) of the intact prophage sequences were previously recorded in the MVP database, belonging to 15 phage cluster types (Fig. 6a), indicating high diversity of prophages in which have not been recorded in the database. Interspecies transmissions of prophages were observed: among the 15 MVP phage clusters, 11 prophages were previously observed in ; cluster 12645 was previously observed in both and ; and cluster 17047 from , while five phage clusters were only observed in E. albertii. In the five groups of prophages, MVP phage clusters were observed in G1, G3, G4 and G5, indicating G2 is a new prophage group specific for E. albertii.

Discussion

is a recently defined species of , with infections previously wrongly attributed to and owing to the lack of sufficient subtyping techniques [1, 2, 21]. The eae gene and the cdtB gene have since been used for identification [10, 24, 59]. However, both genes were not present in all isolates or unique to . In this study, only eae positive samples were cultured for , which would have missed any potential eae negative isolates. However, this bias is also reflected in the global collection of isolates. Among the 312 publicly available genomes, only three (1.0%) were eae negative. as a pathogen is defined by its attaching and effacing pathogenicity [7, 9], and thus it is unsurprising that nearly all isolates carried the virulence determinant eae. Since there were no markers to identify eae negative , it is possible that the population diversity is much larger, and the studies up to date including this study only assessed the population structure of the attaching and effacing . Isolation and identification of have been hampered by the difficulties of differentiating from other species biochemically. It is now possible to use genetic markers to identify . Lindsey et al. reported that the gene EAKF1_ch4033 is specific to [24]. However, our analysis found that four of the 482 isolates (0.83%) were negative for the marker as false negatives and some genomes (e.g. ST378) harboured fragments of this gene and thus were potential false positives. We found three specific gene markers that were present in all genomes analysed in this study and absent in all 30 021 representative and genomes screened. By blastn in the NCBI web database, all markers are specific to , except for EAKF1_ch3804 being found in one complete genome (ST9286). Therefore, there are now four markers available for the identification of with two markers (EAKF1_ch4075c and EAKF1_ch0408c) offering 100% sensitivity and specificity, which will facilitate future studies of the population diversity of .

is phylogenetically diverse

A previous study defined two clades of [21], which is supported by this study. Further, we defined eight lineages. The previously defined clade 1 corresponds to L1, and clade 2 was further divided into seven lineages (L2 to L8). Isolates causing human infection were observed in all eight lineages, indicating that all lineages are potentially pathogenic to humans. Ooka et al. defined five groups (G1 to G5) based on 34 genomes, of which 32 genomes were included in this study [13]. All their G1 isolates belonged to L1 or clade 1, and all G2 to G5 isolates to clade 2 (Table S1). G2 isolates were divided into L7 and L8 while all G4 isolates belonged to L2. The majority of the G3 isolates belonged to none of the lineages, except for two isolates to L4 and one isolate to L5. There is only one isolate in G5, which belonged to none of the lineages in clade 2. Our lineages have also been mapped to STs and CCs by the seven-gene MLST [31], for example, ST4638 and ST5390 belonged to L5 and L6, respectively. STs or CCs can be used as hallmarks for different lineages of when genomic information is not available, which would facilitate comparison between different studies and surveillance of global spread and MDR by MLST. Although the isolates sequenced may not be representative, lineages were of significantly different proportions in different geographic regions: L5 (represented by ST4638) and L8 (represented by four STs) were more common in China, and L3 and L6 were only observed in Europe and North America. Hyma et al. found that some serotype 13 isolates and one serotype 7 isolate, K-1, belonged to [60]. In this study, there were 20 isolates with 13 serotype according to ShigEiFinder [33]. All belonged to L3, with one to L1. The genome of MLST sequences of the serotype 7 isolate K-1 is unavailable. However, based on its eae (Accession No. AY696839) and cdtB (Accession No. AY696753) gene sequences, the eae was subtype tau and the cdtB type was cdtB-VI. The combination of eae subtype tau and cdtB-VI was found in eight isolates belonging to a branch phylogenetically closer to L3, but in none of the seven lineages in clade 2. Thus, serotype 7 isolate K-1 does not belong to any lineages but may be phylogenetically closer to L3. Overall, our study showed that the diversity of is high and new lineages are likely to be identified with more isolates sequenced.

Virulence gene variation in different lineages of

The T3SS and the Cdt are the main virulence factors present in the vast majority of the isolates. However, the subtypes of eae and cdtB were phylogenetically diverse. The eae gene was more diverse than the cdtB gene, and different lineages were dominated by different eae subtypes. Thus, it is likely that multiple independent acquisitions of the eae subtypes have occurred in E. albertii. There were seven new eae subtypes identified, and these eae subtypes were phylogenetically distant from each other, indicating potential independent acquisition. It is also possible that these new eae subtypes evolved within E. albertii. For the cdtB gene, cdtB-II was dominant and present in all lineages except L3, L4 and L5 whereas the newly defined cdtB-VI was found in L3, L4 and L5. Given the phylogenetic relationship of the lineages, cdtB-VI must have replaced cdtB-II in L3-L5. However, it is unclear if the cdtB-VI evolved within or was acquired from other species. Moreover, some subtypes of eae and cdtB were prevalent in but were rare in and vice versa. For example, cdtB-III and cdtB-V were common in Shiga toxin-producing , but were not observed in [49, 61]; the prevalent eae subtypes were not common in [62]; and the eae iota2 was observed in serovar 13 isolates, which are in fact [60]. The eae and cdt genes seemed to have been acquired by multiple times during its long evolutionary history. More studies are required to elucidate the intra- and inter-species transfer of eae and cdt genes in the genus Escherichia. Some virulence genes and pathogenicity islands were found to be associated with certain lineages. ETT2, which contributes to motility and serum resistance (which is essential for invasive infections) in [15], was truncated in L3 and L6, while in the other lineages only the yqeF gene of ETT2 was absent. Experimental evaluation is required to determine whether ETT2 is functional without the yqeF gene in E. albertii. Yersinia HPI encodes the siderophore yersiniabactin (Ybt) for iron scavenging, which causes oxidative stress in host cells and contributes to invasive extra-intestinal infections [52]. HPI comprises 11 genes, all of which were only observed in L6 isolates of E. albertii. Moreover, the iuc gene cluster including iucABCD encoding siderophore aerobactin and iutA encoding ferric aerobactin receptor for iron acquisition [51, 52] was mainly present in L3, L4 and one isolate of L6. More studies are required to evaluate the pathogenicity of those lineages that were equipped with different iron uptake systems. There were other lineage-restricted virulence genes like T6SS, hlyABCD and the lng gene cluster. Although their expression remains unknown, these lineage-restricted virulence factors may result in variation of the pathogenicity and environmental survival in different lineages [15, 55, 63]. Plasmid-mediated acquisition of virulence genes was observed in E. albertii. The lineage-restricted hlyABCD genes, the iuc gene cluster and the lng gene cluster were observed in MOB-suite predicted plasmids, indicating plasmid-mediated acquisition, which was supported by previous studies in [51, 55, 63]. The two isolates from bats harboured a plasmid with T2SS genes and the metalloprotease encoding stcE gene. T2SS genes are critical for the survival and pathogenicity of bacteria [64]. And stcE gene, which is located on pO157 plasmid, contributes to the intimate adherence of EHEC and atypical 13 [65, 66]. Like plasmids, prophages were also found to have contributed to the acquisition of virulence genes in . The non-LEE effector genes of the T3SS were observed in intact prophages, which were found to be significantly associated with G5 prophages defined in this study. A previous report that lambdoid prophages carried various T3SS secretion effectors supports this finding [14]. Altogether, plasmids and prophages play key roles in the transfer of virulence genes in and may facilitate large changes in pathogenicity like those seen in the pathovars of [19].

Plasmid-mediated AR genes were associated with STs and geographic regions

The predicted MDR rate in Chinese isolates is astonishingly high (85.9%, 146/170), with 35.9% highly resistant isolates. These results are supported by previous phenotypic results, which found isolates resistant to up to 14 clinically relevant drugs and 11 drug classes [17]. Importantly, resistance was observed in clinically relevant drug classes including sulfamethoxazole-trimethoprim, cephalosporin, streptomycin and beta-lactam antibiotics [67]. However, it should be noted that the MDR isolates were mainly obtained from poultry in China, and more clinical isolates are required to evaluate their clinical significance. It is likely that poultry source MDR passes down the food production chain to humans, posing a threat to human health. There is an urgent need for surveillance and control of the spread of MDR and using MLST, we identified some STs that were associated with MDR in China. ST4638, ST4479, ST4633 and ST4488 carried proportionally more MDR isolates and were mainly from China, which should facilitate the surveillance of the MDR. MDR in North America and Europe is emerging and the MDR associated STs from these continents were different from those of China. This may be due to the different control strategies on the use of antibiotics in different countries. In this study, we identified plasmid types that were significantly associated with MDR using both PlasmidFinder and MOB-suite, suggesting that the drug resistance genes were carried by plasmids. As the genomes were mostly draft genomes with incomplete plasmid sequences, further studies are required to understand the structure of these plasmids and the carriage of the resistance genes. Moreover, most of the L6 isolates harboured AR/MDR genes without predicted plasmids observed, which indicates potential new plasmids or prophages, or other means of MDR acquisition in L6.

Conclusion

In this study, the population structure of was elucidated based on 170 genomes from China and 382 genomes from other countries. There were eight lineages identified, seven of which (L2-L8) belonged to previously defined clade 2. Isolates causing human infection were found in all lineages suggesting that most has some pathogenicity. However, the uneven distribution of many virulence factors suggests that the degree of pathogenicity may differ across the lineages. The predicted MDR rate and MDR gene profiles varied between regions, STs and CCs, with Chinese isolates and STs being predominantly MDR. Plasmid replicon and MOB types that were related to the acquisition of MDR genes were identified in Chinese isolates. contained a large number of prophages which were divided into five groups, with G5 prophages found to have contributed to the acquisition of the T3SS non-LEE effector genes. Therefore, prophages and plasmids played key roles in creating the virulence and MDR repertoires of E. albertii. Our findings provided fundamental insights into the population structure, virulence variation and MDR of . Moreover, three new specific gene markers were identified to facilitate the identification of this emerging foodborne pathogen. Click here for additional data file. Click here for additional data file.

66 in total

Review 1. The cytolethal distending toxin family.

Authors: C L Pickett; C A Whitehouse
Journal: Trends Microbiol Date: 1999-07 Impact factor: 17.079

Review 2. Longus pilus of enterotoxigenic Escherichia coli and its relatedness to other type-4 pili--a minireview.

Authors: J A Girón; O G Gómez-Duarte; K G Jarvis; J B Kaper
Journal: Gene Date: 1997-06-11 Impact factor: 3.688

3. Isolation of Escherichia albertii from Raw Chicken Liver in Fukuoka City, Japan.

Authors: Nanami Asoshima; Masanori Matsuda; Kumiko Shigemura; Mikiko Honda; Hidehiro Yoshida; Takahiro Oda; Hiroshi Hiwaki
Journal: Jpn J Infect Dis Date: 2015 Impact factor: 1.362

4. Association of cytolethal distending toxin-II gene-positive Escherichia coli with Escherichia albertii, an emerging enteropathogen.

Authors: Atsushi Hinenoya; Noritomo Yasuda; Natsuko Mukaizawa; Sikander Sheikh; Yuko Niwa; Sharda Prasad Awasthi; Masahiro Asakura; Teizo Tsukamoto; Akira Nagita; M John Albert; Shinji Yamasaki
Journal: Int J Med Microbiol Date: 2017-09-20 Impact factor: 3.473

5. Escherichia albertii in wild and domestic birds.

Authors: J Lindsay Oaks; Thomas E Besser; Seth T Walk; David M Gordon; Kimberlee B Beckmen; Kathy A Burek; Gary J Haldorson; Dan S Bradway; Lindsey Ouellette; Fred R Rurangirwa; Margaret A Davis; Greg Dobbin; Thomas S Whittam
Journal: Emerg Infect Dis Date: 2010-04 Impact factor: 6.883

Review 6. Type Three Secretion System in Attaching and Effacing Pathogens.

Authors: Meztlli O Gaytán; Verónica I Martínez-Santos; Eduardo Soto; Bertha González-Pedrajo
Journal: Front Cell Infect Microbiol Date: 2016-10-21 Impact factor: 5.293

7. MVP: a microbe-phage interaction database.

Authors: Na L Gao; Chengwei Zhang; Zhanbing Zhang; Songnian Hu; Martin J Lercher; Xing-Ming Zhao; Peer Bork; Zhi Liu; Wei-Hua Chen
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

8. The Escherichia coli Type III Secretion System 2 Has a Global Effect on Cell Surface.

Authors: Alexander Shulman; Yael Yair; Dvora Biran; Thomas Sura; Andreas Otto; Uri Gophna; Dörte Becher; Michael Hecker; Eliora Z Ron
Journal: mBio Date: 2018-07-03 Impact factor: 7.867

9. An extensive repertoire of type III secretion effectors in Escherichia coli O157 and the role of lambdoid phages in their dissemination.

Authors: Toru Tobe; Scott A Beatson; Hisaaki Taniguchi; Hiroyuki Abe; Christopher M Bailey; Amanda Fivian; Rasha Younis; Sophie Matthews; Olivier Marches; Gad Frankel; Tetsuya Hayashi; Mark J Pallen
Journal: Proc Natl Acad Sci U S A Date: 2006-09-21 Impact factor: 11.205

10. VFDB 2016: hierarchical and refined dataset for big data analysis--10 years on.

Authors: Lihong Chen; Dandan Zheng; Bo Liu; Jian Yang; Qi Jin
Journal: Nucleic Acids Res Date: 2015-11-17 Impact factor: 16.971

1 in total

Review 1. Microbiology and Epidemiology of Escherichia albertii-An Emerging Elusive Foodborne Pathogen.

Authors: Francis Muchaamba; Karen Barmettler; Andrea Treier; Kurt Houf; Roger Stephan
Journal: Microorganisms Date: 2022-04-22

1 in total