Literature DB >> 31447566

Whole-genome sequence analysis of multidrug-resistant uropathogenic strains of Escherichia coli from Mexico.

G L Paniagua-Contreras¹, E Monroy-Pérez¹, C E Díaz-Velásquez², A Uribe-García¹, A Labastida³, F Peñaloza-Figueroa³, P Domínguez-Trejo⁴, L R García⁴, F Vaca-Paniagua^2,5,6, S Vaca¹.

Abstract

Background : Escherichia coli is the main bacterium associated with urinary tract infections (UTIs), including cystitis and pyelonephritis. Uropathogenic E. coli (UPEC) harbors numerous genes that encode diverse virulence factors contributing to its pathogenicity. The treatment of UTIs has become complicated due to the natural selection of E. coli strains that are multiresistant to several groups of antibiotics regularly used in clinical settings such as hospitals. Genomic reports of the global composition and distribution of the antibiotic resistance and virulence genes of these pathogenic strains are lacking in the Mexican population. Purpose and methods : The aim of this study was to globally characterize the genomes of a group of UPEC strains by massive parallel sequencing to determine the prevalence and distribution of virulence and antibiotic resistance genes associated with different serotypes and phylogenetic groups.
Results: The strains exhibited 138-197 virulence genes and 29 antibiotic resistance genes related to antibiotics that are commonly used in clinical practice. Conclusions: These findings are relevant to the definition of new strategies for treating urinary tract infections in public hospitals and private practice. To further define the epidemiological distribution and composition of these virulence and antibiotic resistance genes, larger studies are needed.

Entities: CellLine Chemical Disease Gene Species

Keywords: antibiotic resistance genes; virulence genes; whole-genome sequencing

Year: 2019 PMID： 31447566 PMCID： PMC6682767 DOI： 10.2147/IDR.S203661

Source DB: PubMed Journal: Infect Drug Resist ISSN： 1178-6973 Impact factor: 4.003

Introduction

Escherichia coli is the main bacteria associated with urinary tract infections (UTIs),1 including cystitis and pyelonephritis.2 Uropathogenic E. coli (UPEC) harbors numerous genes that encode diverse virulence factors contributing to its pathogenicity. These virulence factors include adhesins, toxins and capsule, serum resistance and iron uptake systems, among others.3 E. coli strains are identified serologically by their superficial antigens O (lipopolysaccharide), H (flagellar), and K (capsular).4 Phylogenetic analyses group E. coli into eight main phylogenetic groups, seven belonging to E. coli sensu stricto (A, B1, B2, C, D, E and F) and one belonging to the cryptic clade I.5 The treatment of UTIs has become complicated due to the selection of E. coli strains that are resistant to several groups of antibiotics regularly used in clinical settings such as hospitals.6 The size of the genome of E. coli strains differs by approximately 1 Mb, ranging from 4.5 to 5.5 Mb.7 For example, the genome size of E. coli K-12 MG1655 is 4.64 Mb,8 while strain EHEC O157:H7 Sakai has a genome of 5.50 Mb,9 and the genome size of UPEC CFT073 is 5.23 Mb.10 These genomic differences are due principally to the insertion or deletion of large chromosomal regions that encode pathogenicity-associated islands, whose deletion significantly decreases the virulence of UPEC strains in a mouse infection model.11,12 A few reports from Mexico have focused on the molecular analysis of UPEC strains;13–15 however, work addressing the genomics of these pathogens is lacking. Therefore, the purpose of this study was to globally characterize the genomes of a group of UPEC strains to determine the prevalence and distribution of virulence genes and resistance to antibiotics associated with serotypes and phylogenetic groups. Interestingly, we found between 138-197 virulence genes in each strain that drive pathogenicity and 29 antibiotic resistance genes overall. The serotypes and sequence types of the strains were mainly O25:H4-ST131 and O8:H9-ST423, belonging to the B2 and B1 phylogroups, respectively. This is the first genomic analysis of UPEC strains in Mexico and highlights the need to continue with the genotypic characterization of UPEC strains to understand genomic bacterial factors that are clinically relevant to urinary tract infections.

Materials and methods

Urinary tract E. coli strains

We used 24 UPEC strains isolated from 20 women and 4 men (age range 36–73 years) with community-acquired, nonrecurrent urinary tract infections who were attending a Family Medical Unit (UMA) belonging to the Mexican Institute of Social Security (IMSS) in the municipality of Tlalnepantla, Edo. de Mexico, from August to December 2013. The local ethics committee of the UMF approved the study, and the patients signed an informed consent letter to participate in the study. These strains were selected according to their high frequency of resistance to beta-lactam antibiotics (ampicillin, cephalotin and carbenicillin), fluoroquinolones (pefloxacin) and trimethoprim-sulfamethoxazole and their high frequency of virulence factors detected by PCR,16 such as fimH (type-1 fimbriae), iha (iron-regulated-gene homologue adhesin), usp (uropathogenic-specific protein), irp2 (iron-repressible protein) and kpsMT (K-antigen). Each strain was cultured in EMB (eosin methylene blue) and isolated in clonal colonies.

DNA extraction

Genomic bacterial DNA (gDNA) was extracted with the DNeasy Blood & Tissue Kit (Qiagen) following the manufacturer’s instructions. The DNA concentration was quantified by fluorometry with the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific, USA), and the integrity and purity of the material were verified by agarose gel electrophoresis and spectrophotometry, respectively.

Library preparation

The gDNA extracted from each UPEC clone was used for library preparation with the Nextera XT Kit (Illumina) following the manufacturer’s instructions. Briefly, 1 ng of DNA was fragmented and amplified with Nextera XT barcodes in limited-cycle PCR (12 cycles). The PCR product was purified with Agencourt AMPure XP magnetic beads (Beckman Coulter). The libraries were diluted to 4.0 nM and then pooled. Library quality was evaluated by DNA quantification with Qubit and by Bioanalyzer (Agilent) profiling with a High-Sensitivity DNA Kit. The pooled barcoded libraries were diluted to 12 pM.

Whole-genome sequencing

Whole-genome sequencing was performed at the Laboratorio Nacional en Salud, FES-Iztacala, UNAM, in a MiSeq (Illumina) instrument with pair-end reads (V3 2×300). E. coli CFT073 (GenBank: AE014075.1)10,17 was used as reference genome 1, and Escherichia coli EC958 O25b:H4 ST131 (GenBank: HG941718.1 and HG941719.1)17,18 was used as reference genome 2.

Quality filtering, alignment and de novo assembly

FastQ files were depurated by trimming Illumina adaptors, bases below Q20, uncalled bases, and end-sequences below Q20 with cutadapt 1.4 (https://cutadapt.readthedocs.org/). The R package Rqc v1.10.2 was used to measure the Q score and A, T, G, and C contents. The quality-filtered reads were aligned to the E. coli CFT073 reference with Bowtie v2.3.3.1 (Ref.19) and sorted according to chromosomal coordinates with Picard Tools v2.15.0 (Broad Institute). The mapping qualities of all reads were collected with the Rsamtools package v1.28.0.20 Overlapping reads were merged into single reads with FLASH2 v2.2.00.21 To avoid failing to detect any genes that were not present in the reference E. coli strains, de novo genome assembly (without any guiding reference genome) was performed with SPAdes v3.11.1 (Ref.22) in SE (single-end) mode, with multiple assembly k-mers (21, 33, 55, 77, 99, 107, 117, 127) and with the forward reads (for the read pairs with no overlap) and FLASH2 merged reads as input. To evaluate the quality of each of our genome assemblies, we used QUAST v4.5 to measure their total length and their N50 value.23 The obtained values were compared with the genome size of the E. coli model strains MG1655 and CFT073 and with the N50 values previously reported for Illumina paired-end libraries of E. coli MG1655 by the SPAdes developers (http://cab.spbu.ru/software/spades/). As an additional quality test, we aligned the reads of each of our 24 UPEC libraries against the corresponding genome assemblies with Bowtie 2. The percentage of reads aligned to the genome was used as a measure of the completeness of the assembly. The complete results for the genome assembly quality tests can be found in Supplementary materials.

Gene identification, pangenome and synteny analyses

The genome assemblies were annotated with RAST (online version Nov 2017)24,25 and Prokka v 1.12.26 For both tools, the default settings were used. The COG triangles and the OrthoMCL algorithms included in the GET_HOMOLOGUES27,28 software package were used to identify orthologous gene families among the genes annotated by Prokka and RAST in the 24 genome assemblies. GET_HOMOLOGUES was also used to measure the number of orthologous gene families shared by different numbers of the 24 strains and, thus, to define the size of the core and pangenomes of these isolates. Finally, we used GET_HOMOLOGUES to determine whether the pangenome was open or closed by measuring the increase in the pangenome size while the genes of the 24 strains were added to the gene set randomly, one strain at a time.29 To study the synteny among our 24 strains and the genomes of the UPEC model strains CFT073 and EC958 ST-131, the orthologue cluster data generated by GET_HOMOLOGUES were transformed into the i-ADHoRe software input.30 The i-ADHoRe tool was then used to identify syntenic blocks of at least 5 contiguous genes with a maximum discrepancy of 1 gene (only one gene lost or added) between each pair of strains.

Serotype, MLST analysis and phylogenetic group assignation

The genome assemblies were uploaded to SeroTypeFinder v1.1 (Ref.31), which identifies E. coli serotypes from draft or complete genome assemblies. This online tool depends on a database of alleles of the O-antigen-related genes wzx, wzy, wzm, and wzt and the H-antigen-related genes fliC, flkA, fllA, flmA, and flnA. The assemblies were also subjected to MultiLocus Sequence Typing (MLST) with the online tool MLST v1.8 (Ref.32), which determined the sequence type (ST) for each of the 24 assembled genomes. Each ST corresponds to a specific combination of alleles of the adk, fumC, gyrB, icd, mdh, purA and recA genes. To determine the location of the 24 UPEC isolates within the known E. coli phylogroups (A, D, E, B1 or B2), we carried out alignment of the sequences of adk, fumC, gyrB, icd, mdh, purA and recA from the 24 assembled genomes and 17 E. coli model strains whose phylogroup is known. We constructed a phylogenetic tree from a MAFT 7.313 (Ref.33) nucleotide alignment with the maximum likelihood method (). The phylogroups of the 24 isolates were determined from their location in the phylogeny.

Identification of acquired antimicrobial resistance genes

To identify antimicrobial resistance genes, the genome assemblies were uploaded to ResFinder v2.1.34 ResFinder identifies acquired antimicrobial resistance genes in totally or partially sequenced isolates of bacteria from a manually curated gene database. An identity threshold of 90% and a minimum coverage of 60% were used as filters for the ResFinder search. The search was performed for the 24 assembled genomes and two additional E. coli model strains: CFT073 and EC958 ST-131.

Identification of virulence genes

To identify the virulence genes of the 24 UPEC genomes, we first obtained the FASTA sequences of the genes included in the Virulence Factor Database (VFDB),35 consulted on December 7, 2107. This VFDB version included 30178 genes that encode 1796 different virulence (or virulence related) factors (VFs) in 30 different bacterial genera. We used Nucleotide-Nucleotide BLAST v2.7.1+ to align the complete set of VFDB genes with each of the 24 genome assemblies and the genomes of CFT073 and EC958 ST-131. Only the alignments that covered 100% of the VFDB query gene with a sequence identity ≥90% were retained. When several VFDB genes exhibited alignments to the same locus of an assembly, only the VFDB gene that successfully aligned with the greatest number of the 26 genomes was considered. Finally, the positions of the VFs found with the VFDB gene alignments were compared to the Prokka and RAST annotations to unify the feature names and identify those putative VF loci that were not detected by the genome annotators (). The 349 VFDB genes whose orthologues were identified in the 26 studied genomes were manually classified under several functional categories based on the gene descriptions annotated in the VFDB ().

Results

Resistance to antibiotics in UPEC strains

We first evaluated the antibiotic resistance phenotypes of the strains. In these assays, 100% of the E. coli strains analyzed (n=24) were resistant to cephalotin, ampicillin and carbenicillin (Table 1) and 79% (n=19) to pefloxacin, 75% (n=18) to cefotaxime, 71% (n=17) to trimethoprim-sulfamethoxazole, 54% (n=13) to gentamicin, 50% (n=12) to ceftriaxone, 37.5% (n=9) to netilmicin, 29% (n=7) to chloramphenicol, and 25% (n=6) to nitrofurantoin and amikacin.

Table 1

Serotypes (O and H antigens), sequence types (STs), phylogenetic group assignations and antibiotic resistance phenotypes. The strains are grouped by their specific combination of serotype and ST

Sample	Serotype H	Serotype O	Sequence type	Phylogroup	Antibiotic resistance phenotype (no. of strains)
CFn24	CROn12	AMn24	SXTn17	CTXn18	NETn9	PEFn19	NFn6	CLn7	AKn6	GEn13	CBn24
EC022	H4	O25	ST-131	B2	R	S	R	R	R	S	R	R	S	S	S	R
EC153	H4	O25	ST-131	B2	R	S	R	S	R	R	R	S	S	S	R	R
EC037	H4	O25	ST-131	B2	R	S	R	R	R	S	R	S	R	S	S	R
EC131	H4	O25	ST-131	B2	R	R	R	R	R	R	R	R	R	S	R	R
EC101	H4	O25	ST-131	B2	R	R	R	R	R	R	R	S	S	S	R	R
EC144	H4	O25	ST-131	B2	R	S	R	S	S	S	R	R	S	S	R	R
EC307	H4	O25	ST-131	B2	R	R	R	S	R	S	R	S	S	S	S	R
EC199	H4	O25	ST-131	B2	R	S	R	R	S	S	S	S	S	S	S	R
EC025	H4	O25	ST-131	B2	R	S	R	S	R	R	R	S	S	S	R	R
EC053	H5	O16	ST-131	B2	R	S	R	R	R	S	R	S	S	R	S	R
EC010	H5	O75	ST-1193	B2	R	S	R	R	R	S	S	R	S	S	S	R
EC102	H5	O75	ST-1193	B2	R	R	R	R	R	R	R	S	R	R	R	R
EC084	H5	O75	ST-14	B2	R	S	R	S	S	S	R	S	R	S	R	R
EC112	H9	O8	ST-423	B1	R	S	R	R	S	S	R	S	S	S	S	R
EC319	H9	O8	ST-423	B1	R	R	R	R	R	R	R	S	S	R	R	R
EC160	H9	O8	ST-423	B1	R	R	R	R	R	R	R	R	R	S	R	R
EC167	H9	O8	ST-423	B1	R	S	R	R	S	S	S	S	S	S	S	R
EC136	H9	O8	ST-423	B1	R	R	R	R	R	R	R	R	S	R	R	R
EC306	H11	O45	ST-297	B1	R	S	R	R	S	S	S	S	S	S	S	R
EC151	H48	O116	ST-3519	B1	R	R	R	S	R	S	S	S	S	S	S	R
EC013	H4	O116	ST-10	A	R	R	R	R	R	S	R	S	S	R	R	R
EC047	H2	NA	ST-69	D	R	R	R	R	R	S	R	S	R	S	S	R
EC067	H18	O17/O44	ST-69	D	R	R	R	R	R	R	R	S	R	R	R	R
EC011	H6	O1	ST-648	NA	R	R	R	S	R	S	R	S	S	S	R	R

Abbreviations: R, resistant; S, sensitive; Antibiotic: CF, cephalotin; CRO, ceftriaxone; AM, ampicillin; SXT, trimethoprim-sulfamethoxazole; CTX, cefotaxime; NET, netilmicin; PEF, pefloxacin; NF, nitrofurantoin; CL, chloramphenicol; AK, amikacin; GE, gentamycin, CB, carbenicillin.

Serotypes (O and H antigens), sequence types (STs), phylogenetic group assignations and antibiotic resistance phenotypes. The strains are grouped by their specific combination of serotype and ST Abbreviations: R, resistant; S, sensitive; Antibiotic: CF, cephalotin; CRO, ceftriaxone; AM, ampicillin; SXT, trimethoprim-sulfamethoxazole; CTX, cefotaxime; NET, netilmicin; PEF, pefloxacin; NF, nitrofurantoin; CL, chloramphenicol; AK, amikacin; GE, gentamycin, CB, carbenicillin.

Massive parallel sequencing metrics

To better cover the genomic diversity of UPEC, we performed sequencing considering their different patterns of virulence genes measured by PCR and a high frequency of antibiotic resistance to multiple agents, which is representative of the diversity of these infections.16 To measure the quality of the sequencing first, we evaluated the Phred score and the mapping parameters of the trimmed sequences. The per-base quality showed a typical Illumina pattern with an overall Q30 of >80%, and all libraries exhibited at least 25% of high-quality mapped reads and at least 100,000 reads. The assembly of all genomes presented a median N50 of 177,458 (SD: 51,745), and the average number of contigs was 123 (SD: 62), which suggests relatively low fragmentation in the assembly. As a second test to evaluate the quality and completeness of our genome assemblies, the forward reads of each library were aligned with Bowtie 2 against each of the whole-genome assemblies. The percentages of aligned reads were 97.76 (PE; SD: 1.45%) and 99.49 (SE; SD: 0.47), which indicated a high level of completeness of the assemblies. The most abundant serotype-ST combinations were O25:H4 (ST-131) and O8:H9 (ST-423), with 9 and 5 representatives among the 24 UPEC strains, respectively. The serotypes and the sequence types (ST) of the 24 studied strains are indicated in Table 1. The 24 UPEC genome isolates belonged to 4 of the E. coli phylogroups (as determined from their position in the E. coli phylogeny; see ) and presented 10 specific serotype-ST combinations. Only the serotypes of strains EC047 and EC067 were ambiguous or undetermined. EC067 can be assigned to the O17 and O44 antigens due to antigen cross-reactions; in addition, SeroTypeFinder can erroneously report the O17 antigen instead of O44.31

Gene content comparison and pangenome composition

The mean number of protein-coding genes detected by Prokka in the 24 genome assemblies was 4,853 (range: 4,418–5,113), while RAST detected 5,064 (range: 4,574–5,368). In comparison, the UPEC model strain CFT073 and EC958 ST-131 genomes harbor 5,379 and 5,100 genes, respectively (Table 2).10,17,18

Table 2

Numbers and types of genes detected by Prokka and RAST

Classification	Sample	ProkkaCDS	ProkkarRNA	ProkkatRNA	RASTCDS	RASTrRNA	RASTtRNA
O25:H4 (ST-131) B2	EC153	4951	10	81	5193	18	80
	EC037	4948	10	81	5190	18	80
	EC022	4982	9	81	5221	16	80
	EC131	5113	10	84	5368	16	83
	EC101	4962	9	83	5164	19	83
	EC144	5091	9	81	5328	16	80
	EC307	5084	9	81	5321	16	80
	EC199	5010	10	81	5239	18	80
EC025	4890	10	80	5102	17	80
O16:H5 (ST 131) B2	EC053	4651	11	79	4852	18	79
O75:H5 (ST-1193) B2	EC010	4832	8	84	5085	13	84
EC102	4756	8	83	5002	12	84
O75:H5 (ST-14) B2	EC084	4874	9	84	5080	11	84
O8:H9 (ST-423) B1	EC112	4736	8	83	4922	18	81
O8:H9 (ST-423) B1	EC319	4737	8	85	4919	20	83
	EC160	4737	8	83	4923	19	81
	EC167	4755	8	84	4940	20	82
EC136	4728	8	86	4919	20	84
O45:H11 (ST-297) B1	EC306	4467	9	78	4643	17	77
O116:H48 (ST-3519) B1	EC151	4418	10	82	4574	20	80
O16:H4 (ST-10) A	EC013	4912	8	89	5119	14	87
NA:H2 (ST-69) D	EC047	5015	7	83	5236	16	80
O17/O44:H18 (ST-69) D	EC067	4826	8	80	5013	16	78
O1:H6 (ST-648) NA	EC011	5010	8	91	5201	19	92

Abbreviation: NA, not available.

Numbers and types of genes detected by Prokka and RAST Abbreviation: NA, not available. Orthologous families were clustered in the following groups: (1) core (present in all samples), (2) soft core (present in at least 95% of the samples), (3) shell (present only in a few genomes) and (4) cloud (the remaining gene families). A total of 8,383 and 9,408 orthologous clusters were detected by GET_HOMOLOGUES in the Prokka and RAST annotations, respectively (Figure 1). The number of orthologous clusters present in all the strains was 3,136 for the Prokka annotations and 3,105 for the RAST annotations. Depending on the annotation method (either Prokka or RAST), an average of 4,853 or 5,064 genes were detected in the 24 UPEC strains, which approaches the number of genes in UPEC model strains CFT073 and EC958 (5,379 and 5,100). The difference in the numbers of genes detected by Prokka and RAST is due to accuracy differences between the annotation algorithms that are reported in the literature.26

Figure 1

Occurrence of homologous gene families detected in the 24 UPEC strains. The number of homologous gene families ((y-axis) present in increasing numbers of strains (ranging from presence in at least 1 strain to all 24 strains) is shown. The numbers are displayed separately for the (A) Prokka and (B) RAST annotations. The genes of the cloud, shell, soft-core and core genome are indicated by color. The total number of orthologous gene families detected in the 24 UPEC strains (their pangenome) comprises between 8,383 genes and 9,408 orthologous gene families. Among these pangenome gene families, we identified a total of 307 virulence factors (). After clustering homologous gene families, a distance tree based on the absence/presence of the genes was built with GET-HOMOLOGUES. This allowed us to group samples with a similar gene content. The clustering results of the 24 UPEC isolates calculated from the Prokka and the RAST annotations were very similar (Figure 2). This clustering strictly mirrors the distribution of the 24 UPEC isolates in different E. coli phylogroups and their grouping by serotype and ST.

Figure 2

Hierarchical clustering of the strains by gene presence/absence with genes annotated by Prokka and RAST. The pangenome was used for clustering with (A) Prokka and (B) RAST. A distance tree was constructed based on the presence or absence of genes. Samples with a similar gene content clustered together. We also explored whether the pangenome of the 24 UPEC strains was open (ie, comprising an orthologous family set that would grow in number if more UPEC strains were included in the analysis). To this end, we followed the approach developed by Tettelin et al.29 Briefly, the increase in pangenome size was measured as the genes of the 24 strains and those of UPEC model strains CFT073 and EC958 ST-131 were added to the gene set randomly, one strain at a time. The generated data were used to estimate a mathematical function that describes the growth tendency of the pangenome (). The pangenomes of the 24 UPEC strains showed a continuous growth tendency, such that we can predict that its size would reach ~10,000 or ~15,000 total genes if 39 or 97 total strains were included in the calculations.

Synteny conservation of the 24 strains and 2 UPEC model strains

We used i-ADHoRe software to identify the syntenic genomic regions between each pair of strains.30 The medium number of contiguous genes (present in the same contig) that belonged to these syntenic regions ranged from 42 to 281 for different strain comparisons. The assemblies with the highest N50, which tend to exhibit larger contigs, also present larger syntenic regions (represented in one or more fragments in other assemblies) (Supplementary materials). For each pair (a, b) of strain assemblies, the average percentage (of the genes) of the contigs in a that presented the same synteny in one or more regions of b varied from 71.3 to 100%. The hierarchical clustering of the 24 UPEC and strains CFT073 and EC958 ST-131 according to this metric reproduced the distribution of the strains in different serotypes and E. coli phylogroups (Figure 3).

Figure 3

Average percentage of the contigs (of their genes) of each strain (column labels) that appears in one or more fragments with the same synteny in strain b (row labels). The values for all the comparisons between the 24 UPEC strains and the model strains CFT073 and EC958 ST-131 are shown. The strains were hierarchically clustered according to these values. The phylogroups and serotypes of the strains are indicated at the heatmap margins and in the row and column labels, respectively. The percentage of the total genes of a given strain that showed the same synteny in another strain ranged from 73.2 to 98.9%, but the values were higher (from 92 to 100%) when we took into account only the genes that were shared between each pair of strains. The clustering of the strains according to these statistics also reflected the phylogroup and serotype distribution (Supplementary materials). We used the ResFinder v2.1 online tool to identify antibiotic resistance genes in our 24 genome assemblies.34 There was a wide distribution of antibiotic resistance genes, which comprised 29 genes conferring resistance to aminoglycosides, fluoroquinolones, β-lactams, chloramphenicol, sulfonamides, tetracycline and trimethoprim. On average, each strain exhibited 6 antibiotic resistance genes, and the strain with the most exhibited 11. The genes with the highest prevalence among the strains were strA, strB, blaTEM1B and sul2 (Table 3). Overall, the antibiotic resistance genotypes showed concordance with the observed phenotypes ().

Table 3

Distribution of antibiotic resistance genes

Resistance	Serotype	06:H1:K2	025:H4	025:H4									016:H5	075:H5
Gene	CFT073	EC0958	EC153	EC037	EC022	EC131	EC101	EC144	EC307	EC199	EC025	EC053	EC010
Aminoglycoside	aac(3)-lla	–		X	X	X	X	–	X	X	X	X	–	–
Aminoglycoside	aac(3)-lld	–		–	–	–	–	–	–	–	–	–	–	–
Aminoglycoside	aadA1	–		–	–	–	–	–	–	–	–	–	–	–
Aminoglycoside	aadA2	–		–	–	–	–	–	–	–	–	–	–	–
Aminoglycoside	aadA5	––	X	X	X	X	X	–	–	–	–	–	X	–
Aminoglycoside	aph(3’)-la	–		–	–	–	–	–	–	–	–	–	–	–
Aminoglycoside	strA	–		–	–	–	–	–	–	–	X	–	X	–
Aminoglycoside	strB	–		X	X	X	X	X	–	X	X	X	X	–
Aminoglycoside⁺	aac(6’)lb-cr	–		–	–	–	–	X	–	–	–	–	–	X
Fluoroquinolone
Beta-lactamase	blaCMy-2	–		–	–	–	–	X	–	–	–	–	–	X
Beta-lactamase	blaCMy-23	–	X	–	–	–	–	–	–	–	–	–	–	–
Beta-lactamase	blaCTX-M-15	–	X	X	X	X	X	–	X	X	X	X	–	–
Beta-lactamase	blaOXA-1	–	X	X	X	X	X	–	X	X	X	X	–	–
Beta-lactamase	blaTEM-1B	–	X	–	–	–	–	–	–	–	X	–	–	–
Macrolide	mph(A)	–	X	X	X	X	X	–	X	X	–	–	X	–
Phenicol	catA1	–		–	–	–	–	–	–	–	–	–	–	–
Phenicol	catB4	–	X	X	X	X	X	–	X	X	X	X	–	–
Phenicol	flor	–		–	–	–	–	–	–	–	–	–	–	–
Sulphonamide	sul1	–	X	X	X	X	X	–	X	X	–	–	X	–
Sulphonamide	sul2	–		–	–	–	–	–	–	–	X	–	X	–
Sulphonamide	sul3	–		–	–	–	–	–	–	–	–	–	–	–
Tetracycline	tet(A)	–	X	–	–	–	X	–	–	–	X	X	X	–
Tetracycline	tet(B)	–		–	–	–	–	–	–	–	–	–	–	–
Trimethoprim	dfrA1	–		–	–	–	–	–	–	–	–	–	–	–
Trimethoprim	dfrA12	–		–	–	–	–	–	–	–	–	–	–	–
Trimethoprim	dfrA17	–	X	X	X	X	X	–	–	–	–	–	X	–
Trimethoprim	dfrA5	–		–	–	–	–	–	–	–	–	–	–	–
Trimethoprim	dfrA8	–		–	–	–	–	–	–	–	–	–	–	–
Trimethoprim	dfrB4	–		–	–	–	–	–	X	X	–	–	–	–

Distribution of antibiotic resistance genes

Detection of virulence genes

We identified virulence factor genes by performing Blast alignments of the complete set of genes from the Virulence Factor Database (VFDB) against our 24 genome assemblies. We detected between 138 and 197 virulence factor genes in the 24 genome assemblies; 90 of them were shared by at least 23 of the UPEC isolates (and by the reference strains CFT073 and EC958 ST-131) (Figure 4; ). These conserved genes included the E. coli common pilus (ECP), curli and type I fimbriae genes; the che, fig, fli and mot genes related to motility and chemotaxis; the enterobactin siderophore genes and the ompA gene, among others. Adherence; iron uptake; motility and chemotaxis; and secretion systems were the most represented functional categories for the VFs in all strains. There was a characteristic functional category distribution of the genes dependent on the presence of different phylogroups, serotypes and STs. For instance, only the O25:H4 ST-131 and ST-14 isolates exhibited pap adherence genes. On the other hand, the yersiniabactin siderophore genes were found in all the isolates except for O8:1-H9 (ST-423), but the salmochelin siderophore genes occurred only in O8:H9 (ST-423) strains ().

Figure 4

Number of virulence factor genes per functional category for the 24 assembled UPEC genomes. The strains are ordered by the phylogroup-ST. Each functional category is color coded.

Discussion

In this work, we characterized the full genomes of 24 UPEC strains by massive parallel sequencing to investigate their contents of antibiotic resistance and virulence factor genes. These UPEC strains are of clinical relevance given that around the world, 150 million people develop urinary tract infections each year, and E. coli is their primary infectious agent.36 In Mexico, the incidence of UPEC infections is high, reaching 3,000 per 100,000 habitants in 2018.37 We previously characterized UPEC strains from Mexican patients via nongenomic approaches that are limited in terms of gene detection and identified the collective participation of a large repertoire of expressed virulence genes during in vitro infection; therefore, this work represents an advance of this development.16,38 Overall, these metrics are similar to those in published works and were considered valid for the purpose of this study.39,40 The size of our 24 UPEC pangenome is smaller than the E. coli species pangenome previously reported (~18,000 clusters of orthologues), which was calculated with a strain set of comparable size (20 isolates) that included commensal and pathogenic strains from different E. coli phylogroups.41 Although factors such as the completeness of the assemblies or the orthologous search method can contribute to the difference between the size of our UPEC pangenome and the reported E. coli pangenome, we expect that several genes that are characteristic of other E. coli lifestyles or phenotypes (for example, intestinal pathogens) are present in the species pangenome but absent in the UPEC pangenome. The 24 UPEC-strain pangenome shows a highly pronounced growth tendency, such that when the genes of the 24 isolates (along with strains CFT073 and EC958 ST-131) are added to the gene set one strain at a time, the pangenome continues to grow without reaching a saturation point, and we can expect that it would continue to increase steeply if more UPEC strains were taken into account (). We therefore conclude that the pangenome of these UPEC strains is open, reflecting the behavior previously reported for the E. coli species, whose members gain and lose genes frequently by horizontal gene transfer.42 The 24 UPEC-strain core genome (the group of gene families shared by all the strains) includes at least 3,136 gene families and is therefore larger than the E. coli species core genome (which includes ~2,000 orthologous gene families).41 This is explained by the fact that the UPEC strains share a group of genes related to their specific pathogenic lifestyle that may be absent in other members of the E. coli species. Among the genes shared by all the strains (or absent in just one strain), we found 90 virulence factors. These conserved genes include well-known UPEC virulence determinants, such as the E. coli common pilus (ECP), curli and type I fimbriae genes, as well as the che, fig, fli and mot genes, which are related to motility and chemotaxis.43 When the 24 UPEC strains were clustered according to their gene family content, the isolates from the same E. coli phylogroups and/or serotypes clearly grouped together. The same was true when the strains were clustered according to the number of syntenic contig regions that they shared (Figures 2 and 3). The highest levels of both shared synteny and gene content similarity were observed between strains of the same serotype. The values were lower when two strains of the same phylogroup with different serotypes were compared, and the lowest numbers were observed between strains of different phylogroups. Together, these results indicate that among these 24 UPEC strains, there is a distinct gene content for different serotypes or phylogroups and that these distinctive genes tend to share the same synteny. The percentage of the genes of a given strain that exhibit the same synteny in another strain is between 73 and 99% for different strain pairs, but it is much higher (between 92 and 100%) if only the genes shared by both strains are considered. This suggests that the gain or loss of groups of genes is the factor that contributes most to the differences in genomic structure between these 24 strains, rather than the rearrangement of their genes. This is in accordance with previous studies of the genome dynamics of E. coli, which have suggested that the members of the species share a common genomic backbone that presents conserved synteny (with very limited rearrangements) and is composed of genes of the core genome. The accessory genes, such as those related to pathogenicity, are inserted at prophages, pathogenicity islands (PAIs) or insertion sequences (IS) at different positions in the backbone.10,41 To better characterize the strains, we defined the serotype and sequence type in silico. All but one of the strains was successfully assigned. The phylotype was also assigned in silico. Then, we used the gene annotation data to hierarchically cluster the strains. The strains clustered in the same fashion as the distribution of phylogroups, serotypes and sequence types, which supports the validity of our genomic classification of the strains. The O25:H4 (ST 131) strains have been extensively studied as a group of globally widespread, multiresistant UPEC clones. The most common ST-131 isolates belong to lineage C, which has the O25:H4 serotype and is best known for its fluoroquinone resistance (acquired through mutations in gyrA and parC) and for the spreading of the CTX-M-15 extended-spectrum beta lactamase gene.44,45 In most strains, 90 genes involved in the adherence, motility and chemotaxis, iron uptake and secretion systems () were identified. In addition, 87.5% (n=21) of the genomes of the strains showed different antibiotic resistance genes, including genes related to aminoglycoside, fluoroquinolone, beta-lactamases, macrolide, phenicol, sulfonamide, tetracycline and trimethoprim resistance (Table 3), where the identified phenotype and genotype showed multidrug resistance (MDR) against β-lactams, sulfonamides and trimethoprim (). Recently, in a large study conducted in uropathogenic E. coli O25b-B2-ST131 (n=248), the most frequently identified virulence genes were kpsM2, sat, iucD, iutA, iha, fimA, fyuA, ompT, csgA and traT, and multidrug resistance to ceftazidime, cefotaxime, cefazolin, coamoxiclav, amoxicillin, erythromycin, tetracycline, tigecycline, gentamicin, and ciprofloxacin was also observed.46 In another study carried out in UPEC strains (n=167), the most frequent ST identified was ST131 (n=20), mainly corresponding to phylogenetic group B2, which also includes ST ST-1193, ST-14, ST-10 and ST-69, found in the present study (Table 1).47 In this study, group O25:H4-ST-131, belonging to phylogenetic group B2 of E. coli was the most prevalent group in the patients studied. Therefore, it is important to establish epidemiological monitoring programs in the region, focused specifically on controlling bacterial resistance to antimicrobials. A recent epidemiological study conducted in Mexico showed that strains of E. coli O25:H4-ST-131 isolated from cystitis and prostatitis patients were carriers of the fimH, papD, sfa, uspA, ipaH, hofB and hofC genes. The majority of the cystitis strains were resistant to trimethoprim-sulfamethoxazole, ticarcillin and ciprofloxacin, whereas most strains associated with prostatitis were resistant to tetracycline, azithromycin, doxycycline, ticarcillin and ciprofloxacin.14 The antibiotic resistance gene composition and distribution are among the major challenges in the clinical treatment of UPEC infections. We detected acquired antibiotic resistance genes in all the studied genomes except for CFT073. We found the CTX-M-15 gene related to extended-spectrum beta lactamase (characteristic of E. coli 025:H4 ST-131) in EC958 ST-131 and in eight of the nine 025:H4 ST-131 UPEC isolates studied here. The identification of the gyrA1AB and parCloAB alleles, responsible for the fluoroquinolone resistance of the ST-131 H25:H4 strains, would require a search for antibiotic resistance-conferring mutations in the corresponding genes.44,45 The 025:H4 ST-131 and 08:H9 ST-423 strains exhibit characteristic antibiotic resistance gene content patterns. One limitation of this work is that we did not analyze the genetic context of the antibiotic resistance and virulence genes. However, the antibiotic resistance genes sul1 (sulfamethoxazole), dfrA17 (trimethoprim), and aadA5 (aminoglycosides) identified in the genomes of the UPEC strains (Table 3) have been found in the variable region gene cassette of class I integrons that can be mobilized by plasmids or transposons.48 Furthermore, several of the virulence genes detected in the analyzed UPEC genomes have been found in islands of pathogenicity (PAIs), such as PAI IV536 (yersiniabactin siderophore system), PAI I536 (α-hemolysin, CS12 fimbriae, and F17-like fimbrial adhesion) and PAI IIJ96 (α-hemolysin, Prs fimbriae, cytotoxic, necrotizing factor).49 The detection of strains belonging to distinct serogroups that are resistant to four or five unrelated families of antibiotics is of great concern. This high incidence of antibiotic resistance is probably related to several types of inappropriate use of antibiotics in Mexico.50 For example, antibiotics were sold without medical prescription in the drugstores in Mexico until 2010. Eighty-three percent of the strains are resistant to sulfonamides (sul1, sul2 or sul3), and most of them are resistant to both trimethoprim and sulfonamides, a combination of antibiotics that is commonly prescribed for treating UTIs in Mexico, whose continued use could be the cause of the selection of resistant strains. Most strains were resistant to 6–11 antibiotics, and only one was resistant to a single antibiotic, which is epidemiologically relevant given that there is no current knowledge of their prevalence in Mexico. Further studies involving a large number of isolated strains are needed to better establish the distribution and composition of antibiotic resistance genes and virulence factors in UPEC strains from Mexico.14

Conclusion

This is the first genomic analysis of UPEC strains to be carried out in Mexico. The characterized strains exhibited a phenotype and genotype of multidrug resistance and harbored a large number of virulence genes that are commonly used in clinical practice. Group O25:H4-ST-131, belonging to phylogenetic group B2 of E. coli, was the most prevalent in the patients. These findings are relevant to define new strategies for treating urinary tract infections in public hospitals and in private practice. To further define the epidemiological distribution and composition of the virulence and antibiotic resistance genes, larger studies are needed.

49 in total

Review 1. [Antibiotic use in Mexico: review of problems and policies].

Authors: Anahí Dreser; Veronika J Wirtz; Kitty K Corbett; Gabriela Echániz
Journal: Salud Publica Mex Date: 2008

2. A multiplex PCR method to detect 14 Escherichia coli serogroups associated with urinary tract infections.

Authors: Dan Li; Bin Liu; Min Chen; Dan Guo; Xi Guo; Fenxia Liu; Lu Feng; Lei Wang
Journal: J Microbiol Methods Date: 2010-04-28 Impact factor: 2.363

3. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli.

Authors: R A Welch; V Burland; G Plunkett; P Redford; P Roesch; D Rasko; E L Buckles; S-R Liou; A Boutin; J Hackett; D Stroud; G F Mayhew; D J Rose; S Zhou; D C Schwartz; N T Perna; H L T Mobley; M S Donnenberg; F R Blattner
Journal: Proc Natl Acad Sci U S A Date: 2002-12-05 Impact factor: 11.205

4. How to become a uropathogen: comparative genomic analysis of extraintestinal pathogenic Escherichia coli strains.

Authors: Elzbieta Brzuszkiewicz; Holger Brüggemann; Heiko Liesegang; Melanie Emmerth; Tobias Olschläger; Gábor Nagy; Kaj Albermann; Christian Wagner; Carmen Buchrieser; Levente Emody; Gerhard Gottschalk; Jörg Hacker; Ulrich Dobrindt
Journal: Proc Natl Acad Sci U S A Date: 2006-08-15 Impact factor: 11.205

5. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome".

Authors: Hervé Tettelin; Vega Masignani; Michael J Cieslewicz; Claudio Donati; Duccio Medini; Naomi L Ward; Samuel V Angiuoli; Jonathan Crabtree; Amanda L Jones; A Scott Durkin; Robert T Deboy; Tanja M Davidsen; Marirosa Mora; Maria Scarselli; Immaculada Margarit y Ros; Jeremy D Peterson; Christopher R Hauser; Jaideep P Sundaram; William C Nelson; Ramana Madupu; Lauren M Brinkac; Robert J Dodson; Mary J Rosovitz; Steven A Sullivan; Sean C Daugherty; Daniel H Haft; Jeremy Selengut; Michelle L Gwinn; Liwei Zhou; Nikhat Zafar; Hoda Khouri; Diana Radune; George Dimitrov; Kisha Watkins; Kevin J B O'Connor; Shannon Smith; Teresa R Utterback; Owen White; Craig E Rubens; Guido Grandi; Lawrence C Madoff; Dennis L Kasper; John L Telford; Michael R Wessels; Rino Rappuoli; Claire M Fraser
Journal: Proc Natl Acad Sci U S A Date: 2005-09-19 Impact factor: 11.205

6. Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12.

Authors: T Hayashi; K Makino; M Ohnishi; K Kurokawa; K Ishii; K Yokoyama; C G Han; E Ohtsubo; K Nakayama; T Murata; M Tanaka; T Tobe; T Iida; H Takami; T Honda; C Sasakawa; N Ogasawara; T Yasunaga; S Kuhara; T Shiba; M Hattori; H Shinagawa
Journal: DNA Res Date: 2001-02-28 Impact factor: 4.458

7. Prevalence and diversity of integrons and associated resistance genes in faecal Escherichia coli isolates of healthy humans in Spain.

Authors: Laura Vinué; Yolanda Sáenz; Sergio Somalo; Esther Escudero; Miguel Angel Moreno; Fernanda Ruiz-Larrea; Carmen Torres
Journal: J Antimicrob Chemother Date: 2008-08-15 Impact factor: 5.790

8. Fast and accurate long-read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2010-01-15 Impact factor: 6.937

9. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths.

Authors: Marie Touchon; Claire Hoede; Olivier Tenaillon; Valérie Barbe; Simon Baeriswyl; Philippe Bidet; Edouard Bingen; Stéphane Bonacorsi; Christiane Bouchier; Odile Bouvet; Alexandra Calteau; Hélène Chiapello; Olivier Clermont; Stéphane Cruveiller; Antoine Danchin; Médéric Diard; Carole Dossat; Meriem El Karoui; Eric Frapy; Louis Garry; Jean Marc Ghigo; Anne Marie Gilles; James Johnson; Chantal Le Bouguénec; Mathilde Lescat; Sophie Mangenot; Vanessa Martinez-Jéhanne; Ivan Matic; Xavier Nassif; Sophie Oztas; Marie Agnès Petit; Christophe Pichon; Zoé Rouy; Claude Saint Ruf; Dominique Schneider; Jérôme Tourret; Benoit Vacherie; David Vallenet; Claudine Médigue; Eduardo P C Rocha; Erick Denamur
Journal: PLoS Genet Date: 2009-01-23 Impact factor: 5.917

10. The RAST Server: rapid annotations using subsystems technology.

Authors: Ramy K Aziz; Daniela Bartels; Aaron A Best; Matthew DeJongh; Terrence Disz; Robert A Edwards; Kevin Formsma; Svetlana Gerdes; Elizabeth M Glass; Michael Kubal; Folker Meyer; Gary J Olsen; Robert Olson; Andrei L Osterman; Ross A Overbeek; Leslie K McNeil; Daniel Paarmann; Tobias Paczian; Bruce Parrello; Gordon D Pusch; Claudia Reich; Rick Stevens; Olga Vassieva; Veronika Vonstein; Andreas Wilke; Olga Zagnitko
Journal: BMC Genomics Date: 2008-02-08 Impact factor: 3.969

3 in total

1. Whole genome analysis unveils genetic diversity and potential virulence determinants in Vibrio parahaemolyticus associated with disease outbreak among cultured Litopenaeus vannamei (Pacific white shrimp) in India.

Authors: Kattapuni Suresh Prithvisagar; Ballamoole Krishna Kumar; Toshio Kodama; Praveen Rai; Tetsuya Iida; Iddya Karunasagar; Indrani Karunasagar
Journal: Virulence Date: 2021-12 Impact factor: 5.882

2. Development and selection of low-level multi-drug resistance over an extended range of sub-inhibitory ciprofloxacin concentrations in Escherichia coli.

Authors: Carly Ching; Muhammad H Zaman
Journal: Sci Rep Date: 2020-05-29 Impact factor: 4.379

3. Genomic Characterization of Multidrug-Resistant Escherichia coli BH100 Sub-strains.

Authors: Rodrigo Carvalho; Flavia Aburjaile; Marcus Canario; Andréa M A Nascimento; Edmar Chartone-Souza; Luis de Jesus; Andrey A Zamyatnin; Bertram Brenig; Debmalya Barh; Preetam Ghosh; Aristoteles Goes-Neto; Henrique C P Figueiredo; Siomar Soares; Rommel Ramos; Anne Pinto; Vasco Azevedo
Journal: Front Microbiol Date: 2021-01-08 Impact factor: 5.640

3 in total