Literature DB >> 31068432

Complete Assembly of Escherichia coli Sequence Type 131 Genomes Using Long Reads Demonstrates Antibiotic Resistance Gene Variation within Diverse Plasmid and Chromosomal Contexts.

Arun Gonzales Decano¹, Catherine Ludden^2,3, Theresa Feltwell², Kim Judge², Julian Parkhill², Tim Downing⁴.

Abstract

The incidence of infections caused by extraintestinal Escherichia coli (ExPEC) is rising globally, which is a major public health concern. ExPEC strains that are resistant to antimicrobials have been associated with excess mortality, prolonged hospital stays, and higher health care costs. E. coli sequence type 131 (ST131) is a major ExPEC clonal group worldwide, with variable plasmid composition, and has an array of genes enabling antimicrobial resistance (AMR). ST131 isolates frequently encode the AMR genes bla CTX-M-14, bla CTX-M-15, and bla CTX-M-27, which are often rearranged, amplified, and translocated by mobile genetic elements (MGEs). Short DNA reads do not fully resolve the architecture of repetitive elements on plasmids to allow MGE structures encoding bla CTX-M genes to be fully determined. Here, we performed long-read sequencing to decipher the genome structures of six E. coli ST131 isolates from six patients. Most long-read assemblies generated entire chromosomes and plasmids as single contigs, in contrast to more fragmented assemblies created with short reads alone. The long-read assemblies highlighted diverse accessory genomes with bla CTX-M-15, bla CTX-M-14, and bla CTX-M-27 genes identified in three, one, and one isolates, respectively. One sample had no bla CTX-M gene. Two samples had chromosomal bla CTX-M-14 and bla CTX-M-15 genes, and the latter was at three distinct locations, likely transposed by the adjacent MGEs: ISEcp1, IS903B, and Tn2 This study showed that AMR genes exist in multiple different chromosomal and plasmid contexts, even between closely related isolates within a clonal group such as E. coli ST131.IMPORTANCE Drug-resistant bacteria are a major cause of illness worldwide, and a specific subtype called Escherichia coli ST131 causes a significant number of these infections. ST131 bacteria become resistant to treatments by modifying their DNA and by transferring genes among one another via large packages of genes called plasmids, like a game of pass-the-parcel. Tackling infections more effectively requires a better understanding of what plasmids are being exchanged and their exact contents. To achieve this, we applied new high-resolution DNA sequencing technology to six ST131 samples from infected patients and compared the output to that of an existing approach. A combination of methods shows that drug resistance genes on plasmids are highly mobile because they can jump into ST131's chromosomes. We found that the plasmids are very elastic and undergo extensive rearrangements even in closely related samples. This application of DNA sequencing technologies illustrates at a new level the highly dynamic nature of ST131 genomes.

Entities: Chemical

Keywords: MGE; antibiotic resistance; genome assembly; plasmid; sequencing

Mesh：

Year: 2019 PMID： 31068432 PMCID： PMC6506616 DOI： 10.1128/mSphere.00130-19

Source DB: PubMed Journal: mSphere ISSN： 2379-5042 Impact factor: 4.389

INTRODUCTION

Reported cases of bloodstream and urinary tract infections caused by extraintestinal pathogenic Escherichia coli (ExPEC) are increasing globally at an alarming rate (1). As a key source of ExPEC isolates worldwide, E. coli sequence type 131 (ST131) is regarded as a serious threat to public health, given its high level of antimicrobial resistance (AMR), as well as the broad spectrum of infections it causes in community and hospital settings (2, 3). E. coli ST131 is virulent (4) and has an expansive range of virulence factors (5, 6), especially those linked to uropathogenic E. coli (UPEC) (3, 7, 8). AMR and virulence genes allow ST131 to adapt to drug selection pressure and to survive in extraintestinal niches and are often encoded on mobile genetic elements (MGEs) (9), which means that the exact set of virulence and AMR genes in a single ST131 isolate may vary (8, 10). ST131 encodes a range of extended-spectrum β-lactamases (ESBLs) that hydrolyze third-line drugs, including cephalosporins, the most common of which is cefotaximase blaCTX-M-15 . Within ST131, clade C2 has more AMR genes than other clades and is typically blaCTX-M-15 positive, differentiating it from clade C1, which can be blaCTX-M-14 or blaCTX-M-27 positive (3, 8). Most ST131 AMR genes are reported to be carried on plasmids, circular self-replicating double-stranded DNA molecules that constitute part of the bacterial accessory genome (11–13). Plasmids can reduce bacterial cell fitness, but a number of postsegregation killing and stable plasmid inheritance mechanisms allow the stable maintenance of IncF plasmids in ST131 (14–16). The chromosomal integration of plasmid genes is most commonly facilitated by transposons, which can ensure the acquisition and conservation of such elements if there is no subsequent local recombination (17, 18). Identifying plasmid conjugation, recombination, and transposition could have value in tracking AMR genes associated with disease outbreaks and antibiotic treatment failures. Plasmids may be classified using incompatibility (Inc), relaxase (MOB), and mating pair formation system typing (19), but difficulties in plasmid genetic analysis and reconstruction arise with short-read data due to rearrangements driven by recombination, dense arrays of repetitive elements including transposable elements (TEs), changes in gene copy numbers, and high sequence variation. Methods using short reads alone may fail to detect genomic segments exchanged between plasmids and the chromosome, limiting evaluation of the core and accessory genomes. Whole-genome sequencing has provided a high resolution of the genomic epidemiology of ST131 and plasmid-mediated AMR outbreaks (20). However, short reads alone are insufficient to resolve plasmids that often have numerous small MGEs of ∼1 kb or less in size, e.g., TEs and insertion sequences (ISs) (21). Complex transposable units (TUs) consisting of multiple TEs or ISs can mobilize AMR genes by transposition, and this can sometimes be followed by recombination within the TU between one of the inverted repeats (IRs) flanking the TE and the IR of another local TE or an adjacent homologous sequence, resulting in different TU structures, locations, and copy numbers. At present, the exact resolution of complex structural rearrangements of repetitive TUs containing AMR genes may be impossible with short reads (22). Consequently, plasmid assembly is a challenge, requiring accurate long reads and sufficient coverage to distinguish between independent plasmids with regions of sequence identity (21, 23). Long reads, such as those generated using Oxford Nanopore Technologies (ONT) or Pacific Biosciences platforms, can provide a solution to this plasmid assembly problem (24–26). Here, we sequenced six ST131 using the ONT GridION X5 platform. Using the resulting high-coverage sequence data, we reconstructed and annotated the plasmids and chromosomal regions carrying blaCTX-M genes, as well as their genetic context and copy numbers.

RESULTS

Oxford Nanopore long-read quality control and filtering.

High-molecular-weight DNA from six E. coli ST131 isolates was sequenced using long Oxford Nanopore reads and short Illumina reads to assemble their genomes, allowing for plasmid reconstruction and resolution of AMR genes, MGEs, and associated rearrangements. The ONT GridION X5 sequencing generated 8.9 Gbases in total across 1,406,087 reads (mean length of 6.3 kb) (Table 1). The number of reads generated per hour, total yield of bases over time, read length distribution, and read Q score distribution were examined (see Fig. S2 at https://ndownloader.figshare.com/files/14983688). Half of the bases with Q values of ≥7 were on reads of 18 kb or longer (Fig. S3). These metrics indicated sufficient GridION data in terms of quantity and quality. Initial screening removed reads with Q values of <7, leaving 1,142,067 reads with 8.2 Gbases with a mean Q score of 10.2 and a mean length of 7.2 kb (Table 1) for analysis. This included 81 reads longer than 100 kb, including one of 155,312 bases. This corresponded to a 257-fold theoretical coverage for six 5.3-Mb genomes.

TABLE 1

Quality parameters indicate high-quality read libraries for the six ST131 samples from GridION X5 sequence data

Parameter	Value for:
Parameter	All reads	Reads with a Q of ≥7
Total bases	8,908,946	8,193,921
Total reads	1,406,087	1,142,067
Mean length (bp)	6,336	7,175
Median length (bp)	2,273	2,897
Mean Q score	9.1	10.2
Median Q score	10.0	10.5
Reads of >100 kb	85	81

A total of 264,020 low-quality reads (with a Q of <7) totaling 715,024,800 bases were excluded.

Quality parameters indicate high-quality read libraries for the six ST131 samples from GridION X5 sequence data A total of 264,020 low-quality reads (with a Q of <7) totaling 715,024,800 bases were excluded. The initial number of reads per library ranged from 127,118 to 510,253, and these were filtered using a series of steps to ensure that the reads used for each of the six assemblies had high quality. Bases were successfully called at an average of 97.9% of reads (Table 2). Identifying the consensus demultiplexed, duplicate-free, and adapter-free reads from Porechop v0.2.4 eliminated a further 2.9% of the base-called reads, yielding 120,123 to 487,482 reads per library (Table 2).

TABLE 2

Number of reads generated from GridION X5 sequencing data per library that passed filtering during base calling with Albacore v2.0 and those that were adapter-free (using Porechop v0.2.4)

Strain	No. of initial reads (fast5)	No. of base-called reads (fastq)	No. of adapter-free reads (fastq)	Avg length (bp)
VRES1160	358,829	351,636	345,033	7,037
VREC0693	208,478	204,904	194,413	8,982
VRES0739	163,349	160,693	155,900	9,171
VREC1013	510,253	497,646	487,482	6,657
VREC1073	313,627	304,218	298,658	7,256
VREC1428	127,118	124,539	120,123	9,301

The adapter-free reads totaling 1,601,609 were used for downstream analyses. A total of 80,045 reads were excluded during base calling or adapter trimming.

Number of reads generated from GridION X5 sequencing data per library that passed filtering during base calling with Albacore v2.0 and those that were adapter-free (using Porechop v0.2.4) The adapter-free reads totaling 1,601,609 were used for downstream analyses. A total of 80,045 reads were excluded during base calling or adapter trimming.

Long-read genome assembly illuminates highly diverse accessory genomes.

All six genome assemblies produced chromosomes of 4.81 to 5.38 Mb, with differing numbers of plasmids with lengths spanning 4 to 156 kb (see Fig. S4 at https://ndownloader.figshare.com/files/14983688; Table 3). The numbers of contigs produced by long-read assemblies of two samples (VREC0693, VRES0739) corresponded exactly to the chromosome and plasmids. The others had either one (VREC1073, VRES1160, VREC1013) or two (VREC1428) additional chromosomal contigs (Table S2).

TABLE 3

Total sizes of assemblies, chromosomes, and plasmids found in each strain based on their optimal whole-genome assemblies using the GridION X5 long reads

Strain	Genome length (bp)	No. of contigs		N₅₀	Chromosome size (Mb)	No. of plasmids	Plasmid size(s) (kb)
Strain	Genome length (bp)	Assembled	Minimum possible	N₅₀	Chromosome size (Mb)	No. of plasmids	Plasmid size(s) (kb)
VRES1160	5,326,801	6	5	5,126,679	5.23	4	62, 16, 5, 4
VREC0693	5,260,741	3	3	5,039,909	5.04	2	132, 89
VRES0739	4,806,912	3	3	4,797,749	4.81	2	5, 4
VREC1013	5,223,433	3	2	3,699,451	5.14	1	90
VREC1073	5,539,158	3	2	5,286,804	5.38	1	156
VREC1428	5,236,419	7	5	4,924,536	5.13	4	92, 5, 5, 4

Each assembly had seven or fewer contigs, and in three cases, no fewer contigs were possible, consistent with full genome assembly (for VREC0693, VRES0739, and VREC1073). The optimal assembly with Unicycler used long reads alone (in bold mode), with the exception of VREC1013, for which a hybrid combining short Illumina reads with long Oxford Nanopore reads was best, with minor manual screening (see supplemental results at https://ndownloader.figshare.com/files/14983688).

Total sizes of assemblies, chromosomes, and plasmids found in each strain based on their optimal whole-genome assemblies using the GridION X5 long reads Each assembly had seven or fewer contigs, and in three cases, no fewer contigs were possible, consistent with full genome assembly (for VREC0693, VRES0739, and VREC1073). The optimal assembly with Unicycler used long reads alone (in bold mode), with the exception of VREC1013, for which a hybrid combining short Illumina reads with long Oxford Nanopore reads was best, with minor manual screening (see supplemental results at https://ndownloader.figshare.com/files/14983688). Contigs were classified as chromosomal or plasmid derived using mlplasmids given a probability threshold of 60% (22), with further screening for plasmid-related gene content using the Multiple Antibiotic Resistance Annotator (MARA), the Comprehensive Antibiotic Resistance Database (CARD), and PlasmidFinder (Table S2). The largest plasmid was a 156.3-kb IncFIA plasmid in VREC1073, its sole plasmid. VREC1428 and VRES1160 had 92.8- and 61.9-kb IncFIA plasmids, respectively, along with three small Col plasmids each (Table 3). VREC0693 had a 132.0-kb IncFIB plasmid and an 88.8-kb IncB plasmid; IncB plasmids have the same Rep domains as IncFII plasmids (27). VREC3013 had one 89.9-kb IncFII plasmid. VRES0739 alone had no large plasmid, which was verified with the short-read data. By mapping the long reads to the optimal assemblies, the read coverage of each chromosome and plasmid was estimated (see Table S2 at https://ndownloader.figshare.com/files/14983688). Each chromosome had 126- to 310-fold median coverage, and the median coverage levels of large plasmids ranged from 85- to 282-fold, except for VREC1013’s IncFII plasmid, which had 1,015-fold coverage and a normalized depth of 3.3-fold. The normalized depth of plasmids compared to chromosomes suggested that some cells in VREC1428 and VREC1073 may have lost their IncFIA plasmid, and the same was true for VREC0693 and its IncFIB plasmid. However, the IncFIA plasmid in VRES1160 and the IncB plasmid in VREC0693 had higher than expected copy numbers (by 9% after normalization), potentially indicating stable plasmid retention. Across five assemblies in the Unicycler normal mode, the median insertion/deletion (indel) error rates for short reads and hybrid assemblies were similar (0.21 and 0.28 per 100 kb, respectively) but were much higher for long-read assemblies (265.0 per 100 kb) (Table S3). Likewise, the median mismatch error rates for short reads and hybrid assemblies were comparable (4.25 and 2.28 per 100 kb, respectively), but the error rate was much higher for long-read assemblies (332.8 per 100 kb) (Table S3). These rates excluded VREC1073, for which some Quast metrics were zero values. Similarly, the recovery of conserved BUSCO genes was far higher for hybrid assemblies (>99.5%) than for long-read ones (>82.3%).

The dynamic locations and genomic contexts of blaCTX-M genes in long-read assemblies.

The optimized assemblies provided an improved view of the genomic context of each blaCTX-M allele, whose effectiveness as a marker for ST131 clade classification and origin (8) were explored in this study. The deeper resolution of genome architecture revealed surprising differences in blaCTX-M gene context (Fig. 1; see Table S2 at https://ndownloader.figshare.com/files/14983688), including the discovery of chromosomal blaCTX-M genes in VREC0693 (three copies of blaCTX-M-15) and VREC1073 (one copy of blaCTX-M-14). All blaCTX-M genes were complete (876 bp) with adjacent ISEcp1 (1,658 bp with flanking IRs of 14 to 16 bp) and Tn2 (5.8-kb) elements; ISEcp1 and Tn2 can transpose blaCTX-M and other ESBL genes (28, 29). The VRES0739 genome did not contain any region homologous to blaCTX-M, most likely because it had lost an IncF plasmid, unlike the other isolates.

FIG 1

The structure of contigs with blaCTX-M genes. Two of the ST131 blaCTX-M genes are on chromosomal contigs (VREC0693 and VREC1073). VRES1160 and VREC1013 have IncFIA and IncFII plasmids, respectively, both of which have blaCTX-M-15 genes. VREC1428 has an IncFIA plasmid with the blaCTX-M-27 gene. VRES0739 is not shown because it was blaCTX-M negative and had no large plasmid. The contigs were classified as chromosomal or plasmid derived by mlplasmids so that the blaCTX-M genes and their genetic flanking context could be examined. Annotation was derived from GalileoAMR based on the Multiple Antibiotic Resistance Annotator (MARA) and database. The blaCTX-M variants are labeled and circled in red (blaCTX-M-15), purple (blaCTX-M-14), or green (blaCTX-M-27). VRES1160, VREC0693, and VREC1013 all had blaCTX-M-15 genes linked to isoforms of ISEcp1, IS26, and Tn2, implicating them in driving the transposition of the TU (Fig. S5). Each was similar to the ST131 clade C2 ISEcp1-bla-M-15-orf477Δ TU (8, 30) but with distinct structural differences. VRES1160’s single blaCTX-M-15 gene was at bp 2296 on its IncFIA plasmid and was flanked by ISEcp1 to its 5′ end and by Tn2 followed by IS26 at its 3′ end, with another Tn2 5′ of ISEcp1. VREC0693’s three chromosomal blaCTX-M-15 genes were not tandem repeats (chromosomal locations at positions 2781074 bp, 3696068 bp, and 3970927 bp), but each of these TUs were identical: all had ISEcp1 at the 5′ ends and truncated Tn2’s at the 3′ ends. VREC1013’s sole blaCTX-M-15 gene was located at bp 13226 on its IncFII plasmid and was flanked by a truncated ISEcp1 at its 5′ end and Tn2 at its 3′ end, with IS26 copies 5′ and 3′ of these segments. VREC1428’s single blaCTX-M-27 gene was on its IncFIA plasmid at position 6018, and VREC1073’s single chromosomal blaCTX-M-14 gene started at contig position 19746 (see Fig. S5 at https://ndownloader.figshare.com/files/14983688). Both their blaCTX-M genes were flanked by a truncated ISEcp1 at the 5′ ends and a shortened IS903B at the 3′ ends, suggesting that ISEcp1 and IS903B may have facilitated the transposition of the TU from the plasmid. Similar blaCTX-M gene transposition events have been observed in ST131 clade C1 (8). Alignment of the plasmid-derived contigs of VRES1160 (IncFIA) to VREC1013 (IncFIB) showed that the blaCTX-M-15-positive plasmids were much more similar (>83% identity) than VREC1428’s blaCTX-M-27-positive IncFIA plasmid, which was more distinct (Fig. 2). In addition, VREC1428’s plasmid had traI and traD genes, indicating conjugation machinery (Table 4), as well as high homology to at least one published plasmid, unlike VRES1160’s and VREC1013’s plasmids (see supplemental results at https://ndownloader.figshare.com/files/14983688). This suggested that the VRES1160 and VREC1013 plasmids had homology corresponding well with blaCTX-M gene and subclade classification and that they were structurally different from published plasmids due to recombination.

FIG 2

Pairwise comparisons of the three blaCTX-M-positive plasmid-associated contigs show high sequence identity for the two from subclade C2 (VREC1013 and VRES1160) relative to that of one from C1 (VREC1428, top). The BLAST result was visualized with EasyFig v2.2.2 such that the blocks connecting the regions of the contigs represent nucleotide homology: blue for homologous regions in the same direction, and yellow for inversions. Gaps or white spaces denote unique loci or regions present in one contig but not in the other. Gene models are in green, with the direction of transcription shown by arrows. Genes of interest are labeled above each arrow. The blaCTX-M-27 gene (top) is in mauve, and the two blaCTX-M-15 genes (middle and bottom) are in red. The table at the bottom shows the contig size, plasmid type, and number of genes per strain. The list and products of the annotated genes are given in Table 5.

TABLE 4

Genes in the plasmid contigs of VREC1013, VRES1160, and VREC1428

Genes in plasmid contigs of indicated isolate
VREC1428	VREC1013	VRES1160
pemI	xerD_1	bla_1
pemK	ccdA	tmrB
bla	vapC_1	cat
nqrC	vapC_2	bla_2
lolD	kdgT_1	aacA4
agp	kdgT_2	tetA_1
hemR	ridA	tetA_2
repB_1	yagE	pinR
repB_2	ugpA	sopB_1
mmuM	cpdA	sopB_2
rocC	tnpA	repE
ccdA	yknY	ccdB
ccdB	tpd	ccdA
sopB_1	xerD_2	pifC_1
sopB_2	dhfrI_1	pifC_2
klcA	ant1_1	pifC_3
traD_1	folP	repB_1
traD_2	srpC	repB_2
traD_3	bla	xerD
traD_4	xerD_3	chrA
traD_5	xerC	folP
traD_6	dhfrI_2	mdtJ
traI_1	ant1_2	xerC
traI_2	umuC	tetA_3
traI_3	lexA	tetR
traI_4	klcA	neo
traI_5		tnpR
traI_6
traI_7
finO

Count numbers are indicated by an underline followed by the number of counts (e.g., _1). The blaCTX-M (bla), traI, and traD genes are in bold. Only isolate VREC1428 had traI and traD genes, indicating conjugative capacity. VREC0693, VRES0739, and VREC1073 contigs did not have tra genes.

TABLE 5

Protein products encoded by genes found in plasmids of strains VREC1013, VRES1160, and VREC1428

Gene	Protein product
agp	Glucose-1-phosphatase
ccDA	Antitoxin (plasmid maintenance)
chrA	Response regulator
finO	Fertility inhibition protein
folP	Dihydropteroate synthase
hemR	Hemin TonB-dependent receptor
klcA	Antirestriction protein
lolD	Lipoprotein-releasing system ATP-binding
neo	Aminoglycoside 3′-phosphotransferase
nqrC	Na(+)-translocating NADH-quinone reductase
pemK	mRNA interferase
pifC	Transcriptional repressor protein
pinR	Serine recombinase protein
repB	Replication protein
repE	Replication initiation protein
rocC	Amino acid permease
sopB	Inositol phosphate phosphatase
tetA	Tetracycline resistance protein
tmrB	Tunicamycin resistance protein
tnpR	Transposon gamma-delta resolvase
traD	Coupling protein
traI	Multifunctional conjugation protein
xerC	Tyrosine recombinase protein
xerD	Tyrosine recombinase protein

See Fig. 2.

Genes in the plasmid contigs of VREC1013, VRES1160, and VREC1428 Count numbers are indicated by an underline followed by the number of counts (e.g., _1). The blaCTX-M (bla), traI, and traD genes are in bold. Only isolate VREC1428 had traI and traD genes, indicating conjugative capacity. VREC0693, VRES0739, and VREC1073 contigs did not have tra genes. Protein products encoded by genes found in plasmids of strains VREC1013, VRES1160, and VREC1428 See Fig. 2.

Phylogenetic context of analyzed isolates.

A comparison of these six samples with 119 published ST131 isolates (8, 31) as short-read assemblies scaffolded using reference genome NCTC13441 showed that all clustered in ST131 clade C (Fig. S6). There was sufficient resolution across 4,457 core genome single-nucleotide polymorphisms (SNPs) to confidently assign them to subclade C1 (n = 1) or C2 (n = 5) (Fig. 3). VRES1160, VREC0693, VREC1013, VRES0739, and VREC1073 clustered with C2, whereas the blaCTX-M-27-positive VREC1428 was in C1. VRES1160, VREC0693, and VREC1013 all had IncF plasmids (IncFIA, IncFIB, IncFII) and blaCTX-M-15 genes, consistent with C2 isolates, which are typically blaCTX-M-15 positive; 77% of the C2 isolates here (48 out of 62) were observed to be blaCTX-M-15 positive. However, VREC1073 was in subclade C2 but had a blaCTX-M-14 gene, contradicting this pattern, and was the sole blaCTX-M-14-positive C2 isolate found here. The core genomes of VRES0739 and VREC0693 were identical, implying that VRES0739 has very recently lost its (blaCTX-M-positive IncF) plasmid. The sole isolate clustering with C1 was VREC1428, which had an IncFIA plasmid with a blaCTX-M-27 gene and so may belong to the emerging subclade C1-M27, as evidenced by the presence of prophage-like regions like M27PP1/2 (31).

FIG 3

The phylogenetic context of the six ST131 genomes (names in large bold font) shows that all except that of VREC1428 are in ST131 subclade C2 (red inner ring, VRES1160, VREC1073, VRES0739, VREC0693, and VREC1013). VREC1428 clustered in subclade C1 (purple inner ring). No new isolate clustered in C0 (green inner ring), B (blue inner ring), or an intermediate cluster (gray inner ring). Clade classification was based on phylogenetic analysis (8) by including the reference NCTC13441, 63 isolates from reference 8, and 56 isolates from reference 31 with associated classification and blaCTX-M allele data. VREC1073 and VREC0693 had chromosomal blaCTX-M genes. The outer ring shows blaCTX-M-15 (red), blaCTX-M-14 (purple), and blaCTX-M-27 alleles (green). The phylogeny was built with RAxML v8.2.11 using 4,457 SNPs from a core genome alignment generated with Roary v3.11.2 and was visualized with iTOL v4.3. Branch support was performed by 100 bootstrap replicates, and the scale bar indicates the number of substitutions per site. This midpoint-rooted phylogeny includes reference genome isolates EC958 and NCTC13441 (both in C2).

DISCUSSION

Our study resolved the plasmid architecture of several recent E. coli ST131 isolates, allowing investigation of AMR gene location, copy number, and potential transposon-driven rearrangements. This advance was facilitated by careful DNA handling during extraction to produce large volumes of high-molecular-weight DNA that was pure and free from contamination, which was avoided by performing separate extraction steps to obtain small plasmids (32), overcoming a limitation for MinION sequencing (21). The long-read genome assemblies illuminated significant variations in plasmids, MGEs, and blaCTX-M gene composition that were not captured by short reads. ST131 is a globally pandemic E. coli clonal group (15) with diverse sources of transmission (25). Phylogenetic comparison with published genomes (8, 31) showed that five out of six isolates were from subclade C2, with one from C1. The emergence of clade C has been associated with IncF plasmids, and subclade C2 has been associated with ISEcp1 and Tn2 elements flanking blaCTX-M-15 genes (33, 34). Our long-read assemblies showed the excision of the entire TU from the IncFIB plasmid and chromosomal integration at three distinct locations for VREC0693 and, similarly, chromosomal translocation of the blaCTX-M-14 gene from an IncFIA plasmid for VREC1073, mediated by ISEcp1 and IS903B based on previous work (8). These transposition events were likely driven by recombination at adjacent transposable elements. This highlights the value of long-read sequencing to resolve the location of blaCTX-M genes and that chromosomal translocations are not rare in ST131. A high resolution of the AMR gene structure, context, and copy number is highly predictive of AMR phenotypes (35) and could lead to new insights into AMR mechanisms. However, the high indel and mismatch errors in long Oxford Nanopore reads (32, 36–38) limit the power to identify AMR isoforms that could permit genome-based antimicrobial susceptibility testing (27, 39, 40). Here, the five ONT assemblies together had an average of 447-fold higher indel and 48-fold higher mismatch error rates than those for the corresponding Illumina reads, similar to previous work with MinION reads (23), and this impacted gene identification. Consequently, short reads and assembly polishing methods remain important for SNP identification and error detection until long-read error rates can be reduced (41). Our findings illustrate the diversity of AMR gene context even within recently emerged clones such as ExPEC ST131. The detection of multiple instances of chromosomally integrated ESBL genes using long reads here for blaCTX-M-15 in E. coli has parallels elsewhere for blaOXA-181 in blaCTX-M-15-positive Klebsiella pneumoniae (42) and so highlights chromosomal ESBL gene ISEcp1-mediated transposition as a potential adaptive mechanism in Enterobacteriaceae. Further studies with larger sample sizes are needed to identify the rates and mechanisms of these dynamic changes.

MATERIALS AND METHODS

Sample collection.

Six ESBL-producing E. coli ST131 clinical strains were isolated in June to October 2015 from patients at Addenbrooke’s Hospital, Cambridge, United Kingdom, as part of a study on antibiotic resistance (see Table S1 at https://ndownloader.figshare.com/files/14983688). Five samples were from feces, and one was from blood. These were short-read libraries in a multiplex run on an Illumina HiSeq 2500 platform and were processed as previously outlined (43).

High-molecular-weight DNA extraction.

Frozen stocks of the six isolates were streaked onto LB agar plates and grown overnight at 37°C. Single colonies were subcultured onto LB agar plates and incubated overnight at 37°C. DNA was extracted using a Lucigen MasterPure complete DNA and RNA purification kit. For each sample, a swab was used to sweep half a plate of pure colonies, which was suspended in 1× phosphate buffer solution (PBS). Samples were processed according to the manufacturer’s instructions, with elution in 70 μl of nuclease-free water. Pipetting was minimized to reduce shearing of the DNA prior to sequencing.

Oxford Nanopore library preparation and sequencing.

DNA was quantified using a Quant-iT HS (high-sensitivity) kit (Invitrogen). DNA purity was checked using a NanoDrop (Thermo Fisher), and fragment size was confirmed by a Femto Pulse system (Nano Life Quest). The sequencing libraries were prepared using 1 µg DNA per sample and ligation sequencing kit 1D SQK-LSK109 with the barcoding extension kit EXP-NPB104 in accordance with the ONT protocols. The samples were combined using equimolar pooling and loaded onto a single 9.4.1 MIN-106 flow cell and sequenced on the GridION X5 platform under standard conditions.

Illumina library preparation and sequencing.

The short reads used in this study were created as follows: bacterial genomic DNA was extracted using a QIAxtractor (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Library preparation was conducted according to the Illumina protocol and sequenced (96-plex) on an Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA) using 100-bp paired-end reads.

Oxford Nanopore base calling and adapter trimming.

The resulting fast5 read files (available at https://www.ncbi.nlm.nih.gov/sra/PRJEB30511 under accession numbers ERR3284704 to ERR3284709) were transferred to a separate Linux server 4.4.0 (Ubuntu 16.04.4) for analysis. Base calling was performed during the GridION run using ONT Guppy v0.5.1, and the resulting fast5 files from the initial run were converted to fastq format with Albacore v2.0 (ONT). The statistical data of the sequencing run was processed with MinIONQC v1.3.5 (44) based on the default Q score cutoff of 7. Adapters and chimeric reads were removed from fastq files using Porechop v0.2.4 (45) with demultiplex settings (see Fig. S1 at https://ndownloader.figshare.com/files/14983688). Standard outputs were saved as log files and then parsed. The quality of the final fastq files was assessed using FastQC v0.11.8 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and MultiQC v1.4 (46).

Genome assembly and improvement.

We assembled the genomes using the conservative, normal, and bold modes of the long-read-only assembly pipeline in Unicycler v4.6. Previous work has suggested that Unicycler outperforms alternatives (21) that struggle to resolve plasmids (36). This workflow included the assembly polisher, Racon, which ran iteratively to minimize error rates of called bases (45). For comparison, short read-only and hybrid assemblies were also created using Unicycler v4.6. Briefly, during short-read-only assembly, Unicycler v4.6 employed SPAdes v3.12 to assemble short reads and then used Pilon v1.22 to polish the assembly. In hybrid assemblies, Unicycler v4.6 used Miniasm to piece together long reads first and applied SPAdes v3.12 to incorporate short reads and bridge gaps. Pilon was run 3 to 10 times for short-read assemblies and 5 to 10 times for hybrid ones, until no further changes were required to achieve the most contiguous and complete genome assemblies. The average numbers of changes by Pilon were 74.3, 100.2, and 125.3 for short-read assemblies and 234.5, 257.7, and 377.0 across conservative, normal, and bold modes, respectively.

Genome assembly assessment and error rate quantification.

The quality of resulting assemblies was assessed using Quast 3.0 (47) according to the total assembly length, number of contigs, N50, GC content, and degree of replicon circularization. Assembly graphs were visualized with Bandage (48). The resulting contigs in each assembly were classified as chromosomal or plasmid using machine learning algorithms implemented in mlplasmids (22). Genome completeness was examined using the numbers of single-copy universal orthologous genes using BUSCO (Benchmarking Universal Single-Copy Orthologs) v3 with the gammaproteobacteria_odb9 database (49).

Read depth estimation.

The read depth of each replicon was estimated by aligning the short Illumina reads and the long Oxford Nanopore reads to the completed genomes by use of Smalt v0.7.6 and BWA-MEM v0.7.17 (with the flag –x ont2d for ONT reads), respectively. SAMtools v1.7 was used to process the SAM files to BAM format, remove duplicates, and identify the coverage at each base of each assembly. The median value for each replicon was noted and was normalized using the median chromosomal depth of the same assembly.

Genome annotation.

The genomes were annotated using Prokka v1.13.3 (50). blaCTX-M alleles and their contexts were detected using the Multiple Antibiotic Resistance Annotator (MARA) (51) and by aligning the assemblies against the Comprehensive Antibiotic Resistance Database (CARD v3.0) to screen for matches with 100% identification only. Information on the detected AMR features and MGEs are retrieved from Galileo AMR (https://galileoamr.arcbio.com/mara/feature/list). Plasmid identification and typing were carried out using PlasmidFinder v2.0 (52). The plasmid-derived contigs from the assembled genomes were compared using BLAST v2.6.0 with a database of 10,892 complete plasmids (53). Their homology and annotation were visualized using EasyFig v2.2.2 (54).

Phylogenetic analysis.

To provide a phylogenetic context for these six isolates, the Illumina read libraries of 63 published ST131 samples from references 8 and 56 from reference 31 were cleaned and trimmed using Fastp v0.12.3 (55), as were the short-read libraries of the six isolates from this study. These 125 libraries were de novo assembled with Unicycler v4.6 using NCTC13441 as a reference and annotated using Prokka. The 126 genomes were processed using Roary v3.11.2 (56) with a 95% BLAST v2.6.0 identity threshold to create a core genome alignment containing 4,457 SNPs using MAFFT v7.310 (57) spanning 3,250,343 bases and 3,350 genes of the NCTC13441 chromosome (a length similar to that described previously [20]). This core genome was used to construct a maximum likelihood phylogeny using RAxML v8.2.11 with the general time-reversible (GTR) model with gamma rate heterogeneity (58). Clade classification of the six isolates was based on a published ST131 phylogenetic analysis (8) with associated classification and blaCTX-M allele data from references 8 and 31.

Ethical approval.

The study protocol was approved by the National Research Ethics Service (reference 14/EE/1123) and the Cambridge University Hospitals NHS Foundation Trust Research and Development Department (reference A093285).

Data availability.

Illumina reads were deposited under accession numbers ERR2138475, ERR2138200, ERR2138591, ERR1878196, ERR2137889, and ERR1878359 in the European Nucleotide Archive (ENA) under BioProjects PRJEB21499 and PRJEB19918. ONT reads were deposited under accession numbers ERR3284704, ERR328470, ERR3284706, ERR3284707, ERR3284708, and ERR3284709 (see https://www.ebi.ac.uk/ena/data/view/PRJEB30511 in the ENA or https://www.ncbi.nlm.nih.gov/sra/PRJEB30511 in the SRA under BioProject PRJEB30511; see also FigShare at https://doi.org/10.6084/m9.figshare.7554293.v1). Unicycler assemblies were deposited at FigShare (https://doi.org/10.6084/m9.figshare.7560458.v2).

57 in total

Review 1. Mobile genetic elements: the agents of open source evolution.

Authors: Laura S Frost; Raphael Leplae; Anne O Summers; Ariane Toussaint
Journal: Nat Rev Microbiol Date: 2005-09 Impact factor: 60.633

2. In vitro analysis of ISEcp1B-mediated mobilization of naturally occurring beta-lactamase gene blaCTX-M of Kluyvera ascorbata.

Authors: Marie-Frédérique Lartigue; Laurent Poirel; Daniel Aubert; Patrice Nordmann
Journal: Antimicrob Agents Chemother Date: 2006-04 Impact factor: 5.191

Review 3. Linear plasmids and chromosomes in bacteria.

Authors: J Hinnebusch; K Tilly
Journal: Mol Microbiol Date: 1993-12 Impact factor: 3.501

4. Insights into a multidrug resistant Escherichia coli pathogen of the globally disseminated ST131 lineage: genome analysis and virulence mechanisms.

Authors: Makrina Totsika; Scott A Beatson; Sohinee Sarkar; Minh-Duy Phan; Nicola K Petty; Nathan Bachmann; Marek Szubert; Hanna E Sidjabat; David L Paterson; Mathew Upton; Mark A Schembri
Journal: PLoS One Date: 2011-10-28 Impact factor: 3.240

5. MinIONQC: fast and simple quality control for MinION sequencing data.

Authors: R Lanfear; M Schalamun; D Kainer; W Wang; B Schwessinger
Journal: Bioinformatics Date: 2019-02-01 Impact factor: 6.937

6. Global dissemination of a multidrug resistant Escherichia coli clone.

Authors: Nicola K Petty; Nouri L Ben Zakour; Mitchell Stanton-Cook; Elizabeth Skippington; Makrina Totsika; Brian M Forde; Minh-Duy Phan; Danilo Gomes Moriel; Kate M Peters; Mark Davies; Benjamin A Rogers; Gordon Dougan; Jesús Rodriguez-Baño; Alvaro Pascual; Johann D D Pitout; Mathew Upton; David L Paterson; Timothy R Walsh; Mark A Schembri; Scott A Beatson
Journal: Proc Natl Acad Sci U S A Date: 2014-03-31 Impact factor: 11.205

7. Combined Analysis of Variation in Core, Accessory and Regulatory Genome Regions Provides a Super-Resolution View into the Evolution of Bacterial Populations.

Authors: Alan McNally; Yaara Oren; Darren Kelly; Ben Pascoe; Steven Dunn; Tristan Sreecharan; Minna Vehkala; Niko Välimäki; Michael B Prentice; Amgad Ashour; Oren Avram; Tal Pupko; Ulrich Dobrindt; Ivan Literak; Sebastian Guenther; Katharina Schaufler; Lothar H Wieler; Zong Zhiyong; Samuel K Sheppard; James O McInerney; Jukka Corander
Journal: PLoS Genet Date: 2016-09-12 Impact factor: 5.917

8. Phenotypic and Genotypic Characterization of Enterobacteriaceae Producing Oxacillinase-48-Like Carbapenemases, United States.

Authors: Joseph D Lutgring; Wenming Zhu; Tom J B de Man; Johannetsy J Avillan; Karen F Anderson; David R Lonsway; Lori A Rowe; Dhwani Batra; J Kamile Rasheed; Brandi M Limbago
Journal: Emerg Infect Dis Date: 2018-04 Impact factor: 6.883

9. Genetic analysis of invasive Escherichia coli in Scotland reveals determinants of healthcare-associated versus community-acquired infections.

Authors: Cosmika Goswami; Stephen Fox; Matthew Holden; Martin Connor; Alistair Leanord; Thomas J Evans
Journal: Microb Genom Date: 2018-06-22

10. mlplasmids: a user-friendly tool to predict plasmid- and chromosome-derived sequences for single species.

Authors: Sergio Arredondo-Alonso; Malbert R C Rogers; Johanna C Braat; Tess D Verschuuren; Janetta Top; Jukka Corander; Rob J L Willems; Anita C Schürch
Journal: Microb Genom Date: 2018-11-01

11 in total

1. A One Health Genomic Investigation of Gentamicin Resistance in Escherichia coli from Human and Chicken Sources in Canada, 2014 to 2017.

Authors: Graham W Cox; Brent P Avery; E Jane Parmley; Rebecca J Irwin; Richard J Reid-Smith; Anne E Deckert; Rita L Finley; Danielle Daignault; George G Zhanel; Michael R Mulvey; Amrita Bharat
Journal: Antimicrob Agents Chemother Date: 2022-09-27 Impact factor: 5.938

2. Plasmids shape the diverse accessory resistomes of Escherichia coli ST131.

Authors: Arun Gonzales Decano; Nghia Tran; Hawriya Al-Foori; Buthaina Al-Awadi; Leigh Campbell; Kevin Ellison; Louisse Paolo Mirabueno; Maddy Nelson; Shane Power; Genevieve Smith; Cian Smyth; Zoe Vance; Caitriona Woods; Alexander Rahm; Tim Downing
Journal: Access Microbiol Date: 2020-11-18

3. gplas: a comprehensive tool for plasmid analysis using short-read graphs.

Authors: Sergio Arredondo-Alonso; Martin Bootsma; Yaïr Hein; Malbert R C Rogers; Jukka Corander; Rob J L Willems; Anita C Schürch
Journal: Bioinformatics Date: 2020-06-01 Impact factor: 6.937

4. Detection of extended-spectrum beta-lactamase (ESBL) genes and plasmid replicons in Enterobacteriaceae using PlasmidSPAdes assembly of short-read sequence data.

Authors: Joep J J M Stohr; Marjolein F Q Kluytmans-van den Bergh; Ronald Wedema; Alexander W Friedrich; Jan A J W Kluytmans; John W A Rossen
Journal: Microb Genom Date: 2020-06-26

Review 5. Application and Challenge of 3rd Generation Sequencing for Clinical Bacterial Studies.

Authors: Mariem Ben Khedher; Kais Ghedira; Jean-Marc Rolain; Raymond Ruimy; Olivier Croce
Journal: Int J Mol Sci Date: 2022-01-26 Impact factor: 5.923

Review 6. Portable sequencer in the fight against infectious disease.

Authors: Arthur Elia Mongan; Josef Sem Berth Tuda; Lucky Ronald Runtuwene
Journal: J Hum Genet Date: 2019-10-03 Impact factor: 3.172

7. Nanopore MinION Sequencing Reveals Possible Transfer of bla _KPC-2 Plasmid Across Bacterial Species in Two Healthcare Facilities.

Authors: Catharine Prussing; Emily A Snavely; Navjot Singh; Pascal Lapierre; Erica Lasek-Nesselquist; Kara Mitchell; Wolfgang Haas; Rita Owsiak; Elizabeth Nazarian; Kimberlee A Musser
Journal: Front Microbiol Date: 2020-08-19 Impact factor: 5.640

8. A genomic epidemiological study shows that prevalence of antimicrobial resistance in Enterobacterales is associated with the livestock host, as well as antimicrobial usage.

Authors: Manal AbuOun; Hannah Jones; Emma Stubberfield; Daniel Gilson; Liam P Shaw; Alasdair T M Hubbard; Kevin K Chau; Robert Sebra; Tim E A Peto; Derrick W Crook; Daniel S Read; H Soon Gweon; A Sarah Walker; Nicole Stoesser; Richard P Smith; Muna F Anjum
Journal: Microb Genom Date: 2021-10

9. Host Colonization as a Major Evolutionary Force Favoring the Diversity and the Emergence of the Worldwide Multidrug-Resistant Escherichia coli ST131.

Authors: Richard Bonnet; Racha Beyrouthy; Marisa Haenni; Marie-Hélène Nicolas-Chanoine; Guillaume Dalmasso; Jean-Yves Madec
Journal: mBio Date: 2021-08-24 Impact factor: 7.867

10. Insights and genetic features of extended-spectrum beta-lactamase producing Escherichia coli isolates from two hospitals in Ghana.

Authors: Samiratu Mahazu; Wakana Sato; Alafate Ayibieke; Isaac Prah; Takaya Hayashi; Toshihiko Suzuki; Shiroh Iwanaga; Anthony Ablordey; Ryoichi Saito
Journal: Sci Rep Date: 2022-02-03 Impact factor: 4.379