Marie Petitjean1, Bénédicte Condamine1, Charles Burdet1,2, Erick Denamur1,3, Etienne Ruppé1,4. 1. IAME, INSERM, Université de Paris, F-75018 Paris, France. 2. Département d'Epidémiologie, Biostatistique et Recherche Clinique, Hôpital Bichat, APHP, F-75018 Paris, France. 3. Laboratoire de Génétique Moléculaire, Hôpital Bichat, APHP, F-75018 Paris, France. 4. Laboratoire de Bactériologie, Hôpital Bichat, APHP, F-75018 Paris, France.
Abstract
Escherichia coli is a ubiquitous bacterium that has been widely exposed to antibiotics over the last 70 years. It has adapted by acquiring different antibiotic-resistance genes (ARGs), the census of which we aim to characterize here. To do so, we analysed 70 301 E. coli genomes obtained from the EnteroBase database and detected 1 027 651 ARGs using the AMRFinder, Mustard and ResfinderFG ARG databases. We observed a strong phylogroup and clonal lineage specific distribution of some ARGs, supporting the argument for epistasis between ARGs and the strain genetic background. However, each phylogroup had ARGs conferring a similar antibiotic class resistance pattern, indicating phenotypic adaptive convergence. The G+C content or the type of ARG was not associated with the frequency of the ARG in the database. In addition, we identified ARGs from anaerobic, non-Proteobacteria bacteria in four genomes of E. coli, supporting the hypothesis that the transfer between anaerobic bacteria and E. coli can spontaneously occur but remains exceptional. In conclusion, we showed that phylum barrier and intra-species phylogenetic history are major drivers of the acquisition of a resistome in E. coli.
Escherichia coli is a ubiquitous bacterium that has been widely exposed to antibiotics over the last 70 years. It has adapted by acquiring different antibiotic-resistance genes (ARGs), the census of which we aim to characterize here. To do so, we analysed 70 301 E. coli genomes obtained from the EnteroBase database and detected 1 027 651 ARGs using the AMRFinder, Mustard and ResfinderFG ARG databases. We observed a strong phylogroup and clonal lineage specific distribution of some ARGs, supporting the argument for epistasis between ARGs and the strain genetic background. However, each phylogroup had ARGs conferring a similar antibiotic class resistance pattern, indicating phenotypic adaptive convergence. The G+C content or the type of ARG was not associated with the frequency of the ARG in the database. In addition, we identified ARGs from anaerobic, non-Proteobacteria bacteria in four genomes of E. coli, supporting the hypothesis that the transfer between anaerobic bacteria and E. coli can spontaneously occur but remains exceptional. In conclusion, we showed that phylum barrier and intra-species phylogenetic history are major drivers of the acquisition of a resistome in E. coli.
All the genomes are publicly available in the EnteroBase database (https://enterobase.warwick.ac.uk/species/index/ecoli).We analysed a large set of
genomes and searched for antibiotic-resistance genes (ARGs) using various databases. We observed that ARGs distributed according to the phylogenetic background of the strains, supporting the observation that constraints were at play within
. Moreover, we identified four instances of putative transfers of ARGs from a phylum other than that of
, stressing the strong inter-phyla barrier for ARG exchange. However, the capacity of the acquired ARGs to provide resistance against the most used antibiotic families did not differ according to the phylogenetic background, stressing that the different lineages of
adapted to the antibiotic pressure with the acquisition of ARGs their genetic background could accommodate. This research is to our knowledge the first of its kind to study the acquired resistome of
, an intestinal, ubiquitous bacterium that has been exposed to antibiotics from their earliest use. In this connection, the results could help in better understanding how bacteria adapt to antibiotics by acquiring ARGs.
Introduction
is a ubiquitous bacterium found in the intestinal microbiota of vertebrates. In the human gut microbiota,
is the dominant species of the phylum
with a mean 108 c.f.u. (g faeces)–1 [1, 2]. It is also commonly found in the digestive tract of mammals and birds, including livestock and poultry [3]. Hence, during the last 70 years,
strains have been highly exposed to antibiotics used in human and animal health, as well as those used in agriculture. In response,
has adapted through the acquisition of multiple genes encoding antibiotic resistance referred to as antibiotic-resistance genes (ARGs). As a consequence, acquired resistance in
is linked to human activities [4].In the pre-next generation sequencing (NGS) era, an exhaustive characterization of ARGs in a given species was challenged by the necessity to use as many PCRs as the genes targeted and perform experiments in a high number of isolates. With the development of NGS and genomics after 2005, the number of bacterial genomes made available has increased exponentially and the identification of ARGs was made easier via the use of in silico tools. Efforts have been made to collect and organize known ARG sequences in dedicated databases, with the first antibiotic resistance database (ARDB) released as early as 2008 [5, 6]. Since then, others such as ResFinder [6], CARD [7], ARG-ANNOT [8] and more recently AMRFinder [9] have followed. Typically, these databases include thousands of ARG nucleotide and/or amino-acid sequences previously identified in cultivable and/or pathogenic bacteria. However, their content is biased as they lack the ARGs found in bacteria that are difficult to culture and those of little relevance from a medical perspective, such as commensal, strict anaerobic bacteria from the gut microbiota. Nonetheless, we and others have found these bacteria do harbour a vast diversity of ARGs, and the latter actually differ from those found in the ARG databases [10, 11]. These ARGs have been made available in specific databases, namely FARMEDB [12], ResFinderFG [13] and Mustard [11], whose content slightly overlaps with that of conventional ARG databases (Fig. S1, available with the online version of this article). Indeed, very few observations support occurrence of a transfer of the ARG from intestinal commensals to
opportunistic pathogens such as
[14]. Still, such transfer has proven to be possible. For instance, tetX, a gene encoding resistance to tetracyclines [15], was shown to be transferred from
to
. Furthermore, some transfer events may have gone unseen due to the lack of genomic monitoring of a large number of strains and because of the lack of sampling.While most of these ARGs are borne by mobile genetic elements such as integrons [16], plasmids [17, 18] or more rarely phages [19], some associations between their presence and specific
phylogroups have been evidenced in the past based on phenotypic and genetic markers [20-23]. More recently, genomic data have confirmed such associations and extended them to more specific phylogenetic lineages [24, 25]. Some of these multidrug-resistant lineages are disseminating worldwide, such as clonal group A [sequence type (ST) 69 phylogroup D] [26] and more recently ST131 (phylogroup B2) [27]. In such clonal groups, strong associations have been evidenced between the within ST sub-clade, the plasmid type and the ARG content [28]. All these data argue for a complex cross-talk between the chromosomal background, the genetic support of the ARG and the ARG itself resulting from intergenic epistasis [29].Since December 2015, EnteroBase [30], a public database including thousands of genomes from
/
and other species (
, other
species,
,
,
,
and
), has been available. Beyond genomes, EnteroBase includes (with varying degrees of completeness) metadata linked to the strain itself (name, source, location, laboratory, species, serotype, disease and entry/update date) and to its sequencing process (N50, coverage). Still, no data regarding antibiotic resistance are available. In this study, we leveraged the high number of
genomes in EnteroBase to characterize the acquired ARGs in
and, more precisely, (i) to test for specific associations between the phylogenetic background of the strains and the presence of ARGs, and (ii) to evidence ARG transfer between non-
species and
using metagenomic databases ResFinderFG [13] and Mustard [11].
Methods
Genomic database, species classification and
phylogroup determination
A total of 82 063 available genomes was downloaded from Escherichia/Shigella EnteroBase (as of 1 February 2019). The genomes were classified according to their genera and species (
,
,
,
,
or unknown). First,
and enteroinvasive
(EIEC) genomes were identified using in silico PCR with primers of the ipaH3 gene [31]. Due to their specific, obligatory intracellular pathogenic trait, they were removed from the dataset. Then, using ClermonTyper [32], a tool that provides information about phylogroups (A, B1, B2, C, D, E, F and G) for
and identifies nearest species in conjunction with Mash [33] (
,
and
clades including
), all the genomes were classified as
belonging to the aforementioned phylogroups,
,
and
clades I to V. Of note, in this study, we considered that phylogroup F included both F and G phylogroups [34]. When a discrepancy was observed between the ClermonTyper and Mash attribution (n=3734) the strain was classified according to Mash.
ARG identification, plasmid incompatibility group, chromosomal multilocus sequence type (MLST) determination and G+C content
Diamond [35] was used to identify all the ARGs in EnteroBase by aligning all genomes against the AMRFinder [9] (29 04 2019 version), Mustard [11] (30 09 2017 version) and the ResFinderFG [13] (21 12 2016 version) databases (with a minimum coverage × identity value greater than 0.64 for nucleotidic sequences corresponding to 80 % identity and 80 % coverage). All redundancies (sequences sharing 100 % identity in nucleotides) between the databases were removed. The ARG families were selected according to the Mustard website (http://mgps.eu/Mustard). All ResFinderFG and Mustard ARGs originatng from non-
were further investigated. When a genome was found to include an ARG putatively originating from a non-
, contamination (i.e. the presence of multiple sequences of non-
along with that of
) was assessed using Kraken [36]. PlasmidFinder [37] database together with Diamond [35] were used to determine the plasmid incompatibility groups found in each genome of the EnteroBase database (98 % identity and a minimum of 95 % coverage). MLST was determined using mlst software based on the Warwick University or Pasteur Institute MLST schemes available from the pubMLST database (https://github.com/tseemann/mlst) [38]. G+C content (%) deviation between
and the acquired ARG was measured using the
G+C defined previously [39]. ARG sub-family classification was obtained by clustering all the ARGs from the three databases (AMRFinder, Mustard and ResfinderFG) using cd-hit-est based on a 90 % identity threshold [40, 41].
Statistical analysis and normalization
To circumvent the sequencing biases for the richness and diversity estimation, the data were normalized so that each phylogroup would include the same number of genomes (n=10 000) by re-sampling (for C, D and F phylogroups) or sub-sampling (for A, B1, B2 and E phylogroups) while maintaining the proportionality. The same protocol was applied to the STs. For all other statistical analysis, the complete dataset was used (without normalization). We tested the correlations between phylogroup, ST, plasmid incompatibility group and ARG using the corrplot package of R v3.4.2 and the corrmat function. The preferential distribution of some ARGs within phylogroups was tested using the Kruskal–Wallis test and Benjamini–Hochberg correction. The diversity of the ARGs in some STs was measured using the Shannon index in R (v3.4.2) with the vegan package. The number of distinct ARGs in phylogroups or STs was referred to as ARG richness. Logistic regression was performed using R (v3.4.2) and the glm function. We first tested all variables in a univariate model and afterwards included in the multivariate model all variables that had shown a P value <0.01. The stepAIC function of the mass package was used to performs stepwise model selection by Akaike information criterion (AIC).
Results
Distribution of the species and
phylogroups among the E. coli/Shigella EnteroBase
During the curation step, 5144 genomes (6.3 % of genomes) were re-assigned to another genus/species. After curation, we identified 70 301
, 221
clade I, 5
clade II, 26
clade III, 17
clade IV, 102
clade V (
), 216
, 36
and 11 139
and EIEC (enteroinvasive
). Among
, the phylogroups were distributed as follows: B1 (25 265), A (12 469), B2 (12 414), E (11 965), D (4139), F (2403) and C (1646) (Fig. 1). The three major STs were ST11 (n=10 059, phylogroup E), ST131 (n=5143, phylogroup B2) and ST10 (n=3959, phylogroup A), making up 27 % of the
in the database. Of note, ST11 and ST131 include the O157:H7 Shiga-toxin-producing
(STEC) [42] and the O25b:H4 extra-intestinal pathogenic
(ExPEC) strains [27], respectively, while ST10 encompasses the laboratory-derived K-12 strain [43].
Fig. 1.
Distribution of species/phylogroups in the Escherichia/Shigella EnteroBase. Among the 82 063 genomes of the Escherichia/Shigella EnteroBase (a), we identified 623 genomes from
,
and
clades (referred to as ‘others’) (b), and 70 301 genomes of
distributed in seven phylogroups (c). Of note, the F phylogroup in the figure includes both F and G phylogroups [34].
clade III and IV are two sub-species belonging to a unique species.
clade V corresponds to
.
Distribution of species/phylogroups in the Escherichia/Shigella EnteroBase. Among the 82 063 genomes of the Escherichia/Shigella EnteroBase (a), we identified 623 genomes from
,
and
clades (referred to as ‘others’) (b), and 70 301 genomes of
distributed in seven phylogroups (c). Of note, the F phylogroup in the figure includes both F and G phylogroups [34].
clade III and IV are two sub-species belonging to a unique species.
clade V corresponds to
.
Most frequent ARGs found in
First, we used the AMRFinder database and identified a total of 314 091 ARGs in
genomes. The mean number of ARGs by genome was around 4.5 and the median was 2, and we observed a minimum of 0 ARGs and a maximum of 47 ARGs. The first part (n=164 519) included genes matching to known genes with 100 % of identity and coverage. This corresponded to 381 ARGs out of the 4955 genes included in the AMRFinder database. The second part (n=149 572) was made of variants sharing a coverage × identity value greater than 0.64 for nucleotide sequences (corresponding to 80 % identity and 80 % coverage). This comprised variants for 328 genes (including 169 genes not previously detected when the 100 % identity and coverage parameters applied) in the AMRFinder database. A total of 550 genes out of 4955 (11.1 %) of the AMRFinder database were, thus, found at least once in
genomes. The 20 most frequent ARGs sharing 100 % identity with genes from AMRFinder are depicted in Fig. 2. We predominantly found genes encoding β-lactamases and aminoglycoside-modifying enzymes (AMEs), the three most abundant genes being bla
TEM-1 (n=16 766), aph(3′′)-Ib (n=15 481) and aph(6)-Id (n=12 845).
Fig. 2.
The 20 ARGs most frequently detected in
from the EnteroBase database sharing 100 % identity with ARGs from AMRFinder. AME, aminoglycoside-modifying enzyme; Cat, chloramphenicol acetyltransferase; DfrA, dihydrofolate reductase type A; Mph, macrolide phosphotransferase; Sul, dihydropteroate synthase; Tet, tetracycline efflux pump.
The 20 ARGs most frequently detected in
from the EnteroBase database sharing 100 % identity with ARGs from AMRFinder. AME, aminoglycoside-modifying enzyme; Cat, chloramphenicol acetyltransferase; DfrA, dihydrofolate reductase type A; Mph, macrolide phosphotransferase; Sul, dihydropteroate synthase; Tet, tetracycline efflux pump.Among the 234 β-lactamases found in the AMRFinder database, 27 sub-families were identified (with 100 % identity or in variants). The five β-lactamases bla
TEM (n=18 364), bla
CTX-M (n=10 640), bla
OXA-1-like (n=4075), bla
CMY-2-like (n=1994) and bla
SHV (n=518) were the most frequent (Table S1).We also observed a high frequency of ARGs conferring resistance to antibiotics that are used to treat infections not caused by
but rather caused by Gram-positive bacteria, such as rifampicin (arr, n=394/110, respectively, 100 % identity and variants) and macrolide-lincosamide [lnu (n=494/322), mef (n=258/21), mphA (n=8661/408), erm (n=81/1405), vga (n=1/0) and msr (n=76/2)].Unexpectedly, we identified a blaZ gene commonly found in
in an
strain. However, subsequent analysis of the genome revealed that 10 % of reads were assigned to
and 90 % to
. This 10 % of reads from
evenly distributed in the genome of
strain CFSAN007851, strongly supporting the hypothesis of a contamination prior to sequencing.
Distribution of the resistance genes according to the strain phylogeny
We observed that the mean number of ARGs per genome differs according to the phylogroup: 5.3 for A, 3.6 for B1, 5.5 for B2, 7.6 for C, 7.1 for D, 2.2 for E and 7.4 for F phylogroup (P <0.001 with ANOVA test). We took an ecological approach by considering the richness (corresponding to the number of unique ARGs) and the diversity with the Shannon index (used to quantify specific biodiversity). We observed a variable distribution of the ARG richness in each phylogroup, with a richness of 254, 236, 213, 170, 178, 122 and 169 in phylogroups A, B1, B2, C, D, E and F, respectively. However, we observed an even distribution of the diversity with Shannon index equal to 3.63, 3.51, 3.15, 3.58, 3.40, 2.92 and 3.51 (Fig. S2). The three most represented ARGs in each phylogroup were bla
TEM-1, aph(3′′)-Ib and aph(6)-Id, except for in phylogroups D and F, in which mphA (a phosphotransferase conferring resistance to macrolides) and tetB (an efflux pump conferring resistance to tetracyclines) ranked second and third, respectively. However, specific ARGs were more frequently found in some phylogroups [referred to as phylogroup-predominant resistance genes (PPRGs), i.e. ARGs with a P value less than 0.001 with Kruskal–Wallis test and Benjamini–Hochberg correction; Fig. 3]. Indeed, 53.5 % (n=197/368) of the bla
OXA-48 (a widely spread carbapenemase-encoding gene) were found in phylogroup D. Likewise, 49.5 % (n=4141/8366) of bla
CTX-M-15 (encoding the extended-spectrum β-lactamase most prevalent worldwide) were found in phylogroups B2 and C (25 % each). Additionally, phylogroups B1 and E represented 25 265 and 11 965 genomes, but only included 5 and 35 PPRGs, respectively. In contrast, we observed 102 PPRGs in phylogroup A (n=12 469).
Fig. 3.
Histogram of the 10 (when available) most representative ARGs in each
phylogroup (referred to as the PPRGs). Only genes with at least 20 occurrences in the database are represented. For each phylogroup, genes are sorted by decreasing frequency.
Histogram of the 10 (when available) most representative ARGs in each
phylogroup (referred to as the PPRGs). Only genes with at least 20 occurrences in the database are represented. For each phylogroup, genes are sorted by decreasing frequency.We tested the hypothesis that even if the distribution of ARGs differed according to the phylogroup, that of their function (i.e. the antibiotic families they encode resistance to) would not. We applied the same statistical approach and indeed found no specific association between the activity spectrum of ARG families and phylogroups (Fig. 4). For instance, we looked specifically at bla
CTX-M genes, which encode resistance to third-generation cephalosporins. bla
CTX-M was widespread between all phylogroups (and particularly in the phylogroup B2) except the phylogroup E where bla
CTX-M-1 was prominent. However, the pattern of resistance they confer is very similar.
Fig. 4.
Histogram of the distribution of the targeted antibiotic families in each
phylogroup. The proportion of each targeted antibiotic family is represented as a percentage of each phylogroup.
Histogram of the distribution of the targeted antibiotic families in each
phylogroup. The proportion of each targeted antibiotic family is represented as a percentage of each phylogroup.We then searched for differential abundances with respect to the main STs of EnteroBase [ST11 (n=9391), ST131 (n=4841) and ST10 (n=3407)] (Fig. 5). Despite the fact that ST11 had the maximum number of genomes in EnteroBase, its ARG richness was 65, while it was 162 and 169 in ST131 and ST10, respectively. The Shannon entropy values for ST11, ST131 and ST10 were 2.36, 2.86 and 3.12, respectively. We also sought a specific distribution of some widely spread, clinically relevant β-lactamases. bla
TEM-1 and bla
CTX-M-15 were mostly found in ST131 [12.5 % (n=2094) and 42 % (n=2310)]. Conversely, bla
NDM-1 was mostly found in ST101 (18 %, n=38).
Fig. 5.
Histogram of the most representative ARGs in each
major ST. Only genes with at least 20 occurrences in the database were represented. For each ST, genes were sorted by decreasing frequency. ‘Others’ represents the 2430 STs with few occurrences of the indicated genes. Each gene was preferentially found in one ST with a P value less than 0.01 with Kruskal–Wallis test and Benjamini–Hochberg correction.
Histogram of the most representative ARGs in each
major ST. Only genes with at least 20 occurrences in the database were represented. For each ST, genes were sorted by decreasing frequency. ‘Others’ represents the 2430 STs with few occurrences of the indicated genes. Each gene was preferentially found in one ST with a P value less than 0.01 with Kruskal–Wallis test and Benjamini–Hochberg correction.
Global associations between strain phylogeny, plasmid type and resistance genes
We tested correlations between ARGs, phylogroups, STs and plasmid incompatibility groups. A total of 124 clusters were found with a correlation factor strictly higher than 0.30, each containing at least one ARG with other ARGs, STs, phylogroups or plasmid incompatibility groups. The bla
CTX-M-15 gene strongly correlated with aac(6′)-Ib and bla
OXA-1 (r=0.70), and also to a lesser extent correlated to aac(3)-IIa, mphA, aadA5, qacEdelta1 and ST131, as well as the plasmid incompatibility group IncFII (r=0.36). In contrast, we did not identify other ARGs, plasmid incompatibility groups or STs associated with the other common bla
CTX-M genes, bla
CTX-M-27 and bla
CTX-M-1. As for carbapenemase-encoding genes, a correlation was detected between bla
NDM-1 and aph(3′)-VI, floR, erm-42, bla
CMY-6, mphE, msrE, armA and with the plasmid incompatibility group IncA (r=0.36). Finally, bla
TEM-1 correlated with aph(3′′)-Ib, aph(6)-Id, aac(3)-IId, sul2 and the incompatibility group IncQ1 (r=0.39). We further assessed whether the ARGs that correlated were located next to each other. The ARGs associated with bla
CTX-M-15, bla
NDM-1 and bla
TEM-1 were in most instances found on the same contig (Fig. S3), supporting their acquisition via a common mobile genetic element rather than multiple acquisitions events. Last, we used a logistic regression to identify the variables associated with the presence of bla
CTX-M-15, bla
NDM-1 and bla
TEM-1. In multivariate analysis, we observed strong associations between these genes and other ARGs (including those mentioned above), but also with plasmid types and phylogroups, supporting the role of the genetic background in the presence of ARGs in
(Tables S2–S4). Of note, we did not observe any negative correlation between ARGs. However, genes tetB, aadA5, qacEdelta1, bla, bla, bla and mphA were negatively correlated to ST11.
G+C content and type of antibiotic resistance
We determined the G+C content of clusters of ARGs measured from the EnteroBase genomes and the distribution of their divergence using the mean G+C content of the
core genome (51.8 %) [39]. We observed a large panel of G+C content deviation (Fig. 6) between the mean G+C content of
and the G+C content of acquired ARGs, supporting the supposition that the G+C content was not a constraint for the acquisition of ARGs from other species or within the
species. Interestingly, we also observed that the G+C content of the most frequent ARGs found in
(with at least 1000 occurrences) did not significantly differ from that of the ARGs with low frequency (Mann–Whitney test, P=0.7, mean for the most frequent ARG 50.0, and 48.6 for low frequency).
Fig. 6.
Scatter plot of the G+C deviation of ARGs according to their occurrence in EnteroBase and the type of resistance encoded. Each dot corresponds to one cluster of ARGs (90 % nucleotide identity) created using AMRFinder, Mustard and ResfinderFG databases. The G+C content deviation corresponds to the difference between the mean G+C content of
[39] and the mean of the G+C content of each gene in the cluster.
Scatter plot of the G+C deviation of ARGs according to their occurrence in EnteroBase and the type of resistance encoded. Each dot corresponds to one cluster of ARGs (90 % nucleotide identity) created using AMRFinder, Mustard and ResfinderFG databases. The G+C content deviation corresponds to the difference between the mean G+C content of
[39] and the mean of the G+C content of each gene in the cluster.The functional class of the horizontally transferred ARGs has been shown to play a role in the fitness of the recipient strain in that the gene encoding drug modification (e.g. β-lactamase and AME) would have a minimal impact on the fitness of the recipient strain even if originating from a G+C-divergent background. Conversely, genes encoding proteins interacting with the cellular content (e.g. efflux and target modification) would impact the fitness of the recipient, especially when originating from a G+C-divergent background [44]. However, in our dataset, we did not observe a distinct pattern according to the type of resistance (Mann–Whitney test, P=0.7, mean for drug-modifying enzyme equal to 3.969, and 2.582 for ARGs with cellular interactions; Fig. 6).
Non-
ARGs can be exceptionally found in
Lastly, we assessed whether some ARGs found in non-
bacteria were present in the
genomes of EnteroBase using the specific databases ResFinderFG and Mustard. We observed a high number of hits (n=833 814) in the Mustard and ResFinderFG databases, mostly corresponding to genes predicted to originate from
. The first part (n=271 458) included genes matching to known genes with 100 % of identity and coverage. This corresponded to 37 different genes identified in the 6095 genes of the Mustard database (Table S5) and 44 different genes identified in the 2282 genes of the ResFinderFG database (Table S6). The second part (n=562 356) was made of variants sharing a coverage × identity value greater than 0.64 for nucleotide sequences (corresponding to 80 % identity and 80 % coverage) with known genes. This made up 385 673 variants for 23 additional genes in Mustard (Table S5) and 176 683 variants for 61 additional genes in ResFinderFG (Table S6).Interestingly, we identified a putative β-lactamase encoding gene from ResFinderFG (beta_lactamase, KU546399.1, faeces, AMX) and Mustard (MC3.MG60.AS1.GP1.C4251.G1) databases also found in the strain
NBRC 113350 (NCBI GenBank accession no. NZ_AP019724.1). The
bearing this gene was from phylogroup A, ST744 (Table 1), and had been isolated in Germany [45] from a patient screened for multidrug-resistant bacteria at the University Hospital of Münster. The gene was embedded in a 7600 bp contig (Fig. S4), itself sharing 100 % identity with the
genome. The strain was kindly provided by Professor Alexander Mellmann from the University of Münster and was re-sequenced in our laboratory using Illumina MiniSeq and Oxford Nanopore (Oxford Nanopore Technologies) chemistries, which confirmed the presence of the resistance gene (data not shown). Further description of this strain is underway and will be detailed in a separate study.
Table 1.
Description of the ARGs shared between
and non-
strains
E. coli strain ID
Phylogroup
ST (Warwick University/Pasteur Institute schemes)
Serotype
fimH allele
Resistance gene
Strain of origin ID
ESC_HA9845AA
A
744/2
Onovel32:H10
54
Putative β-lactamase
Bacteroides uniformis strain NBRC 113350
ESC_OA1280AA
E
753/920
O130:H9
124
erm(49)
Bifidobacterium breve strain CECT7263
ESC_JA0734AA
D
405/477
O102:H6
27
erm
Clostridioides difficile strain CDT4
ESC_FA9928AA
A
5943/999
O89:H11
41
tetM
Clostridioides difficile strain CD161
Description of the ARGs shared between
and non-
strainsstrain IDPhylogroupST (Warwick University/Pasteur Institute schemes)SerotypefimH alleleResistance geneStrain of origin IDESC_HA9845AAA744/2Onovel32:H1054Putative β-lactamasestrain NBRC 113350ESC_OA1280AAE753/920O130:H9124erm(49)strain CECT7263ESC_JA0734AAD405/477O102:H627ermstrain CDT4ESC_FA9928AAA5943/999O89:H1141tetMstrain CD161We also found two erm genes (ribosomal methylases), encoding macrolide-lincosamide resistance and originating from non-
. The first one was found in an
from phylogroup E, ST753, originating from a livestock sample from Ireland (Table 1). A part of the 12 910 bp contig containing the gene matched against a
genome (Fig. S5). We were able to identify some transposase genes in the contig bearing the erm gene and surrounding inverted repeat sequences. The second erm was found in an
from a phylogroup D, ST405 sample from The Netherlands (Table 1), and was originally identified on a plasmid from
(Fig. S6). Unfortunately, the metadata associated with those genomes was not sufficient to trace back the strains and confirm the presence of the genes in
by re-sequencing the strains. However, no evidence of wet-lab or dry-lab contamination was observed.Last, we identified a tetM gene, encoding tetracycline resistance, originating from
(Fig. S7). The gene was found in a 3291 bp contig completely matched against the
strain in an
from phylogroup A, ST5943, originating from Thailand. No evidence of contamination was observed even though the reads were not available.
Discussion
Using a large number of genomes, we were able to assess the diversity and distribution of ARGs in
. From a global perspective, we assumed that the richness of acquired ARGs in
was somewhat limited with regards to the high number of ARGs in the literature and the closeness of
to human activities (the ARGs found in
representing 11.1 % of the AMRFinder database). This suggested constraints in the path for ARG exchanges and sustainability between species (phylogenetic origin of the gene, interaction within the cell of the gene product [44, 46]), but also within the
species. We did not find any evidence of a link between the G+C content or the functional class of the transferred genes and their frequency in the database. However, some ARGs had specific associations with the genomic background (phylogroups and STs), other ARGs and plasmid incompatibility groups. Of note, we did not observe any negative correlation between ARGs, suggesting that the ARGs are not competing with each other.Such association between the phylogenetic background and the ARGs can result mainly from two evolutionary scenarios. In the ‘chance and timing scenario’, there is a limited number of acquisition events then propagated vertically (clonal inheritance). In this case, the strong gene-lineage association is only contingent upon evolutionary history. Such a rare acquisition scenario would be likely to apply to genes such as aac(3)-I, aac(6′)-Ib-cr5 or bla. Conversely, for the multiple arrivals, the maintenance and the expression of the ARGs are under selection due to epistatic interactions between the resistance determinants and the genomic background. This could concern genes such as tetB, with widespread distribution but increased prevalence in specific lineages.Similar association between the genomic background and the presence of virulence genes has been reported within the
species [47, 48] and attributed to epistasis between different parts of the genome [29]. Nonetheless, on a broader scale, we observed that these preferential genetic supports of resistance led to the same functional pattern of resistance to antibiotic classes. While
phylogroups had different ARG distributions, they harboured a full genetic armamentarium to resist the same antibiotic families. In brief, the ARGs were different but their functions were similar. Such a functional redundancy suggests that the
phylogroups were exposed to the same antibiotic pressure but acquired different ARGs to cope with it in an adaptive convergence process. Interestingly, a similar role of the genetic background influencing the genomic basis of antibiotic resistance by channelling evolution along different mutational paths has been reported following a long-term in vitro evolution experiment with
[49].The antibiotic exposure impacts not only the bacteria causing infections, but also the bacteria residing in our microbiota. Indeed, we observed a high rate of ARGs conferring resistance to antibiotics that are used to treat infections not caused by
or other Gram-negative bacteria but rather caused by Gram-positive bacteria, such as rifampicin and macrolide-lincosamides. Such antibiotics are excreted in the intestine at high concentrations, so that some bacteria with minimum inhibitory concentrations too high to be in the clinical spectrum of the antibiotic would be affected in the gut. In that respect, the high frequency of ARGs conferring resistance to those antibiotics is a strong signal stressing the impact of antibiotics on our microbiota.We observed four putative transfers of ARGs between non-
and
. In a previous work, we observed that the vast majority of ARGs found in the intestinal microbiota were very distinct from those found in cultivable bacteria (including
) and that few arguments were found to support their mobility [11]. Taking advantage of the largest
database to date, we could observe that ARGs could actually be exchanged between
and intestinal commensals, such as was observed for tetX [50]. Even if anecdotal in this dataset, with four observations, the very fact that they were detected suggests that they are not that uncommon, but unlike tetX which has met with success, their spread is very limited to date as they were only found in one genome each. The donor bacteria were strict anaerobic bacteria (
,
and
) commonly found in the gut microbiota at high abundances [51] alongside
. Considering that humans have been using antibiotics for more than 70 years, very favourable conditions have been met for ARG transfers between anaerobic bacteria and
. That such transfers have been observed so rarely supports the hypothesis that anaerobic bacteria may indeed provide ARGs to
, but that the contribution to the worldwide AMR issue seems to be minor. Indeed, the most successful ARGs found in
originate from other
species [14], e.g. bla
CTX-M progenitors are
spp. belonging to the
[52]. Besides, the observations of the ARG transfer would not have been possible if not for the use of multiple databases to cover the broadest range of ARGs. While databases such as AMRFinder may be found suitable to identify ARGs from clinically relevant bacteria, they may not be appropriate when it comes to looking for ARGs originating from other environments, such as the gut microbiota.We acknowledge some limitations of the present study. Despite including a large number of genomes, EnteroBase suffers from inclusion biases in that strains of interest (e.g. the resistant and/or the pathogenic ones) are the most sequenced. Indeed, EnteroBase includes a large number of STEC (Shiga-toxin-producing
), mainly the O157:H7 serotype, ExPEC (extra-intestinal pathogenic
) with the emerging ST131 and many strains producing extended spectrum β-lactamases. This may have led to an overestimation of the associations between ARGs, phylogenetic traits and replicons. Also, many more ARGs may have been found in
, perhaps including some from intestinal commensals that were not found of interest to be cultured and sequenced. Hence, we assume we did not cover the global picture of the acquired ARGs always found in
, but only a part. Nonetheless, we believe our findings are sound with regards to the very high number of strains included in this study. We also know that some genomes with contamination can be found in the database, which warrants the use of specific tools upstream of genome analysis.In all, we observed that ARGs were distributed in the
phylogroups/STs with a preferential fashion. In the meantime, they provided resistance to the same antibiotic families. Furthermore, we observed that the transfer of ARGs between non-
and
indeed occurred but seemed to be exceptional.Click here for additional data file.Click here for additional data file.
Authors: F R Blattner; G Plunkett; C A Bloch; N T Perna; V Burland; M Riley; J Collado-Vides; J D Glasner; C K Rode; G F Mayhew; J Gregor; N W Davis; H A Kirkpatrick; M A Goeden; D J Rose; B Mau; Y Shao Journal: Science Date: 1997-09-05 Impact factor: 47.728
Authors: Jason W Sahl; Carolyn R Morris; Jennifer Emberger; Claire M Fraser; John Benjamin Ochieng; Jane Juma; Barry Fields; Robert F Breiman; Matthew Gilmour; James P Nataro; David A Rasko Journal: J Clin Microbiol Date: 2015-01-14 Impact factor: 5.948