Preetha Shibu1,2, Frazer McCuaig3, Anne L McCartney4, Magdalena Kujawska5, Lindsay J Hall5,6, Lesley Hoyles3,7. 1. Life Sciences, University of Westminster, UK. 2. Present address: Berkshire and Surrey Pathology Services, Frimley Health NHS Trust, Wexham Park Hospital, Slough, UK. 3. Department of Biosciences, Nottingham Trent University, UK. 4. Department of Food and Nutritional Sciences, University of Reading, UK. 5. Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, UK. 6. Chair of Intestinal Microbiome, ZIEL - Institute for Food & Health, Technical University of Munich, Freising, Germany. 7. Department of Surgery and Cancer, Imperial College London, UK.
Draft genome sequences for PS_Koxy1, PS_Koxy2 and PS_Koxy4 have been deposited with links to BioProject accession number PRJNA562720 and under accession numbers VTQC00000000, VTQB00000000 and VTQA00000000, respectively. Supplementary data and material associated with this article are available from figshare at https://figshare.com/projects/Kleboxymycin_biosynthetic_gene_cluster/81059.Members of the complex are difficult to tell apart using phenotypic and chemotaxonomic methods. Consequently, many genomes deposited in public databases are misclassified as . Here we demonstrate that the current multi-locus sequence typing (MLST) system for the complex can be used to accurately distinguish many strains, which will be of use to clinical laboratories in resource-limited settings which rely on the MLST scheme for typing and epidemiological tracking of isolates. In addition, extended analyses of the genomes of spp. have revealed the kleboxymycin biosynthetic gene cluster (BGC) is restricted to species of the complex (, , and ). Species- and/or gene-specific differences in the cluster’s sequences may be relevant to virulence of and related species. The finding of the kleboxymycin BGC in the preterm infant gut microbiota may have implications for disease presentation in a subset of neonates.
Introduction
Members of the complex encode a chromosomal β-lactamase gene (bla
OXY) [1]. Differences in the sequence of this gene allowed the establishment of phylogroups (Ko), which correspond to species: (Ko1, with Ko5 representing a sub-lineage), (Ko2), (Ko3), (Ko4), (Ko6) and (Ko8). Ko7 has been described on the basis of a single isolate [1]. Individual gene (rpoB, gyrA, rrs) sequences can be used to differentiate species of the complex [2], as can genome-based average nucleotide identity (ANI) and phylogenomic analyses [1, 3]. All members of the complex can be differentiated by MALDI-TOF [1], but reference databases currently in routine clinical use lack reference spectra of the different species to allow identification beyond .Recent work has demonstrated genomic characterization of strains is inadequate, with large numbers of genomes deposited in public databases erroneously assigned to instead of or [3-6]. Consequently, and are clinically relevant but under-reported in the literature [3, 7]. Given that the bla
OXY gene has diversified in parallel to housekeeping genes in the complex, it is likely that the multi-locus sequence typing (MLST) scheme [8] can be used to distinguish members of this genetically diverse group of bacteria.Little is known about the antibiotic-resistance and virulence genes encoded by and related species. In the course of ongoing –phage work, with three GES-5-positive ST138 strains originally described as [9, 10], we sought to determine whether widely recognized virulence factors such as enterobactin, yersiniabactin and salmochelin are encoded in the strains’ genomes, and the kleboxymycin biosynthetic gene cluster (BGC), as this was until recently a little-studied BGC implicated in non- antibiotic-associated haemorrhagic colitis (AAHC) [11-14]. AAHC is caused by the overgrowth of cytotoxin-producing secondary to use of antibiotics such as penicillin or amoxicillin, resulting in the presence of diffuse mucosal oedema and haemorrhagic erosions [15, 16]. This type of colitis is distinct from the more common form of antibiotic-associated diarrhoea caused by toxin-producing Clostridiodes difficile, which usually gives rise to watery diarrhoea resulting in mild to moderate disease.Gene-based and genomic analyses of our ST138 isolates showed they were , not , and that along with common virulence genes they encoded the kleboxymycin BGC. Our findings led us to (1) determine whether the MLST scheme could be used to differentiate members of the complex, and (2) investigate the distribution of the kleboxymycin BGC in a range of and related species.
Methods
Clinical isolates
Strains PS_Koxy1 (isolated December 2014; cardiothoracic/intensive care unit), PS_Koxy2 (isolated August 2015; haematology unit) and PS_Koxy4 (isolated September 2015; haematology unit) had been recovered from a throat swab, urine and rectal swab, respectively, obtained from three different adults. The strains were from the study of Eades et al. [9], described in further detail by Ellington et al. [10] (PS_Koxy1, patient X; PS_Koxy2, patient A; PS_Koxy4, patient B; Frances Davies, personal communication). Full details of methods associated with the phenotypic and genotypic characterization of the clinical isolates can be found in Supplementary Material (available in the online version of this article).
ANI analysis of genome sequences
All annotated non-redundant genome assemblies available in the NCBI Genome database on 2 September 2019 (n=7170; Table S1) were downloaded [17]. ANI of genomes with their closest relatives and type strains of species was assessed using FastANI [18], which uses Mashmap as its MinHash-based alignment-free sequence mapping engine to provide ANI values for both complete and draft-quality genomes that are related by 80–100 % ANI.
MLST analyses
Allele sequences (n=442 representing seven housekeeping genes – gapA, infB, mdh, pgi, phoE, rpoB, tonB – contributing to 354 different MLST sequence types; correct as of 19 March 2021) for the MLST scheme [8] were used to determine the MLST profiles of all complex genomes included in this study (Table S1). The allele sequences were used to create blastn databases against which the assemblies of all genomes included in this study were searched. Sequences with exact hits to one allele of each housekeeping gene were retained, allowing us to identify the sequence types of the genomes included in this study (Table S2). For those genomes that returned hits to alleles across all seven housekeeping genes, a phylogenetic tree (neighbour joining, Jukes Cantor) was generated in Geneious Prime v2019.2.1 using the aligned (clustal W) concatenated (gapA–infB–mdh–pgi–phoE–rpoB–tonB) nucleotide sequences of their housekeeping genes and those of each sequence type used in the MLST scheme [8]. Support for clustering of nodes in the tree was determined by bootstrap analysis (1000 replications).
Characterization of the kleboxymycin BGC in genomes
The annotated reference sequence of the kleboxymycin BGC was downloaded from GenBank (accession number MF401554 [11]) and used as a BLASTP database for searches with the protein sequences encoded within the genomes of PS_Koxy1, PS_Koxy2 and PS_Koxy4. Initially, Geneious Prime v2019.2.1 was used to identify regions of the three draft genomes encoding the complete BGC, and to align them to MF401554.The protein sequences of the annotated assemblies were searched for the kleboxymycin BGC using the reference sequence and BLASTP v2.9.0+, and the resulting hits were filtered based on >70 % identity and >70 % coverage to identify isolates potentially carrying genes from the BGC. (n=3) and (n=2) and -related metagenome-assembled genomes (MAGs) (n=25) from Chen et al. [3] were also subject to BLASTP searches. Genomes that encoded the full BGC (i.e. all 12 BGC genes on a contiguous stretch of DNA) were identified from the blast results. The protein sequences encoded in the BGC were extracted from the annotated assemblies using samtools v1.9 faidx [19] and concatenated into a single sequence (the sequence data are available as supplementary material from figshare). These concatenated sequences were used to produce a multiple-sequence alignment (MSA) in Clustal Omega v1.2.4, along with the BGC sequences of the three clinical isolates, the reference sequence [11], a recently described sequence [20] and a homologous sequence found in BZA12 (to be used as an outgroup in later phylogenetic analyses; identified as encoding the complete kleboxymycin BGC through NCBI BLASTP). Phylogenetic analyses were carried out on the MSA using the R package Phangorn v2.5.5 [21], producing a maximum-likelihood tree, which was visualised and rooted (on BZA12) using the Interactive Tree of Life (iTOL v5.5) [22]. To examine variation at the individual protein level, further within-species MSAs were produced for each of the 12 protein sequences in the BGC. Each of these alignments was used as the basis for a consensus sequence, produced using EMBOSS Cons v6.6.0.0, representing each of the four species carrying the BGC. An MSA and per cent identity matrix were then generated for each protein between the consensus sequences of , , and , along with the reference sequence [11].The species affiliations of the genomes encoding the full kleboxymycin BGC were determined using FastANI v1.2 [18] against genomes of type strains of the and complexes [1, 23] and ATCC 13048T (assembly accession number GCA_003417445), with PhyloPhlAn 0.99 used to conduct a phylogenetic analysis to confirm species affiliations. PhyloPhlAn identifies hundreds of conserved (core) proteins from a given genomic dataset and uses them to build a complete high-resolution phylogeny.
Results
Characterization of the clinical isolates
Although initial phenotypic tests (Supplementary Material) and genomic analyses [9, 10] identified PS_Koxy1, PS_Koxy2 and PS_Koxy4 as , analyses of the isolates' proteomes showed them to be ST138 (phylogroup Ko1, bla
OXY1-8) (Fig. S1). Full details of phenotypic characterization and genome sequencing of the clinical isolates can be found in Supplementary Material. PS_Koxy1, PS_Koxy2 and PS_Koxy4 all shared 98.81, 98.71 and 98.71 % ANI, respectively, with the type strain of (W14T, GCA_901556995), and 99.98–100.00 % ANI with each other. Based on current recommendations, ANI of 95–96 % and above with the genome of the type strain is indicative of species affiliation [24]. Inclusion of the genomes with representatives of all six species of the complex in a phylogenetic analysis confirmed the affiliation of PS_Koxy1, PS_Koxy2 and PS_Koxy4 with (Fig. S2).
Assigning MLST sequence types to species
While annotations for genomes are improving, we have previously noted and continue to notice issues with identities attributed to genomes in public repositories [3]. Consequently, the identity of all genomes included in this work was first confirmed by ANI analysis (Table S1), with bla
OXY gene and phylogenetic analyses supporting our findings (Supplementary Material). Of the 178 complex genomes identified, many had been misassigned in GenBank: seven genomes were listed as , 106 as , 51 as , 13 as sp. and one as . Our analyses of the 178 genomes showed the dataset actually represented (n=76), (n=66), (n=24), (n=6), (n=5) and (n=1).The MLST scheme uses sequence polymorphisms among seven housekeeping genes – gapA, infB, mdh, pgi, phoE, rpoB, tonB – to generate sequence types for isolates. Currently, there are 442 allele sequences that contribute to 354 unique MLST sequence types. We first identified nucleotide sequences within the genomes with exact matches to nucleotide sequences within the allele reference dataset. One-hundred-and-twenty-nine genomes returned hits to known MLST profiles, and 10 isolates returned MLST profiles with no assigned sequence type (Table S2). Our clinical isolates returned the expected ST138 result.Of the 66 genomes, 59 could be assigned to known sequence types (in order of abundance: ST2, ST176, ST199, ST36, ST19, ST30, ST53, ST101, ST18, ST31, ST34, ST48, ST58, ST59, ST141, ST145, ST153, ST181, ST221, ST222, ST257, ST258, ST287, ST323) and one (GCA_003937225) represented a novel sequence type. Of the 24 genomes, 13 could be assigned to known sequence types (ST172, ST216, ST104, ST186, ST236, ST263, ST316, ST319, ST350), with four (GCA_002856195, GCA_900451335, GCA_008120915, GCA_004343645) representing unique novel sequence types. Of the six genomes, three could be assigned to known sequence types (ST47, ST311, ST351) and one (GCA_901563825) represented a novel sequence type. Of the 79 genomes (including our three clinical isolates), 57 could be assigned to known sequence types (ST85, ST27, ST202, ST143, ST29, ST50, ST84, ST138, ST11, ST88, ST317, ST28, ST40, ST52, ST82, ST92, ST98, ST108, ST127, ST144, ST146, ST157, ST170, ST180, ST226, ST294, ST315), with four genomes (GCA_000783895, GCA_000735215, GCA_007097185, GCA_007097115) representing three novel sequence types. None of the or genomes returned hits to known alleles (Table S2), but the relevant individual housekeeping gene sequences are provided as Supplementary Files for use by other researchers.For those genomes that encoded known or novel sequence types, we concatenated their housekeeping-gene sequences and used them to create a MSA with the concatenated sequences of each of the 354 recognized MLST sequence types. This MSA was used to create a phylogenetic tree, allowing us to visualize the relationships among species and sequence types (Fig. 1).
Fig. 1.
Sequence types within the MLST scheme can be used to distinguish members of the complex. The three clinical isolates (all ST138) characterized in this study are shown in white. The phylogenetic tree (neighbour joining, Jukes Cantor) was generated using concatenated nucleotide sequences of housekeeping genes (gapA–infB–mdh–pgi–phoE–rpoB–tonB) used in the MLST scheme [8]. The purple circles represent bootstrap values ≥80 % (based on 1000 replications); the larger the circle, the higher the bootstrap value. Scale bar, average number of nucleotide substitutions per position. The full list of MLST sequence types and their species affiliations are available in Table S2.
Sequence types within the MLST scheme can be used to distinguish members of the complex. The three clinical isolates (all ST138) characterized in this study are shown in white. The phylogenetic tree (neighbour joining, Jukes Cantor) was generated using concatenated nucleotide sequences of housekeeping genes (gapA–infB–mdh–pgi–phoE–rpoB–tonB) used in the MLST scheme [8]. The purple circles represent bootstrap values ≥80 % (based on 1000 replications); the larger the circle, the higher the bootstrap value. Scale bar, average number of nucleotide substitutions per position. The full list of MLST sequence types and their species affiliations are available in Table S2.Of the 354 known MLST sequence types, 342 (96.6 %) were associated with specific members of the complex (Table S2): 115 with , 130 with , 73 with and 24 with . Eleven were associated with unspecified members of the complex. ST105 was associated with , sharing 99.73 % sequence similarity type strain’s MLST profile. -specific sequence types shared 98.64–100 % sequence similarity, -specific sequence types shared 96.62–100.00 % sequence similarity, -specific sequence types shared 98.20–100.00 % sequence similarity, -specific sequence types shared 99.00–100.00 % sequence similarity and -specific sequence types shared 97.09–99.7 % sequence similarity. A matrix of similarity values for the 504 sequences included in the analysis is available in Supplementary Material, along with the MSA alignment used to generate the phylogenetic tree shown in Fig. 1.
Detection of the complete kleboxymycin BGC in clinical isolates
It has long been known that gut colonization is linked with AAHC [16]. Schneditz et al. [12] showed tillivaline (TV), a pyrrolobenzodiazepine (PBD) derivative produced by , is one of the enterotoxins responsible for causing AAHC. This toxic product is encoded by the heterologous expression of the kleboxymycin (also known as tilimycin (TM) [14]) BGC comprising 12 genes [11]. Protein sequences of the reference sequence [11] were used to create a BLASTP database against which the proteins encoded in the genomes of PS_Koxy1, PS_Koxy2 and PS_Koxy4 were compared. The genomes of PS_Koxy1, PS_Koxy2 and PS_Koxy4 encoded a complete kleboxymycin BGC (Fig. 2). All genes in each of the genomes shared >99 % identity and >99 % query coverage with the genes of the reference sequence [12]: mfsX, 99.76 % identity; uvrX, 99.87 %; hmoX, 99.80 %; adsX, 99.85 %; icmX, 99.52 %; dhbX, 99.62 %; aroX, 99.74 %; npsA, 99.80 %; thdA, 98.68 %; npsB, 99.93 %; npsC, 98.47 %; marR, 99.39%.
Fig. 2.
Alignment of the kleboxymycin BGCs from the three clinical strains with the complete cluster of MH43-1 (GenBank accession number MF401554 [11]). (a) The image (alignment view) was generated via the progressiveMauve algorithm plugin of Geneious Prime v2019.2.1 (default settings, full alignment), with gene names for the three clinical isolates assigned manually. (b) Genes corresponding to Prokka-generated annotations. Consensus identity is the mean pairwise nucleotide identity over all pairs in the column: green, 100 % identity; greeny-brown, at least 30 % and under 100 % identity; red, below 30 % identity.
Alignment of the kleboxymycin BGCs from the three clinical strains with the complete cluster of MH43-1 (GenBank accession number MF401554 [11]). (a) The image (alignment view) was generated via the progressiveMauve algorithm plugin of Geneious Prime v2019.2.1 (default settings, full alignment), with gene names for the three clinical isolates assigned manually. (b) Genes corresponding to Prokka-generated annotations. Consensus identity is the mean pairwise nucleotide identity over all pairs in the column: green, 100 % identity; greeny-brown, at least 30 % and under 100 % identity; red, below 30 % identity.Our strains were ST138, so we downloaded and assembled (from BioProject PRJEB30858) available raw sequence data from 19 ST138 strains described recently [10] and determined whether they were in fact and encoded the kleboxymycin BGC. All strains were confirmed to be on the basis of ANI analysis, and encoded the complete kleboxymycin BGC (Fig. S4).Schneditz et al. [12] reported npsA/npsB were functionally conserved in six sequenced strains of (Table 1), based on a BLASTP analysis. Full details of the analysis are unavailable, with only a brief mention of presence being determined based on BLASTP sequence identities >90 % with no indication of sequence coverage. All the genomes included in the study of Schneditz et al. [12] were compared with those of the type strain of and related species to confirm their species affiliations (Table 1). While some strains were , others belonged to , , and . Using thresholds of 70 % identity and 70 % query coverage in our BLASTP analyses to reduce the potential for detecting false positives, we reanalysed the genomes included in the study of Schneditz et al. [12]. Our results agreed with those of Schneditz et al. [12] for all genomes, except we detected npsA/npsB (and all other genes encoded in the kleboxymycin BGC) in SA2. 10–5243, 10–5250, 11492–1, 10–5248 and M5a1 also encoded the whole kleboxymycin BGC. All genes in all matches shared greater than 90 % identity across greater than 99 % query coverage. 10–5242, E718 and KCTC 1686 did not encode homologues associated with the kleboxymycin BGC. 10–5245 encoded almost-complete homologues of four genes [EHS96696.1 (marA) 98.79 % identity, 99.39 % coverage; EHS96697.1 (npsC) 95.38 % identity, 99.23 % coverage; (EHS96698.1 (mfsX) 96.68 % identity, 99.87 % coverage; EHS96699.1 (uvrX) 94.88 % identity, 99.76 % coverage] in contig JH603137.1.
Table 1.
Genomes included in analyses conducted by Schneditz et al. [12] with corrected species affiliations (originally reported as )
Assembly accession
Strain
Species
ANI with shown genome*
npsA/npsB
GCA_000240325.1
KCTC 1686
K. michiganensis
98.69 %, GCA_901556995.1
–
GCA_000247835.1
10–5242
K. michiganensis
97.59 %, GCA_901556995.1
–
GCA_000247855.1
10–5243
K. oxytoca
99.31 %, GCA_900977765.1
+
GCA_000247875.1
10–5245
K. oxytoca
99.13 %, GCA_900977765.1
–
GCA_000247895.1
10–5246
Raoultella ornithinolytica
99.21 %, GCA_001598295.1
–
GCA_000247915.1
10–5250
K. pasteurii
99.29 %, GCA_901563825.1
+
GCA_000252915.3
11492–1
K. oxytoca
99.15 %, GCA_900977765.1
+
GCA_000276705.2
E718
K. michiganensis
98.37 %, GCA_901556995.1
–
GCA_000427015.1
SA2
K. grimontii
99.33 %, GCA_900200035.1
+
GCA_001078235.1
10–5248
K. oxytoca
99.25 %, GCA_900977765.1
+
GCA_001633115.1
M5a1
K. grimontii
99.40 %, GCA_900200035.1
+
*GCA_901556995.1=K. michiganensis W14T; GCA_900200035.1, K. grimontii 06D021T; GCA_900977765.1=K. oxytoca ATCC 13182T; GCA_001598295.1=R. ornithinolytica NBRC 105727T; K. pasteurii SB3355T GCA_901563825.1.
Genomes included in analyses conducted by Schneditz et al. [12] with corrected species affiliations (originally reported as )Assembly accessionStrainSpeciesANI with shown genome*npsA/npsBGCA_000240325.1KCTC 168698.69 %, GCA_901556995.1–GCA_000247835.110–524297.59 %, GCA_901556995.1–GCA_000247855.110–524399.31 %, GCA_900977765.1+GCA_000247875.110–524599.13 %, GCA_900977765.1–GCA_000247895.110–524699.21 %, GCA_001598295.1–GCA_000247915.110–525099.29 %, GCA_901563825.1+GCA_000252915.311492–199.15 %, GCA_900977765.1+GCA_000276705.2E71898.37 %, GCA_901556995.1–GCA_000427015.1SA299.33 %, GCA_900200035.1+GCA_001078235.110–524899.25 %, GCA_900977765.1+GCA_001633115.1M5a199.40 %, GCA_900200035.1+*GCA_901556995.1=K. michiganensis W14T; GCA_900200035.1, K. grimontii 06D021T; GCA_900977765.1=K. oxytoca ATCC 13182T; GCA_001598295.1=R. ornithinolytica NBRC 105727T; K. pasteurii SB3355T GCA_901563825.1.
Detection of the kleboxymycin BGC in the faecal microbiota of preterm infants
Our previous work had highlighted the preterm infant gut microbiota harbours a range of species belonging to the complex [3]. BLASTP searches of the two (P049A W, GCA_008120305; P095L Y, GCA_008120085) and three (P038I, GCA_008120465; P043G P, GCA_008120425; P079F P, GCA_008120915) strains we previously characterized showed all three strains encoded the kleboxymycin BGC (Fig. S5). All BGC genes in their genomes shared >98 % identity and >99 % query coverage with the genes of the reference sequence [11]: mfsX, 100 % identity; uvrX, 99.60–99.73 %; hmoX, 99.80 %; adsX, 99.69 %; icmX, 100 %; dhbX, 100 %; aroX, 99.94–99.74 %; npsA, 99.21–99.41 %; thdA, 98.68 %; npsB, 99.31–99.38 %; npsC, 99.23–100 %; marR, 100 %. The BGC was also detected in 8 out of 25 of the preterm-associated complex MAGs (three , five ) we described previously [3]. An MSA of the preterm-associated genomes’ BGC against the reference sequence [11] suggested species-specific clustering of the sequences (Fig. S5).
Prevalence of the kleboxymycin BGC in spp.
Given the work detailed above had detected the kleboxymycin BGC in several different but closely related species and in a range of clinical and gut-associated isolates, and Hubbard et al. [20] recently detected the BGC in a strain of , we chose to increase the scope of our analysis to include 7170 publicly available assembled genomes (including our three clinical strains, and five isolates from preterm infants [3]) (Table S1).As mentioned above, we have noted issues with identities attributed to genomes in public repositories [3], so the identity of all non- complex genomes included in this work was first confirmed by ANI analysis (Table S1). The majority (n=6245) of the additional genomes were , followed by subsp. (n=241), subsp. (n=184), (n=168), subsp. (n=120), subsp. (n=19), ‘K. quasivariicola’ (n=11) and (n=1). Out of 7170 genomes, 110 (1.5 %) had one or more matches with the 12 genes encoded within the kleboxymycin BGC reference sequence, with all except two genomes (both ) belonging to species of the complex (Table S3). Ninety-six genomes – all belonging to the complex – encoded at least 12 genes belonging to the BGC (Table S3), and were examined further.One genome (GCA_002856195) encoding 12 BGC genes was found to encode two stretches of the same protein with the other cluster-associated genes non-contiguous, while one (GCA_004005605) encoded 13 BGC genes (one gene duplicated) in a non-contiguous arrangement. Fifty-five out of 66 (83.3 %) genomes encoded the entire kleboxymycin BGC, as did 19 out of 24 (79.2 %) , 9 out of 79 (11.4 %) and five out of six (83.3 %) genomes (Fig. 3a). Phylogenetic analysis (Fig. 3b) confirmed findings from ANI analyses (Table S1) that showed all genomes belonged to species of the complex. The 88 genomes confirmed to encode the complete kleboxymycin BGC included the type strain of . The BGC cluster sequences grouped according to species, and the reference sequence [11] clustered with sequences and was closely related to the type strain of that species (Fig. 3c).
Fig. 3.
Distribution of the kleboxymycin BGC in spp. genomes. (a) Distribution of the complex genomes encoding the entire kleboxymycin BGC. (b) Unrooted maximum-likelihood tree [generated using PhyloPhlAn v0.99 [34] and 380 protein-encoding sequences conserved across the genomes] confirming species affiliations of the 88 genomes within the complex [1] encoding the kleboxymycin BGC. Type strains are shown with coloured backgrounds corresponding to the legend. The clade associated with and has been collapsed because of space constraints. (c) Maximum-likelihood tree generated with the concatenated protein sequences for the kleboxymycin BGC of the 88 genomes found to encode all 12 genes of the BGC plus the reference sequence [11]. The tree was rooted using the kleboxymycin-encoding BGC of BZA12. Values at nodes, bootstrap values expressed as a percentage of 100 replicates. Sources of isolates, where known, are shown to the right of the assembly accession numbers. (b, c) Scale bar, average number of substitutions per position.
Distribution of the kleboxymycin BGC in spp. genomes. (a) Distribution of the complex genomes encoding the entire kleboxymycin BGC. (b) Unrooted maximum-likelihood tree [generated using PhyloPhlAn v0.99 [34] and 380 protein-encoding sequences conserved across the genomes] confirming species affiliations of the 88 genomes within the complex [1] encoding the kleboxymycin BGC. Type strains are shown with coloured backgrounds corresponding to the legend. The clade associated with and has been collapsed because of space constraints. (c) Maximum-likelihood tree generated with the concatenated protein sequences for the kleboxymycin BGC of the 88 genomes found to encode all 12 genes of the BGC plus the reference sequence [11]. The tree was rooted using the kleboxymycin-encoding BGC of BZA12. Values at nodes, bootstrap values expressed as a percentage of 100 replicates. Sources of isolates, where known, are shown to the right of the assembly accession numbers. (b, c) Scale bar, average number of substitutions per position.Species-specific consensus sequences were generated for all genes within the kleboxymycin BGC and are available as supplementary material from figshare. Similarity values for each gene within the BGC consensus sequences across the four species are available in Table S4.
Discussion
Genotypic characteristics of the three clinical strains
The three clinical strains characterized herein had previously been included in a study of outbreak strains encoding GES-5 and CTX-M-15 [9], the first report of GES-5-positive clinical isolates of ST138 in the UK. Subsequently, it has been shown that the GES-5 gene in these strains is encoded on an IncQ group plasmid [10]. The whole-genome sequence data reported on previously [9] were not available to us. API 20E (this study; Supplementary Material), MALDI-TOF and limited sequence analysis [9] had shown the strains to be . Our previous work with isolates recovered from preterm infants had shown that API 20E testing on its own was insufficient to accurately identify strains [3]. The strains described by Eades et al. [9] were characterized before the availability of MALDI-TOF databases capable of splitting species of the complex (MALDI-TOF was only able to identify as but did not have sufficient resolution to identify individual species within the complex) [1]. As we are using PS_Koxy1, PS_Koxy2, PS_Koxy4 in ongoing phage work, we generated draft genome sequences for the strains, to accurately identify them and facilitate detailed host–phage studies in the future.ANI and phylogenetic analyses confirmed all three strains belonged to the species , not (Supplementary Material). In addition to the AMR genes GES-5 (β-lactamase with carbapenemase activity) and CTX-M-15 (an ESBL responsible for resistance to cephalosporins) reported previously [9], the strains encoded SHV-66, an ESBL not previously reported in and related species (Supplementary Material). SHV-66 has previously only been reported in a minority of β-lactamase-producing in Guangzhou, China [25]. In this study, SHV-66 (99.65 % identity, bit-score 580 – strict CARD match) was also found in strains E718 [26], GY84G39 (unpublished), K1439 (unpublished) and 2880STDY5682598 [7] (accession numbers GCA_000276705, GCA_001038305, GCA_002265195 and GCA_900083915, respectively), included in the phylogenetic analysis shown in Fig. S1. Moradigaravand et al. [7] noted in their study that 2880STDY5682598 encoded a bla
SHV gene, but did not document its type nor indicate its novelty.The three strains had identical virulence factor profiles (Fig. 3b), encoding the plasminogen activating omptin Pla, the Mg2+ transport proteins MgtBC, Hsp60, autoinducer-2 (LuxS), type I fimbriae, type three fimbriae, type six secretion system I, common pilus and enterobactin. They also encoded numerous proteins associated with capsule, regulation of capsule synthesis (RcsAB) and LPS, with several of the latter sharing identity with endotoxins (RfaD, GalU, LpxC, GmhA/LpcA, KdsA). All six proteins required for allantoin utilization were encoded in the strains’ genomes.No capsule or O antigen types could be assigned to the strains using Kaptive, but all three strains were best matched with KL68 [PS_Koxy1, 17 out of 18 genes matched (cpsACP missing); PS_Koxy2 and PS_Koxy4, 16 out of 18 genes matched (cpsACP and KL68_18 missing)] and O1v1 [four out of seven genes (wzm, wzt, glf, wbbO) matched in all strains].
MLST sequence types can be used to distinguish members of the complex
The bla
OXY gene diversified in parallel to housekeeping genes in the complex, and it is already known that rpoB – one of the seven genes included in the MLST scheme [8] – can be used to identify members of the complex [2]. Given that our three clinical strains were ST138 and belonged to , we determined whether specific sequence types within the MLST scheme could be assigned to species. We found that all species of the complex are associated with specific sequence types. In addition, we identified 10 novel MLST sequence types that can be used to identify , , and genomes (Table S2).Herzog et al. [27], when originally describing the MLST scheme to characterize clinical isolates, showed their concatenated sequence data for 74 clinical isolates were associated with three clusters (A, B1 and B2). Comparison of their sequence types with our annotations shows that cluster A represents , cluster B1 represents and cluster B2 represents and .The ability to use the MLST scheme to differentiate clinical isolates will be of particular interest to clinical microbiologists in resource-limited settings who rely on the MLST scheme for typing and epidemiological tracking of isolates in the absence of whole-genome sequence data. It should also be noted that ribosomal MLST [28] (rMLST) available via the Species ID portal of the PubMLST website allows those working with genome sequence data derived from complex isolates to identify species. This resource uses 53 genes encoding the bacterial ribosome protein subunits (rps genes) to rapidly characterize genomic data to the species level.The identification of ST105 as belonging to indicates this sequence type should be withdrawn from the MLST scheme.
Distribution of the kleboxymycin BGC in spp.
As relatively little is known about the virulence factors of and related species, and the VFDB is limited with respect to the number of spp. on which it reports information, we wanted to see whether our strains encoded the kleboxymycin BGC responsible for generating microbiome-associated metabolites known to directly contribute to AAHC [11, 12]. The cytotoxic nature of a heat-stable, non-proteinaceous component of spent media from strains isolated from patients with AAHC was first reported in 1990 [29]. With respect to being a causative agent of AAHC, the bacterium has fulfilled Koch’s postulates [15]. While a commensal of the gut microbiota of some individuals, it has been suggested that cytotoxic is a transient member of the gut microbiota [29].TV is a PBD produced by and is a causative agent of AAHC [12]. The TV biosynthesis genes are encoded on a non-ribosomal peptide synthase operon and include npsA, thdA and npsB. The genes aroX and aroB are also essential for TV production [13]. The genes npsA, thdA, npsB and aroX are located on a pathogenicity island (PAI). In clinical isolates, the PAI was present in 100 % of toxin-producing isolates, but only 13 % of non-toxin-producing isolates [12]. AAHC is characterized by disruption of epithelial barrier function resulting from apoptosis of epithelial cells lining the colon. TV exerts its apoptotic effect by binding to tubulin and stabilising microtubules, leading to mitotic arrest [14].A second PBD generated by the same pathway as TV has been identified [13]. TM (also called kleboxymycin [11]) has stronger cytotoxic properties than TV, having a PBD motif with a hydroxyl group at the C11 position, while TV has an indole ring. When deprived of indole by the inactivation of the indole-producing tryptophanase gene tnaA, produces TM but not TV. TV production is restored with the addition of indole, as indole spontaneously reacts with TM to produce TV. Limited interconversion between TM and TV may also occur spontaneously in vivo [11]. TM is a genotoxin and triggers apoptosis by interacting with DNA, which leads to the activation of damage repair mechanisms, causing DNA strand breakage [14]. DNA interaction is prevented in the case of TV by its indole ring, and both the molecular targets and apoptotic mechanisms of TM and TV are distinct. The kleboxymycin BGC is not native to , nor the wider . Instead, the BGC is thought to have been acquired via horizontal gene transfer from spp., which in turn acquired the BGC from bacteria of the phylum [11].In the current study, we found the kleboxymycin BGC in our isolates and that it was common among four species of the complex, with and strains making the largest contribution and the type strains of and encoding the BGC (Fig. 3). Prior to this study, sequences from two strains (MH43-1, GenBank accession number MF401554 [11]; AHC-6, GenBank accession number HG425356 [12]) were available for the kleboxymycin BGC. Draft genome sequences do not appear to be publicly available for either of these strains. However, our analysis of the kleboxymycin BGC across the complex has shown that MH43-1 is a strain of (Fig. 3c). Hubbard et al. [20] recently reported on a strain of that encoded the BGC, based on antiSMASH analysis. Comparison of the AHC-6 sequence with that of MH43-1 and other sequences included in this study shows AHC-6 is a strain of (99.0–99.55 % nucleotide similarity with the BGCs encoded by the three MAG sequences included in Fig. S5). It is likely that as more genomes of complex species are deposited in public databases, the range of species encoding the kleboxymycin BGC will increase.All three of our strains encoded the kleboxymycin BGC (Figs 2 and 3), as did strains of we previously isolated from preterm infants and and MAGs recovered from publicly available shotgun metagenomic data (Fig. S5). Whether the BGCs encoded in our clinical and infant-associated strains are functional will be the subject of future studies. The discovery of the kleboxymycin BGC in strains and MAGs recovered from preterm infants is of particular concern. Gut colonization is linked with AAHC, with disease caused by the overgrowth of cytotoxin-producing strains secondary to use of antibiotics [16]. AAHC presents as diffuse mucosal oedema and haemorrhagic erosions [16], and patients pass bloody diarrhoea [30]. The gut microbiota of preterm infants is shaped by the large quantity of antibiotics these infants are given immediately after birth to cover possible early onset infection, with ‘blooms’ of bacteria preceding onset of infection [3]. Blood in the stool is frequently associated with necrotizing enterocolitis (NEC) in preterm infants, which shares similar pathological hallmarks to AAHC – i.e. intestinal necrosis. Notably, NEC is difficult to diagnose in the early stages and is often associated with sudden serious deterioration in infant health, with treatment options limited due to emerging multi-drug-resistant bacteria associated with disease. Previous studies have linked spp. to preterm NEC (supported by corresponding clinical observations), with bacterial overgrowth in the intestine linked to pathological inflammatory cascades, facilitated by a ‘leaky’ epithelial barrier and LPS–TL4 activation. Recent work has demonstrated complex isolates of ST173, ST246 and a novel ST [7-32] recovered from infants with NEC can produce kleboxymycin (TM) and TV [31]. Using our MLST annotation scheme (Table S2), we determine these sequence types represent , and , respectively, with rMLST analyses of the whole-genome sequence data of Paveglio et al. [31] confirming our findings (rST 124484, rST 124487 and rST 157090, respectively). Taken together with the results from our study, we suggest specific virulence factors – i.e. kleboxymycin-related metabolites encoded by atypical spp. – may also play a role in NEC, and this warrants further study.Attempts have been made to link specific subtypes of to AAHC [32]. Cytotoxic effects were limited to , with faecal (and to a lesser extent skin) isolates of most commonly associated with cytotoxicity [32]. No genetic relationship was associated with cytotoxic strains based on pulsed-field gel electrophoresis, and 31 out of 97 strains exhibited evidence of cytotoxin production (i.e. reduced viability of Hep2 cells). Joainig et al. [32] isolated genetically distinct cytotoxin-positive and -negative strains from one AAHC patient, leading them to suggest that, when detected in faeces, should be considered an opportunistic pathogen able to produce disease upon antibiotic treatment. They also found that, in patients with acute or chronic diarrhoeal diseases, more than half of the isolates recovered were cytotoxin-positive. Given that -related species are not routinely screened for in such samples, it is possible that kleboxymycin-producing isolates may make a greater contribution to diarrhoeal diseases than currently recognized, especially in patients suffering from non--associated disease. We have shown that there are species-specific differences in the kleboxymycin BGC (Fig. 3c). These differences may have implications for virulence of strains and warrant further study. It is hoped that the identification of an increased range of strains (including type strains) encoding the kleboxymycin BGC will facilitate such studies.Click here for additional data file.Click here for additional data file.
Authors: Thomas R Connor; Nicholas J Loman; Simon Thompson; Andy Smith; Joel Southgate; Radoslaw Poplawski; Matthew J Bull; Emily Richardson; Matthew Ismail; Simon Elwood- Thompson; Christine Kitchen; Martyn Guest; Marius Bakke; Samuel K Sheppard; Mark J Pallen Journal: Microb Genom Date: 2016-09-20
Authors: Chirag Jain; Luis M Rodriguez-R; Adam M Phillippy; Konstantinos T Konstantinidis; Srinivas Aluru Journal: Nat Commun Date: 2018-11-30 Impact factor: 14.919
Authors: Cristina Merla; Carla Rodrigues; Virginie Passet; Marta Corbella; Harry A Thorpe; Teemu V S Kallonen; Zhiyong Zong; Piero Marone; Claudio Bandi; Davide Sassera; Jukka Corander; Edward J Feil; Sylvain Brisse Journal: Front Microbiol Date: 2019-10-25 Impact factor: 5.640
Authors: Nagender Ledala; Mishika Malik; Karim Rezaul; Sara Paveglio; Anthony Provatas; Aaron Kiel; Melissa Caimano; Yanjiao Zhou; Jonathan Lindgren; Kristyna Krasulova; Peter Illes; Zdeněk Dvořák; Sandhya Kortagere; Sabine Kienesberger; Amar Cosic; Lisa Pöltl; Ellen L Zechner; Subho Ghosh; Sridhar Mani; Justin D Radolf; Adam P Matson Journal: mBio Date: 2022-01-25 Impact factor: 7.867
Authors: Kelly L Wyres; Jane Hawkey; James Stewart; Louise M Judd; Adam Jenney; Kathryn E Holt Journal: BMC Infect Dis Date: 2022-08-24 Impact factor: 3.667