| Literature DB >> 34559044 |
William Matlock1, Samuel Lipworth1,2, Bede Constantinides1,3, Timothy E A Peto1,2,3,4, A Sarah Walker1,3,4, Derrick Crook1,2,3,4, Susan Hopkins5, Liam P Shaw6, Nicole Stoesser1,2,3.
Abstract
Analysing the flanking sequences surrounding genes of interest is often highly relevant to understanding the role of mobile genetic elements (MGEs) in horizontal gene transfer, particular for antimicrobial-resistance genes. Here, we present Flanker, a Python package that performs alignment-free clustering of gene flanking sequences in a consistent format, allowing investigation of MGEs without prior knowledge of their structure. These clusters, known as 'flank patterns' (FPs), are based on Mash distances, allowing for easy comparison of similarity across sequences. Additionally, Flanker can be flexibly parameterized to fine-tune outputs by characterizing upstream and downstream regions separately, and investigating variable lengths of flanking sequence. We apply Flanker to two recent datasets describing plasmid-associated carriage of important carbapenemase genes (bla OXA-48 and bla KPC-2/3) and show that it successfully identifies distinct clusters of FPs, including both known and previously uncharacterized structural variants. For example, Flanker identified four Tn4401 profiles that could not be sufficiently characterized using TETyper or MobileElementFinder, demonstrating the utility of Flanker for flanking-gene characterization. Similarly, using a large (n=226) European isolate dataset, we confirm findings from a previous smaller study demonstrating association between Tn1999.2 and bla OXA-48 upregulation and demonstrate 17 FPs (compared to the 5 previously identified). More generally, the demonstration in this study that FPs are associated with geographical regions and antibiotic-susceptibility phenotypes suggests that they may be useful as epidemiological markers. Flanker is freely available under an MIT license at https://github.com/wtmatlock/flanker.Entities:
Keywords: antimicrobial resistance (AMR); bioinformatics; mobile genetic element (MGE); plasmid; whole-genome sequencing
Mesh:
Substances:
Year: 2021 PMID: 34559044 PMCID: PMC8715433 DOI: 10.1099/mgen.0.000634
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Schematic of Flanker’s modes and parameters. (a) Flanker uses Abricate to annotate the gene of interest in input sequences and outputs associated flanking sequences, optionally clustering (-cl) these on a user-defined Mash distance threshold. It can take linear or circularized sequences. (b) In this example, genes geneA and geneB have been queried (-g geneA geneB), and only the upstream flank is desired (-f upstream). The top single black arrow represents choosing a single window of length 3000 bp (-w 3000), whereas the bottom three black arrows represent stepping in 1000 bp windows from 0 to 3000 bp (-w 0 -wstep 1000 -wstop 3000). The default mode (-m default) extracts flanks for all annotated alleles separately, but the multi-allelic mode (-m mm) extracts flanks for all alleles in parallel. (c) Flanker has a supplementary salami mode (-m sm), which outputs non-contiguous blocks of sequence with a start point, step size and end point (-w 0 -wstep 1000 -wstop 3000), represented by the three black arrows.
Fig. 3.Flanking regions 7200 bp upstream of bla KPC-2/3 in plasmids from isolates. The Tree panel is a neighbour-joining tree reconstructed from Mash distances between complete sequences of plasmids carrying the bla KPC-2/3 gene. The next three panels indicate the presence/absence of FIB(pQ1I)-, FII(pKP91)- and FIB(Kpn3)-type plasmids. The Gene column indicates which bla KPC allele (2 or 3) is present. The Gene Graphical Representation panel schematically represents coding regions in the 7200 bp sequence region upstream of the bla KPC-2/3 gene, which is shown in red. Other genes are coloured according to the FP, which here takes into account the overall pattern of all 100 bp window groups (shown in the Flankergram panel) over the full 7200 bp region upstream of bla. The Flankergram shows window clusters over each 100 bp window between 0 and 7200 bp. The MLST panel shows multilocus sequence types, with those occurring once labelled ‘other’. The final two panels show the Galileo AMR and the TETyper outputs for the eight FPs, respectively. The FPs are numbered in ascending order according to abundance in the hybrid assemblies.
Fig. 2.Flanking regions 5000 bp upstream of bla OXA-48 in plasmids from . pneumoniae isolates. The Tree panel is a neighbour-joining tree reconstructed from Mash distances between complete sequences of plasmids carrying the bla OXA-48 gene. The second panel indicates the presence/absence of a L/M(pOXA-48)-type plasmid. The Gene Graphical Representation panel schematically represents coding regions in the 5000 bp sequence upstream of the bla OXA-48 gene, which is shown in red. Other genes are coloured according to the FP, which considers the overall pattern of all 100 bp window clusters up to 2200 bp (the approximate upstream limit of Tn1999). The Flankergram panel shows window clusters of all groups over each 100 bp window between 0 and 5000 bp. The dotted line at 2200 bp indicates the approximate point of upstream divergence between several FPs. The MLST panel shows multilocus sequence types, with those occurring once labelled ‘other’. FPs are numbered in ascending order according to abundance in the hybrid assemblies. Data used to make this figure came from the Dutch CPE surveillance and EuSCAPE hybrid assembly datasets.