| Literature DB >> 35082278 |
Charlie Higgs1, Norelle L Sherry1,2,3, Torsten Seemann1,2, Kristy Horan2, Hasini Walpola2, Paul Kinsella4, Katherine Bond4, Deborah A Williamson2,4,5, Caroline Marshall6, Jason C Kwong1,3, M Lindsay Grayson3,7, Timothy P Stinear1, Claire L Gorrie1,2, Benjamin P Howden8,9,10.
Abstract
Vancomycin-resistant Enterococcus faecium (VREfm) is a major nosocomial pathogen. Identifying VREfm transmission dynamics permits targeted interventions, and while genomics is increasingly being utilised, methods are not yet standardised or optimised for accuracy. We aimed to develop a standardized genomic method for identifying putative VREfm transmission links. Using comprehensive genomic and epidemiological data from a cohort of 308 VREfm infection or colonization cases, we compared multiple approaches for quantifying genetic relatedness. We showed that clustering by core genome multilocus sequence type (cgMLST) was more informative of population structure than traditional MLST. Pairwise genome comparisons using split k-mer analysis (SKA) provided the high-level resolution needed to infer patient-to-patient transmission. The more common mapping to a reference genome was not sufficiently discriminatory, defining more than three times more genomic transmission events than SKA (3729 compared to 1079 events). Here, we show a standardized genomic framework for inferring VREfm transmission that can be the basis for global deployment of VREfm genomics into routine outbreak detection and investigation.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35082278 PMCID: PMC8792028 DOI: 10.1038/s41467-022-28156-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Comparison of genomic typing methods for E faecium.
a Midpoint-rooted maximum-likelihood phylogenetic tree of E. faecium isolates. Tree includes all isolates used in this study not including genetic outliers (n = 343). The reference genome is identified by the red triangle tip and is sequence type (ST) 1421. Singleton STs and core genome multilocus sequence type (cgMLST) clusters with only one isolate are shown in white. b Stacked histogram of cgMLST allelic differences between all E. faecium isolate pairs (n = 346 isolates). The total number of genes in the scheme is 1423. The red dashed line represents a pairwise allelic difference threshold of 25. The inset graph shows the same data set but has been restricted to show only cgMLST pairwise allelic difference of ≤ 50.
Fig. 2Relationship between methods for determining genomic clusters for E. faecium isolates from the major multilocus sequence type (ST1421 n = 134).
Core genome multilocus sequence type (cgMLST) clusters are based on single-linkage clustering using a pairwise allelic difference threshold of ≤25 alleles. cgMLST core alignment (cgMLSTCA), cgMLST with cluster reference core alignments (cgMLSTCRCA), pairwise comparison using de novo references (PCDR) and split kmer analysis (SKA) clusters are based on a pairwise single nucleotide polymorphism (SNP) distance threshold determined based on intrapatient SNP diversity (cgMLSTCA: ≤44 SNPs, cgMLSTCRCA: ≤6 SNPs, PCDR: ≤10 SNPs, SKA: ≤7 SNPs) and are generated using single linkage clustering. The size of the nodes represents the number of isolates in each of the clusters and is relative for each multilocus sequence type (MLST) and the number of isolates in each cluster is displayed in brackets. PCDR was used as the gold standard method and so the genomic clustering pattern of the other methods should be as close as possible to PCDR.
Fig. 3Comparison of the major attributes of each of the methods used to determine genetic similarity.
The number of isolate pairs below the intrapatient threshold was determined individually for each of the genomic methods and the epidemiological classification of isolate pairs only occurred for pairs below the threshold. The epidemiological classification process of the isolate pairs is detailed in the Supplementary Fig. 10. Pairwise comparison using de novo references (PCDR) was used as the gold standard genomic comparison method.
Fig. 4Proposed genomic analysis pipeline for identifying potential vancomycin-resistant Enterococcus faecium transmission.
Outlined is each step in the analysis pipeline, the software used and the required whole genome sequencing data. 1 core genome multiloicus sequence type (cgMLST) allele scheme by de Been et al. 2 The COREugate pipeline was used to assign the allelic profiles and build a matrix of the pairwise allelic differences. 3 Split kmers were generated from short read data at k = 15. 4 The threshold for determining transmission events was determined based on within patient diversity. For this data set and split kmer analysis (SKA) this was found to be 7 single nucleotide polymorphisms (SNPs).
Fig. 5Transmission networks identified in the hospital case study.
Network diagram of genomics transmission links of all isolates involved in the hospital case study. Each node represents one case and are coloured by their respective core genome multilocus sequence type (cgMLST) cluster. The shape of each node identifies if the isolate was collected as patient screening or as a clinical sample. Nodes are grouped in grey circles based on their sequence type (ST) (only STs with more than one isolate have been labelled). Lines between nodes are present if the split kmer analysis (SKA) pairwise distance between isolates is ≤7 single nucleotide polymorphisms (SNPs) and the line type represents the strength of epidemiological link between the isolates. SKA clusters with more than 3 isolates have been labelled.
Fig. 6Methods for investigating the genomic diversity of vancomycin-resistant E. faecium.
This flow chart provides an overview of all the genetic comparison approaches used in the study. It has been split into three sections based on the type of analysis that each approach uses: typing, core genome alignments and pairwise comparisons. Pairwise comparison using de novo references (PCDR) was used as the gold standard method for determining genomic diversity. 1 core genome multilocus sequence type (cgMLST) allele scheme by de Been et al. 2 The COREugate pipeline was used to assign the allelic profiles and build a matrix of the pairwise allelic differences. 3 Complete genomes were assembled using Unicycler/Canu (multilocus sequence type (MLST) references) or Trycycler (cgMLST references). 4 Core genome alignments were performed using short-read data and snippy. Pairwise distances were then generated from the core alignment. 5 Split kmers were generated from short read data at k = 15. 6 de novo assemblies were performed using SKESA. 7 Individual snippy alignments were performed using the de novo assemblies and short read data on each pair. Any self-single nucleotide polymorphisms identified from mapping self reads to the de novo assembly were removed. The mean of the two reciprocal alignments was then used as the pairwise distance.