| Literature DB >> 31554234 |
Weihua Huang1, Guiqing Wang2,3, Changhong Yin4, Donald Chen5,6, Abhay Dhand7, Melissa Chanza8, Nevenka Dimitrova9, John T Fallon10,11.
Abstract
The surveillance of health care-associated infection (HAI) is an essential element of the infection control program. While whole-genome sequencing (WGS) has widely been adopted for genomic surveillance, its data processing remains to be improved. Here, we propose a three-level data processing pipeline for the precision genomic surveillance of microorganisms without prior knowledge: species identification, multi-locus sequence typing (MLST), and sub-MLST clustering. The former two are closely connected to what have widely been used in current clinical microbiology laboratories, whereas the latter one provides significantly improved resolution and accuracy in genomic surveillance. Comparing to a broadly used reference-dependent alignment/mapping method and an annotation-dependent pan-/core-genome analysis, we implemented our reference- and annotation-independent, k-mer-based, simplified workflow to a collection of Acinetobacter and Enterococcus clinical isolates for tests. By taking both single nucleotide variants and genomic structural changes into account, the optimized k-mer-based pipeline demonstrated a global view of bacterial population structure in a rapid manner and discriminated the relatedness between bacterial isolates in more detail and precision. The newly developed WGS data processing pipeline would facilitate WGS application to the precision genomic surveillance of HAI. In addition, the results from such a WGS-based analysis would be useful for the precision laboratory diagnosis of infectious microorganisms.Entities:
Keywords: data processing pipeline; genomic surveillance; health care-associated infection (HAI); k-mer; whole-genome sequencing (WGS)
Year: 2019 PMID: 31554234 PMCID: PMC6843764 DOI: 10.3390/microorganisms7100388
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Figure 1Schematic bioinformatics data processing workflows for whole-genome sequence analysis. Left: alignment/mapping-based analysis. Multi-locus sequence typing (MLST) is based on species determination, while the reference genome is selected based on both species and MLST uncovered. Right: k-mer-based analysis. Contigs cleaning is optional, mainly for those isolates with an outstanding size of the assembled “genome”. SNVs: single nucleotide variants; ARGs: antibiotic resistance genes; VFs: virulence factors.
Figure 2Alignment/mapping-based analysis of 147 Acinetobacter clinical isolates using the PB364 genome (ST2) as reference. (A) Phylogeny tree generated from core single nucleotide variants (SNVs) analysis. Neighbor-joining method is used, and branch length is ignored. The shadowed reference is PB364. (B–D) Overview of clinical isolates from principal component analysis. B: all 147 isolates; C: 26 ST2 isolates with PB364 excluded; and D: 111 ST229 isolates. Three main sub-clusters are identified in ST2 isolates: sub1, sub2 and sub3.
Figure 3Alignment/mapping-based analysis of 147 Acinetobacter clinical isolates using the AR_0078 genome (ST229) as reference. (A) Phylogeny tree generated from core SNVs analysis. Neighbor-joining method is used, and branch length is ignored. The shadowed reference is AR_0078. (B–D) Overview of clinical isolates from principal component analysis. B: all 147 isolates; C: 26 ST2 isolates; and D: 111 ST229 isolates with AR_0078 excluded. Four main sub-clusters are identified in ST2 isolates: sub1, sub2, sub3 and sub4.
Figure 4K-mer-based kWIP (k-mer weighted inner product) analysis of 147 Acinetobacter clinical isolates. (A) Phylogeny tree from hierarchical clustering using the complete linkage method. (B–D) Plots of metric multidimensional scaling (MDS). B: all 147 isolates; C: 26 ST2 isolates; and D: 111 ST229 isolates. Four main sub-clusters are identified in ST2 isolates: sub1, sub2, sub3 and sub4. Filled arrow in D shows a major sub-cluster of ST229 isolates with close relatedness. Dash-circled or arrowed are sub-clusters shared between patients as annotated aside. (E,F) Phylogeny trees of ST2 and ST229 isolates from hierarchical clustering using the complete linkage method. Three main sub-clusters are annotated as sub1, sub2 and sub3 of ST2 in E, with patients’ identities at bottom; the remaining three isolates belong to sub4. Shadowed boxes in F indicate shared sub-MLST types between patients as annotated at bottom. Isolates of the major sub-cluster in D belong to multiple patients as noted in F. The environmental isolate (Env) is arrowed grey.
Figure 5Whole-genome analysis of 11 Enterococcus sp. isolates. (A) Plot of metric multidimensional scaling (MDS) based on kWIP analysis. (B) Phylogeny tree from kWIP weighted metric hierarchical clustering. (C) Pair-wise comparisons of single nucleotide variants (SNVs) based on snippy analysis using the M190262 draft genome as reference.