| Literature DB >> 35171290 |
Eike Steinig1,2, Sebastián Duchêne1, Izzard Aglua3, Andrew Greenhill4, Rebecca Ford4, Mition Yoannes4, Jan Jaworski3, Jimmy Drekore5, Bohu Urakoko3, Harry Poka3, Clive Wurr6, Eri Ebos6, David Nangen6, Laurens Manning7,8, Moses Laman4, Cadhla Firth2, Simon Smith9, William Pomat4, Steven Y C Tong1,10, Lachlan Coin1, Emma McBryde2, Paul Horwood4,11.
Abstract
Nanopore sequencing and phylodynamic modeling have been used to reconstruct the transmission dynamics of viral epidemics, but their application to bacterial pathogens has remained challenging. Cost-effective bacterial genome sequencing and variant calling on nanopore platforms would greatly enhance surveillance and outbreak response in communities without access to sequencing infrastructure. Here, we adapt random forest models for single nucleotide polymorphism (SNP) polishing developed by Sanderson and colleagues (2020. High precision Neisseria gonorrhoeae variant and antimicrobial resistance calling from metagenomic nanopore sequencing. Genome Res. 30(9):1354-1363) to estimate divergence and effective reproduction numbers (Re) of two methicillin-resistant Staphylococcus aureus (MRSA) outbreaks from remote communities in Far North Queensland and Papua New Guinea (PNG; n = 159). Successive barcoded panels of S. aureus isolates (2 × 12 per MinION) sequenced at low coverage (>5× to 10×) provided sufficient data to accurately infer genotypes with high recall when compared with Illumina references. Random forest models achieved high resolution on ST93 outbreak sequence types (>90% accuracy and precision) and enabled phylodynamic inference of epidemiological parameters using birth-death skyline models. Our method reproduced phylogenetic topology, origin of the outbreaks, and indications of epidemic growth (Re > 1). Nextflow pipelines implement SNP polisher training, evaluation, and outbreak alignments, enabling reconstruction of within-lineage transmission dynamics for infection control of bacterial disease outbreaks on portable nanopore platforms. Our study shows that nanopore technology can be used for bacterial outbreak reconstruction at competitive costs, providing opportunities for infection control in hospitals and communities without access to sequencing infrastructure, such as in remote northern Australia and PNG.Entities:
Keywords: BEAST; bacteria; nanopore; outbreaks; phylodynamics; reproduction number
Mesh:
Year: 2022 PMID: 35171290 PMCID: PMC8963328 DOI: 10.1093/molbev/msac040
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Culture-based sequencing protocol and outbreak sampling locations in northern Australia and PNG. (A) Isolates were sequenced on 8 flow cells with 24 isolates per flow cell using a sequential nuclease flush protocol. (B) Sequenced data were subset to those matching Illumina sequencing of the isolates, assembled, and quality controlled. Several isolates were set aside for independent random forest classifier training used in the SNP polishing and phylogenetics pipeline.
Fig. 2.(A) Average genome coverage (R9.4.1, RBK-004) of Bonito base-called nanopore reads against the JKD6159 (ST93) reference genome (n = 159) where the dashed lines indicate the coverage thresholds chosen to evaluate genotyping (10×) and phylodynamic models (5×) in the FNQ and PNG outbreaks. SNP and indel counts across three different assembly types: uncorrected nanopore reads polished with Medaka (ont_medaka), Medaka polished nanopore genomes Illumina corrected with Pilon (hybrid_medaka), and hybrid assembly in Unicycler (hybrid_unicycler). (B) Assembly genotyping results are shown as proportion of assemblies matching the reference Illumina genotype across the three types of assemblies, and the 10× coverage threshold.
Fig. 3.(A) Workflow outlining computational analysis of community-associated S. aureus nanopore sequencing using successive barcode panels on ONT MinION flow cells (R9.4.1). MLST typing informs the background population genome collection from a previous study (Illumina). Outbreaks in PNG and FNQ were caused by the Australian clone (ST93-MRSA-IV). SNPs are called for the Illumina background with Snippy and ONT outbreak isolates with Clair. ONT SNP calls are polished using random forest SNP classifiers, trained on the outbreak reference genome (JKD6159 of ST93). (B–D) AUC scores of quality or composite features (left) used in training random forest classifiers for SNP polishing and relative feature importance of models (right) trained on (B) S. aureus mixed lineages (ST88, ST15, and ST93) (C) ST93 FNQ isolates and (D) ST93 from PNG with matching Illumina data and Snippy reference calls (all n = 3).
Fig. 4.Trained random forest SNP polisher evaluation showing left: accuracy, precision, and recall of Clair nanopore SNP calls against matching Illumina reference SNPs called with Snippy. Plots are split into ST93 outbreak isolates (inside left) and other sequence types (inside right) from PNG and FNQ combined. In the right-hand plots, the number of FNs, FPs, and TP SNP calls for the groups is shown on a log-scale. Models were trained on three Illumina matched isolates from between-species (A) N. gonorrhea from Sanderson et al. within species (B) S. aureus ST88, ST93, ST15 from PNG, (C) within-lineage (ST93) using samples from FNQ and separately from PNG (D) (ST93). Polishing models were evaluated on all PNG and FNQ isolates excluding those used in training (ST93: n = 55, other sequence types: n = 25, >10× coverage). Outliers in the tails of the distributions are novel multilocus sequence type variants of ST93.
Fig. 5Comparison of ML phylogenetic topologies of ST93. Illumina reference trees were constructed with NanoPath (A) and Snippy (B). All other trees are hybrid phylogenies including the nanopore data of the outbreaks in FNQ and PNG (>5× coverage) within the ST93 background population (Illumina, n = 531) (B) after polishing Clair SNPs using the trained random forest classifiers (C: N. gonorrhoeae; D and E: S. aureus mixed and lineage-specific). Asterisk (*) denotes two isolates with excessive branch lengths that were removed for visual clarity (supplementary fig. S6, Supplementary Material online).
Fig. 6.Posterior distributions of the effective reproduction number (Re), most recent common ancestor of the outbreak (MRCA), infectious period () and sampling proportion () for the nanopore-sequenced outbreak clades in PNG (A, n = 56) and FNQ (B, n = 32). Birth–death skyline models were run on the clade subsets of the polished hybrid alignments with >5× coverage (ridge labels) including the Illumina reference alignment (illumina, bottom ridge), the between-species N. gonorrhoeae polished alignment by Sanderson and colleagues (Sanderson), as well as the S. aureus mixed lineages (saureus_mix), ST93 FNQ (saureus_fnq), and ST93 PNG (saureus_png).