| Literature DB >> 35194141 |
Induja Chandrakumar1, Nick P G Gauthier2, Cassidy Nelson3, Michael B Bonsall3, Kerstin Locher4,5, Marthe Charles4,5, Clayton MacDonald4,5, Mel Krajden5,6, Amee R Manges6,7, Samuel D Chorlton8,9.
Abstract
A large gap remains between sequencing a microbial community and characterizing all of the organisms inside of it. Here we develop a novel method to taxonomically bin metagenomic assemblies through alignment of contigs against a reference database. We show that this workflow, BugSplit, bins metagenome-assembled contigs to species with a 33% absolute improvement in F1-score when compared to alternative tools. We perform nanopore mNGS on patients with COVID-19, and using a reference database predating COVID-19, demonstrate that BugSplit's taxonomic binning enables sensitive and specific detection of a novel coronavirus not possible with other approaches. When applied to nanopore mNGS data from cases of Klebsiella pneumoniae and Neisseria gonorrhoeae infection, BugSplit's taxonomic binning accurately separates pathogen sequences from those of the host and microbiota, and unlocks the possibility of sequence typing, in silico serotyping, and antimicrobial resistance prediction of each organism within a sample. BugSplit is available at https://bugseq.com/academic .Entities:
Mesh:
Year: 2022 PMID: 35194141 PMCID: PMC8864044 DOI: 10.1038/s42003-022-03114-4
Source DB: PubMed Journal: Commun Biol ISSN: 2399-3642
Fig. 1Overview of full BugSplit workflow and example of contig classification algorithm.
a Flow of data through the BugSplit workflow. Rectangles represent data points, diamonds represent processes, and circles represent forks in analysis. b Example application of contig classification algorithm. Alignments against the reference database are first collapsed up the taxonomic tree based on absolute nucleotide identity. A base-level vote is then performed across all bases of a contig, determining the final taxonomic assignment of the contig based on rank-specific majority thresholds.
Fig. 2Performance of contig taxonomic classifiers across four datasets (Zymo Even, Zymo Log, Zymo Gut, and CAMI high complexity).
a Average bin completeness across taxonomic ranks. b average purity across taxonomic ranks. Shaded bands show the standard error of the metrics in a and b. c BugSplit produces more complete bins with less contamination compared with alternative taxonomic binners.
Fig. 3Taxonomic profiling accuracy of five tools across three mock microbial communities sequenced with a long-read sequencer.
a Greater bin completeness reflects better taxonomic profiling. b Greater bin purity reflects better taxonomic profiling. c Lower Bray-Curtis distance reflects better taxonomic profiling. Shaded bands show the standard error of the metrics in a, b, and c.
Antimicrobial resistance prediction of BugSplit applied to nanopore mNGS of urine, compared with Illumina isolate sequencing, of Neisseria gonorrhoeae infections.
| 23 S (Macrolides) | gyrA (Quinolones) | mtrR (Penicillins, Tetracyclines, Cephalosporins, Macrolides) | pilQ (Macrolides) | ponA (Penicillins) | rpsJ (Tetracyclines) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Sample/resistance variant | c.2045 A > G | c.2597 C > T | p.S91F | p.D95N/G | p.A39T | p.G45D | Deletion | p.E666K | p.L421P | p.V57M/L/Q |
| 202 | A | C | S | D | A | G | E | V | ||
| 206 | A | C | A | N/A | N/A | E | L | |||
| 250 | A | C | S | D | A | G | E | V | ||
| 271 | A | C | A | G | WT | E | L | |||
| 294 | A | C | A | G | WT | E | L | |||
| 301 | A | C | A | G | E | |||||
| 303 | A | C | A | G | WT | E | L | |||
| 304 | A | C | S | D | G | WT | E | L | ||
| 314 | A | C | S | D | A | G | E | V | ||
| 315 | A | C | A | G | WT | E | ||||
Bold = variant detected by BugSplit, concordant with Illumina sequencing of N. gonorrhoeae isolates.
Underline = variant missed by BugSplit, compared with Illumina sequencing of N. gonorrhoeae isolates. The mtR gene was not assembled for the single missing variant within this region.
N/A variant position not recovered by BugSplit; no variant present.