| Literature DB >> 35642524 |
Ashley M Rooney1,2, Amogelang R Raphenya3,4,5, Roberto G Melano1,6, Christine Seah6, Noelle R Yee2, Derek R MacFadden7, Andrew G McArthur3,4,5, Pierre H H Schneeberger1,2,8, Bryan Coburn1,2,9.
Abstract
Short-read sequencing can provide detection of multiple genomic determinants of antimicrobial resistance from single bacterial genomes and metagenomic samples. Despite its increasing application in human, animal, and environmental microbiology, including human clinical trials, the performance of short-read Illumina sequencing for antimicrobial resistance gene (ARG) detection, including resistance-conferring single nucleotide polymorphisms (SNPs), has not been systematically characterized. Using paired-end 2 × 150 bp (base pair) Illumina sequencing and an assembly-based method for ARG prediction, we determined sensitivity, positive predictive value (PPV), and sequencing depths required for ARG detection in an Escherichia coli isolate of sequence type (ST) 38 spiked into a synthetic microbial community at varying abundances. Approximately 300,000 reads or 15× genome coverage was sufficient to detect ARGs in E. coli ST38, with comparable sensitivity and PPV to ~100× genome coverage. Using metagenome assembly of mixed microbial communities, ARG detection at E. coli relative abundances of 1% would require assembly of approximately 30 million reads to achieve 15× target coverage. The minimum sequencing depths were validated using public data sets of 948 E. coli genomes and 10 metagenomic rectal swab samples. A read-based approach using k-mer alignment (KMA) for ARG prediction did not substantially improve minimum sequencing depths for ARG detection compared to assembly of the E. coli ST38 genome or the combined metagenomic samples. Analysis of sequencing depths from recent studies assessing ARG content in metagenomic samples demonstrated that sequencing depths had a median estimated detection frequency of 84% (interquartile range: 30%-92%) for a relative abundance of 1%. IMPORTANCE Systematically determining Illumina sequencing performance characteristics for detection of ARGs in metagenomic samples is essential to inform study design and appraisal of human, animal, and environmental metagenomic antimicrobial resistance studies. In this study, we quantified the performance characteristics of ARG detection in E. coli genomes and metagenomes and established a benchmark of ~15× coverage for ARG detection for E. coli in metagenomes. We demonstrate that for low relative abundances, sequencing depths of ~30 million reads or more may be required for adequate sensitivity for many applications.Entities:
Keywords: antimicrobial resistance gene detection; metagenome; microbiome; resistome; sequencing; whole genome
Year: 2022 PMID: 35642524 PMCID: PMC9238399 DOI: 10.1128/msystems.00022-22
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 7.324
FIG 1Genomic antimicrobial resistance determinant detection. (a) ARG detection frequencies across subsamples in Escherichia coli ST38. Individual dots represent a single ARG and are connected by lines to demonstrate trends in detection across subsamples. bla, gyrA, and parC variants are highlighted as previously identified resistance determinants for this strain. (b) Histogram of the number of unique ARGs with ≥90% detection frequency summarized by categories detected across subsamples in E. coli ST38. (c–d) Performance of ARG classification across subsamples in E. coli ST38 including (c) sensitivity and false negatives, and (d) positive predictive value (PPV) and false positives (FPs). The mean and standard deviation are plotted. (e) Protein variants and associated SNP(s) detection frequencies across subsamples in E. coli ST38. (f) A distribution of the percentage of E. coli isolates (n = 948) by ARG detection performance of 300,000 reads compared to 100× genome coverage. Performance is measured by sensitivity, PPV, and F1 score. In a and e, the horizontal dotted line marks 90% detection frequency. The red vertical dashed line marks the subsample at 300,000 reads (a) and 200,000 reads (e).
FIG 2Metagenomic antimicrobial resistance determinant detection. (a–c) Reference-based ARG (n = 20) and SNP (n = 6) detection frequencies in four metagenomic samples with Escherichia coli ST38 relative abundances of 90%, 50%, 10%, and 1%. (a–b) Individual dots represent a single ARG or SNP. Trend lines for each of the four metagenomic samples are plotted through the median detection frequency at each subsample. The horizontal dotted line marks 90% detection frequency. (c) For reference-based ARGs or SNPs in each metagenomic sample, the minimum subsample that falls within the detection frequency cutoff (x axis) is plotted. (d–e) Detection of vanA in rectal swab samples positive for vancomycin-resistant Enterococcus from a public data set. (d) Enterococcus relative abundance by total genome coverage; each rectal swab sample is represented by an icon. (e) vanA detection frequency across genome coverages for each rectal swab sample. Rectal swab sample 4 is not plotted, as vanA was not detected with the total number of sequences available. (f) Estimated ARG detection frequency by coverage of a hypothetical target organism at a range of relative abundances.
FIG 3Performance of a read-based method (KMA) for antimicrobial resistant determinant detection in Escherichia coli ST38 and metagenomic samples, compared to assembly. (a–e) A comparison of ARG detection frequencies between subsamples using KMA or assembly. Trend lines are plotted through the median detection frequency at each subsample. Individual points represent single ARGs. Wilcoxon matched-pairs signed-rank test performed at each subsample, *, P < 0.05. (a) Reference-based ARG (n = 72) detection frequencies in Escherichia coli ST38. (b–f) Reference-based ARG (n = 20) detection frequencies in four metagenomic samples with E. coli ST38 relative abundances of 90% (b), 50% (c), 10% (d), and 1% (e). (f) For reference-based ARGs or SNPs in each metagenomic sample, the minimum subsample that fell within the detection frequency cutoff is plotted. ARGs or SNPs with 0% detection frequency across all subsamples within a sample are indicated as not detected (nd). (g) Reference-based SNP detection frequencies (n = 6) using KMA in E. coli ST38 (relative abundance 100%), and four metagenomic samples (f–g) with E. coli ST38 relative abundances of 90%, 50%, 10%, and 1%. (g) Trend lines are plotted through median detection frequencies at each subsample. Individual points represent individual protein variants and associated SNPs (n = 6). The horizontal dotted line marks 90% detection frequency. (h) Nonreference ARGs detected across subsamples in four metagenomic samples with E. coli ST38 relative abundances of 90%, 50%, 10%, and 1%. Solid lines or dashed lines are nonreference ARGs detected using contig assembly or KMA, respectively. Mean and standard deviation are plotted.