| Literature DB >> 30381421 |
Michelle Su1,2,3, Sarah W Satola1,3,4, Timothy D Read5,3,4.
Abstract
Clinical microbiology has long relied on growing bacteria in culture to determine antimicrobial susceptibility profiles, but the use of whole-genome sequencing for antibiotic susceptibility testing (WGS-AST) is now a powerful alternative. This review discusses the technologies that made this possible and presents results from recent studies to predict resistance based on genome sequences. We examine differences between calling antibiotic resistance profiles by the simple presence or absence of previously known genes and single-nucleotide polymorphisms (SNPs) against approaches that deploy machine learning and statistical models. Often, the limitations to genome-based prediction arise from limitations of accuracy of culture-based AST in addition to an incomplete knowledge of the genetic basis of resistance. However, we need to maintain phenotypic testing even as genome-based prediction becomes more widespread to ensure that the results do not diverge over time. We argue that standardization of WGS-AST by challenge with consistently phenotyped strain sets of defined genetic diversity is necessary to compare the efficacy of methods of prediction of antibiotic resistance based on genome sequences.Entities:
Keywords: antibiotic resistance; genome-based prediction
Mesh:
Substances:
Year: 2019 PMID: 30381421 PMCID: PMC6425178 DOI: 10.1128/JCM.01405-18
Source DB: PubMed Journal: J Clin Microbiol ISSN: 0095-1137 Impact factor: 5.948
FIG 1Overview of genome sequencing and how it is used in WGS-AST. DNA is extracted from directly from bacteria in clinical samples (metagenomics) or more commonly, from cultured bacterial colonies. Sequencing technologies fragment the DNA and then randomly sequence to produce a library of reads (stored in FASTQ files). The reads are assembled into genomic scaffolds in silico. Sequencing is performed either using short-read second-generation technology, which tends to produce fragmented whole-genome assemblies of high accuracy, or long-read third-generation technologies that have higher error rates but more complete assemblies. WGS-AST algorithms operate on the raw reads and/or assembled contigs.
Genomics terms
| Term | Definition |
|---|---|
| Quality score | A measure of the probability of an inaccurate base call, typically represented by the Phred score [Q = −10 log10(P)] ( |
| Coverage | A measure of how many instances a base was sequenced, as quantified by number of unique reads mapped to that position in the genome. A “genome of coverage of 30×” means that on average, each base of the genome has 30 reads mapped. |
| Sequence read | Inferred nucleotide sequence of a genome fragment. Reads range from short (≤300 bp) to long (5 to 100+ kbp). |
| Contig | A contiguous sequence created by assembling multiple overlapping sequence reads. |
| Genome assembly | A singular (complete) or set of contigs after aligning and merging all sequence reads. Assemblies can be created |
| First-generation sequencing | Nucleotide sequencing that relied on either chain termination (Sanger) or cleavage (Maxam-Gilbert) methodology in single-tube reactions. |
| Second-generation sequencing | Nucleotide sequencing methods that sheared the genome and PCR amplified individual DNA fragments to massively parallelize sequencing and detect base identity by monitoring release of pyrophosphate (454), release of hydrogen (Ion Torrent), release of fluorescent reversible-terminator nucleotides (Illumina), or fluorescent ligated probes (SOLiD). |
| Third-generation sequencing | Nucleotide sequencing that relies on real-time single-molecule sequencing via monitoring of fluorescently labeled nucleotide incorporation (Pacific Biosciences) or ion current after DNA is fed through a channel (Oxford Nanopore). |
Types of antibiotic resistance loci
| Locus type | Description |
|---|---|
| Gene | Presence of an intact protein-coding gene that confers resistance. For example, a strain that contains |
| Plasmid/mobile element | Presence of a known drug resistance plasmid or mobile genetic cassette (e.g., SCC |
| Mutation | A particular SNP or SNV (which encompasses both SNPs and 1-bp indels) that is associated with resistance. |
| Allele | Nucleotide variant of gene caused by mutation. One sequence variant of a gene may be sensitive to a drug, while another allele may be associated with resistance. |
| Gene amplification | Increase in gene copy number due to homologous recombination. For example, a single gene in a genome may be sensitive to a drug, but a strain with two or more tandem repeats may be resistant. |
SCCmec, staphylococcal cassette chromosome mec element; SNV, single-nucleotide variant.
Selected rule-based WGS-AST results
| Species | Antibiotic(s) | No. of genomes tested | Diversity/no. of STs | Primary database(s) | Software | Input data | Sensitivity (%) | Specificity (%) | Reference |
|---|---|---|---|---|---|---|---|---|---|
| Amoxicillin-clavulanate | 76 | NR | Custom | Blastx, ClustalW | Assembly, FASTQ | 100 | 100 | Tyson et al. ( | |
| Trimethoprim | 48 | 19+ STs | ResFinder | ResFinder | Assembly or FASTQ | 100 | 100 | Zankari et al. ( | |
| Gentamicin | 74 | NR | >100 loci | Blastn | Assembly | 100 | 100 | Stoesser et al. ( | |
| Pyrazinamide | 167 | NR | PhyResSE | Stampy | FASTQ | 88.9 | 100, 94.9 (with uncharacterized) | Pankhurst et al. ( | |
| Isoniazid | 693 | Lineages 1–4 | TBDReaMDB, MUBII-TB-DB minus phylogenetic SNPs | TB Profiler | FASTQ | 92.8 | 100 | Coll et al. ( | |
| Moxifloxacin | 13,424 | NR | NR | NR | NR | 88.2 | 90 | Miotto et al. ( | |
| Amikacin | 667 | 7 clades | Custom | SAMtools, mpileup, Cortex | FASTQ | 91.2, 88.1 (with uncharacterized) | 99.4, 99.5 (with uncharacterized) | Walker et al. ( | |
| Rifampin | 1,565 | NR | Hain, Cepheid, AID | Mykrobe | FASTQ | 90.8 | 99 | Bradley et al. ( | |
| Ethambutol | 752 | NR | Mykrobe | Mykrobe | FASTQ | 100 | 98.5, 77.3 (with uncharacterized) | Quan et al. ( | |
| Fusidic acid | 491 | 61 STs | Custom | Blastn, tblastn | Assembly | 91 | 100 | Gordon et al. ( | |
| Vancomycin | NR | 16 CCs | Custom | Blastn, mapping software | Assembly, FASTQ | 100 | 100 | Aanensen et al. ( | |
| Mupirocin | 340 | 25 CCs | Modified from Gordon et al. ( | Mykrobe | FASTQ | 100 | 100 | Bradley et al. ( | |
| Ciprofloxacin, clindamycin, erythromycin, fusidic acid, gentamicin, methicillin, mupirocin, penicillin, rifampin, tetracycline, trimethoprim, vancomycin | 1,379 | 111 STs | Custom | Mykrobe, GeneFinder, Typewriter | FASTQ, Assembly | 97 | 99 | Mason et al. ( | |
| Chloramphenicol | 332 | ST1 and ST2 | CARD, ResFinder, literature | GeneFinder | FASTQ | 100 | 100 | Day et al. ( | |
| Non-serovar Typhi | Ceftriaxone | 640 | NR | Tyson et al. ( | Blastx, ClustalW | Assembly, FASTQ | 100 | 99.8 | McDermott et al. ( |
| Ciprofloxacin | 3,491 | 227 serovars | CARD, ResFinder | GeneFinder | FASTQ | 99.28 | 99.97 | Neuert et al. ( | |
| Erythromycin | 210 | 90 STs | SRST2 | SRST2 | FASTQ | 100 | 100 | Deng et al. ( | |
| Erythromycin | 32/82 | NR | Tyson et al. ( | Blastx, ClustalW | Assembly, FASTQ | 100 | 100 | Zhao et al. ( | |
| Kanamycin | 50 |
12 STs | ResFinder | ResFinder | Assembly or FASTQ | 100 | 100 | Zankari et al. ( | |
| Levofloxacin | 390 | 175 STs | Custom | NR | Assembly | 91.9 | 93.7 | Kos et al. ( | |
| Gentamicin | 69 | NR | Custom | Blastn | Assembly | 96 | 100 | Stoesser et al. ( | |
| Ampicillin | 341 | NR | CARD, ResFinder | GeneFinder | FASTQ | 100 | 100 | Sadouki et al. ( |
NR, not reported; ST, sequence type; CC, clonal complex.
AID, Autoimmun Diagnostika GmbH.
Selected model-based WGS-AST results
| Species | Antibiotic(s) | No. of genomes tested | Diversity | Database | ML algorithm | Input data | Sensitivity (%) | Specificity (%) | Overall accuracy (%) | Reference |
|---|---|---|---|---|---|---|---|---|---|---|
| Amoxicillin | 329 | 7 STs | NA | Gradient-boosted trees | Pangenome, population structure matrix | 90 | 95 | Moradigaravand et al. ( | ||
| Ciprofloxacin | 581 | 7 STs | NA | Gradient-boosted trees | Pangenome, population structure matrix, SNPs | 81 | 99 | Moradigaravand et al. ( | ||
| Gentamicin | 564 | 7 STs | NA | Gradient-boosted trees | Pangenome, population structure matrix | 87 | 99 | Moradigaravand et al. ( | ||
| Trimethoprim | 283 | 7 STs | NA | Gradient-boosted trees | Pangenome, population structure matrix | 92 | 97 | Moradigaravand et al. ( | ||
| Isoniazid | 1,811 (80% train, 20% test) | 7 clades | NA | Random Forest | Variants in 23 genes | 97 | 94 | Yang et al. ( | ||
| Rifampin | 1,725 (80% train, 20% test) | 7 clades | NA | Class-conditional Bernoulli mixture model | Variants in 23 genes | 97 | 97 | Yang et al. ( | ||
| Ethambutol | 3,526 (80% train, 20% test) | 5 genetic clusters | NA | Multitask wide and deep neural networks | Variants in 32 regions | 91.9 | 90.3 | Chen et al. ( | ||
| Pyrazinamide | 3,147 (train), 567 (test) | 5 genetic clusters | NA | Multitask wide and deep neural networks | Variants in 32 regions | 75.2 | 90.1 | Chen et al. ( | ||
| Kanamycin | 162 (train), 18 (test) | NR | PATRIC, RAST | AdaBoost | Assembly | 88.3 (F1) | Davis et al. ( | |||
| Beta-lactams (PEN, AMO, MER, TAX, CFT, CFX) | 2,528 (train), 1,781 (test) | 403 STs (train), 299 STs (test) | NA | Random Forest | PBP sequences | >97 (±1 MIC dilution), >93 (category) | Li et al. ( | |||
| Beta-lactams | 1,350 (train), 58 (test) | NR | PATRIC, RAST | AdaBoost | Assembly | 87.6 (F1) | Davis et al. ( | |||
| Azithromycin | 681 | NR | NA | Linear regression | Variants in 20 regions | 80, 99 (±1 MIC dilution) | 83, 94 (±1 MIC dilution) | 93 (±1 MIC dilution), 44 (category) | Eyre et al. ( | |
| Ciprofloxacin | 676 | NR | NA | Linear regression | Variants in 20 regions | 100 | 99 | 94 (±1 MIC dilution), 68 (category) | Eyre et al. ( | |
| Ampicillin-sulbactam | 1,668 | >99 STs | PATRIC, RAST | XGBoost | Assembly | 99 (F1, ±1 MIC dilution) | Nguyen et al. ( | |||
| Levofloxacin | 1,668 | >99 STs | PATRIC, RAST | XGBoost | Assembly | 93 (F1) | Nguyen et al. ( | |||
| Meropenem | 1,777 | >99 STs | PATRIC, RAST | AdaBoost | Assembly | 92 (F1) | Long et al. ( | |||
| Piperacillin-tazobactam | 1,777 | >99 STs | PATRIC, RAST | AdaBoost | Assembly | 76 (F1) | Long et al. ( | |||
| Ampicillin | 78 | NR | Resfams | Logistic regression | FASTA, alignments | 97.4 | Pesesky et al. ( | |||
| Chloramphenicol | 78 | NR | Resfams | Logistic regression | FASTA, alignments | 89.7 | Pesesky et al. ( | |||
| Methicillin | 99 (train), 11 (test) | NR | PATRIC, RAST | AdaBoost | Assembly | 99.5 (F1) | Davis et al. ( | |||
| Vancomycin | 75 | 12 STs | Custom | Random Forest | Assembly | 73 | 81 | Alam et al. ( | ||
| Carbapenem | 99 (train), 11 (test) | NR | PATRIC, RAST | AdaBoost | Assembly | 95 (F1) | Davis et al. ( | |||
| Non-serovar Typhi | Ceftriaxone | 5,278 | PATRIC, RAST | XGBoost | Assembly | 80, 95 (±1 MIC dilution) | Nguyen et al. ( |
ST, sequence type; NR, not reported.
NA, not applicable.
F1, harmonic average of the precision (positive predictive value [PPV]) and recall (sensitivity) (84).
PEN, penicillin; AMO, amoxicillin; MER, meropenem; TAX, ceftriaxone; CFT, cefotaxime; CFX, cefixime.
Some outstanding questions
| Question |
|---|
| At what price and turnaround time will WGS-AST replace culture-based sequencing for routine use in clinical microbiology labs? |
| How do we interpret the presence of an antimicrobial resistance determinant gene if the susceptibility of the strain is below the MIC? |
| Can genome prediction be used to detect heteroresistance? Or to detect polygenic phenotypes? |
| How important is epistasis in determining the resistance to different classes of antibiotics? |
| Can gene amplification as a mechanism of resistance be accurately determined from WGS data? |
| How efficiently can WGS-AST prediction software be ported to metagenomic-AST data? |