| Literature DB >> 31333599 |
Bryan Naidenov1, Alexander Lim1, Karyn Willyerd1, Nathanial J Torres2, William L Johnson1, Hong Jin Hwang3, Peter Hoyt1,3, John E Gustafson1, Charles Chen1.
Abstract
The Elizabethkingia are a genetically diverse genus of emerging pathogens that exhibit multidrug resistance to a range of common antibiotics. Two representative species, Elizabethkingia bruuniana and E. meningoseptica, were phenotypically tested to determine minimum inhibitory concentrations (MICs) for five antibiotics. Ultra-long read sequencing with Oxford Nanopore Technologies (ONT) and subsequent de novo assembly produced complete, gapless circular genomes for each strain. Alignment based annotation with Prokka identified 5,480 features in E. bruuniana and 5,203 features in E. meningoseptica, where none of these identified genes or gene combinations corresponded to observed phenotypic resistance values. Pan-genomic analysis, performed with an additional 19 Elizabethkingia strains, identified a core-genome size of 2,658,537 bp, 32 uniquely identifiable intrinsic chromosomal antibiotic resistance core-genes and 77 antibiotic resistance pan-genes. Using core-SNPs and pan-genes in combination with six machine learning (ML) algorithms, binary classification of clindamycin and vancomycin resistance achieved f1 scores of 0.94 and 0.84, respectively. Performance on the more challenging multiclass problem for fusidic acid, rifampin and ciprofloxacin resulted in f1 scores of 0.70, 0.75, and 0.54, respectively. By producing two sets of quality biological predictors, pan-genome genes and core-genome SNPs, from long-read sequence data and applying an ensemble of ML techniques, our results demonstrated that accurate phenotypic inference, at multiple AMR resolutions, can be achieved.Entities:
Keywords: AMR prediction; Elizabethkingia; antimicrobial resistance; machine learning; nanopore sequencing
Year: 2019 PMID: 31333599 PMCID: PMC6622151 DOI: 10.3389/fmicb.2019.01446
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Core genome and SNP statistics for each phenotypic group formed from participating strains.
| Number of | Core genome | Number of | |
|---|---|---|---|
| Phenotype | strains | size (bases) | SNPs |
| 21 | 2,658,537 | 712,703 | |
| Vancomycin | 33 | 27,066 | 11,066 |
| Clindamycin | 28 | 3,488 | 1,996 |
| Fusidic acid | 25 | 20,006 | 9,851 |
| Rifampin | 29 | 3,368 | 2,044 |
| Ciprofloxacin | 28 | 3,662 | 1,931 |
Comparison of Nanopore R9.4 2D sequencing statistics.
| ATCC 33958 | KC1913 | |
|---|---|---|
| Total quality yield (megabases) | 1,090 | 330 |
| Total number of quality reads | 166,167 | 45,786 |
| Median read length (kilobases) | 5.78 | 6.53 |
| 2D Median quality score (phred score) | 15.8 | 16.3 |
| Template median quality score (phred score) | 8.4 | 8.8 |
| Longest read (kilobases) | 32.8 | 42.4 |
Figure 1(A) Read length distribution after extraction and filtering of raw read data of Elizabethkingia bruuniana. (B) Read length distribution after extraction and filtering of raw read data of E. meningoseptica. (C) Per-read quality distribution of E. bruuniana. (D) Per-read quality distribution of E. meningoseptica.
Figure 2Competed genomes of both E. bruuniana (red) and E. meningoseptica (blue) are displayed with histograms representing average GC content (ranged 20 – 50%) in the outermost circles of each genome. The middle genome circles display a heatmap indicating methylation frequency (darker regions indicate high methylation frequency). The inner genome circles indicate conserved regions belonging to the Elizabethkingia core genome found in each respective genome. Putative AMR gene clusters, identified by HMMR3, are shown on the outer edges of the core genome circles.
Figure 3f1-micro scores for each algorithm (with their best respective hyper-parameters), for each AMR group. Mean f1-micro score over the given iterations for that group is shown by the values above the bars and standard deviation is shown by the error bars. The binary prediction algorithms are show in panels (A,B). The multiclass classification is shown in panels (C–G).
Figure 4f1 micro-scores for each algorithm over the course of different hyper-parameter changes. k-NN (C) and the SVM, with a radial basis function (B) as a kernel, are most sensitive to hyper-parameter changes. Y-axis (f1 micro-score) is scaled from 0.5 to 1.0. The remaining plots (A,D–F) show minimal changes to f1 micro-score from changing the respective hyper-parameter.