| Literature DB >> 32119670 |
Jason C Hyun1, Erol S Kavvas2, Jonathan M Monk2, Bernhard O Palsson2.
Abstract
The evolution of antimicrobial resistance (AMR) poses a persistent threat to global public health. Sequencing efforts have already yielded genome sequences for thousands of resistant microbial isolates and require robust computational tools to systematically elucidate theEntities:
Mesh:
Substances:
Year: 2020 PMID: 32119670 PMCID: PMC7067475 DOI: 10.1371/journal.pcbi.1007608
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Known AMR genes present in the S. aureus pan-genome.
| Antibiotic | Genes |
|---|---|
| ciprofloxacin | |
| clindamycin | |
| erythromycin | |
| gentamicin | |
| tetracycline | |
| trimethoprim | |
| sulfamethoxazole |
Fig 1S. aureus genomes clustered by shared genetic content compared to known subtypes and antibiotic resistance patterns.
(a) Genomes clustered using hierarchical clustering with average linkage, based on pairwise Jaccard distances between the sets of genetic features present in each genome. Clusters extracted from this hierarchy align well with (b) experimentally observed resistance patterns and (c) subtype annotations from PATRIC. Antibiotics shown are ciprofloxacin (CIP), clindamycin (CLI), erythromycin (ERY), gentamicin (GEN), sulfamethoxazole/trimethoprim (SXT), and tetracycline (TET).
Fig 2Comparison of SVM ensemble approaches and statistical tests for detecting AMR-conferring genes and alleles in S. aureus.
(a) Workflow for SVM ensemble approaches. Beginning with genomes from PATRIC, open reading frames (ORFs) are identified and clustered by coding sequence to identify putative genes and alleles. Each genome is encoded based on the presence or absence of each gene and allele to capture genomic variation in the pan-genome as a sparse binary matrix. Genomes and/or features of this matrix are randomly sampled 500 times and used to train SVMs to predict binary AMR phenotype for a single antibiotic from genotype. Weights for each feature are averaged across all models in the ensemble and used to rank features by association to AMR. (b) Associations between known AMR-conferring genomic features and AMR phenotype, as ranked by Fisher’s Exact test, Cochran-Mantel-Haenszel test, and four different SVM ensemble types (SVM: ensemble by bootstrapping genomes, SVM-RSE: bootstrapping genomes and features; “random subspace ensemble”, SVM-RSE-O: SVM-RSE with oversampling to balance subtypes, SVM-RSE-U: SVM-RSE with undersampling to balance subtypes). Features were ranked either by p-value for statistical tests or by average feature weight for SVM ensembles. Fractional ranking was used for ties. Only features detected by at least one method are shown, colored by rank (green: in top 10, yellow: 11–50, orange: 51–100, gray: >100). Features shown are either genes or individual alleles (denoted as
Fig 3Predictive performance of SVM-RSE on 16 organism-antibiotic cases.
(a) Distribution of AMR phenotypes for each case. Organisms examined are S. aureus (SA), P. aeruginosa (PA), and E. coli (EC). Antibiotics examined are ciprofloxacin (CIP), clindamycin (CLI), erythromycin (ERY), gentamicin (GEN), tetracycline (TET), sulfamethoxazole/trimethoprim (SXT), amikacin (AMK), ceftazidime (CAZ), levofloxacin (LVX), meropenem (MEM), amoxicillin/clavulanic acid (AMC), imipenem (IPM), and trimethoprim (TMP). (b) SVM-RSE performance metrics from 5-fold cross validation. Performance values shown are averages and standard errors from 5-fold cross validation. The left-most column “log2(R/S)” shows the extent of class imbalance, the log2 of the number of resistant genomes divided by the number of susceptible genomes.
Known resistance-conferring genes found by SVM-RSE in S. aureus, P. aeruginosa, and E. coli.
| Organism | Drug | Features | Ranked 1–10 | Ranked 11–50 |
|---|---|---|---|---|
| CIP | 2 | - | ||
| CLI | 3 | |||
| ERY | 2 | |||
| GEN | 1 | - | ||
| SXT | 1 | - | ||
| TET | 1 | - | ||
| AMK | 0 | - | - | |
| CAZ | 1 | - | ||
| LVX | 4 | - | ||
| MEM | 2 | - | ||
| AMC | 2 | - | ||
| CAZ | 4 | |||
| CIP | 8 | |||
| GEN | 6 | |||
| IPM | 3 | - | ||
| TMP | 5 |
For each organism-antibiotic pair, known AMR genes among the top 50 features detected by SVM-RSE are shown. Features referring to individual alleles of a gene are underlined. In the cases of P. aeruginosa-LVX and E. coli-CIP, two and four distinct resistant gyrA alleles were found in the top 10, respectively. In cases where a gene is mentioned in both the top 10 and rank 11–50 columns, multiple resistant alleles were detected at the different ranks. Antibiotics examined are ciprofloxacin (CIP), clindamycin (CLI), erythromycin (ERY), gentamicin (GEN), sulfamethoxazole/trimethoprim (SXT), tetracycline (TET), amikacin (AMK), ceftazidime (CAZ), levofloxacin (LVX), meropenem (MEM), amoxicillin/clavulanic acid (AMC), imipenem (IPM), and trimethoprim (TMP).
Alleles of gyrA and parC associated with fluoroquinolone resistance detected by SVM-RSE.
| Organism | Feature | # Res. | # Sus. | Mutations |
|---|---|---|---|---|
| 119 | 0 | |||
| 113 | 0 | |||
| 82 | 2 | |||
| 18 | 1 | |||
| 78 | 1 | |||
| 66 | 1 | |||
| 15 | 0 | |||
| 157 | 2 | |||
| 27 | 0 | |||
| 46 | 2 | |||
| 2 | 4 | D402E, T457A, V598I, Δ815, T818E, Δ824, Δ825, E859V, E886D | ||
| 0 | 12 | F410Y | ||
| 23 | 115 | - | ||
| 4 | 39 | Δ909, Δ910 | ||
| 52 | 137 | - | ||
| 3 | 637 | D678E, A828S | ||
| 1 | 152 | D678E | ||
| 2 | 179 | - | ||
| 1 | 250 | - | ||
| 7 | 475 | D475E | ||
Alleles of gyrA and parC among the top 10 hits associated with either resistance or susceptibility by SVM-RSE were characterized based on mutations relative to the corresponding gene in a reference genome for each organism: NC_002745.2 for S. aureus (N315), NC_022516.2 for P. aeruginosa (PAO1), U00096.3 for E. coli (K12 MG1655). Allele-specific mutations are shown, with known resistance-conferring mutations shown in bold and underlined. Each allele’s frequency among resistant (Res.) and susceptible (Sus.) genomes are shown.
Novel resistance-conferring gene candidates predicted by SVM-RSE.
| ERY | 135/0 | 8.8 | Wildtype | - | ||
| GEN | SA_RS03845 | 134/0 | 13.5 | S409N | ABC transporter-like domain | |
| GEN | 134/0 | 13.5 | T506N, E541K | Anticodon-binding domain | ||
| GEN | 134/0 | 13.5 | S68N, N132K | ABC transporter-like domain | ||
| GEN | 134/0 | 13.5 | D126Y | ComG operon protein 4 family (non-cytoplasmic) | ||
| GEN | 134/0 | 13.5 | E38D, S44T, N112K, S422N, K448N | Thioredoxin-like DSF; FAD/NAD(P)-binding domain | ||
| TET | 131/2 | 8.7 | G60R | C-terminus | ||
| TET | SA_RS11525 | 131/2 | 8.7 | H127Y | - | |
| TET | SA_RS10745 | 130/2 | 8.5 | K641Q | RNA-binding domain S1 | |
| TET | 130/2 | 8.5 | P26L | P-type ATPase TM DSF | ||
| CAZ | PA5359 | 48/1 | 6.3 | DEL 1–24 | N-terminal signal peptide | |
| CAZ | PA1414 | 48/1 | 6.3 | DEL 1–33 | N-terminus | |
| CAZ | PA1942 | 48/1 | 6.3 | DEL 1–32 | N-terminus | |
| CLI | WP_000664727 | 71/5 | 0.7 | Plasmid replication protein, RepL | ||
| GEN | WP_000134308 | 134/1 | 11.6 | Acyl-CoA N-acyltransferase, GNAT domain | ||
| TET | WP_031824444 | 123/2 | 7.8 | Replication initiation factor | ||
| AMC | WP_097223430 | 26/5 | 3.5 | Bacterial toxin RNase RnlA/LsoA | ||
| AMC | WP_000710826 | 26/5 | 3.5 | Antitoxin RnlB/LsoB | ||
| AMC | WP_000774834 | 25/11 | 2.4 | Plasmid stability protein StbB | ||
| CAZ | WP_001620093 | 13/33 | 2.1 | NagB/RpiA transferase-like, DeoR-type HTH domain, DeoR C-terminal sensor domain | ||
| CAZ | WP_000243817 | 82/15 | 7.1 | RmlC-like cupin fold metalloprotein, WbuC family | ||
| CIP | WP_001304218 | 262/386 | 3.9 | Nucleoside triphosphate hydrolase, AAA domain | ||
| GEN | WP_001330846 | 44/1 | 8.5 | TM protein | ||
| IMP | WP_001310177 | 2/25 | 2.0 | PyrBI operon leader peptide | ||
| TMP | WP_000082530 | 59/9 | 4.1 | Mercury transport protein MerC | ||
Selected AMR-conferring core gene alleles and accessory genes predicted by SVM-RSE, for S. aureus, P. aeruginosa, and E. coli. For core gene alleles, genes names and mutations are defined relative to the reference genomes N315 (NC_002745.2) for S. aureus, PAO1 (NC_002516.2) for P. aeruginosa, and K12 MG1655 (U00096.3) for E. coli. The number of resistant (R) vs. susceptible (S) genomes are shown for each feature. Log2 odds ratios (LORs) were computed using weighted pseudocounts to account for zeroes in the contingency table (see Methods for details). Protein features and domains were annotated with either InterPro (for core gene alleles) or InterProScan (for accessory genes). Abbreviations not originally in InterPro annotations are DSF (domain superfamily) and TM (transmembrane).
Fig 4Characterization of mutations in four predicted AMR-conferring alleles in S. aureus.
For each of the predicted AMR-associated genes (a) kdbB, (b) SA_RS10745, (c) oppD and (d) ahpF, the AMR phenotype distributions and locations relative to InterPro structural domains are shown for individual mutations. Mutations in the predicted AMR-associated allele are in orange, while all other mutations observed for that gene are in black (only mutations in at least 5 genomes are shown). For kdbB, the first five annotations in light blue are associated with P-type ATPase. Abbreviations include superfamily (SF), domain superfamily (DSF), nucleoside triphosphate hydrolase (NTH), ATP-binding cassette transporter (ABCt), pyridine nucleotide-diphosphate oxidoreductase (PNDOR), and alkyl hydroperoxide reductase (AHPR), in addition to those used in InterPro annotations.