| Literature DB >> 27297683 |
James J Davis1,2, Sébastien Boisvert3, Thomas Brettin1,2, Ronald W Kenyon4, Chunhong Mao4, Robert Olson1,2, Ross Overbeek2,5, John Santerre6, Maulik Shukla1,2, Alice R Wattam4, Rebecca Will4, Fangfang Xia1,2, Rick Stevens1,2,6.
Abstract
The emergence and spread of antimicrobial resistance (AMR) mechanisms in bacterial pathogens, coupled with the dwindling number of effective antibiotics, has created a global health crisis. Being able to identify the genetic mechanisms of AMR and predict the resistance phenotypes of bacterial pathogens prior to culturing could inform clinical decision-making and improve reaction time. At PATRIC (http://patricbrc.org/), we have been collecting bacterial genomes with AMR metadata for several years. In order to advance phenotype prediction and the identification of genomic regions relating to AMR, we have updated the PATRIC FTP server to enable access to genomes that are binned by their AMR phenotypes, as well as metadata including minimum inhibitory concentrations. Using this infrastructure, we custom built AdaBoost (adaptive boosting) machine learning classifiers for identifying carbapenem resistance in Acinetobacter baumannii, methicillin resistance in Staphylococcus aureus, and beta-lactam and co-trimoxazole resistance in Streptococcus pneumoniae with accuracies ranging from 88-99%. We also did this for isoniazid, kanamycin, ofloxacin, rifampicin, and streptomycin resistance in Mycobacterium tuberculosis, achieving accuracies ranging from 71-88%. This set of classifiers has been used to provide an initial framework for species-specific AMR phenotype and genomic feature prediction in the RAST and PATRIC annotation services.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27297683 PMCID: PMC4906388 DOI: 10.1038/srep27930
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Bacterial species with over 100 distinct susceptible and resistant phenotypes on the PATRIC FTP site (http://ftp.patricbrc.org/).
| Organism | Antibiotic | Susceptible | Resistant | Major data sources |
|---|---|---|---|---|
| carbapenem | 110 | 122 | ||
| ethambutol | 691 | 333 | ||
| ethionamide | 250 | 173 | ||
| isoniazid | 453 | 814 | ||
| kanamycin | 484 | 188 | ||
| ofloxacin | 514 | 239 | ||
| rifampicin | 509 | 666 | ||
| streptomycin | 656 | 490 | ||
| methicillin | 115 | 491 | ||
| beta-lactam | 1504 | 1563 | ||
| co-trimoxazole | 584 | 2126 |
*AMR data may exist for single antibiotics and families of antibiotics.
Figure 1A typical machine learning workflow for AMR phenotype detection.
Genomes for a given species are binned according to whether they are resistant or susceptible to an antibiotic and the k-mer counts are computed for each genome. The k-mer counts are then merged to form a matrix. A machine learning algorithm searches this matrix to find the k-mers that distinguish the resistant and susceptible genomes. These distinguishing k-mers are then used as a “classifier” to predict the phenotype for a new genome.
Figure 2ROC curves for AdaBoost classifiers built for A. baumannii carbapenem resistance (red line with square symbols), S. aureus methicillin resistance (orange line with diamond symbols), S. pneumoniae beta-lactam resistance (green line with triangle symbols) and S. pneumoniae co-trimoxazole resistance (blue line with circle symbols).
Data are the results of cross validation on the set of genomes described in Table 2. Equal numbers of susceptible and resistant genomes were used in the experiment.
Characteristics of the cross validation experiments for the Acinetobacter baumannii, Staphylococcus aureus and Streptococcus pneumoniae AdaBoost classifiers.
| Antibiotic | Available Genomes | Genomes used per trial | Classifier statistics | ||||||
|---|---|---|---|---|---|---|---|---|---|
| RES | SUS | Test set | Training set | AUC | F1 plot point | F1 Score | Accuracy at F1 point | Accuracy at point zero | |
| A. baumannii | |||||||||
| Carbapenem | 122 | 110 | 11 | 99 | 0.964 | 0.193 | 0.950 | 0.950 | 0.945 |
| S. aureus | |||||||||
| Methicillin | 491 | 115 | 11 | 99 | 0.991 | 2.283 | 0.995 | 0.995 | 0.995 |
| S. pneumoniae | |||||||||
| Beta-lactam | 1563 | 1504 | 150 | 1350 | 0.971 | –0.029 | 0.907 | 0.909 | 0.909 |
| Co-trimoxazole | 2124 | 584 | 58 | 522 | 0.942 | –0.189 | 0.880 | 0.878 | 0.876 |
*For each round of cross validation the depicted set size was chosen for the susceptible and resistant genomes.
A description of the top three k-mers found by AdaBoost and their corresponding regions in A. baumannii AB_2008-15-34-7, S. aureus 08-01059, S. pneumoniae ATCC 700669, and SMRU2064.
| Rank | α-value | k-mers with an identical pattern | Corresponding genes | PATRIC annotation |
|---|---|---|---|---|
| A. baumannii, carbapenem | ||||
| 1 | 1.21 | 1 | fig|1221255.3.peg.3516 | LysR-family transcriptional regulator clustered with PA0057 |
| 2 | 0.82 | 4 | fig|1221255.3.peg.3314 | NAD+–asparagine ADP-ribosyltransferase |
| 3 | 0.77 | 3 | fig|1221255.3.peg.631 | Dihydrodipicolinate synthase family |
| S. aureus, methicillin | ||||
| 1 | 2.37 | 3321 | fig|1413344.3.peg.2510,fig|1413344.3.peg.2511,fig|1413344.3.peg.2512 | Glycerophosphoryl diester phosphodiesterase (EC 3.1.4.46); MaoC domain protein; Penicillin-binding protein PBP2a, Penicillin-binding protein PBP2a, methicillin resistance determinant MecA, transpeptidase |
| 2 | 1.16 | 26 | fig|1413344.3.peg.1752 | hypothetical protein |
| 3 | 0.84 | 17 | fig|1413344.3.peg.1698 | Phage protein |
| S. pneumoniae, beta-lactam | ||||
| 1 | 0.74 | 17 | fig|561276.4.peg.338 | Cell division protein FtsI [Peptidoglycan synthetase] (EC 2.4.1.129) |
| 2 | 0.62 | 16 | intergenic region | between Multiple sugar ABC transporter proteins (fig|561276.4.peg.108 and fig|561276.4.peg.109) |
| 3 | 0.60 | 14 | fig|561276.4.peg.338 | Cell division protein FtsI [Peptidoglycan synthetase] (EC 2.4.1.129) |
| S. pneumoniae, co-trimoxazole | ||||
| 1 | 0.66 | 5 | intergenic region | immediately downstream of Dihydropteroate synthase (EC 2.5.1.15) (fig|1313.2194.peg.17) |
| 2 | 0.55 | 2 | fig|1313.2194.peg.1876 | Dihydrofolate reductase (EC 1.5.1.3) |
| 3 | 0.51 | 6 | fig|1313.2194.peg.1874 | Glucan-binding domain / Lyzozyme M1 (1,4-beta-N-acetylmuramidase) (EC 3.2.1.17) |
Genomes were chosen as examples with exact k-mer matches. The complete list of k-mers is described in the supplementary data file.
*Occurs next to fig|1221255.3.peg.3517, Metallo-beta-lactamase superfamily protein PA0057.
Figure 3ROC curves for AdaBoost classifiers built for M. tuberculosis antimicrobial resistance.
Genome sets and classifier statistics are described in Table 3. Classifiers for individual antibiotics were chosen for minimal correlation between AMR patterns and up to 250 resistant and susceptible genomes were used. Equal numbers of susceptible and resistant genomes were used all experiments. All curves depict cross validation experiments and are for ethambutol (red line with square symbols), ethionamide (orange line with diamond symbols), isoniazid (green line with triangle symbols), kanamycin (light blue line with circle symbols), ofloxacin (dark blue line with square symbols), rifampicin (purple line with diamond symbols) and streptomycin (brown line with triangle symbols). The black line with circle plot points depicts the combined multidrug resistance classifier described in Tables 3 and Supplementary Tables S4–6.
Characteristics of the cross validation experiments for the Mycobacterium tuberculosis AdaBoost classifiers.
| Available Genomes | Genomes used per trial | Classifier statistics | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Antibiotic | RES | SUS | Test set | Training set | AUC | F1 plot point | F1 Score | Accuracy at F1 point | Accuracy at point zero |
| Ethambutol | 250 | 250 | 25 | 225 | 0.715 | 0.435 | 0.704 | 0.588 | 0.668 |
| Ethionamide | 173 | 250 | 17 | 153 | 0.812 | −0.136 | 0.766 | 0.768 | 0.771 |
| Isoniazid | 250 | 250 | 25 | 225 | 0.911 | −0.085 | 0.872 | 0.880 | 0.882 |
| Kanamycin | 188 | 250 | 18 | 162 | 0.898 | 0.137 | 0.871 | 0.883 | 0.872 |
| Ofloxacin | 239 | 250 | 23 | 207 | 0.833 | −0.022 | 0.761 | 0.793 | 0.791 |
| Rifampicin | 250 | 250 | 25 | 225 | 0.932 | −0.410 | 0.870 | 0.864 | 0.858 |
| Streptomycin | 250 | 250 | 25 | 225 | 0.795 | −0.485 | 0.722 | 0.642 | 0.712 |
| Combined Set | 83 | 139 | 8 | 72 | 0.969 | −0.577 | 0.950 | 0.950 | 0.928 |
*For each round of cross validation the depicted set size was chosen for the susceptible and resistant genomes.
A description of the top three k-mers found by AdaBoost and their corresponding genomic regions in M. tuberculosis TKK_02_0002, KT-0099, TKK_02_0004 and TKK_03_0024.
| Rank | α-value | k-mers with an identical pattern | Corresponding genes | PATRIC annotation |
|---|---|---|---|---|
| Ethambutol | ||||
| 1 | 0.267 | 1 | fig|1397854.3.peg.744 | DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) |
| 2 | 0.208 | 29 | fig|1400933.3.peg.3985 | Integral membrane indolylacetylinositol arabinosyltransferase EmbB (EC 2.4.2.-) |
| 3 | 0.240 | 1 | fig|1397854.3.peg.3144 | FIG00820705: hypothetical protein |
| Ethionamide | ||||
| 1 | 0.467 | 31 | fig|1295720.3.rna.14 | Small Subunit Ribosomal RNA |
| 2 | 0.292 | 13 | fig|1295720.3.peg.4188 | Partial REP13E12 repeat protein |
| 3 | 0.257 | 8 | intergenic region | Between fig|1295720.3.peg.3517 LSU ribosomal protein L2p (L8e) and fig|1295720.3.peg.3518 LSU ribosomal protein L23p (L23Ae) |
| Isoniazid | ||||
| 1 | 0.982 | 1 | fig|1397854.3.peg.2114 | Catalase (EC 1.11.1.6)/Peroxidase (EC 1.11.1.7) |
| 2 | 0.517 | 3 | fig|1400933.3.peg.1961 | PE-PGRS family protein |
| 3 | 0.244 | 2 | fig|1397854.3.peg.2292 | Polyketide synthase |
| Kanamycin | ||||
| 1 | 0.995 | 31 | fig|1397854.3.rna.19 | Small Subunit Ribosomal RNA |
| 2 | 0.494 | 27 | intergenic region | Between fig|1397854.3.peg.2690, RNA 3′-terminal phosphate cyclase (EC 6.5.1.4) and fig|1397854.3.peg.2691, CBS domain protein |
| 3 | 0.264 | 2 | fig|1397854.3.peg.9 | DNA gyrase subunit A (EC 5.99.1.3) |
| Ofloxacin | ||||
| 1 | 0.471 | 2 | fig|1397854.3.peg.9 | DNA gyrase subunit A (EC 5.99.1.3) |
| 2 | 0.373 | 18 | fig|1397854.3.peg.3738 | PPE family protein |
| 3 | 0.236 | 10 | fig|1397854.3.peg.9 | DNA gyrase subunit A (EC 5.99.1.3) |
| Rifampicin | ||||
| 1 | 0.610 | 1 | fig|1397854.3.peg.744 | DNA-directed RNA polymerase beta subunit (EC 2.7.7.6) |
| 2 | 0.785 | 2 | fig|1397854.3.peg.294 | Nitrate/nitrite transporter |
| 3 | 0.518 | 1 | fig|1397854.3.peg.2114 | Catalase (EC 1.11.1.6)/Peroxidase (EC 1.11.1.7) |
| Streptomycin | ||||
| 1 | 0.386 | 31 | fig|1448395.3.peg.756 | SSU ribosomal protein S12p (S23e) |
| 2 | 0.342 | 3 | fig|1448395.3.peg.6 | DNA gyrase subunit A (EC 5.99.1.3) |
| 3 | 0.200 | 8 | fig|1448395.3.peg.1615 | PE-PGRS family proteinCOX10-CtaB |
Genomes were chosen as examples with exact k-mer matches. The complete list of k-mers is described in the supplementary data file.