| Literature DB >> 29323230 |
Marcus Nguyen1,2,3, Thomas Brettin2,3, S Wesley Long4,5, James M Musser4,5, Randall J Olsen4,5, Robert Olson2,3, Maulik Shukla2,3, Rick L Stevens2,3,6, Fangfang Xia2,3, Hyunseung Yoo2,3, James J Davis7,8.
Abstract
Antimicrobial resistant infections are a serious public health threat worldwide. Whole genome sequencing approaches to rapidly identify pathogens and predict antibiotic resistance phenotypes are becoming more feasible and may offer a way to reduce clinical test turnaround times compared to conventional culture-based methods, and in turn, improve patient outcomes. In this study, we use whole genome sequence data from 1668 clinical isolates of Klebsiella pneumoniae to develop a XGBoost-based machine learning model that accurately predicts minimum inhibitory concentrations (MICs) for 20 antibiotics. The overall accuracy of the model, within ±1 two-fold dilution factor, is 92%. Individual accuracies are ≥90% for 15/20 antibiotics. We show that the MICs predicted by the model correlate with known antimicrobial resistance genes. Importantly, the genome-wide approach described in this study offers a way to predict MICs for isolates without knowledge of the underlying gene content. This study shows that machine learning can be used to build a complete in silico MIC prediction panel for K. pneumoniae and provides a framework for building MIC prediction models for other pathogenic bacteria.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29323230 PMCID: PMC5765115 DOI: 10.1038/s41598-017-18972-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The pipeline used to optimize and train the XGBoost model using known data (blue), and to predict the MIC values for a new genome (yellow).
Accuracies for the entire XGBoost model and for the individual antibiotics.
| Antibiotic | Samples | Accuracya | 95% C.I.b |
|---|---|---|---|
| All | 32705 | 0.92 | 0.92, 0.92 |
| Amikacin | 1667 | 0.97 | 0.96, 0.98 |
| Ampicillin | 1666 | 1.00 | 0.99, 1.00 |
| Ampicillin/Sulbactam | 1664 | 0.99 | 0.99, 1.00 |
| Aztreonam | 1644 | 0.89 | 0.89, 0.90 |
| Cefazolin | 1667 | 0.96 | 0.95, 0.96 |
| Cefepime | 1571 | 0.61 | 0.58, 0.64 |
| Cefoxitin | 1645 | 0.90 | 0.89, 0.91 |
| Ceftazidime | 1667 | 0.92 | 0.91, 0.93 |
| Ceftriaxone | 1667 | 0.89 | 0.87, 0.90 |
| Cefuroxime sodium | 1575 | 0.99 | 0.99, 1.00 |
| Ciprofloxacin | 1664 | 0.98 | 0.97, 0.98 |
| Gentamicin | 1667 | 0.95 | 0.93, 0.96 |
| Imipenem | 1666 | 0.94 | 0.93, 0.95 |
| Levofloxacin | 1666 | 0.97 | 0.96, 0.97 |
| Meropenem | 1660 | 0.93 | 0.91, 0.95 |
| Nitrofurantoin | 895 | 0.96 | 0.95, 0.97 |
| Piperacillin/Tazobactam | 1662 | 0.78 | 0.77, 0.79 |
| Tetracycline | 1667 | 0.89 | 0.87, 0.90 |
| Tobramycin | 1666 | 0.95 | 0.94, 0.96 |
| Trimethoprim/Sulfamethoxazole | 1667 | 0.95 | 0.94, 0.96 |
aAverage accuracy within ±1 two-fold dilution factor, based on a 10-fold cross validation. b95% confidence interval.
Figure 2The accuracy of the XGBoost model for individual MICs. The X-axis of the heatmap shows the actual MIC (μ g/ml) for a bin and the Y-axis lists the antibiotics. The within ±1-tier accuracy of a particular antibiotic-MIC bin is denoted by color, with red and orange being least accurate and bright yellow and green being most accurate. The number within each cell represents the number of samples (genomes with the MIC) within the bin.
Error rates for the entire XGBoost model for the individual antibiotics.
| Antibiotic | Resistant | Susceptible | VMEa | VME 95% C. I.b | MEc | ME 95% C. I.b |
|---|---|---|---|---|---|---|
| All | 21404 | 9410 | 0.031 | 0:028, 0:034 | 0.037 | 0:033, 0:041 |
| Amikacin | 103 | 1320 | 0.298 | 0:239, 0:358 | 0.000 | 0:000, 0:000 |
| Ampicillin | 1635 | 4 | 0.000 | 0:000, 0:000 | 0.000 | 0:000, 0:000 |
| Ampicillin/Sulbactam | 1455 | 90 | 0.003 | 0:000, 0:007 | 0.032 | −0:021, 0:085 |
| Aztreonam | 1407 | 216 | 0.001 | −0:001, 0:002 | 0.398 | 0:333, 0:462 |
| Cefazolin | 1570 | 97 | 0.060 | 0:047; 0:072 | 0.018 | −0:009, 0:046 |
| Cefepime | 963 | 418 | 0.007 | 0:002, 0:012 | 0.137 | 0:077, 0:197 |
| Cefoxitin | 828 | 667 | 0.077 | 0:060, 0:095 | 0.009 | −0:001, 0:019 |
| Ceftazidime | 1488 | 136 | 0.005 | 0:001, 0:008 | 0.123 | 0:069, 0:177 |
| Ceftriaxone | 1528 | 80 | 0.000 | 0:000, 0:000 | 0.188 | 0:101, 0:274 |
| Cefuroxime sodium | 1469 | 91 | 0.002 | 0:000, 0:004 | 0.010 | −0:013, 0:033 |
| Ciprofloxacin | 1424 | 201 | 0.005 | 0:000, 0:010 | 0.025 | 0:000, 0:050 |
| Gentamicin | 683 | 926 | 0.072 | 0:061, 0:082 | 0.009 | 0:001, 0:017 |
| Imipenem | 478 | 1160 | 0.040 | 0:012, 0:067 | 0.032 | 0:021, 0:043 |
| Levofloxacin | 1287 | 349 | 0.016 | 0:006, 0:025 | 0.020 | 0:006, 0:034 |
| Meropenem | 481 | 1134 | 0.048 | 0:034, 0:062 | 0.027 | 0:017, 0:038 |
| Nitrofurantoin | 719 | 55 | 0.018 | 0:009, 0:027 | 0.227 | 0:098, 0:356 |
| Piperacillin/Tazobactam | 1048 | 432 | 0.025 | 0:011, 0:038 | 0.012 | 0:000, 0:023 |
| Tetracycline | 778 | 739 | 0.114 | 0:095, 0:134 | 0.008 | 0:001, 0:015 |
| Tobramycin | 723 | 589 | 0.040 | 0:023, 0:057 | 0.012 | 0:002, 0:022 |
| Trimethoprim/Sulfamethoxazole | 1251 | 416 | 0.119 | 0:098, 0:140 | 0.108 | 0:082, 0:134 |
aVME, Average very major error rate, which is defined as the percentage of resistant samples that are incorrectly predicted to be susceptible by the model.
b95% confidence interval for the average VME and ME rates, respectively.
cME, Average major error rate, which is defined as the percentage of susceptible samples that are incorrectly predicted to be resistant by the model.
The function that is most highly correlated with the MICs for each antibiotic.
| Antibiotic | PATRIC Function | PCC Actual MICa | PCC Predicted MICb | Top 10 Coveragec |
|---|---|---|---|---|
| Meropenem | Class A beta-lactamase (EC 3.5.2.6) => KPC family, carbapenem-hydrolyzing | 0.923 | 0.814 | 0.7 |
| Trimethoprim Sulfamethoxazole | Dihydropteroate synthase type-2 (EC 2.5.1.15) @ Sulfonamide resistance protein | 0.919 | 0.758 | 0.9 |
| Imipenem | Class A beta-lactamase (EC 3.5.2.6) => KPC family, carbapenem-hydrolyzing | 0.891 | 0.905 | 0.8 |
| Cefepime | Class A beta-lactamase (EC 3.5.2.6) => CTX-M family, extended-spectrum | 0.848 | 0.648 | 0.9 |
| Tobramycin | Aminoglycoside N(6′)-acetyltransferase (EC 2.3.1.82) => AAC(6′)-Ib/AAC(6′)-II | 0.837 | 0.853 | 0.8 |
| Tetracycline | Tetracycline resistance regulatory protein TetR | 0.829 | 0.717 | 0.8 |
| Ceftriaxone | Class A beta-lactamase (EC 3.5.2.6) => CTX-M family, extended-spectrum | 0.823 | 0.700 | 0.7 |
| Gentamicin | Aminoglycoside N(3)-acetyltransferase (EC 2.3.1.81) => AAC(3)-II,III,IV,VI,VIII,IX,X | 0.818 | 0.862 | 0.6 |
| Ampicillin Sulbactam | Class A beta-lactamase (EC 3.5.2.6) => TEM family | 0.780 | 0.787 | 0.8 |
| Ciprofloxacin | Integron integrase IntI1 | 0.715 | 0.713 | 0.8 |
| Aztreonam | Integron integrase IntI1 | 0.678 | 0.614 | 0.7 |
| Cefazolin | Class A beta-lactamase (EC 3.5.2.6) => CTX-M family, extended-spectrum | 0.676 | 0.667 | 0.9 |
| Cefuroxime sodium | Aminoglycoside N(3)-acetyltransferase (EC 2.3.1.81) => AAC(3)-II,III,IV,VI,VIII,IX,X | 0.668 | 0.616 | 0.3 |
| Ceftazidime | Integron integrase IntI1 | 0.657 | 0.623 | 0.6 |
| Levofloxacin | probable bacteriophage protein STY1063 | 0.588 | 0.584 | 0.7 |
| Piperacillin Tazobactam | plasmid stabilization system | 0.583 | 0.501 | 0.1 |
| Amikacin | IncI1 plasmid conjugative transfer prepilin PilS | 0.577 | 0.478 | 0.2 |
| Cefoxitin | Class A beta-lactamase (EC 3.5.2.6) => KPC family, carbapenem-hydrolyzing | 0.550 | 0.571 | 0.6 |
| Nitrofurantoin | Integron integrase IntI1 | 0.433 | 0.507 | 0.6 |
| Ampicillin | Class A beta-lactamase (EC 3.5.2.6) => TEM family | 0.357 | 0.327 | 0.0 |
aPearson correlation coefficient between the occurrences of the given function and the actual MICs.
bPearson correlation coefficient between the occurrences of the given function and the predicted MICs.
cThe fraction of the top 10 functions (by PCC) for the predicted MICs that occur in the top 10 for the actual MICs.