Literature DB >> 31273328

Personal clinical history predicts antibiotic resistance of urinary tract infections.

Idan Yelin¹, Olga Snitser¹, Gal Novich², Rachel Katz³, Ofir Tal⁴, Miriam Parizade⁵, Gabriel Chodick^3,6, Gideon Koren^3,6, Varda Shalev^3,6, Roy Kishony^7,8,9.

Abstract

Antibiotic resistance is prevalent among the bacterial pathogens causing urinary tract infections. However, antimicrobial treatment is often prescribed 'empirically', in the absence of antibiotic susceptibility testing, risking mismatched and therefore ineffective treatment. Here, linking a 10-year longitudinal data set of over 700,000 community-acquired urinary tract infections with over 5,000,000 individually resolved records of antibiotic purchases, we identify strong associations of antibiotic resistance with the demographics, records of past urine cultures and history of drug purchases of the patients. When combined together, these associations allow for machine-learning-based personalized drug-specific predictions of antibiotic resistance, thereby enabling drug-prescribing algorithms that match an antibiotic treatment recommendation to the expected resistance of each sample. Applying these algorithms retrospectively, over a 1-year test period, we find that they greatly reduce the risk of mismatched treatment compared with the current standard of care. The clinical application of such algorithms may help improve the effectiveness of antimicrobial treatments.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Anti-Bacterial Agents

Year: 2019 PMID： 31273328 PMCID： PMC6962525 DOI： 10.1038/s41591-019-0503-6

Source DB: PubMed Journal: Nat Med ISSN： 1078-8956 Impact factor: 53.440

Introduction

The resistance of bacterial pathogens to commonly used antibiotics is a growing public health concern, threatening the efficacy of antibiotic drugs[1,2]. The use of antibiotics benefits resistant strains, exacerbating the problem over time[3-7]. At the single patient level, the efficacy of antimicrobial treatment is critically dependent on correctly matching antibiotic choice to the specific susceptibilities of the pathogen[8-10]. Ideally, correct prescription should be based on direct measurement of the antibiotic susceptibilities of the infecting pathogen. In practice, though, to provide rapid clinical intervention, drugs are often prescribed empirically in the absence of culture susceptibility measurements, risking incorrect and therefore, ineffective treatment. This problem is of particular importance in Urinary Tract Infections (UTIs), one of the most frequent community-acquired infections worldwide, for which the common practice of empirical treatment is jeopardized by substantial frequency of resistant infections. UTIs are among the most common bacterial infections, with over 150 million annual cases globally[11]. One of three women will have at least one symptomatic UTI by age 24, and more than one-half will be affected during their lifetime[12]. Treatment of these infections accounts for about 8% of non-hospital usage of antibiotics, often as part of empirical prescription[13-15]. The common etiological agents of UTIs are diverse, including Escherichia coli, Klebsiella pneumoniae and Proteus mirabilis, as well as gram-positive bacteria such as Enterococcus faecalis[16-21]. These pathogens are often resistant to several antibiotics, with resistance rates of infections exceeding 20% for commonly used drugs[17,20,22], emphasizing the challenge of empirically prescribing the specific antibiotics to which the infecting pathogen is susceptible[23]. The risk of an infection being resistant to different antibiotics is associated with patient demographics and comorbidities. Known demographic factors associated with resistance include older age[24], gender[25], ethnicity[26-29], residence in a retirement home[25] and travel to developing countries[28]. Known comorbidities associated with resistance include presence of a urinary catheter[21,25,30], immunodeficiency[25] and diabetes[25]. Notably, most of these associations were identified based on small patient cohorts, typically with high frequencies of antibiotic resistant infections, such as retirement homes, rehabilitation centers, or hospitals. Beyond the patient’s demographics and comorbidities, antibiotic resistance has also been associated with the patient’s past clinical history, including recurrent UTIs, hospitalizations and resistance of previous infections. Risk of resistance to specific drugs have been shown to increase for patients with recurrent UTIs[25,29,31] and past hospitalizations[25,32]. Studies have further shown that resistance of previous infections can be used to predict resistance in future infections[33,34]. However, the time extent of these associations is not well resolved and it is also unclear whether and how these associations vary across resistances to different antibiotics. Availability of antibiotic purchase data reveals patterns of antibiotic use[15,35] and shows that risk of resistance increases with short-term prior use of antibiotics[5,24,25,32,36-38]. Recent large-scale studies showed that, across geography, resistance levels can be correlated with past drug consumption[20,39]. Resistance to fluoroquinolones was correlated with past consumption volumes of these same drugs[20], while resistance to trimethoprim-sulfa was correlated with the volume of consumption of the same drug (cognate) as well as of other drugs of different pharmaceutical classes (noncognate)[20]. Such associations of usage of a given antibiotic with future resistance to other antibiotics can appear indirectly through co-occurrence among resistance mechanisms (for example, if resistance to drug X and resistance to drug Y are correlated, then direct selection by drug X to X-resistance may result in association of drug X with resistance to drug Y). Resolving direct and indirect selection for resistance has been challenging in absence of resistance co-occurrence data. Negative associations, where drug use is anti-correlated with resistance, have also been observed, but it has been difficult to discern the direction of causality[20,40]. Finally, the time extent of these positive and negative associations of resistance with prior antibiotic usage is not well resolved. Here, we present an analysis of a large population of UTI patients to unravel predictive features of antibiotic resistance and test how these features can be combined to recommend optimal drugs for empirical treatment. We analyze a patient-level longitudinal dataset of community and retirement-home acquired UTI cultures collected by Maccabi Healthcare Services (MHS), Israel’s second largest Health Maintenance Organizations, serving a diverse population of ~2 million patients. Analyzing demographic factors, we find strong drug-specific associations with resistance. Then, comparing resistance data of multiple infections from the same patient, we unravel a decaying long-term memory-like correlation of resistance over time. We also combine these culture records with patient-linked records of antibiotic use to quantify the extent and time of direct and indirect correlations of antibiotic use with resistance at the single-patient level. Finally, combining these demographic and historical factors for personalized predictions of resistance, we develop machine learning models which we demonstrate can substantially improve upon physician prescribed empirical antibiotic treatment.

Results

We retrieved data of all positive urine cultures of MHS patients for the ten-year period between 01-July-2007 and 30-June-2017, as well as patient demographics and record of antibiotic purchases for these patients (Online Methods). Among all ~2 million MHS patients, there were 711,099 recorded positive urine samples from 315,047 patients total. For each positive sample, one or more bacterial species were isolated and characterized. The dataset included species-level identification of these isolates as well as resistance profiles measured by VITEK2, reinterpreted in accordance with CLSI guidelines (Sensitive, Intermediate, and Resistant). As a multi-species infection can be treated by a given drug only if none of the isolates is resistant to it, we define for each antibiotic and each sample the “sample resistance”: the maximal resistance across all isolates from the same sample (96.4% of samples were identified as single species and their resistance profile is simply defined as the resistance profile of their single isolates). All of MHS’s country-wide clinical tests are performed centrally (Online Methods), allowing reliable comparison across patients and time. In our analysis, we focus on resistance to the 6 drugs that were most commonly prescribed as part of empirical treatment of these infections (identified as the drugs commonly given on the same day samples were sent for culture; Table 1 and Supplementary Table 1; Online Methods). Resistance measurements for these antibiotics were carried out routinely over the entire ten-year period (except for cephalexin for which measurements are available only since 2014, Extended Data Fig. 1).

Table 1:

List of antibiotic resistances analyzed in the study

Antibiotics	Class
Trimethoprim-Sulfa	DHFR inhibitor
Ciprofloxacin	Fluoroquinolones
Nitrofurantoin	Nitrofuran
Amoxicillin-CA	Penicillin-β-lactamase inhibitor
Cefuroxime axetil	Cephalosporin
Cephalexin	Cephalosporin

Extended Data Figure 1:

Availability of resistance measurements over time.

For each of the 6 antibiotics, the fraction of urine samples for which resistance was measured, overall (black) and for each of the three most common species (colors), is plotted across the 10-year sampling period. Also indicated are the time ranges used for model Training (green horizontal bars) and Testing (red bars). Time periods during which measurements of resistance to cephalexin were scarce were removed from analysis (gray bar).

Three species, E. coli, K. pneumoniae and P. mirabilis, account for 85% of isolates (70%, 10%, 5%, respectively; Fig. 1a). These pathogens varied in their resistance profiles (Fig. 1b). Notably, for all 6 antibiotics, the chance of resistant infection is significant, indicating that antibiotic treatment efficacy could often be undermined. These population-level frequencies of resistance were fairly static over time (e.g. trimethoprim-sulfa or nitrofurantoin) with only mild changes observed in certain antibiotics and specific species (Fig. 1c and Extended Data Fig. 2). The diversity of pathogens and resistance patterns underscores that antibiotic prescriptions must be tailored to match the resistance profile of the infection[41], motivating the development of methods to better predict resistance[23].

Figure 1:

Frequency of bacterial species and antibiotic resistance in urinary tract infections.

(a) Species abundance across the entire UTI dataset (July 2007-June 2017, 711099 samples). (b) The frequency of resistance and intermediate resistance to the 6 focal antibiotic drugs for the three most common bacterial species and for the urine sample as a whole (“sample”, defined as the highest resistance measured for each isolate in the sample). Dark to light shades represent resistant, intermediate and sensitive, respectively. (c) Frequencies of resistance for each of the three common species (colored lines) and the sample resistance (black lines) over the 10 year sampling time, for two representative antibiotics: trimethoprim-sulfa (top) and ciprofloxacin (bottom; see Extended Data Fig. 2 for all antibiotics). Data points represent quarterly averages.

Extended Data Figure 2:

Frequency of resistance over time.

Frequencies of resistance for each of the three common species (colored lines) and the overall sample (black lines) over the 10 year dataset. Empty time intervals correspond to periods during which resistance was not frequently measured (matching the gray horizontal bar of Extended Data Fig. 1).

Strong antibiotic-specific correlations of resistance with demographic factors

Consistent with previous studies, UTIs were much more common for females than males (~88% females)[11,26] and had qualitatively different age distributions (Fig. 2a)[11,18,26,42,43]. For each antibiotic, we performed multivariate logistic regression for the odds of resistance η =P/(P + P) as a function of age, gender, retirement home residence, pregnancy, date of sampling (time since July 2007) and season of sampling (Online Methods: Logistic regression “Demographics” model; Intermediate levels of resistance were classified as sensitive since they do not exclude prescription of an antibiotic, especially given the higher efficacy of antibiotics in urine infections[44]). We also calculated, for each of the 6 antibiotics, the frequencies of resistance of the urine samples across age, separated by gender, pregnancy and retirement home residence (Fig. 2c and Extended Data Fig. 3a).

Figure 2:

Antibiotic-specific associations of resistance with demographic factors.

(a) Distribution of urine cultures across major demographic factors: age, gender (top, females; bottom, males), pregnancy (red) and retirement home residence (dark). (b) Adjusted odds ratios of resistance for each demographic variable (see Logistic regression – demographics in the Online Methods, and see Supplementary Table 2 for all adjusted and unadjusted regression coefficients). Asterisks indicate statistical significance and non-significant odds ratios (P>0.01) are shown as blank. (c) Frequency of resistance as a function of age showing qualitatively distinct patterns for three representative antibiotics. UTI samples are separated into five non-overlapping categories: men not residing in retirement homes (blue), men residing in retirement homes (dotted blue), women not pregnant and not residing in retirement homes (magenta), women in retirement homes (magenta dotted), and pregnant women (red). See Extended Data Fig. 3 for all antibiotics.

Extended Data Figure 3:

Odds of resistance as a function of age for different demographic groups.

Frequency of resistance to each of the 6 antibiotics, in each of 10 age bins (0,10,…,100 years). (a) Frequencies of resistance for five non-overlapping demographic groups: men not residing in retirement homes (blue), men residing in retirement homes (dotted blue), women not pregnant and not residing in retirement homes (magenta), women in retirement homes (magenta dotted), and pregnant women (red). (b) Comparing the overall frequency of resistance to the 6 drugs for women and men across age.

Age, gender, pregnancy and residence in retirement home had strong, yet differential, association with resistances to the 6 antibiotics. For all 6 antibiotics, risk of resistance strongly increased with age and with retirement-home residence and decreased for females and pregnancy (Fig. 2b,c; see Supplementary Table 2 for regression coefficients and 95% Confidence Intervals, CI). The odds ratio (OR) for age (the ratio between the adjusted odds of resistance in the oldest and youngest age groups; Online Methods) differed widely among the 6 measured antibiotics, ranging from 2 in trimethoprim-sulfa and amoxicillin-CA to more than 8 in ciprofloxacin (Fig. 2b and Supplementary Table 2). For some antibiotics, the risk of an infection being resistant were non-monotonic with age, having an additional peak of higher risk at infancy or childhood (e.g., nitrofurantoin; Fig. 2c). For all antibiotics, females had lower odds of resistance, yet the odds ratios varied substantially among the different antibiotics (from OR=0.95, 95% CI: 0.93–0.97 for trimethoprim-sulfa to OR=0.38, 95% CI: 0.38–0.39 for cefuroxime axetil). These lower odds of resistance for females were often lowered even further with pregnancy (as much as OR=0.48, 95% CI: 0.45–0.50 for ciprofloxacin; Supplementary Table 2). We also identified an interaction between gender and age leading to heterogeneous patterns for males and females (e.g. trimethoprim-sulfa, nitrofurantoin) and even to opposing interactions of gender with specific ages groups (e.g. ciprofloxacin; Fig. 2c). While, across all antibiotics, resistance was higher for residents of retirement homes, the correlation with age within this group was reversed: the frequencies of resistance for retirement home residence did not increase, and even slightly decreased, with age (Fig. 2c and Extended Data Fig. 3a; possibly representing differential survivorship). The date of sample had some association with resistance to specific antibiotics, most notably cefuroxime axetil, while season had a relatively weak correlation with resistance for any of the drugs (Fig. 2b). Comparing the frequencies of resistance across the different antibiotics, we found that relative resistance rates changed between age groups (Extended Data Fig. 3b). We concluded that among the different demographic factors associated with risk of resistance, age, gender and residence in retirement homes are the strongest, with resistances to different antibiotics differentially correlated with these factors and the interactions among them.

Long-term correlations of resistance among same-patient urine samples

Moving from demographics to clinical history, we analyzed correlations of resistance across same-patient infections, revealing “memory-like” long-term auto-correlations and a timeless patient-specific tendency for resistance. Analyzing all same-patient pairs of samples, we calculated for each antibiotic the risk ratio for resistance of the second sample given the resistance of the first sample (ζ = [N/(N + N)]/[N/(N+N)], where the N’s are number of same-patient sample pairs with the specified resistance phenotypes; for example, N is the number of sample pairs in which the first sample is resistant to the antibiotic and the second sensitive; Online Methods). Calculating ζ as a function of the time difference t = t1 − t2 between the two samples in each pair, we find that, for all antibiotics, these risk ratios are highest for short time differences and decay as the time difference increases (Fig. 3; Supplementary Fig. 1). Sample pairs less than a week apart showed substantially higher risk ratios, which we interpreted as repeated measurements of the same-infection (Supplementary Fig. 1). Considering only correlations between sample pairs more than a week apart, we found that the risk ratios decay and finally converge, at long time differences, to an asymptotic constant larger than 1 (the risk ratios are well fitted by the sum of an exponent and a constant, ; Fig. 3a,b and Supplementary Fig. 1). The memory-like decay time τ of correlations among samples was longer than six months for most antibiotics and even exceeded a year for ciprofloxacin resistance, which is consistent with and even longer than previously observed (Fig. 3c)[34]. The maximal risk ratios considering previous resistance reached about 8 for short time differences for some antibiotics and typically remained larger than 3 even for samples taken half a year apart (Fig 3a,b and Supplementary Fig. 1). At much longer times, the risk ratio decayed, and ζ converged to a constant, but interestingly it did not fully diminish, but rather converged to values larger than 1 (Fig. 3a,b,d, green), representing timeless patient specific tendencies for resistance. These decaying memory-like and timeless correlations could stem from repeated same-strain infections or from correlations with other patient-specific factors. In either case, these strong memory-like and timeless correlations can potentiate predictions of resistance.

Figure 3:

Long term “memory” of resistance across same-patient samples.

(a,b) Risk ratio of the resistance of a urine sample given a record of a resistant versus sensitive earlier sample from the same patient, as a function of the time difference between the two samples, for trimethoprim-sulfa (a) and ciprofloxacin (b, See Online Methods and Supplementary Fig. 1 for all antibiotics). Risk ratios are well fitted with , representing a time-decaying correlation (“memory”, yellow) and a time-independent correlation (“patient propensity”, green) among sample pairs. The magnitudes of these terms are shown as stacked bars on the right and the memory time (τ) is indicated across the time axis (yellow arrow). Gray triangle and diamond represent trimethoprim-sulfa and ciprofloxacin respectively, linking between the different panels. (c) Time scale of the memory of resistance τ for the 6 different antibiotics (correlated with the yellow arrows in panels (a) and (b). (d) The magnitude of long-term and timeless memory for the different antibiotics (yellow, green bars, respectively).

Direct and indirect selection for resistance following past antibiotic purchase

Next, we linked the infection dataset with patient-resolved antibiotic purchase data. For each patient with recorded UTI samples, we retrieved all records of antibiotic purchase made during the twenty year period from 1-Jan-1998 to 30-Jun-2017. For analysis, we used the 11 most purchased drugs (Supplementary Table 1). Antibiotics identical or highly similar to the ones used for resistance measurement were assigned as cognate antibiotics of these resistance measurements (Online Methods; Supplementary Table 1). For each UTI sample, we counted the number of purchases made by the same patient of each of the 11 drugs at distinct time intervals prior to the sample (Online Methods). Then, we applied multivariate logistic regression to correlate resistance to each of the 6 antibiotics with these drug purchase counts (Online Methods: Logistic regression “Purchase history”; Fig. 4a, Extended Data Fig. 4a).

Figure 4:

Direct association of past purchase with its cognate resistance leads, through association among resistances, to indirect association of purchases with noncognate resistances.

(a) Multivariate logistic regression models for the association of resistance to trimethoprim-sulfa (left) and ciprofloxacin (right) with past purchases of the indicated drugs at the indicated time intervals prior to infection (“Total”, See Extended Data Fig. 4a for all antibiotics; Logistic regression - purchase history in Online Methods). Values represent the odds ratios for a single purchase of a specific drug at a specific time interval (color map, stars for statistical significance as indicated, non-significant values, with Bonferroni corrected P>0.05, are blanked). A long term association is observed between resistance and past purchase of its matching (cognate, arrows) as well as with non-cognate antibiotics. (b) Logistic regression model as in (a) adjusted for cross-resistance. This adjusted model diminishes or even completely removes noncognate drug-to-resistance associations while fully preserving the cognate associations (“Direct”, See Extended Data Fig. 4b for all antibiotics; arrows; cyan, trimethoprim-sulfa; magenta, ciprofloxacin). (c,d) Association of resistance to trimethoprim-sulfa (c) and ciprofloxacin (d) with purchases of these two drug (cyan and magenta, respectively). Note differences between total (dashed lines) and direct (solid lines) effects for cognate (thick lines) versus noncognate (thin lines) drugs.

Extended Data Figure 4:

Odds ratios of resistance to each of the antibiotics for past purchases of different drugs across a range of purchase-to-sample time intervals: adjustments for demographics and cross-resistance.

(a) Multivariate logistic regression models for the association of each antibiotic resistance with past purchases of the indicated drugs not accounting for cross-resistance (Online Methods: Logistic regression “Purchase history”. Same graphical scheme as in Fig. 4a,b). (b) Logistic regression model as in (a) adjusted for cross-resistance (Online Methods: Logistic regression “Purchase history adjusted for cross resistance”). (c) Logistic regression model as in (a) adjusted for demographics (Online Methods: Logistic regression “Purchase history adjusted for demographics”. Gray asterisks indicate statistical significance and non-significant values, with Bonferroni corrected P>0.05, are blanked.

We identified strong long-term patient-level associations of resistance with past purchase of both cognate and noncognate antibiotics. These purchase-resistance associations peaked at time differences of one to two weeks between purchase and sample, and often lasted for months and even longer than a year (Fig. 4a, Extended Data Fig. 4a). For example, the associations between purchase of ciprofloxacin and its cognate resistance had an odds ratio of 1.5 after half a year and remained as large as 1.2 even two years past purchase (Fig. 4a). Some weak negative associations were also identified (e.g., ciprofloxacin resistance was negatively correlated with past use of amoxicillin and cefalexin, Fig. 4a). Yet, the magnitude of these negative correlations decreased after adjusting for demographics, suggesting that they stemmed indirectly from correlations of purchases and resistance with demographics (Online Methods: Logistic regression, “Purchase history adjusted for demographics”; Extended Data Fig. 4c). Notably, drug purchases were associated not only with their expected cognate resistances. Indeed, use of some first-line antibiotics, such as ciprofloxacin and ofloxacin, increased the risk of a future resistance to a wide range of mechanistically diverse antibiotics. These abundant long-term positive associations between resistances and past purchase of noncognate drugs did not stem from correlations of purchases and resistance with patient demographics; they remained strong even when adjusting for demographics (Extended Data Fig. 4c). Together, these results support strong and long-lasting patient-level associations of antibiotic resistance with past use of both cognate and noncognate antibiotics. Exposing direct drug-to-resistance associations by disentangling correlations among resistances, we found that drug usage specifically selects for its cognate resistance at the single-patient level. Across the sample dataset, resistances to different antibiotics within class and even resistances to antibiotics of different classes were highly correlated (cross resistance; Extended Data Fig. 5). These inherent correlations among resistances suggest that observed associations between resistance to a given drug A and past purchase of a different non-cognate drug B may arise indirectly through selection for resistance B and association between resistance to B and resistance to A. Mathematically discerning these direct from indirect effects is only possible when multiple resistances are considered[20,45]. As our dataset contained measurements of multiple resistances for each sample, we were able to disentangle direct from indirect associations by adjusting the logistic regression for other measured resistances (Online Methods: Logistic regression “Purchase history adjusted for cross-resistance”). In this cross-resistance adjusted analysis of purchase-resistance associations, the noncognate associations between drug purchases and resistance substantially diminished and even disappeared while the associations between cognate drug-to-resistance pairs persisted (Fig. 4b, Extended Data Fig. 4b). For example, considering the associations between purchases of trimethoprim-sulfa and ciprofloxacin to their cognate resistances, we observed that the unadjusted and cross-resistance adjusted associations were of similar magnitude for cognate drugs (Fig 4c,d, thick solid vs. thick dashed lines), while the total association of drugs with their noncognate resistance decreased considerably once the indirect effect was removed (Fig 4c,d, thin solid vs. thin dashed lines). Our analysis therefore identifies both direct and indirect selection for resistance at the single-patient level lasting months and even a year following drug use.

Extended Data Figure 5:

Correlations among resistances to different antibiotics.

Correlation among resistance measurements for each pair of antibiotics across all samples for which both resistances were measured. Cephalexin and cefuroxime axetil, which have a particularly high correlation (marked with ‘x’), are treated as “analogous” in the analysis of indirect effects of purchases on resistance (Online Methods: Logistic regression “Purchase history adjusted for cross-resistance”).

Predicting antibiotic resistance at the single-patient single-infection level

As resistance is strongly associated with demographics, sample history and purchase history, we wanted to determine the predictive power of these factors individually and when combined together and identify potential interactions among them. Models of Logistic Regression and Gradient Boosting Decision Trees (GBDT) were trained and tested on temporally separate periods: training period of 9 years from 1-July-2007 to 30-June-2016 and testing period of the following year, from 1-July-2016 to 30-June-2017 (for cephalexin, training period was modified to avoid a time period during which resistance to this drug was not routinely measured, Extended Data Fig. 1). This temporal separation between training and testing data emulates forecasting resistance, as would be the case in real-life implementation of such a method. Area Under the Curve (AUC) of Receiver Operating Characteristic was used as a standard measure for predictive power[46]. Logistic regression and GBDT models provided personalized drug-specific prediction of resistance. Individually considering demographics, sample history and purchase history, we find that each of these sets of features had significant predictive power, with their relative prominence varying across the different antibiotics (Extended Data Fig. 6). Combining all these feature sets in a complete logistic regression model (Online Methods: Logistic regression “Complete”), much increased predictability of resistance (AUC ranged from 0.7 for amoxicillin-CA to 0.83 for ciprofloxacin; Extended Data Fig. 6). Predictability of resistance was slightly increased by the GBDT models (Online Methods). For each given antibiotic k, considering the model-assigned resistance probabilities of each sample m, we can define threshold values that allow substantial reduction in risk of resistance while allowing treatment of the vast majority of the infections (Fig. 5a). Setting this threshold to allow treatment of 75% of samples by each of the 6 drugs, the vast majority of infections can be treated with at least one of the drugs (92%, Extended Data Fig. 7). Finally, we found that these model-assigned probabilities of resistance can markedly differentiate samples resistant to one drug and sensitive to another (Fig. 5b, odds ratio of 3.9 for nitrofurantoin versus cefuroxime axetil, P<10–100, Fisher exact; See Supplementary Fig. 3 for all other drug pairs). In total, these results demonstrate that machine learning models can provide high and specific predictability of antibiotic resistance at the single-patient and single-infection levels, motivating the development of algorithmic drug recommendations and comparison of their performance with current standard of care.

Extended Data Figure 6:

Model performance on test and training data. Area Under Curve (AUC) for Receiver Operator Characteristic for prediction of resistance based on demographics, sample history and purchase history, individually and in a complete model combining all feature sets.

Each feature set was modelled using Logistic Regression (LR), and the complete model was modelled by both LR and Gradient Boosting Decision Trees (GBDT). To identify overfitting, model performance on the testing dataset (grey) was contrasted with model performance on the training dataset (black; Supplementary Fig. 2 for definition of training and test time periods). Mild level of overfitting is seen for all drugs except trimethoprim which showed no over fitting.

Figure 5:

Algorithmically suggesting antibiotic prescription for empirical treatments can much improve upon the current standard-of-care.

(a) For each of the 6 antibiotics, we calculated the fraction (top) of resistant (red) and sensitive (green) samples, as well as the risk of resistance (bottom), for all samples within the one-year test period whose complete-model machine-learning assigned probabilities of resistance were below a set threshold P (x-axis, see Supplementary Fig. 2 for all antibiotics and more formal definitions). At P = 1 the risk of sample resistance equals the population-wide risk of resistance (dotted red line). Setting P=0.12 would permit treatment of 75% of these infections with much reduced risk of resistance compared to population-wide risk (48% reduction, down-pointing arrow). (b) Differentiation between samples resistant to cefuroxime axetil and sensitive to nitrofurantoin (red) and vice versa (blue) by their model-assigned resistance probabilities (odds ratio of 3.9 for red points below the diagonal and blue points above it; P<10−100, Fisher exact; see Supplementary Fig. 3 for all pairs of antibiotics). (c) Physician’s frequency of mismatched prescriptions across all SDET cases (dark bar) was slightly better than null expectation for randomly prescribing drugs with equal probabilities (Random “dice”, magenta dashed, P<10−10) or for randomly permuting the physicians’ prescriptions (Random permutations, cyan dashed, P=2.5×10−5). These mismatch treatment rates were substantially reduced by the machine-learning (ML) based recommendations (light bars,), either unconstrained (magenta hatched, P<10−10) or constrained to recommend drugs at the exact same frequencies prescribed by the physicians (cyan hatched, P<10−10). (d) Top, distribution of the drugs prescribed by the physicians (dark bar), by the constrained algorithm (cyan-hashed light bar, constrained to be equal to the Physician’s) and by the unconstrained algorithm (magenta-hashed light bar). Bottom, for each of these prescription models, the frequency of mismatched treatment for each of the drugs is indicated, normalized by the expected mismatch frequency for random drug prescription (the average rate of resistance to the drug across the SDET population).

Extended Data Figure 7:

The fraction of samples that can be treated by at least one drug given set thresholds on the single-drug resistance probability scores.

Given the complete-model assigned probabilities of resistance of each sample m to each antibiotic k, we calculated the fraction of samples, within the one-year test period, that have at least one drug with resistance score below a threshold. This fraction is calculated assuming that the threshold used to determine resistance of single drugs is either: (a) the same probability threshold P for all drugs (counting all samples for which for at least one antibiotics k), or (b) the same rank threshold r for all drugs, counting all samples for which for at least one antibiotics k, where is the probability threshold of drug k that include a fraction r of the samples.

Algorithmic drug recommendations substantially reduce mismatched treatments

Analyzing prescriptions given by physicians as part of current standard of care, we found that these prescriptions significantly, yet not strongly, reduce the rate of mismatched treatments, compared to null random expectations. We identified all cases of “same-day empirical treatments” (SDETs), where a patient purchased an antibiotic on the same day they had a UTI sample sent for culture (11,952 cases within the one year test period; as culture tests take 2–4 days, these prescriptions were necessarily given empirically). Retrospectively contrasting these empirically prescribed drugs with the measured resistance of their corresponding samples, we found an overall 8.5% [95% CI: 8.03–9.05] rate of mismatched treatments (the sample was resistant to the prescribed antibiotic). This rate was significantly, yet not strongly, lower than expected by chance in two different null models. First, randomly choosing for each of these SDET cases one of the 6 drugs with equal probabilities, we found an expected null mismatched treatment rate of 10.2% [95% CI: 9.88–10.52], which is 20% higher than observed in physician’s prescriptions (P<10–10, Bootstrapping, Online Methods; “Dice” model, Online Methods, Fig. 5c). Second, randomly permuting among the SDET cases the same pool of drugs prescribed by the physicians, we found an expected null mismatched rate of 9.4% [95% CI=9.00–9.71], namely 10% higher than observed (P=2.3×10−5, Bootstrapping, Online Methods; “Random permutation” model, Online Methods, Fig. 5c). Together, these results indicate statistically significant, but mild, patient-specific optimization of treatment in standard clinical practice. Developing algorithmic drug recommendations based on the machine-learning predictions of resistance, we found that they can greatly improve upon these standard-of-care rates of mismatched empirical treatments. To computationally recommend drugs based on the machine-learning assigned probabilities of resistance , we considered two algorithms, unconstrained and constrained (cost-adjusted; Extended Data Fig. 8). In the unconstrained model, we simply chose for each of the SDET cases the antibiotic for which the model predicted risk of resistance is lowest (minimal , “Unconstrained algorithm for drug choice”, Online Methods). Comparing these recommendations to the measured antibiotic susceptibility of the sample, we found a mismatched rate as low as 5.1% [95% CI: 4.69–5.48] namely 42% lower than observed in the physician prescribed treatment of these exact same cases (P<10–10, Bootstrapping, Online Methods; Fig. 5c). The chance of mismatched treatment was lower than expected not only in total, but across each of the prescribed drugs (Fig. 5d, top). Importantly though, the distribution of drugs recommended by this unconstrained algorithm was very different than the distribution of drugs prescribed by physicians (Fig. 5d, bottom). In particular, the algorithm almost entirely refrained from prescribing trimethoprim and cefalexin, for which population-level rates of resistance were high. Optimal unconstrained algorithmic recommendations can thus dramatically reduce the chance of mismatched treatments, yet do so by drastically changing the overall distribution of prescribed drugs.

Extended Data Figure 8:

Schematic diagram of ML-trained prescription models.

A set of samples with features of demographics, sample resistance history and antibiotic purchase history labelled for resistance to each antibiotic k (‘Train set’) is used to train an antibiotic resistance prediction model (Online Methods: Logistic regression, terms #1–#9). The model is applied to an SDET set of cases from the test period to calculate probabilities of resistance to each antibiotic. In an unconstrained model the antibiotic with minimal probability for resistance is suggested. The calculated probabilities of resistance together with the respective prescriptions of the SDET set of cases are used to add a “cost” term. In a constrained drug prescription model, the antibiotic with the minimal cost-adjusted probability is suggested.

A model constrained to prescribe each drug at the exact same frequency it was used by physicians can still greatly reduce the rate of mismatched treatments. The overall rate of prescription of each drug could reflect considerations other than minimizing mismatched treatment (for example, ease of use, side effects, and tendency to avoid drugs for which population level resistance rates are low). To address these considerations, here referred to as costs, we developed a constrained, cost-adjusted, algorithm (“Constrained (cost-adjusted) algorithm for drug choice”, Online Methods). To recommend drugs that best minimize the population rate of mismatched treatments while maintaining a given population-level frequency of use of each drug, the algorithm assigns an effective cost for each drug and adjusts their values to match the required distribution of drug use (Online Methods). Applying this model to the SDET cases while adjusting the drug-specific costs such that the overall distribution of recommended drugs precisely matches the distribution of the drugs prescribed by physicians, this model gave a mismatched treatment rate of 5.9% [95% CI: 5.47–6.33], slightly above the unconstrained model but still 30% lower than the physician’s rate (P<10–10, Bootstrapping, Online Methods). The improvements in mismatch rate were general across the population and robust to the clinical definition of resistance (Extended Data Fig. 9). These results show that algorithmically suggested drug prescriptions can substantially reduce the risk of mismatched treatments even when allowed to barely permute the same pool of drugs among patients.

Extended Data Figure 9:

Robustness of ML-trained prescription models across age and gender and with respect to the clinical definition of resistance.

(a) Frequency of mismatched treatment across all SDET cases, comparing physician’s prescriptions (dark bar) to algorithmic recommendations by the constrained and unconstrained models (cyan and magenta hatched, respectively) for females (top) and males (bottom) separated into 3 major age groups. (b) Frequency of mismatched treatment across all SDET cases (Online Methods), when classifying “Intermediate” level of resistance as “Resistant”. Comparing mismatch frequencies of physicians’ prescriptions (dark bar) to algorithmic recommendations (light bars), either unconstrained (magenta hatched) or constrained for recommending drugs at the same ratio as physicians (cyan hatched). Also presented are the null expectations for randomly prescribing drugs with equal probabilities (Random “Dice”, magenta dashed) or for random drug permutations (Random permutations, cyan dashed).

Discussion

Analyzing a large longitudinal medical dataset, we demonstrate high predictability of antibiotic resistance in UTIs, which can guide culture-free recommendation of treatment to lower the chance of mismatched empirical treatment. The best predictive power of resistance comes from combining patient-specific data of demographics, antibiotic resistance profile of past UTIs and purchase history of antibiotic drugs. Considering demographics, we found that - age, gender, pregnancy, and residence in a retirement home were strongly associated with resistance, showing complex and non-monotonic patterns specific to each of the different antibiotics. Utilizing repeated same-patient cultures in our database, we identified and characterized a personal component of memory-like correlations of resistance, lasting for many months and even over a year. These long-term correlations can represent recurrent infections with the same strain, or correlations with other patient-specific factors. Either way, we show that they further contribute to predictability of resistance. Long-term associations were also observed between resistance and past drug purchases. Resistance to a given drug had long-lasting associations not only with past usage of this same drug, but also with other, even mechanistically unrelated, drugs. Yet, adjusting for correlations among resistances exposed direct selection where drug use led specifically to its own cognate resistance at the single patient level. These results are consistent with drug use directly selecting, at the single-patient level, for strains resistant to it and thereby selecting indirectly, likely through frequent co-occurrence, to resistance to other antibiotics. Combining these demographic, sample history and drug history data can guide algorithmic recommendations for empirical treatment which substantially improve upon current standard of care. Comparing empirical prescriptions given by physicians to random prescriptions, we found that physicians personalize drug prescriptions in ways that significantly reduce the chance of mismatched treatment. However, machine-learning models could still substantially improve upon these already reduced rates. Indeed, the rates of mismatched treatment would have been reduced by over 40% were the drugs with lowest machine-learning predicted chance of resistance chosen. These machine-learning recommendations are inherently biased towards recommending drugs with overall low levels of resistance, for example ciprofloxacin, which is often intentionally avoided in standard clinical practice precisely to hinder the spread of resistance. We therefore also developed a model that assigns a cost for each drug, thereby constraining the rate of recommendation of each drug to the rate at which it was prescribed by physicians. Importantly, even when constrained to merely permute among the patients the exact same pool of drugs prescribed by physicians, the model can still reduce the rate of unmatched treatment by over 30% compared to standard care. Some aspects of the data may complicate the interpretation of our results. As purchase of a drug does not fully guarantee its concurrent use, later usage of a purchased drug may bias our results towards higher odds ratio for purchases made long before infection. Conversely, we can not exclude that some patients have used antibiotics they did not purchase through MHS, which will bias our results towards lower odds ratio for drug purchases. Additionally, past antibiotic purchase and treatment might be associated with different clinical conditions, not considered in this study, such as comorbidities, hospitalizations and catheter use. While these factors are less likely to directly affect resistance rates, they are likely associated with risk of infections. Also, although culture data is routine for suspected UTIs, sending urine for a culture test is not obligatory. As a result, we assume some UTIs would be empirically treated without any culture record, and there is likely higher propensity towards culture testing of infections suspected of being resistant. This would generate bias towards measurement of more resistant samples, resulting in overestimation of the total frequency of resistance, especially for first-line treatment and potentially in overestimation of the general rate of mismatched treatment. Another bias due to elective culture testing would be for cultures taken following treatment failure. Such bias can again generate bias towards measurements of more resistant samples, and it can further contribute to the strong short-term association of drug purchases with resistance, especially for first-line antibiotics. Lastly, the extent of this bias towards culture testing specifically following treatment failure could itself depend on demographics, which can bias correlations of demographics with resistance. While we cannot exclude these biases, our analysis demonstrates that, with all of these potential biases, resistance of urine infections can be well predicted based on the specific demographics and clinical history of the patient, and that algorithmic drug recommendations can substantially reduce the chance of prescribing an antibiotic to which the infection is resistant. The substantial reduction in the rate of mismatched treatment enabled by machine learning recommendations based on the patient’s record and clinical history lays the basis for a future paradigm where clinicians will routinely consult such algorithms for prescription of patient-tailored antibiotic treatment. We expect that algorithmic approaches similar to the one described here will be implemented, either centralized or locally, in healthcare systems where vast longitudinal electronic health records are available. While the key factors identified here can serve as the basis of such approach, the specific model, the exact coefficients and relative weights of predictors, will have to be adjusted for each country or region. Indeed, these algorithms can also be dynamically and adaptively updated in real time as new data is acquired. We expect that inclusion of additional patient specific factors, such as comorbidities and hospitalizations, as well as of real-time information on infections, resistance and drug usage in other patients in a range of geographical proximities[39], can further increase resistance predictability. These models could also be used to adjust for patient-specific drug “costs”, thereby accounting for allergies and other patient specific drug restrictions. In the longer term, these clinical-record and epidemiological data based approaches could be integrated with genomics of the patient as well as of the pathogen[47-53]. Implemented in the clinic, machine-learning guided personalized empirical prescription can reduce treatment failure as well as lower the overall use of antibiotics thereby assisting in the global effort of impeding the antibiotic resistance epidemic.

Online Methods

Data.

Anonymized clinical records of urine culture tests (“culture reports”) and records of antibiotic purchases (“purchase reports”) were obtained from Maccabi Health Services (MHS) for the time period from July 2007 to June 2017. Randomly generated patient identifiers were used to link culture reports and antibiotic purchase reports.

Culture reports:

Antibiotic resistance profiling of bacterial pathogens isolated from urine cultures was carried out centrally (in two locations until 2010, and in one central lab since). We retrieved 711,099 culture reports of positive samples from 315,047 patients total (positive samples indicate bacteriuria, and as samples are most often sent for patients presenting symptoms, we consider these samples as representing UTIs). Each report included: (1) Unique patient code; (2) Date of sample; (3) List of isolates cultured with species identification (typically one isolate per sample; 3.6% of samples had more than one isolate); (4) Resistance profile of the isolates from processed results of a VITEK 2 system given as Sensitive, Intermediate and Resistant for each drug tested. We focused on resistance to the 6 antibiotics most commonly prescribed in empirical treatment of these UTIs, with empiric prescription defined as prescription on the same day the sample was taken, excluding any chance of the measurements being available. (N = 6, Supplementary Table 1, Table 1, Ofloxacin resistance was excluded as measurements were not available as of 2013). Resistance to these antibiotics was routinely measured across the 10 year period, except for cephalexin that was only measured as of 2014 (Extended Data Fig. 1). (5) Demographics: age, gender, pregnancy of the patient, as well as identifier of patients residing in retirement homes.

Antibiotic purchase reports:

All drug purchases by prescription are routinely recorded in MHS databases. We identified and retrieved all purchases made by patients with culture reports by converting internal MHS drug codes to ATC classifications of antibiotics (Supplementary Table 1). Each purchase record included: (1) Unique patient code to be linked to the code of the culture record; (2) Internal MHS product code, which was translated to an ATC drug code, (3) Date of purchase.

Choice of drugs for analysis:

We focused on the 11 antibiotic compounds (N =11), most purchased in the dataset (Supplementary Table 1).

Feature definition.

For each urine sample m, we define the following parameters used for the logistic regression and the gradient boosting decision trees:

Sample resistance profile:

For each urine sample m, we define as 0 for sensitive and intermediate and 1 for resistant to antibiotic k (1 ≤ k ≤ N). If the sample had multiple isolates, was assigned 1 if at least one isolate was resistant. Missing resistance measurements are defined as N/A, and for each antibiotic k only samples which have defined resistance to it are used when training or testing its Logistic Regression or Gradient Boosting Decision Trees (GBDT).

Demographics:

: 0/1 for males/females; : 0/1 indicating pregnancy; : 0/1 indicating residence in retirement homes; : 0/1 indicating patient age at time of UTI sampling in group j = 1,2,…,10 standing for 0–10,11–20,…,91–100 years; : date of sample in units of annual quarters starting 2007; : 0/1 indicating the quarter of the sample within the calendar year, with j = 1,2,3,4.

Sample history:

For a given sample, we consider all earlier samples of the same patient (if any). We bin the time difference between any such earlier sample and the current sample, t = t − t (t is negative, designating past events), into one of 16 time bins (i = 1,2,…,16). A bin i is defined by t ≤ t < t, with {t0,…,t16} = −{1,2,4,8,16,24,32,…,112} weeks. Boundary choice in integer number of weeks is important to avoid effects of weekends and of patient preference for a specific week day. Previous samples within one week of the current sample were not included as they likely represent data on the same infection which might not have been available yet to the physician at the time of the second sample). We then calculated and as the number of prior cultures within time bin i, whose resistance equals to 1 or 0 (Resistant or Sensitive), respectively.

Drug purchase history:

For each urine sample, we consider all earlier drug purchases made by the same patient. We bin the time difference between the urine sample date and a given past purchase, = t − t, into 8 logarithmically spaced time bins (i = 1,2,…,8, a bin i is defined by t ≤ t < t, where the boundaries of these time bins are {t0,…,t8} = −{1,2,4,8,…,128} weeks (the logarithmic binning was chosen to increase statistical power at large time differences where purchase density is lower). For each sample, we then calculate as the number of purchases of a given drug j (1 ≤ j ≤ N, Supplementary Table 1) made by the patient during time bin i. For distribution of purchases per these logarithmically spaced bins, see Supplementary Fig. 5.

Cross-resistance:

To resolve direct versus indirect associations of drug purchase and resistance, we adjusted the logistic regression of resistance to a given antibiotic k as a function of past drug purchases by the resistances to all other drugs j which are non-analogous to k. We define A as a binary variable equals 0 and 1 for analogous versus non-analogous drug pairs, respectively. “Analogous” pairs are defined as antibiotics which have exceptionally high cross-resistance (A = 0 for ; we use A = 0.7 which corresponds to drug pairs of the same class; see pairs labeled with ‘x’ in Extended Data Fig. 5). We then add as features for each sample m in the regression analysis of a given antibiotic k the resistance measurements to all antibiotics j for which A = 1. Note: These cross-resistance features provide information from the focal sample and were used only in the analysis of direct/indirect effect of purchases (Fig. 4b) and not for evaluation of resistance predictability.

Logistic regression.

Logistic regression of resistance for each antibiotic was performed via the Matlab glmfit function. For each of the resistances k = 1,2,…,6, the probability of resistance P was fit to the sample resistance Y for all urine samples which had measurement of resistance to k either across the entire 10 year dataset (for Figs. 2,4), or across the “training period” (for the analysis of predictive power of Fig. 5; see Extended Data Fig. 1 for definition of the training period for each of the 6 antibiotics). The different logistic models included combinations of the following 10 terms: Different combination of the above terms were used in the different regression models as follows (each row in the Table represents a logistic model that was applied to each of the 6 antibiotics): See Extended Data Fig. 1 for training range of each of the 6 resistances.

Calculating odds ratios from logistic regression.

For each antibiotic k, odds ratios were calculated from the coefficients of above logistic regressions.

Binary variables:

For the binary variables Gender, Pregnancy and Retirement Home, odds ratio were defined as: female versus male, pregnant versus non-pregnant, retirement home residence versus patients not residing in retirement homes.

Categorical variables:

For the categorical variables Age and Season, odds ratios for each category relative to the reference (age group of of 0–10 years, 4th quarter, respectively) is given by and , where and are reported in Supplementary Table 2. In Fig. 2, we report for Age , with j = 10 standing for the 91–100 year group; and for Season, , with j = 2 standing for the 2nd quarter (most contrast to the reference, which is the 4th quarter).

Quadratic variables:

For Date, which is fitted quadratically, the individual regression coefficients and their CIs are reported in Supplementary Table 2. In Fig. 2b, we also report, for each antibiotic k, effective odds ratios defined as the ratios between the maximal and minimal expected odds taken across the relevant date range of (0 ≤ X ≤ 10 · 4): Note that when these quadratic dependencies are monotonic within the relevant range (0 ≤ x ≤ 1), the above formula becomes simply:

Analysis of “memory” across sample pairs.

To analyze “memory” of resistance across samples, we considered all pairs of samples from the same patient (across all patients with 2–10 samples) and binned them according to their time difference t = t1 − t2 (where t1and t2 are the sample dates of the early and late sample; t is always negative, indicating information on current sample from past samples) into time bins as indicated by the bars in Fig. 3. In each time bin and for each antibiotic, we counted N, N, N, and N as the number of urine sample pairs where the early and late samples are Resistant, or Sensitive (for example N is the number of same-patient sample pairs, within the time difference bin, where first sample is Resistant and the second Sensitive to the given focal antibiotic. For each antibiotic, only samples for which resistance was measured were considered). We then calculated for each time difference bin the risk ratio ζ = [N→/(N→ + N→)]/[N→/(N→+N→)].

Gradient Boosting Decision Trees (GBDT).

GBDT is an ensemble method combining regression trees with weak individual predictive performances, into a single high-performance model. This is done by iteratively fitting decision trees, each iteration targeting the prediction residuals of the preceding tree. The final model is built by combining weighted individual tree contributions, with weights proportional to their performances. For each of the 6 antibiotics, a boosted decision tree ensemble was fitted using all features as defined above (demographics, sample history and drug purchase history) on the training set as defined by the training time period (Extended Data Fig. 1, green bars). This training dataset was sampled to balance resistant/sensitive label frequency. For parameter tuning, a validation dataset was sampled from the training set to be used for model selection (20%). For the estimator of the i iteration, a decreasing learning rate η was used such that η = η0α, with an annealing rate α = 0.99 and an initial learning rate η0 = 0.1. To further promote a diverse ensemble of individual estimators, a 0.9 feature-sampling and observation-sampling rates were used. Fitting of interaction effects is controlled by varying the size of the individual regression trees, with tree estimator of depth k producing models with up to k-way interactions. The model was tuned to match data complexity by iteratively increasing tree depth limit of all ensemble estimators while evaluating performance on the validation set, selecting the best depth for each antibiotic.

Unconstrained algorithm for drug choice.

Given the complete-model machine-learning assigned probabilities of resistance of each same-day empirically treated infection m = 1,2,…,N to each of the antibiotics k = 1,…,N, the unconstrained model simply recommends for each infection, the antibiotic for which the model predicted probability of resistance is lowest. Namely, is defined by .

Constrained (cost-adjusted) algorithm for drug choice.

The constrained, cost-adjusted, algorithm for drug choice takes as input the complete-model machine-learning assigned probabilities of resistance of each same-day empirically treated infection m = 1,2,…,N to each of the antibiotics = 1,…,N, as well as the target total number of uses of each drug (with ). The algorithm needs to return as output the optimal recommended drug treatments for each infection m such that the overall expected rate of mismatched treatment is minimized while the overall usage of each drug (where δ(i, j) = 1 for i = j and 0 otherwise) satisfies for all the antibiotics k. This constrained optimization problem can be solved exactly. First, we adjust the machine-learning model probabilities of resistance to each antibiotic by an additive drug-specific value C accounting for an assigned “cost” of using this drug: . Then, given a set of cost values for all the antibiotics {C}, the recommended antibiotic for each infection m is defined by and given these drug choice for all the infections, we then calculate the overall drug distribution . These drug distribution counts are therefore a function of the cost values n = n({C}). We then numerically solve for the set of cost values for which the drug distribution satisfies . For N = 6, this amounts to numerically solving 6 equations with the 6 C’s as variables (The degeneracy due to is offset by an added normalization Σ C = 0). Once we solved for the cost values , the specific drug recommendations for each infection are defined by with . It is easy to prove mathematically that this solution optimally minimizes risk of resistance given the constraints of total usage of each drug. Let’s assume that there exists an alternative solution which has the same distribution of drug usage but with lower predicted chance of resistance . As the two solutions have the same overall number of uses of each drug, there must exist a set of pairwise swapping steps that transforms the “rec” solution to the “alt” solution, where each step consists of taking two infections m1 and m2 and swapping their recommended prescriptions and (an operation that maintains the same overall use of the drugs). But, given that the recommended prescriptions and are defined by and , swapping them necessarily leads to equal or higher overall probability of mismatched treatment: Therefore, any swap among the set of infections of the drugs recommended by the algorithm leads to increased predicted rate of mismatched treatment. The solution we provide is therefore optimal. Finally, we note that an important added value of this approach is that it also provides the cost values for each of the antibiotics. Namely, given the distribution of antibiotics prescribed by physicians, we can deduce effective cost values that effectively account for the different global considerations physicians take such as ease of use, and tendency to avoid drugs of last resort. Once these cost values are determined, such as based on the one-year test period, they can be used for future algorithmic recommendations of drug prescriptions. Namely, for a given new case with machine-learning probability of resistance P for each of the antibiotics k, the algorithm will simply recommend the antibiotic K for which , where .

Analysis of “Same-Day Empirical Treatments” (SDET).

We identified all cases across the one-year test period where patients purchased one (and only one) of the 6 antibiotics on the same day they had a sample sent for culture and for which resistances to all 6 antibiotics were measured (Same-Day Empirical Treatments, SDET). We then retrospectively annotated each SDET prescription as “matched”, or “unmatched” according to whether the sample was sensitive or resistant to the prescribed antibiotic, respectively. The rate of mismatched treatment was then defined across all of these SDET cases (Fig. 5c), as well as separately across all of the cases treated with a given drug (Fig. 5d, top). A similar analysis was done for the drugs recommended by either the unconstrained or the constrained (cost-adjusted) models (Fig. 5c,d). Mismatch rates were also compared with two models of null expectations. In the “Dice” model, we randomly chose, for each SDET case, one of the 6 drugs with equal probability. In the “Random permutation” model, we randomly permuted across the SDET cases the same overall pool of drugs prescribed by the physicians (thereby maintaining the exact same frequency of use of each of the 6 drugs). For each of these models, we repeated 1000 random simulations and calculated the average mismatched treatment rate (Fig. 5c, horizontal lines).

Statistical significance of mismatched treatment rates.

We performed 10,000 bootstrapping simulations in which we randomly sampled, with replacement, 11,952 cases from the 11,952 SDET cases and calculated for each of these 10,000 simulations the mismatch rate for the prescriptions given by Physicians, the Constrained Machine Learning model (CML), the Unconstrained Machine Learning model (UCML), the Random Permutation model (RP) and the Random Dice model (RD). For each of these 5 models, we report the 95% Confidence Interval of the mismatched treatment rate based on the 2.5th and 97.5th percentile values of the mismatched treatment rate of the specified model across the 10,000 bootstrapping simulations. When comparing two models, we consider the difference between the mismatched treatment rates of the two models for each of the 10,000 simulations. For all reported model comparisons (Physicians-RD, Physicians-RP, UCML-Physicians, and CML-Physicians), the mismatch rate in the first model was lower than the mismatched rate in the second model in virtually all 10,000 bootstrapping simulations (representing P-values lower than 10−4). As an estimate for the P-value, we report the error function based on the average and standard deviation of the difference of mismatch rate between the two models across the 10,000 bootstrapping simulations.

Data availability.

The data that support the findings of this study are available from Maccabi Healthcare Services but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Access to the data is however available upon reasonable request and signing an MTA agreement with Maccabi Healthcare Services.

Code Availability.

Code used for data analysis is available upon request.

Ethical approval.

The study protocol was approved by the ethics committee of Assuta Medical Center, Tel-Aviv, Israel.

Availability of resistance measurements over time.

Frequency of resistance over time.

Odds of resistance as a function of age for different demographic groups.

Odds ratios of resistance to each of the antibiotics for past purchases of different drugs across a range of purchase-to-sample time intervals: adjustments for demographics and cross-resistance.

Correlations among resistances to different antibiotics.

Model performance on test and training data. Area Under Curve (AUC) for Receiver Operator Characteristic for prediction of resistance based on demographics, sample history and purchase history, individually and in a complete model combining all feature sets.

The fraction of samples that can be treated by at least one drug given set thresholds on the single-drug resistance probability scores.

Schematic diagram of ML-trained prescription models.

Robustness of ML-trained prescription models across age and gender and with respect to the clinical definition of resistance.

Display item	Model name	Regression terms	Fit data
Fig. 2bSup. Table 2	Demographics	#1–#6	All data
Sup. Table 2	Gender, unadjusted	#1	All data
Sup. Table 2	Pregnancy, adjusted for gender	#2 and #1	All data
Sup. Table 2	Ret.Home, unadjusted	#3	All data

Sup. Table 2	Age, unadjusted	#4	All data
Sup. Table 2	Date, unadjusted	#5	All data
Sup. Table 2	Season, unadjusted	#6	All data
Fig. 4a,c,dExt. Data Fig. 4a	Purchase history	#8	All data
Ext. Data Fig. 4c	Purchase history, adjusted for demographics	#8 and #1–#6	All data
Fig. 4b,c,dExt. Data Fig. 4b	Purchase history, adjusted for cross-resistance	#8 and #10	All data
Ext. Data Fig. 6	Demographics	#1–#6	Training range*
Ext. Data Fig. 6	Sample history	#7	Training range*
Ext. Data Fig. 6	Purchase history	#8	Training range*
Ext. Data Fig. 6 Fig. 5Ext. Data Fig. 9	Complete	#1–#9	Training range*

See Extended Data Fig. 1 for training range of each of the 6 resistances.

48 in total

Review 1. Urinary tract infections: disease panorama and challenges.

Authors: W E Stamm; S R Norrby
Journal: J Infect Dis Date: 2001-03-01 Impact factor: 5.226

2. The antibiotic resistance crisis: part 1: causes and threats.

Authors: C Lee Ventola
Journal: P T Date: 2015-04

Review 3. Effect of antibiotic prescribing in primary care on antimicrobial resistance in individual patients: systematic review and meta-analysis.

Authors: Céire Costelloe; Chris Metcalfe; Andrew Lovering; David Mant; Alastair D Hay
Journal: BMJ Date: 2010-05-18

Review 4. Update on the antibiotic resistance crisis.

Authors: Gian Maria Rossolini; Fabio Arena; Patrizia Pecile; Simona Pollini
Journal: Curr Opin Pharmacol Date: 2014-09-23 Impact factor: 5.547

5. The effect of vancomycin and third-generation cephalosporins on prevalence of vancomycin-resistant enterococci in 126 U.S. adult intensive care units.

Authors: S K Fridkin; J R Edwards; J M Courval; H Hill; F C Tenover; R Lawton; R P Gaynes; J E McGowan
Journal: Ann Intern Med Date: 2001-08-07 Impact factor: 25.391

6. Bloodstream infections caused by antibiotic-resistant gram-negative bacilli: risk factors for mortality and impact of inappropriate initial antimicrobial therapy on outcome.

Authors: Cheol-In Kang; Sung-Han Kim; Wan Beom Park; Ki-Deok Lee; Hong-Bin Kim; Eui-Chong Kim; Myoung-Don Oh; Kang-Won Choe
Journal: Antimicrob Agents Chemother Date: 2005-02 Impact factor: 5.191

7. Effect of azithromycin and clarithromycin therapy on pharyngeal carriage of macrolide-resistant streptococci in healthy volunteers: a randomised, double-blind, placebo-controlled study.

Authors: Surbhi Malhotra-Kumar; Christine Lammens; Samuel Coenen; Koen Van Herck; Herman Goossens
Journal: Lancet Date: 2007-02-10 Impact factor: 79.321

8. A European study on the relationship between antimicrobial use and antimicrobial resistance.

Authors: Stef L A M Bronzwaer; Otto Cars; Udo Buchholz; Sigvard Mölstad; Wim Goettsch; Irene K Veldhuijzen; Jacob L Kool; Marc J W Sprenger; John E Degener
Journal: Emerg Infect Dis Date: 2002-03 Impact factor: 6.883

9. Initiation of inappropriate antimicrobial therapy results in a fivefold reduction of survival in human septic shock.

Authors: Anand Kumar; Paul Ellis; Yaseen Arabi; Dan Roberts; Bruce Light; Joseph E Parrillo; Peter Dodek; Gordon Wood; Aseem Kumar; David Simon; Cheryl Peters; Muhammad Ahsan; Dan Chateau
Journal: Chest Date: 2009-08-20 Impact factor: 9.410

10. Impact of rapid organism identification via matrix-assisted laser desorption/ionization time-of-flight combined with antimicrobial stewardship team intervention in adult patients with bacteremia and candidemia.

Authors: Angela M Huang; Duane Newton; Anjly Kunapuli; Tejal N Gandhi; Laraine L Washer; Jacqueline Isip; Curtis D Collins; Jerod L Nagel
Journal: Clin Infect Dis Date: 2013-07-29 Impact factor: 9.079

27 in total

1. Temporal encoding of bacterial identity and traits in growth dynamics.

Authors: Carolyn Zhang; Wenchen Song; Helena R Ma; Xiao Peng; Deverick J Anderson; Vance G Fowler; Joshua T Thaden; Minfeng Xiao; Lingchong You
Journal: Proc Natl Acad Sci U S A Date: 2020-08-03 Impact factor: 11.205

2. Quantifying the impact of treatment history on plasmid-mediated resistance evolution in human gut microbiota.

Authors: Burcu Tepekule; Pia Abel Zur Wiesch; Roger D Kouyos; Sebastian Bonhoeffer
Journal: Proc Natl Acad Sci U S A Date: 2019-10-30 Impact factor: 11.205

3. A game theoretic approach reveals that discretizing clinical information can reduce antibiotic misuse.

Authors: Maya Diamant; Shoham Baruch; Eias Kassem; Khitam Muhsen; Dov Samet; Moshe Leshno; Uri Obolski
Journal: Nat Commun Date: 2021-02-19 Impact factor: 14.919

Review 4. Application of Artificial Intelligence in Combating High Antimicrobial Resistance Rates.

Authors: Ali A Rabaan; Saad Alhumaid; Abbas Al Mutair; Mohammed Garout; Yem Abulhamayel; Muhammad A Halwani; Jeehan H Alestad; Ali Al Bshabshe; Tarek Sulaiman; Meshal K AlFonaisan; Tariq Almusawi; Hawra Albayat; Mohammed Alsaeed; Mubarak Alfaresi; Sultan Alotaibi; Yousef N Alhashem; Mohamad-Hani Temsah; Urooj Ali; Naveed Ahmed
Journal: Antibiotics (Basel) Date: 2022-06-08

5. Towards personalized guidelines: using machine-learning algorithms to guide antimicrobial selection.

Authors: Ed Moran; Esther Robinson; Christopher Green; Matt Keeling; Benjamin Collyer
Journal: J Antimicrob Chemother Date: 2020-09-01 Impact factor: 5.790

Review 6. Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research.

Authors: Melis N Anahtar; Jason H Yang; Sanjat Kanjilal
Journal: J Clin Microbiol Date: 2021-06-18 Impact factor: 5.948

7. In Situ Construction of a MgSn(OH)₆ Perovskite/SnO₂ Type-II Heterojunction: A Highly Efficient Photocatalyst towards Photodegradation of Tetracycline.

Authors: Yuanyuan Li; Xiaofang Tian; Yaoqiong Wang; Qimei Yang; Yue Diao; Bin Zhang; Dingfeng Yang
Journal: Nanomaterials (Basel) Date: 2019-12-24 Impact factor: 5.076

8. Mother-to-Neonate Transmission of Antibiotic-Resistant Bacteria: A Cross-Sectional Study.

Authors: Lital Ashtamkar Matok; Maya Azrad; Tamar Leshem; Anan Abuzahya; Thanaa Khamaisi; Tatiana Smolkin; Avi Peretz
Journal: Microorganisms Date: 2021-06-08

9. A decision algorithm to promote outpatient antimicrobial stewardship for uncomplicated urinary tract infection.

Authors: Sanjat Kanjilal; Michael Oberst; Sooraj Boominathan; Helen Zhou; David C Hooper; David Sontag
Journal: Sci Transl Med Date: 2020-11-04 Impact factor: 19.319

10. Escape mutations circumvent a tradeoff between resistance to a beta-lactam and resistance to a beta-lactamase inhibitor.

Authors: Dor Russ; Fabian Glaser; Einat Shaer Tamar; Idan Yelin; Michael Baym; Eric D Kelsic; Claudia Zampaloni; Andreas Haldimann; Roy Kishony
Journal: Nat Commun Date: 2020-04-24 Impact factor: 14.919