| Literature DB >> 36241875 |
Shuen-Lin Jeng1, Zi-Jing Huang2, Deng-Chi Yang3, Ching-Hao Teng4,5,6, Ming-Cheng Wang7.
Abstract
Recurrent urinary tract infection (RUTI) can damage renal function and has impact on healthcare costs and patients' quality of life. There were 2 stages for development of prediction models for RUTI. The first stage was a scenario in the clinical visit. The second stage was a scenario after hospitalization for urinary tract infection caused by Escherichia coli. Three machine learning models, logistic regression (LR), decision tree (DT), and random forest (RF) were built for the RUTI prediction. The RF model had higher prediction accuracy than LR and DT (0.700, 0.604, and 0.654 in stage 1, respectively; 0.709, 0.604, and 0.635 in stage 2, respectively). The decision rules constructed by the DT model could provide high classification accuracy (up to 0.92 in stage 1 and 0.94 in stage 2) in certain subgroup patients in different scenarios. In conclusion, this study provided validated machine learning models and RF could provide a better accuracy in predicting the development of single uropathogen (E. coli) RUTI. Both host and bacterial characteristics made important contribution to the development of RUTI in the prediction models in the 2 clinical scenarios, respectively. Based on the results, physicians could take action to prevent the development of RUTI.Entities:
Mesh:
Year: 2022 PMID: 36241875 PMCID: PMC9568612 DOI: 10.1038/s41598-022-18920-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Patient characteristics related to UTI and RUTI (sample size = 963) used in the first stage analysis. The name in the parentheses represents the label of the factor used in the machine learning models. Data are presented with median (interquartile range) or number (percentage). Abbreviations: UTI, urinary tract infection; RUTI, recurrent urinary tract infection; ED, emergency department; WBC, white blood cell; RBC, red blood cell; HPF, high power field.
| Characteristic | UTI ( | RUTI (n = 137) | |
|---|---|---|---|
| Age (year) | 67 (45–78) | 75 (62–81) | < 0.0001 |
| Gender (male) | 208 (25) | 35 (26) | 0.9157 |
| Place of urine sample collection (ED) (Place_of_collection) | 781 (95) | 123 (90) | 0.0513 |
| Diabetes mellitus (Dis1) | 230 (28) | 63 (46) | < 0.0001 |
| Malignancy, exclusion of urogenital malignancy (Dis2) | 117 (14) | 19 (14) | 0.9999 |
| Autoimmune disease (Dis3) | 15 (2) | 1 (1) | 0.7146 |
| Liver cirrhosis (Dis4) | 24 (3) | 12 (9) | 0.0025 |
| Indwelling Foley catheter (Dis5) | 35 (4) | 13 (9) | 0.0172 |
| Obstructive uropathy (Dis6) | 100 (12) | 23 (17) | 0.1299 |
| Urolithiasis (Dis7) | 20 (2) | 4 (3) | 0.7653 |
| Urogenital malignancy (Dis8) | 19 (2) | 6 (4) | 0.1519 |
| Neurogenic bladder (Dis9) | 35 (5) | 22 (16) | < 0.0001 |
| Disease group (four_disease_group) | 154 (18) | 51 (37) | < 0.0001 |
| End stage renal disease (Dis10) | 18 (2) | 4 (3) | 0.5394 |
| Transplantation (Dis11) | 5 (1) | 1 (1) | 0.9999 |
| Stroke (Dis12) | 65 (8) | 13 (9) | 0.5004 |
| Frequency of hospitalization within 2 years (Pre_hos_2y) | 0 (0–2) | 1 (0–3) | < 0.0001 |
| Frequency of ED visit within 2 years (Pre_UTI_ER_2y) | 0 (0–0) | 0 (0–1) | 0.0004 |
| Frequency of UTI within 2 years (Pre_UTI_hos_2y) | 0 (0–1) | 0 (0–2) | < 0.0001 |
| 441 (53) | 59 (43) | 0.0268 | |
| Fever (UTI_symptom1) | 375 (45) | 50 (36) | 0.0629 |
| Dysuria (UTI_symptom2) | 74 (9) | 12 (9) | 0.9999 |
| Painful urination (UTI_symptom3) | 0 (0) | 1 (1) | 0.1423 |
| Frequency (UTI_symptom4) | 72 (9) | 7 (5) | 0.1800 |
| Burning sensation (UTI_symptom5) | 31 (4) | 4 (3) | 0.8070 |
| Low abdominal pain (UTI_symptom6) | 15 (2) | 1 (1) | 0.7146 |
| Flank/back pain (UTI_symptom7) | 63 (8) | 5 (4) | 0.1053 |
| Gross hematuria (UTI_symptom8) | 36 (4) | 5 (4) | 0.8228 |
| Serum creatinine (mg/dL) | 0.8 (0.6–1.2) | 0.96 (0.6–2.0) | 0.0101 |
| Peak blood WBC count (109/L) (BloodWBC) | 10.9 (8.1–14.4) | 10.2 (7.3–13.4) | 0.0761 |
| Urinary bacterial count (0 ~ 4) (UBact) | 2 (1–3) | 2 (1–3) | 0.4160 |
| Urinary WBC/HPF (UWBC_level) | 52 (15–178) | 41 (12–122) | 0.3491 |
| Urinary RBC/HPF (URBC_level) | 5 (1–20) | 5 (1–11) | 0.3540 |
Bacterial characteristics related to UTI and RUTI (sample size = 809) used in the second stage analysis. The name in the parentheses represents the label of the factor used in the machine learning models. Data are presented with number (percentage). Abbreviations: UTI, urinary tract infection; RUTI, recurrent urinary tract infection.
| Characteristic | UTI ( | RUTI ( | |
|---|---|---|---|
| 0.1308 | |||
| A (1) | 57 (13) | 12 (14) | |
| B1 (2) | 39 (9) | 11 (13) | |
| B2 (3) | 251 (58) | 38 (45) | |
| D (4) | 89 (20) | 24 (28) | |
| 0.0004 | |||
| 154 (33) | 11 (13) | ||
| 0.8645 | |||
| 62 (13) | 12 (14) | ||
| 0.0028 | |||
| 197 (43) | 22 (26) | ||
| 0.7995 | |||
| 25 (5) | 5 (6) | ||
| 0.0674 | |||
| 36 (8) | 2 (2) | ||
| 0.2505 | |||
| 74 (16) | 9 (10) | ||
| 0.6245 | |||
| 299 (65) | 53 (62) | ||
| 0.0234 | |||
| 282 (61) | 41 (48) | ||
| 0.0697 | |||
| 183 (40) | 25 (29) | ||
| 0.0169 | |||
| 346 (75) | 53 (62) | ||
| 0.6344 | |||
| 272 (59) | 48 (56) | ||
| 0.0660 | |||
| 173 (37) | 23 (27) | ||
| 0.3323 | |||
| 436 (94) | 79 (92) | ||
| 0.0782 | |||
| 98 (21) | 11 (13) | ||
| 0.0341 | |||
| 159 (37) | 21 (25) | ||
| 0.1741 | |||
| 120 (26) | 16 (19) |
Antimicrobial susceptibility of bacterial pathogens related to UTI and RUTI (sample size = 809) used in the second stage analysis. The name in the parentheses represents the label of the factor used in the machine learning models. Data are presented with number (percentage). Abbreviations: UTI, urinary tract infection; RUTI, recurrent urinary tract infection. S, susceptible; I, intermediate; R, resistant.
| Antimicrobial susceptibility | UTI ( | RUTI ( | |
|---|---|---|---|
| 0.2392 | |||
| S | 126 (20) | 13 (13) | |
| I | 3 (0) | 0 (0) | |
| R | 509 (80) | 85 (87) | |
| 0.0838 | |||
| S | 277 (61) | 35 (50) | |
| I | 52 (11) | 6 (9) | |
| R | 125 (28) | 28 (41) | |
| 0.0003 | |||
| S | 407 (64) | 44 (45) | |
| I | 18 (3) | 1 (1) | |
| R | 213 (33) | 53 (54) | |
| 0.0058 | |||
| S | 430 (68) | 50 (52) | |
| I | 22 (4) | 4 (4) | |
| R | 176 (28) | 42 (44) | |
| 0.4653 | |||
| S | 350 (77) | 49 (72) | |
| I | 16 (4) | 2 (3) | |
| R | 84 (19) | 17 (25) | |
| 0.0004 | |||
| S | 435 (68) | 45 (46) | |
| I | 5 (1) | 1 (1) | |
| R | 197 (31) | 52 (53) | |
| 0.0897 | |||
| S | 311 (69) | 38 (56) | |
| I | 62 (14) | 12 (18) | |
| R | 78 (17) | 18 (26) | |
| 0.0807 | |||
| S | 559 (88) | 78 (80) | |
| I | 14 (2) | 3 (3) | |
| R | 64 (10) | 17 (17) | |
| 0.1509 | |||
| S | 621 (100) | 93 (98) | |
| I | 0 (0) | 0 (0) | |
| R | 2 (0) | 2 (2) | |
| 0.0994 | |||
| S | 420 (66) | 54 (55) | |
| I | 13 (2) | 2 (2) | |
| R | 203 (32) | 42 (43) | |
| 0.3417 | |||
| S | 622 (98) | 94 (97) | |
| I | 8 (1) | 1 (1) | |
| R | 4 (1) | 2 (2) | |
| 0.0010 | |||
| S | 403 (63) | 43 (44) | |
| I | 6 (1) | 2 (2) | |
| R | 227 (36) | 53 (54) | |
| 0.0840 | |||
| S | 281 (45) | 33 (35) | |
| I | 7 (1) | 0 (0) | |
| R | 337 (54) | 62 (65) |
Comparison of the performance in RUTI prediction models in the clinical visit through fivefold cross validation (sample size = 963). Abbreviation: RUTI, recurrent urinary tract infection.
| Algorithm | Accuracy | Sensitivity | Specificity | |||
|---|---|---|---|---|---|---|
| Mean | Standard deviation | Mean | Standard deviation | Mean | Standard deviation | |
| Logistic regression | 0.604 | 0.044 | 0.648 | 0.117 | 0.597 | 0.044 |
| Decision tree | 0.654 | 0.020 | 0.618 | 0.058 | 0.660 | 0.023 |
| Random forest | 0.700 | 0.039 | 0.626 | 0.131 | 0.712 | 0.046 |
Figure 1Variable importance plot of the first stage RF analysis in percentage of mean decrease accuracy for the factors. It shows that age, cirrhosis (Dis4), diabetes mellitus (Dis1), and disease group (four_disease_group) are the most important 4 factors to predict recurrence in the clinical visit (sample size = 963).
Figure 2The decision rules of the DT analysis for development of RUTI in the clinical visit. (sample size = 963). The 2 green boxes and 1 red box indicate the nodes of the decision rules with an accuracy rate higher than 0.85 and 0.70 for non RUTI and RUTI classification, respectively.
Comparison of the performance in RUTI prediction models after hospitalization for UTI through fivefold cross validation (sample size = 809). Abbreviations: UTI, urinary tract infection; RUTI, recurrent urinary tract infection.
| Algorithm | Accuracy | Sensitivity | Specificity | |||
|---|---|---|---|---|---|---|
| Mean | Standard deviation | Mean | Standard deviation | Mean | Standard deviation | |
| Logistic regression | 0.604 | 0.026 | 0.590 | 0.065 | 0.605 | 0.034 |
| Decision tree | 0.635 | 0.052 | 0.600 | 0.061 | 0.640 | 0.057 |
| Random forest | 0.709 | 0.047 | 0.620 | 0.057 | 0.722 | 0.058 |
Figure 3Variable importance plot of the second stage RF analysis in percentage of mean decrease accuracy for the factors. It shows that cefixime (Anti7), afa (Gene11), usp (Gene8), and cefazolin (Anti5) are important factors to predict recurrence after hospitalization (sample size = 809).
Figure 4The decision rules of the DT analysis for development of RUTI after hospitalization. The 4 green boxes and 3 red boxes indicate the nodes of the decision rules with an accuracy rate higher than 0.85 and 0.70 for non RUTI and RUTI classification, respectively (sample size = 809).