| Literature DB >> 35280663 |
Yuhui He1, Panxin Peng2, Wenwei Ying1, Qinwei Wang3, Yan Wang3, Xiankui Liu4, Wenhui Song5, Yue Gao5, Peizhe Li2, Jie Wang1, Weijie Zhu1, Wenzhi Gao1, Xiaofeng Zhou2, Xuesong Li1, Liqun Zhou1.
Abstract
Background: Quick and accurate identification of urinary calculi patients with positive urinary cultures is critical to the choice of the treatment strategy. Predictive models based on machine learning algorithms provide a new way to solve this problem. This study aims to determine the predictive value of machine learning algorithms using a urine culture predictive model based on patients with urinary calculi.Entities:
Keywords: Logistic regression; bacteriuria; early diagnosis; machine learning; urinary calculi
Year: 2022 PMID: 35280663 PMCID: PMC8899151 DOI: 10.21037/tau-21-780
Source DB: PubMed Journal: Transl Androl Urol ISSN: 2223-4683
Figure 1Flowchart of the study. CJFH, China-Japan Friendship Hospital. TFAHZU, The First Affiliated Hospital of Zhengzhou University. TFHCMU, The First Affiliated Hospital of China Medical University. TFCH, Tianjin First Central Hospital. GBDT, gradient boosting decision tree. *, the mid-section urine culture with two or more bacteria or <104 CFU/mL is up to the physician to judge whether the urine culture result is contaminated.
Correlation between urine culture results and main clinical characteristics in 2,054 urinary calculi patients from four clinical centers
| Main clinical characteristic | Number of patients | Urine culture | P | |
|---|---|---|---|---|
| Positive (%) | Negative (%) | |||
| Total | 2,054 | 456 (77.8%) | 1,598 (22.2%) | |
| General information | ||||
| Sex | ||||
| Male | 1,355 | 194 (14.3%) | 1,161 (85.7%) | <0.01* |
| Age | ||||
| <60 | 1,454 | 275 (18.9%) | 1,179 (81.1%) | <0.01* |
| BMI# | ||||
| <23 | 454 | 87 (19.2%) | 367 (80.8%) | 0.08 |
| Past history | ||||
| Hypertension | ||||
| Yes | 599 | 159 (26.5%) | 440 (73.5%) | <0.01* |
| Diabetes | ||||
| Yes | 300 | 81 (27.0%) | 219 (73.0%) | 0.03* |
| Coronary heart disease | ||||
| Yes | 98 | 34 (34.7%) | 64 (65.3%) | <0.01* |
| History of abdominal/pelvic surgery | ||||
| Yes | 260 | 69 (26.5%) | 191 (73.5%) | 0.07 |
| History of cerebrovascular disease | ||||
| Yes | 43 | 9 (21.0%) | 34 (79.1%) | 0.84 |
| Malformation of urinary system | ||||
| Yes | 120 | 30 (25.0%) | 90 (75.0%) | 0.45 |
| Personal history | ||||
| Smoking | ||||
| Yes | 552 | 146 (26.5%) | 406 (73.6%) | <0.01* |
| Drinking | ||||
| Yes | 130 | 27 (20.8%) | 103 (79.2%) | 0.69 |
*, P<0.05. BMI, body mass index; vs., versus; #, BMI was divided according to national surveys to fit Chinese actual situation (20). Chi-square tests.
Figure 2Visualization of models included in this study. (A) Nomogram model to predict the risk of positive urine culture based on a logistic regression algorithm. (B) Visible principle of the algorithm based on adaboost. (C) and (D) are visualizations in parts of leaves based on the random forest and GBDT models. It should be noted that the visual model only shows a part of the leaves or principles of the decision tree and does not represent the entire model. GBDT, gradient boosting decision tree; BACT, bacteriuria; WBC, white blood cell; NIT, nitrite; Cr, creatinine; UA, uric acid; eGRF, estimated glomerular filtration rate; SG, specific gravity.
Figure 3Important features in machine learning models and typical examples of performance predictions. (A) Top 10 important features of each machine learning model. (B) Typical receiver operating characteristic curve in four models in test No. 3 with a ratio of 6:4. (C) Confusion matrix of four models in test No. 3 with a ratio of 6:4. No., number.
Performance of each model on validation data
| Models | AUC | MMC | F1-core | Additive NRI* | Absolute NRI* | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Average | 95% CI | Average | 95% CI | Average | 95% CI | Average | 95% CI | Average | 95% CI | |||||
| Logistic regression | 0.761 | 0.753–0.770 | 0.335 | 0.322–0.348 | 0.501 | 0.490–0.512 | – | – | – | – | ||||
| Random forest | 0.790 | 0.782–0.798 | 0.406 | 0.395–0.417 | 0.530 | 0.521–0.540 | 0.020 | 0.004–0.035 | 0.065 | 0.057–0.065 | ||||
| Adaboost | 0.779 | 0.766–0.791 | 0.385 | 0.369–0.400 | 0.536 | 0.523–0.549 | 0.051 | 0.036–0.159 | 0.023 | 0.016–0.030 | ||||
| GBDT | 0.831 | 0.823–0.840 | 0.460 | 0.446–0.475 | 0.588 | 0.575–0.601 | 0.124 | 0.106–0.142 | 0.065 | 0.060–0.069 | ||||
*, additive NRI and absolute NRI were calculated by comparing machine learning models to the logistic regression model. AUC, area under curve; 95% CI, 95% confidence interval; MMC, the Matthews correlation coefficient; NRI, net reclassification index; GBDT, gradient boosting decision tree.