| Literature DB >> 35054269 |
Kyoung Hwa Lee1, Jae June Dong2, Subin Kim1, Dayeong Kim1, Jong Hoon Hyun1, Myeong-Hun Chae3, Byeong Soo Lee3, Young Goo Song1.
Abstract
Early detection of bacteremia is important to prevent antibiotic abuse. Therefore, we aimed to develop a clinically applicable bacteremia prediction model using machine learning technology. Data from two tertiary medical centers' electronic medical records during a 12-year-period were extracted. Multi-layer perceptron (MLP), random forest, and gradient boosting algorithms were applied for machine learning analysis. Clinical data within 12 and 24 hours of blood culture were analyzed and compared. Out of 622,771 blood cultures, 38,752 episodes of bacteremia were identified. In MLP with 128 hidden layer nodes, the area under the receiver operating characteristic curve (AUROC) of the prediction performance in 12- and 24-h data models was 0.762 (95% confidence interval (CI); 0.7617-0.7623) and 0.753 (95% CI; 0.7520-0.7529), respectively. AUROC of causative-pathogen subgroup analysis predictive value for Acinetobacter baumannii bacteremia was the highest at 0.839 (95% CI; 0.8388-0.8394). Compared to primary bacteremia, AUROC of sepsis caused by pneumonia was highest. Predictive performance of bacteremia was superior in younger age groups. Bacteremia prediction using machine learning technology appeared possible for acute infectious diseases. This model was more suitable especially to pneumonia caused by Acinetobacter baumannii. From the 24-h blood culture data, bacteremia was predictable by substituting only the continuously variable values.Entities:
Keywords: bacteremia; data extraction time; machine learning; prediction
Year: 2022 PMID: 35054269 PMCID: PMC8774637 DOI: 10.3390/diagnostics12010102
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Figure 1Flow chart of study population.
Performance of predicting bacteremia using various machine learning methods.
| Type of Data | Model | AUROC (95% CI) | Sensitivity | Specificity |
|---|---|---|---|---|
| 12 h | MLP | 0.762 (0.7617–0.7623) | 0.695 | 0.706 |
| Random Forest | 0.758 (0.7572–0.7591) | 0.664 | 0.723 | |
| XGBoost (Gbtree) | 0.745 (0.7446–0.7455) | 0.629 | 0.747 | |
| XGBoost (DART) | 0.744 (0.7439–0.7446) | 0.638 | 0.747 | |
| 24 h | MLP | 0.753 (0.7520–0.7529) | 0.602 | 0.730 |
| Random Forest | 0.738 (0.7383–0.7401) | 0.643 | 0.729 | |
| XGBoost (Gbtree) | 0.730 (0.7300–0.7304) | 0.607 | 0.729 | |
| XGBoost (DART) | 0.727 (0.7256–0.7275) | 0.602 | 0.702 |
Clinical data were extracted within 12 or 24 h of onset of blood culture. Abbreviations: AUROC, area under the receiver operating characteristic curve; CI, confidence interval; MLP, multi-layer perceptron; DART, Dropouts meet multiple Additive Regression Trees.
Influence ranking of clinical variables to bacteremia prediction.
| Rank | Data Fusion within 12-h | Data Fusion within 24-h |
|---|---|---|
| 1 | Monocyte | Monocyte |
| 2 | Platelet | Neutrophil |
| 3 | Hospital stay * | Platelet |
| 4 | Neutrophil | Albumin |
| 5 | T. bilirubin | ALP |
| 6 | BUN | T. bilirubin |
| 7 | Albumin | tCO2 |
| 8 | tCO2 | BUN |
| 9 | AST | Hospital stay * |
| 10 | ALP | CRP |
| 11 | ALT | Total Protein |
| 12 | White blood cell count | Creatinine |
| 13 | Chloride | ALT |
| 14 | aPTT | Pulse rate |
| 15 | Total Protein | Prothrombin time |
| 16 | Pulse rate | Hemoglobin |
| 17 | Respiratory rate | AST |
| 18 | DBP | Sodium |
| 19 | Creatinine | Chloride |
| 20 | CRP | ESR |
* Hospital stay refers to the length of hospitalization from first day of admission to the time of blood culture tests of the study subjects. Clinical data were integrated based on blood culture time points. Abbreviations: T. bilirubin, total bilirubin; ALP, alkaline phosphatase; BUN, blood urea nitrogen; AST, aspartate transaminase; CRP, C-reactive protein; ALT, alanine transaminase; aPTT, activated partial thromboplastin time; DBP, diastolic blood pressure; ESR, erythrocyte sedimentation rate.
Subgroup analysis of bacteremia prediction according to causing pathogen, infection site, age, and sex.
| Type of Data | Subgroup | With Bacteremia | Without Bacteremia | AUROC (95% CI) | Sensitivity | Specificity | |
|---|---|---|---|---|---|---|---|
| 12 h | Pathogen |
| 1805 | 14,068 | 0.794 (0.7928–0.7946) | 0.693 | 0.766 |
|
| 1827 | 14,068 | 0.672 (0.6717–0.6741) | 0.656 | 0.618 | ||
|
| 1518 | 14,068 | 0.763 (0.7616–0.7658) | 0.677 | 0.716 | ||
|
| 1727 | 14,068 | 0.839 (0.8388–0.8394) | 0.789 | 0.750 | ||
|
| 855 | 14,068 | 0.729 (0.7278–0.7331) | 0.611 | 0.706 | ||
| Infection site | Urine | 2202 | 21,540 | 0.749 (0.7485–0.7504) | 0.642 | 0.725 | |
| Sputum | 3051 | 21,540 | 0.822 (0.8217–0.8229) | 0.792 | 0.715 | ||
| Bile | 650 | 21,540 | 0.775 (0.7742–0.7764) | 0.739 | 0.684 | ||
| Primary bacteremia | 557 | 21,540 | 0.561 (0.5572–0.5651) | 0.636 | 0.473 | ||
| Age | 18–39 years | 1377 | 3885 | 0.781 (0.7780–0.7868) | 0.3923 | 0.8898 | |
| 40–59 years | 5141 | 8359 | 0.718 (0.7152–0.7190) | 0.5818 | 0.7382 | ||
| 60–80 years | 8936 | 14,862 | 0.761 (0.7594–0.7622) | 0.7405 | 0.6597 | ||
| Sex | Male | 9416 | 117,535 | 0.763 (0.7630–0.7636) | 0.710 | 0.694 | |
| Female | 6038 | 81,236 | 0.759 (0.7586–0.7600) | 0.670 | 0.724 | ||
| 24 h | Pathogen |
| 2771 | 22,114 | 0.753 (0.7523–0.7353) | 0.639 | 0.738 |
|
| 2949 | 22,114 | 0.737 (0.7354–0.7376) | 0.668 | 0.684 | ||
|
| 2376 | 22,114 | 0.778 (0.7777–0.7795) | 0.706 | 0.730 | ||
|
| 2621 | 22,114 | 0.840 (0.8400–0.8407) | 0.817 | 0.747 | ||
|
| 1280 | 22,114 | 0.718 (0.7170–0.7209) | 0.688 | 0.661 | ||
| Infection site | Urine | 3489 | 34,502 | 0.787 (0.7860–0.7878) | 0.751 | 0.689 | |
| Sputum | 4700 | 34,502 | 0.805 (0.8041–0.8052) | 0.670 | 0.756 | ||
| Bile | 1022 | 34,502 | 0.792 (0.7910–0.7936) | 0.659 | 0.740 | ||
| Primary bacteremia | 987 | 7169 | 0.583 (0.5823–0.5855) | 0.517 | 0.616 | ||
| Age | 18–39 years | 413 | 5712 | 0.758 (0.7554–0.7648) | 0.463 | 0.845 | |
| 40–59 years | 1415 | 12,415 | 0.719 (0.7176–0.7214) | 0.566 | 0.749 | ||
| 60–80 years | 3026 | 21,154 | 0.723 (0.7221–0.7238) | 0.560 | 0.713 | ||
| Sex | Male | 14,585 | 241,478 | 0.748 (0.7473–0.7481) | 0.673 | 0.707 | |
| Female | 10,030 | 164,521 | 0.760 (0.7597–0.7608) | 0.619 | 0.760 | ||
| Merge | Under 12 h | 323,949 | 20,608 | 0.726 (0.7261–0.7266) | 0.631 | 0.705 | |
| Over 12 h | 82,050 | 3957 | 0.726 (0.7253–0. 7258) | 0.653 | 0.679 | ||
Clinical data were integrated based on blood culture time points. Abbreviations: AUC, area under the receiver operating characteristic curve; CI, confidence interval; E. coli, Escherichia coli; S. aureus, Staphylococcus aureus; K. pneumoniae, Klebsiella pneumonia; A. baumannii, Acinetobacter baumannii; P. aeruginosa, Pseudomonas aeruginosa.
Figure 2Area under the receiver operating characteristic curve of the bacteremia prediction. (A) Type of pathogen (12-h vs. 24-h model); (B) Site of infection (12-h vs. 24-h model); (C) Age (12-h vs. 24-h model); (D) Sex (12-h vs. 24-h model); (E) Merge hour (24-h model).