| Literature DB >> 35845426 |
Zhixiao Xu1, Kun Guo1, Weiwei Chu1, Jingwen Lou1, Chengshui Chen1,2.
Abstract
Background: The ability to assess adverse outcomes in patients with community-acquired pneumonia (CAP) could improve clinical decision-making to enhance clinical practice, but the studies remain insufficient, and similarly, few machine learning (ML) models have been developed. Objective: We aimed to explore the effectiveness of predicting adverse outcomes in CAP through ML models.Entities:
Keywords: CAP; XGBoost; adverse outcomes; community-acquired pneumonia; machine learning
Year: 2022 PMID: 35845426 PMCID: PMC9278327 DOI: 10.3389/fbioe.2022.903426
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Characteristics of the dataset used for machine learning before imputation.
| All |
| |
|---|---|---|
|
| ||
| Clinical signs | ||
| Cough (%) | 2,302 | |
| No | 146 (6.34%) | |
| Yes | 2,156 (93.7%) | |
| Dyspnea, tachypnoea, or hypoxemia (%) | 2,302 | |
| No | 543 (23.6%) | |
| Yes | 1,759 (76.4%) | |
| Fever or hypothermia (%) | 2,302 | |
| No | 805 (35.0%) | |
| Yes | 1,497 (65.0%) | |
| Clinical characteristics | ||
| Age (year) (mean (SD)) | 63.3 (19.7) | 2,302 |
| Respiratory frequency (/min) (mean (SD)) | 28.0 (20.3) | 2,100 |
| Heart rate (/min) (mean (SD)) | 91.1 (16.5) | 2,125 |
| Systolic blood pressure (SBP) (mmHg) (mean (SD)) | 120 (24.5) | 2,134 |
| Diastolic blood pressure (DBP) (mmHg) (mean (SD)) | 73.6 (16.5) | 2,133 |
| CURB-65 (%) | 1,449 | |
| Uncertain/Unknown | 130 (8.97%) | |
| 1 | 280 (19.3%) | |
| 2 | 562 (38.8%) | |
| 3 | 386 (26.6%) | |
| 4 | 58 (4.00%) | |
| 5 | 33 (2.28%) | |
| Laboratory tests | ||
| Hematocrit values (%) (mean (SD)) | 37.4 (11.0) | 1,790 |
| Hemoglobin values (g/dl) (mean (SD)) | 12.6 (7.02) | 1,765 |
| Leukocytes values (10^9/L) (mean (SD)) | 14.6 (76.2) | 1,811 |
| Segmented neutrophils values (%) (mean (SD)) | 79.1 (12.8) | 1,673 |
| Platelet values (10^9/L) (mean (SD)) | 879 (23,412) | 1,377 |
| Creatinine (mg/dl) (mean (SD)) | 1.56 (4.55) | 1,434 |
| BUN (mg/dl) (mean (SD)) | 53.0 (38.8) | 1,535 |
| Glucose (mg/dl) (mean (SD)) | 137 (76.0) | 1,377 |
| Comorbidity | ||
| Chronic obstructive pulmonary disease (COPD) (%) | 2,302 | |
| Uncertain/unknown | 59 (2.56%) | |
| No | 1,898 (82.5%) | |
| Yes | 345 (15.0%) | |
| Heart disease (%) | 2,302 | |
| Uncertain/unknown | 25 (1.09%) | |
| No | 1,288 (56.0%) | |
| Yes | 989 (43.0%) | |
| Diabetes (%) | 2,302 | |
| Uncertain/unknown | 16 (0.70%) | |
| No | 1,924 (83.6%) | |
| Yes | 362 (15.7%) | |
| Immunosuppression (%) | 2,302 | |
| Uncertain/unknown | 12 (0.52%) | |
| No | 2,151 (93.4%) | |
| Yes | 139 (6.04%) | |
| Malignancy (%) | 2,302 | |
| Uncertain/unknown | 12 (0.52%) | |
| No | 2,171 (94.3%) | |
| Yes | 119 (5.17%) | |
| Cerebrovascular disease (CBVD) (%) | 2,302 | |
| Uncertain/unknown | 15 (0.65%) | |
| No | 2,126 (92.4%) | |
| Yes | 161 (6.99%) | |
| Kidney disease (%) | 2,302 | |
| Uncertain/unknown | 15 (0.65%) | |
| No | 2,130 (92.5%) | |
| Yes | 157 (6.82%) | |
| Liver disease (%) | 2,302 | |
| Uncertain/unknown | 9 (0.39%) | |
| No | 2,234 (97.0%) | |
| Yes | 59 (2.56%) | |
| Intravenous drug use (%) | 2,302 | |
| Uncertain/unknown | 8 (0.35%) | |
| No | 2,284 (99.2%) | |
| Yes | 10 (0.43%) | |
| Alcoholism | 2,302 | |
| Uncertain/unknown | 30 (1.30%) | |
| No | 2,134 (92.7%) | |
| Yes | 138 (5.99%) | |
| Neurological psychiatric disorder (%) | 2,302 | |
| Uncertain/unknown | 33 (1.43%) | |
| No | 1,927 (83.7%) | |
| Yes | 342 (14.9%) | |
| Suspected aspiration (%) | 2,302 | |
| Uncertain/unknown | 20 (0.87%) | |
| No | 2,223 (96.6%) | |
| Yes | 59 (2.56%) | |
| Hospitalization due to CAP in previous year (%) | 2,302 | |
| Uncertain/unknown | 9 (0.39%) | |
| No | 2,004 (87.1%) | |
| Yes | 289 (12.6%) | |
| Overcrowding (%) | 2,302 | |
| Uncertain/unknown | 39 (1.69%) | |
| No | 2,211 (96.0%) | |
| Yes | 52 (2.26%) | |
| Smoking (%) | 2,302 | |
| Uncertain/unknown | 175 (7.60%) | |
| No | 1,284 (55.8%) | |
| Yes | 843 (36.6%) | |
| Received flu shot in the last 12 months (%) | 2,302 | |
| Uncertain/unknown | 34 (1.48%) | |
| No | 1,559 (67.7%) | |
| Yes | 709 (30.8%) | |
| Received antipneumococcic vaccine at any given time (%) | 2,302 | |
| Uncertain/unknown | 30 (1.30%) | |
| No | 1,869 (81.2%) | |
| Yes | 403 (17.5%) | |
| Outcomes | ||
| Hospital admission (%) | 2,302 | |
| Uncertain/unknown | 2 (0.09%) | |
| No | 735 (31.9%) | |
| Yes | 1,565 (68.0%) | |
| Death (%) | 2,302 | |
| Uncertain/unknown | 21 (0.91%) | |
| No | 2,004 (87.1%) | |
| Yes | 277 (12.0%) | |
| ICU admission (%) | 2,302 | |
| Uncertain/unknown | 72 (3.13%) | |
| No | 1,887 (82.0%) | |
| Yes | 343 (14.9%) | |
| One-year post-enrollment status (%) | 2,302 | |
| Uncertain/unknown | 144 (6.26%) | |
| No | 537 (23.3%) | |
| Yes | 1,621 (70.4%) | |
Patient characteristics according to hospital admission stratification.
| Hospital admission |
| ||
|---|---|---|---|
| No | Yes | ||
|
|
| ||
| Cough (%) | 0.001 | ||
| No | 28 (3.81%) | 118 (7.54%) | |
| Yes | 707 (96.2%) | 1,447 (92.5%) | |
| Dyspnea, tachypnoea, or hypoxemia (%) | <0.001 | ||
| No | 250 (34.0%) | 292 (18.7%) | |
| Yes | 485 (66.0%) | 1,273 (81.3%) | |
| Fever or hypothermia (%) | <0.001 | ||
| No | 205 (27.9%) | 600 (38.3%) | |
| Yes | 530 (72.1%) | 965 (61.7%) | |
| Age (year) (mean (SD)) | 54.7 (19.6) | 67.4 (18.4) | <0.001 |
| Respiratory frequency (/min) (mean (SD)) | 29.6 (23.9) | 27.3 (16.7) | 0.019 |
| Heart rate (/min) (mean (SD)) | 89.0 (12.1) | 92.0 (17.3) | <0.001 |
| SBP (mmHg) (mean (SD)) | 120 (18.2) | 120 (25.7) | 0.944 |
| DBP (mmHg) (mean (SD)) | 77.0 (16.6) | 72.0 (15.2) | <0.001 |
| CURB-65 (mean (SD)) | 2.16 (0.36) | 2.28 (0.80) | <0.001 |
| Hematocrit values (%) (mean (SD)) | 38.1 (3.23) | 37.1 (11.5) | 0.001 |
| Hemoglobin values (g/dl) (mean (SD)) | 13.2 (7.72) | 12.4 (5.23) | 0.005 |
| Leukocytes values (10^9/L) (mean (SD)) | 13.5 (3.37) | 15.1 (82.0) | 0.425 |
| Segmented neutrophils values (%) (mean (SD)) | 79.2 (5.72) | 79.1 (12.6) | 0.796 |
| Platelet values (10^9/L) (mean (SD)) | 1,861 (32,030) | 418 (296) | 0.222 |
| Creatinine (mg/dl) (mean (SD)) | 1.48 (0.54) | 1.60 (4.34) | 0.257 |
| BUN (mg/dl) (mean (SD)) | 50.1 (15.0) | 54.3 (37.0) | <0.001 |
| Glucose (mg/dl) (mean (SD)) | 134 (30.5) | 138 (68.1) | 0.105 |
| COPD (%) | <0.001 | ||
| Uncertain/unknown | 19 (2.59%) | 39 (2.49%) | |
| No | 645 (87.8%) | 1,252 (80.0%) | |
| Yes | 71 (9.66%) | 274 (17.5%) | |
| Heart disease (%) | <0.001 | ||
| Uncertain/unknown | 2 (0.27%) | 23 (1.47%) | |
| No | 508 (69.1%) | 780 (49.8%) | |
| Yes | 225 (30.6%) | 762 (48.7%) | |
| Diabetes (%) | <0.001 | ||
| Uncertain/unknown | 6 (0.82%) | 10 (0.64%) | |
| No | 662 (90.1%) | 1,260 (80.5%) | |
| Yes | 67 (9.12%) | 295 (18.8%) | |
| Immunosuppression (%) | 0.001 | ||
| Uncertain/unknown | 2 (0.27%) | 10 (0.64%) | |
| No | 707 (96.2%) | 1,442 (92.1%) | |
| Yes | 26 (3.54%) | 113 (7.22%) | |
| Malignancy (%) | <0.001 | ||
| Uncertain/unknown | 5 (0.68%) | 7 (0.45%) | |
| No | 717 (97.6%) | 1,452 (92.8%) | |
| Yes | 13 (1.77%) | 106 (6.77%) | |
| CBVD (%) | <0.001 | ||
| Uncertain/unknown | 3 (0.41%) | 12 (0.77%) | |
| No | 712 (96.9%) | 1,412 (90.2%) | |
| Yes | 20 (2.72%) | 141 (9.01%) | |
| Kidney disease (%) | <0.001 | ||
| Uncertain/unknown | 6 (0.82%) | 9 (0.58%) | |
| No | 708 (96.3%) | 1,421 (90.8%) | |
| Yes | 21 (2.86%) | 135 (8.63%) | |
| Liver disease (%) | 0.009 | ||
| Uncertain/unknown | 5 (0.68%) | 4 (0.26%) | |
| No | 720 (98.0%) | 1,512 (96.6%) | |
| Yes | 10 (1.36%) | 49 (3.13%) | |
| Intravenous drug use (%) | 0.252 | ||
| Uncertain/unknown | 4 (0.54%) | 3 (0.19%) | |
| No | 729 (99.2%) | 1,554 (99.3%) | |
| Yes | 2 (0.27%) | 8 (0.51%) | |
| Alcoholism (%) | 0.001 | ||
| Uncertain/unknown | 7 (0.95%) | 23 (1.47%) | |
| No | 703 (95.6%) | 1,429 (91.3%) | |
| Yes | 25 (3.40%) | 113 (7.22%) | |
| Neurological psychiatric disorder (%) | <0.001 | ||
| Uncertain/unknown | 5 (0.68%) | 28 (1.79%) | |
| No | 699 (95.1%) | 1,227 (78.4%) | |
| Yes | 31 (4.22%) | 310 (19.8%) | |
| Suspected aspiration (%) | <0.001 | ||
| Uncertain/unknown | 4 (0.54%) | 16 (1.02%) | |
| No | 729 (99.2%) | 1,492 (95.3%) | |
| Yes | 2 (0.27%) | 57 (3.64%) | |
| Hospitalization due to CAP in previous year (%) | <0.001 | ||
| Uncertain/unknown | 3 (0.41%) | 6 (0.38%) | |
| No | 685 (93.2%) | 1,318 (84.2%) | |
| Yes | 47 (6.39%) | 241 (15.4%) | |
| Overcrowding (%) | 0.797 | ||
| Uncertain/unknown | 14 (1.90%) | 24 (1.53%) | |
| No | 705 (95.9%) | 1,505 (96.2%) | |
| Yes | 16 (2.18%) | 36 (2.30%) | |
| Smoking (%) | 0.009 | ||
| Uncertain/unknown | 39 (5.31%) | 136 (8.69%) | |
| No | 432 (58.8%) | 850 (54.3%) | |
| Yes | 264 (35.9%) | 579 (37.0%) | |
| Received flu shot in the last 12 months (%) | 0.092 | ||
| Uncertain/unknown | 5 (0.68%) | 29 (1.85%) | |
| No | 504 (68.6%) | 1,054 (67.3%) | |
| Yes | 226 (30.7%) | 482 (30.8%) | |
| Received antipneumococcic vaccine at any given time (%) | <0.001 | ||
| Uncertain/unknown | 4 (0.54%) | 26 (1.66%) | |
| No | 633 (86.1%) | 1,234 (78.8%) | |
| Yes | 98 (13.3%) | 305 (19.5%) | |
Diagnostic accuracy for the nine machine learning algorithms with the test dataset for the prediction of hospital admission and ICU admission.
| Hospital admission | ICU admission | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy score | Precision score | Recall score | F1-score | AUC | Accuracy score | Precision score | Recall score | F1-score | AUC | |
| Ridge | 0.799 | 0.816 | 0.902 | 0.857 | 0.836 | 0.846 | 0.444 | 0.039 | 0.072 | 0.711 |
| DT | 0.854 | 0.857 | 0.937 | 0.895 | 0.892 | 0.848 | 0.500 | 0.098 | 0.164 | 0.745 |
| RF | 0.857 | 0.848 | 0.957 | 0.899 | 0.912 | 0.846 | 0.455 | 0.049 | 0.088 | 0.793 |
| XGB | 0.877 | 0.879 | 0.946 | 0.911 | 0.921 | 0.846 | 0.478 | 0.108 | 0.176 | 0.801 |
| KNN | 0.761 | 0.900 | 0.722 | 0.801 | 0.871 | 0.854 | 0.667 | 0.078 | 0.140 | 0.660 |
| NN | 0.832 | 0.856 | 0.900 | 0.877 | 0.883 | 0.839 | 0.312 | 0.049 | 0.085 | 0.694 |
| SVM | 0.810 | 0.833 | 0.896 | 0.863 | 0.861 | 0.854 | 0.833 | 0.049 | 0.093 | 0.759 |
| NB | 0.716 | 0.884 | 0.662 | 0.757 | 0.851 | 0.205 | 0.157 | 0.961 | 0.269 | 0.707 |
| LR | 0.778 | 0.822 | 0.852 | 0.837 | 0.817 | 0.831 | 0.359 | 0.137 | 0.199 | 0.686 |
DT, KNN, LR, NB, NN, RF, Ridge; SVM, and XGB represented decision tree, K-nearest neighbors, logistic regression without penalization, Naive Bayes, neural network, random forest, ridge regression, support vector machine, and eXtreme gradient boosting respectively.
FIGURE 1Test accuracy of the nine machine learning algorithms for the prediction of hospital admission, ICU admission, death and one-year enrollment status.
Diagnostic accuracy for the nine machine learning algorithms with the test dataset for the prediction of death and one-year post-enrollment status.
| Death | One-year post-enrollment status | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy score | Precision score | Recall score | F1-score | AUC | Accuracy score | Precision score | Recall score | F1-score | AUC | |
| Ridge | 0.892 | 0.447 | 0.243 | 0.315 | 0.817 | 0.802 | 0.814 | 0.955 | 0.879 | 0.802 |
| DT | 0.876 | 0.368 | 0.300 | 0.331 | 0.807 | 0.775 | 0.801 | 0.930 | 0.861 | 0.749 |
| RF | 0.899 | 0.524 | 0.157 | 0.242 | 0.791 | 0.802 | 0.819 | 0.944 | 0.877 | 0.825 |
| XGB | 0.896 | 0.480 | 0.171 | 0.253 | 0.825 | 0.816 | 0.844 | 0.926 | 0.883 | 0.837 |
| KNN | 0.892 | 0.375 | 0.086 | 0.140 | 0.701 | 0.787 | 0.798 | 0.959 | 0.871 | 0.749 |
| NN | 0.893 | 0.465 | 0.286 | 0.354 | 0.831 | 0.810 | 0.843 | 0.918 | 0.879 | 0.810 |
| SVM | 0.895 | 0.469 | 0.214 | 0.294 | 0.763 | 0.804 | 0.827 | 0.934 | 0.877 | 0.802 |
| NB | 0.705 | 0.203 | 0.643 | 0.308 | 0.723 | 0.736 | 0.858 | 0.775 | 0.815 | 0.771 |
| LR | 0.896 | 0.488 | 0.286 | 0.360 | 0.789 | 0.804 | 0.842 | 0.909 | 0.874 | 0.808 |
FIGURE 2Feature importance plot for the eXtreme gradient boosting algorithm using test data for the prediction of hospital admission, ICU admission, death and one-year post-enrollment status.