| Literature DB >> 34349576 |
Jiru Ye1, Meng Hua2, Feng Zhu2.
Abstract
BACKGROUND: It is very important to determine the risk of patients developing severe or critical COVID-19, but most of the existing risk prediction models are established using conventional regression models. We aim to use machine learning algorithms to develop predictive models and compare predictive performance with logistic regression models.Entities:
Keywords: COVID-19; high sensitivity C-reactive protein; machine learning; prediction model; procalcitonin
Year: 2021 PMID: 34349576 PMCID: PMC8328384 DOI: 10.2147/RMHP.S318265
Source DB: PubMed Journal: Risk Manag Healthc Policy ISSN: 1179-1594
Comparison of the General Data Between the Two Groups
| Classification | Asymptomatic-Moderate Group | Severe and Above Group | |
|---|---|---|---|
| N | 132 | 29 | |
| Age | 43.8 ± 15.0 | 59.6 ± 14.8 | <0.001 |
| Age group | <0.001 | ||
| <60 | 114 (86.4%) | 17 (58.6%) | |
| ≥60 | 18 (13.6%) | 12 (41.4%) | |
| Gender | 0.249 | ||
| Female | 61 (46.2%) | 10 (34.5%) | |
| Male | 71 (53.8%) | 19 (65.5%) | |
| Hypertension | 31 (23.5%) | 12 (41.4%) | 0.049 |
| Diabetes | 14 (10.6%) | 10 (34.5%) | 0.001 |
| Coronary heart disease | 3 (2.3%) | 2 (6.9%) | 0.194 |
| Cerebrovascular diseases | 0 (0.0%) | 2 (6.9%) | 0.002 |
| Tumor | 1 (0.8%) | 2 (6.9%) | 0.084 |
| HBV | 1 (0.8%) | 0 (0.0%) | 0.638 |
| Chronic renal disease | 1 (0.8%) | 1 (3.4%) | 0.239 |
| Chronic liver disease/ cirrhosis | 3 (2.3%) | 0 (0.0%) | 0.412 |
| Alcoholism | 4 (3.0%) | 1 (3.4%) | 0.895 |
| Smoking | 5 (3.8%) | 2 (6.9%) | 0.639 |
Note: Results in the table: Mean ± SD/N (%).
Abbreviation: HBV, Hepatitis B virus.
Comparison of Laboratory Indexes Between Asymptomatic-Moderate Group and Severe or Above Group
| Classification | Asymptomatic-Moderate Group | Severe and Above Group | |
|---|---|---|---|
| N | 132 | 29 | |
| WBC (109/L) | 4.95 ± 1.55 | 5.50 ± 2.72 | 0.143 |
| Neur (%) | 59.84 ± 11.08 | 70.46 ± 16.12 | <0.001 |
| Lymr (%) | 29.38 ± 10.28 | 21.15 ± 12.56 | <0.001 |
| Monr (%) | 9.67 ± 3.26 | 7.51 ± 3.67 | 0.002 |
| Neuc (109/L) | 3.00 ± 1.28 | 4.12 ± 2.84 | 0.121 |
| Lymc (109/L) | 1.42 ± 0.62 | 0.97 ± 0.53 | <0.001 |
| Monc (109/L) | 0.47 ± 0.19 | 0.37 ± 0.18 | 0.009 |
| RBC (1012/L) | 4.73 ± 0.68 | 4.27 ± 0.57 | <0.001 |
| HGB (g/L) | 139.71 ± 18.39 | 130.10 ± 21.25 | 0.015 |
| HCT (%) | 41.21 ± 4.79 | 37.72 ± 5.53 | <0.001 |
| PLT (109/L) | 179.65 ± 55.38 | 173.93 ± 56.50 | 0.617 |
| RDW (%) | 13.18 ± 1.91 | 13.55 ± 2.79 | 0.526 |
| MPV (fL) | 11.06 ± 1.23 | 10.89 ± 1.17 | 0.514 |
| PDW (fL) | 14.98 ± 3.17 | 15.34 ± 2.08 | 0.559 |
| PTC (L/L) | 0.20 ± 0.05 | 0.18 ± 0.06 | 0.180 |
| Hs-CRP (mg/L) | 7.50 (2.30–23.90) | 71.77 (25.90–108.10) | <0.001 |
| PCT (ng/mL) | 0.12 (0.02–0.20) | 0.20 (0.17–0.34) | <0.001 |
Note: Results in the table: Mean ± SD/Median (Q1-Q3)/N (%).
Abbreviations: WBC, white blood cell; Neur, percentage of neutrophils; Lymr, percentage of lymphocytes; Monr, percentage of monocytes; Neuc, neutrophil count; Lymc, lymphocyte count; Monc, monocyte count; RBC, red blood cell count; HGB, hemoglobin; HCT, hematocrit; PLT, platelet count; RDW, red blood cell distribution width; MPV, mean platelet volume; PDW, platelet distribution width; PTC, plateletcrit; Hs-CRP, high sensitivity C-reactive protein; PCT, procalcitonin.
The Importance of Variables
| Feature | Gain | Cover | Frequency |
|---|---|---|---|
| hs-CRP | 0.40746360 | 0.47782767 | 0.31578947 |
| PCT | 0.24389755 | 0.19765398 | 0.26315789 |
| Age | 0.19192806 | 0.20253600 | 0.21052632 |
| Neuc | 0.04459240 | 0.03011219 | 0.05263158 |
| HGB | 0.04254448 | 0.04640159 | 0.05263158 |
| Neur | 0.04108720 | 0.01675670 | 0.05263158 |
| PDW | 0.02848671 | 0.02871187 | 0.05263158 |
Abbreviations: hs-CRP, high sensitivity C-reactive protein; PCT, procalcitonin; Neuc, neutrophil count; HGB, hemoglobin; Neur, percentage of neutrophil; PDW, platelet distribution width.
Figure 1Importance of the predictor variables in the XGBoost model, scaled to a maximum of 100.
Figure 2Receiver operating characteristic curve for estimating the prediction efficacy of the logistic regression model and machine learning model.
Comparison of the Diagnostic Efficiency Between the Machine Learning Model and the Logistic Regression Model
| Model | AUC (95% CI) | Cutoff | Specificity | Sensitivity | Accuracy | Positive-LR | Negative-LR |
|---|---|---|---|---|---|---|---|
| Machine learning model | 0.978 (0.960–0.996) | 0.1743 | 0.909 | 0.966 | 0.919 | 10.621 | 0.038 |
| Logistic regression model | 0.827 (0.724–0.930) | −1.3998 | 0.808 | 0.786 | 0.804 | 4.092 | 0.265 |
Abbreviations: Positive LR, positive likelihood ratio; Negative LR, negative likelihood ratio.