| Literature DB >> 35956189 |
Victor Muniz De Freitas1, Daniela Mendes Chiloff1, Giulia Gabriella Bosso1, Janaina Oliveira Pires Teixeira1, Isabele Cristina de Godói Hernandes1, Maira do Patrocínio Padilha1, Giovanna Corrêa Moura1, Luis Gustavo Modelli De Andrade2, Frederico Mancuso3, Francisco Estivallet Finamor3, Aluísio Marçal de Barros Serodio4, Jaquelina Sonoe Ota Arakaki5, Marair Gracio Ferreira Sartori6, Paulo Roberto Abrão Ferreira7, Érika Bevilaqua Rangel8.
Abstract
A machine learning approach is a useful tool for risk-stratifying patients with respiratory symptoms during the COVID-19 pandemic, as it is still evolving. We aimed to verify the predictive capacity of a gradient boosting decision trees (XGboost) algorithm to select the most important predictors including clinical and demographic parameters in patients who sought medical support due to respiratory signs and symptoms (RAPID RISK COVID-19). A total of 7336 patients were enrolled in the study, including 6596 patients that did not require hospitalization and 740 that required hospitalization. We identified that patients with respiratory signs and symptoms, in particular, lower oxyhemoglobin saturation by pulse oximetry (SpO2) and higher respiratory rate, fever, higher heart rate, and lower levels of blood pressure, associated with age, male sex, and the underlying conditions of diabetes mellitus and hypertension, required hospitalization more often. The predictive model yielded a ROC curve with an area under the curve (AUC) of 0.9181 (95% CI, 0.9001 to 0.9361). In conclusion, our model had a high discriminatory value which enabled the identification of a clinical and demographic profile predictive, preventive, and personalized of COVID-19 severity symptoms.Entities:
Keywords: COVID-19; hospitalization; machine learning; predictive model
Year: 2022 PMID: 35956189 PMCID: PMC9369854 DOI: 10.3390/jcm11154574
Source DB: PubMed Journal: J Clin Med ISSN: 2077-0383 Impact factor: 4.964
Figure 1The total number of patients with respiratory signs and symptoms and the need for hospitalization and non-hospitalization, from March to August 2020, at São Paulo Hospital, São Paulo, Brazil.
Comparison of demography, signs, and symptoms between patients who required hospitalization and those who did not require hospitalization from March to August 2020 at São Paulo Hospital, Brazil.
| Variables | No Hospitalization ( | Hospitalization ( |
| ||
|---|---|---|---|---|---|
| Median | Interquartile Range | Median | Interquartile Range | ||
| Females ( | 3570 (54%) | 324 (44%) | <0.001 | ||
| Age (year-old) | 39 | (28, 51) | 58 | (47, 69) | <0.001 |
| Duration of symptoms (days) | 4 | (2, 8) | 7 | (4, 10) | <0.001 |
| Systolic blood pressure (mmHg) | 133 | (121, 146) | 129 | (114, 145) | <0.001 |
| Diastolic blood pressure (mmHg) | 84 | (75, 94) | 80 | (70, 90) | <0.001 |
| Heart rate (bpm) | 90 | (80, 101) | 96 | (85, 110) | <0.001 |
| Temperature (°C) | 36.50 | (36.00, 36.80) | 36.50 | (36.00, 36.90) | 0.007 |
| Respiratory rate (bpm) | 18 | (16, 20) | 24 | (20, 28) | <0.001 |
| SpO2 (%) | 97.00 | (96.00, 98.00) | 94.00 | (90.00, 96.00) | <0.001 |
| Influenza vaccine | 1897 (39%) | 149 (47%) | 0.007 | ||
| Fever | 2500 (42%) | 289 (57%) | <0.001 | ||
| Fatigue | 1628 (27%) | 184 (37%) | <0.001 | ||
| Sneezing | 496 (8.3%) | 20 (4.0%) | <0.001 | ||
| Dry cough | 2702 (45%) | 262 (52%) | 0.004 | ||
| Productive cough | 720 (12%) | 75 (15%) | 0.065 | ||
| Running nose | 1304 (22%) | 45 (8.9%) | <0.001 | ||
| Sore throat | 1428 (24%) | 38 (7.6%) | <0.001 | ||
| Diarrhea | 851 (14%) | 88 (17%) | 0.051 | ||
| Breathing difficulty | 1955 (33%) | 302 (60%) | <0.001 | ||
| Anorexia | 614 (10%) | 94 (19%) | <0.001 | ||
| Headache | 2219 (37%) | 93 (18%) | <0.001 | ||
| Myalgia | 1845 (31%) | 138 (27%) | 0.10 | ||
| Nausea/vomiting | 735 (12%) | 91 (18%) | <0.001 | ||
| Wheezing | 136 (2.3%) | 11 (2.2%) | 0.9 | ||
| Thoracic pain | 1059 (18%) | 67 (13%) | 0.012 | ||
| Abdominal pain | 320 (5.4%) | 34 (6.7%) | 0.2 | ||
| Anosmia | 1133 (19%) | 80 (16%) | 0.084 | ||
| Dysgeusia | 1127 (19%) | 80 (16%) | 0.10 | ||
| Chills | 672 (11%) | 46 (9.1%) | 0.14 | ||
SpO2: oxyhemoglobin saturation by pulse oximetry.
Comparison of comorbidities between patients who required hospitalization and those who did not require hospitalization from March to August 2020 at São Paulo Hospital, Brazil.
| Variables | No Hospitalization | Hospitalization |
|
|---|---|---|---|
| Hypertension | 1227 (21%) | 232 (46%) | <0.001 |
| Cardiac disease | 227 (3.8%) | 68 (14%) | <0.001 |
| Diabetes mellitus | 469 (7.8%) | 136 (27%) | <0.001 |
| Cerebrovascular disease | 42 (0.7%) | 17 (3.4%) | <0.001 |
| Chronic kidney disease | 162 (2.7%) | 66 (13%) | <0.001 |
| Immunosuppression | 230 (3.8%) | 55 (11%) | <0.001 |
| COPD | 95 (1.6%) | 13 (2.6%) | 0.10 |
| Asthma | 355 (5.9%) | 19 (3.8%) | 0.044 |
| Tuberculosis | 39 (0.7%) | 6 (1.2%) | 0.2 |
| Other respiratory diseases | 87 (1.5%) | 12 (2.4%) | 0.10 |
| Neoplasia | 89 (1.5%) | 30 (6.0%) | <0.001 |
| Solid organ transplant | 145 (2.4%) | 63 (12%) | <0.001 |
| Obesity | 305 (5.1%) | 56 (11%) | <0.001 |
| Smoking | 621 (10%) | 45 (8.9%) | 0.3 |
| Pregnancy | 79 (1.3%) | 3 (0.6%) | 0.2 |
| Previous hospitalization | 37 (0.6%) | 23 (4.5%) | <0.001 |
COPD: Chronic obstructive pulmonary disease.
Datasets (train and test sets) used in predicting hospitalization in patients with respiratory symptoms during the COVID-19 pandemic.
| Characteristic | Train | Test |
|---|---|---|
| Female sex | 3102 (53%) | 792 (54%) |
| Age | 41 (29, 53) | 40 (30, 54) |
| Symptom duration | 4 (2, 8) | 4 (2, 8) |
| Systolic blood pressure | 133 (121, 147) | 132 (120, 145) |
| Diastolic blood pressure | 84 (74, 94) | 82 (74, 92) |
| Heart rate | 90 (80, 102) | 89 (80, 100) |
| Temperature | 36.50 (36.00, 36.80) | 36.50 (36.00, 36.70) |
| Respiratory frequency | 18 (17, 20) | 18 (17, 20) |
| SpO2 | 97.00 (96.00, 98.00) | 97.00 (96.00, 98.00) |
| Influenza vaccine | 1632 (40%) | 414 (39%) |
| Fever | 2258 (44%) | 531 (41%) |
| Fatigue | 1456 (28%) | 356 (27%) |
| Occasional Cough | 420 (8.1%) | 96 (7.4%) |
| Dry cough | 2370 (46%) | 594 (46%) |
| Phlegm cough | 641 (12%) | 154 (12%) |
| Running nose | 1059 (20%) | 290 (22%) |
| Sore throat | 1173 (23%) | 293 (23%) |
| Diarrhea | 733 (14%) | 206 (16%) |
| Dyspnea | 1822 (35%) | 435 (34%) |
| Anorexia | 580 (11%) | 128 (9.9%) |
| Headache | 1851 (36%) | 461 (36%) |
| Myalgia | 1597 (31%) | 386 (30%) |
| Nausea and vomiting | 652 (13%) | 174 (13%) |
| Chest wheezing | 120 (2.3%) | 27 (2.1%) |
| Chest pain | 889 (17%) | 237 (18%) |
| Abdominal pain | 295 (5.7%) | 59 (4.6%) |
| Anosmia | 959 (19%) | 254 (20%) |
| Dysgeusia | 960 (19%) | 247 (19%) |
| Chills | 570 (11%) | 148 (11%) |
| Hypertension | 1193 (23%) | 266 (20%) |
| Heart disease | 238 (4.6%) | 57 (4.4%) |
| Diabetes mellitus | 466 (9.0%) | 139 (11%) |
| Cerebrovascular disease | 47 (0.9%) | 12 (0.9%) |
| Chronic kidney disease | 188 (3.6%) | 40 (3.1%) |
| Immunosuppression | 231 (4.5%) | 54 (4.2%) |
| COPD | 84 (1.6%) | 24 (1.8%) |
| Asthma | 307 (5.9%) | 67 (5.2%) |
| Tuberculosis | 29 (0.6%) | 16 (1.2%) |
| Respiratory disease | 79 (1.5%) | 20 (1.5%) |
| Neoplasia | 97 (1.9%) | 22 (1.7%) |
| Transplant | 171 (3.3%) | 37 (2.8%) |
| Obesity | 284 (5.5%) | 77 (5.9%) |
| Smoking | 535 (10%) | 131 (10%) |
| Pregnant | 61 (1.2%) | 21 (1.6%) |
| Prior hospitalization | 47 (0.9%) | 13 (1.0%) |
| Hospitalization | 592 (10%) | 148 (10%) |
Continuous values are present in medians and percentiles (25 and 75%). SpO2: oxyhemoglobin saturation by pulse oximetry; COPD: chronic obstructive pulmonary disease.
Results of model performance in predicting hospitalization in patients with respiratory symptoms during the COVID-19 pandemic. The predictions were retrieved in the test set (n = 1468).
| Model | AUC-ROC | Accuracy | Balanced Accuracy | F1-Score | Precision |
|---|---|---|---|---|---|
| XGBoost Full 1 | 0.927 | 0.905 | 0.800 | 0.946 | 0.962 |
| Random Forest 1 | 0.930 | 0.905 | 0.779 | 0.947 | 0.957 |
| Lasso 1 | 0.899 | 0.844 | 0.823 | 0.907 | 0.974 |
| Light GBM 1 | 0.925 | 0.896 | 0.822 | 0.941 | 0.968 |
| XgBoost Reduced 2 | 0.917 | 0.886 | 0.793 | 0.935 | 0.962 |
1 Predictors of the full model (n = 45): sex, age, duration of symptoms, systolic blood pressure, diastolic blood pressure, heart rate, temperature, respiratory frequency, SpO2, influenza vaccine, fever, fatigue, occasional cough, dry cough, phlegm cough, running nose, sore throat, diarrhea, dyspnea, anorexia, headache, myalgia, nausea and vomiting, chest wheezing, chest pain, abdominal pain, anosmia, dysgeusia, chills, hypertension, heart disease, diabetes mellitus, cerebrovascular disease, chronic kidney disease, immunosuppression, chronic obstructive pulmonary disease, asthma, tuberculosis, respiratory disease, neoplasia, transplant, obesity, smoking, pregnancy, and prior hospitalization. 2 Predictors of reduced model (n = 11): SpO2, respiratory rate, age, sex, duration of symptoms, presence of hypertension, temperature at admission, presence of DM, heart rate, and systolic and diastolic blood pressure at admission.
Figure 2(A) Variables in order of importance using the reduced predictive model using SHAP or SHAPley Additive exPlanation plot. (B) ROC curve analyses with the area under the curve of 0.9181 (95% CI, 0.9001 to 0.9361).
Figure 3Summary of the study design and findings.