| Literature DB >> 35469291 |
Andrea Ramírez Varela1, Sergio Moreno López1, Sandra Contreras-Arrieta1, Guillermo Tamayo-Cabeza1, Silvia Restrepo-Restrepo1, Ignacio Sarmiento-Barbieri1, Yuldor Caballero-Díaz1, Luis Jorge Hernandez-Florez1, John Mario González1, Leonardo Salas-Zapata2, Rachid Laajaj1, Giancarlo Buitrago-Gutierrez3, Fernando de la Hoz-Restrepo4, Martha Vives Florez1, Elkin Osorio2, Diana Sofía Ríos-Oliveros2, Eduardo Behrentz1.
Abstract
Symptoms-based models for predicting SARS-CoV-2 infection may improve clinical decision-making and be an alternative to resource allocation in under-resourced settings. In this study we aimed to test a model based on symptoms to predict a positive test result for SARS-CoV-2 infection during the COVID-19 pandemic using logistic regression and a machine-learning approach, in Bogotá, Colombia. Participants from the CoVIDA project were included. A logistic regression using the model was chosen based on biological plausibility and the Akaike Information criterion. Also, we performed an analysis using machine learning with random forest, support vector machine, and extreme gradient boosting. The study included 58,577 participants with a positivity rate of 5.7%. The logistic regression showed that anosmia (aOR = 7.76, 95% CI [6.19, 9.73]), fever (aOR = 4.29, 95% CI [3.07, 6.02]), headache (aOR = 3.29, 95% CI [1.78, 6.07]), dry cough (aOR = 2.96, 95% CI [2.44, 3.58]), and fatigue (aOR = 1.93, 95% CI [1.57, 2.93]) were independently associated with SARS-CoV-2 infection. Our final model had an area under the curve of 0.73. The symptoms-based model correctly identified over 85% of participants. This model can be used to prioritize resource allocation related to COVID-19 diagnosis, to decide on early isolation, and contact-tracing strategies in individuals with a high probability of infection before receiving a confirmatory test result. This strategy has public health and clinical decision-making significance in low- and middle-income settings like Latin America.Entities:
Keywords: Anosmia; COVID-19; Logistic model; Machine learning; SARS-CoV-2; Symptoms
Year: 2022 PMID: 35469291 PMCID: PMC9020649 DOI: 10.1016/j.pmedr.2022.101798
Source DB: PubMed Journal: Prev Med Rep ISSN: 2211-3355
Fig. 1Modeling framework for the analysis of symptoms association and SARS-CoV-2 prediction. a) Variable’s selection for the final logistic regression model; b) Sampling procedure and hold out validation for logistic regression model and the ML approach.
Sociodemographic characteristics and symptoms related to SARS-CoV-2 infection (N = 58,556).
| Variable | Total | Positive | Negative | |
|---|---|---|---|---|
| Age (years) median (IQR) | 36 (28–48) | 35 (27–47) | 36 (28–49) | < ·001* |
| Age (years) ( | ·001* | |||
| <18 | 235 (0·4) | 30 (0·90) | 205 (0·4) | |
| 18–29 | 17,718 (30·3) | 1,096 (32·9) | 16,622 (30·1) | |
| 30–59 | 35,119 (59·9) | 1,880 (56·5) | 33,239 (60·2) | |
| >60 | 5,484 (9·4) | 319 (9·6) | 5,165 (9·4) | |
| Sex ( | ·183 | |||
| Female | 29,736 (50·8) | 1,652 (5·6) | 28,084 (94·4) | |
| Male | 28,794 (49·2) | 1,673 (5·8) | 27,121 (94·2) | |
| Contact with a COVID-19 confirmed case ( | < ·001* | |||
| Yes | 10,758 (18·4) | 796 (7·4) | 9,962 (92·6) | |
| No | 47,798 (81·6) | 2,529 (5·3) | 45,269 (94·7) | |
| Comorbidities ( | ||||
| Arterial hypertension | 4,352 (7·4) | 228 (5·2) | 4,124 (94·8) | ·193 |
| Smoking | 2,969 (5·1) | 133 (4·5) | 2,836 (95·5) | ·004* |
| Obesity | 2,551 (4·4) | 139 (5·5) | 2,412 (94·6) | ·609 |
| Asthma | 2,311 (3·9) | 104 (4·5) | 2,207 (95·5) | ·007* |
| Diabetes mellitus | 1,379 (2·4) | 71 (5·2) | 1,308 (94·9) | ·478 |
| Chronic obstructive lung disease | 295 (0·5) | 19 (6·4) | 276 (93·6) | ·652 |
| Symptoms related to SARS-CoV-2 ( | ||||
| Asymptomatic | 51,254 (87·5) | 1,852 (3·6) | 49,402 (96·4) | < ·001* |
| Symptomatic | 7,302 (12·5) | 1,473 (20·2) | 5,829 (79·8) | < ·001* |
| Sore throat | 4,230 (7·2) | 790 (18·7) | 3,440 (81·3) | < ·001* |
| Dry cough | 3,407 (5·8) | 893 (26·2) | 2,514 (73·8) | < ·001* |
| Fatigue | 2,713 (4·6) | 707 (26·1) | 2,006 (73·9) | < ·001* |
| Anosmia | 1,802 (3·1) | 749 (41·6) | 1,053 (58·4) | < ·001* |
| Diarrhea | 1,871 (3·2) | 413 (22·1) | 1,458 (77·9) | < ·001* |
| Fever | 1,169 (2·0) | 444 (37·9) | 725 (62·0) | < ·001* |
| Dyspnea | 1,376 (2·6) | 383 (27·8) | 993 (72·2) | < ·001* |
| Confusion | 413 (0·7) | 119 (28·8) | 294 (71·2) | < ·001* |
| Headache | 254 (0·4) | 50 (19·7) | 204 (80·3) | < ·001* |
| Myalgias | 68 (0·1) | 15 (22·1) | 53 (77·9) | < ·001* |
| Dysgeusia | 16 (0·03) | 13 (81·3) | 3 (18·8) | < ·001*† |
| Chills | 25 (0·04) | 2 (8·0) | 23 (92·0) | ·616† |
| Vomiting/nausea | 25 (0·04) | 3 (12·0) | 22 (88·0) | ·172† |
| Rhinorrhea | 13 (0·02) | 0 (0) | 13 (100·0) | ·376† |
*p value <·05.
†Fisher exact test.
Logistic regression with undersampling dataset (n = 14,475).
| Variable | Unadjusted | 95% CI | Adjusted | 95% CI | ||
|---|---|---|---|---|---|---|
| Age | 0·99 | [0·99, 1·00] | ·108 | 1·00 | [0·99, 1·01] | ·052 |
| Socioeconomic strata | ||||||
| Low-low | 3·71 | [2·42, 5·70] | < ·001* | 3·16 | [1·98, 5·02] | < ·001 |
| Low | 3·13 | [2·16, 4·52] | < ·001* | 2·70 | [1·82, 5·02] | < ·001 |
| Middle-low | 2·74 | [1·91, 3·95] | < ·001* | 2·27 | [1·81, 4·01] | < ·001 |
| Middle | 1·37 | [0·93, 2·02] | ·103 | 1·22 | [0·81, 1·84] | ·337 |
| Middle-high | 1·30 | [0·84, 2·02] | ·231 | 1·27 | [0·79, 2·01] | ·332 |
| Contact with confirmed COVID-19 | 1·62 | [1·45, 1·82] | < ·001* | 1·27 | [1·12, 1·46] | < ·001* |
| Arterial hypertension | 0·78 | [0·64, 0·96] | ·020* | 0·79 | [1·12, 1·46] | ·058 |
| Symptoms related to SARS-CoV-2 | ||||||
| Fever | 15·75 | [11·81, 21·01] | < ·001* | 4·29 | [3·07, 6·02] | < ·001* |
| Anosmia | 17·19 | [13·97, 20·91] | < ·001* | 7·76 | [6·19, 9·73] | < ·001* |
| Dry cough | 8·02 | [6·90, 9·33] | < ·001* | 2·96 | [2·44, 3·58] | < ·001* |
| Fatigue | 7·09 | [6·03, 8·33] | < ·001* | 1·93 | [1·57, 2·93] | < ·001* |
| Headache | 4·85 | [2·80, 8·37] | < ·001* | 3·29 | [1·78, 6·07] | < ·001* |
*p value < 0·05.
Fig. 2Logistic regression model obtained under BIC criterion. a) Forest plot for association between sociodemographic characteristics, COVID-19 related symptoms and SARS-CoV-2 positive RT-PCR test; b) ROC curve for prediction of SARS-CoV-2 positive RT-PCR test result.
Diagnostic performance of prediction models.
| Dataset | Variable | AUC | SE | SP | PPV | NPV | Correctly classified |
|---|---|---|---|---|---|---|---|
| Complete data set | Logistic regression | ·71 | ·10 | ·99 | ·57 | ·95 | ·95 |
| RF | ·66 | ·04 | ·99 | ·54 | ·94 | ·95 | |
| SVM | ·59 | ·02 | ·99 | ·48 | ·95 | ·94 | |
| XG boosting | ·73 | ·07 | ·99 | ·55 | ·95 | ·95 | |
| Undersampling | Logistic regression | ·73 | ·26 | ·98 | ·73 | ·86 | ·85 |
| RF | ·81 | ·28 | ·98 | ·72 | ·86 | ·85 | |
| SVM | ·73 | ·29 | ·98 | ·74 | ·86 | ·85 | |
| XG boosting | ·77 | ·09 | ·99 | ·88 | ·83 | ·84 | |
| SMOTE | Logistic regression | ·72 | ·22 | ·94 | ·71 | ·66 | ·67 |
| RF | ·87 | ·69 | ·96 | ·91 | ·83 | ·86 | |
| SVM | ·66 | ·24 | ·94 | ·71 | ·66 | ·67 | |
| XG boosting | ·90 | ·98 | ·99 | ·97 | ·81 | ·86 | |
| ROSE | Logistic regression | ·74 | ·47 | ·88 | ·79 | ·62 | ·67 |
| RF | ·81 | ·65 | ·84 | ·80 | ·71 | ·75 | |
| SVM | ·73 | ·47 | ·88 | ·79 | ·62 | ·67 | |
| XG boosting | ·77 | ·57 | ·81 | ·76 | ·65 | ·69 |