| Literature DB >> 35062267 |
Jesús M Antoñanzas1, Aida Perramon2, Cayetana López1, Mireia Boneta1, Cristina Aguilera1, Ramon Capdevila3, Anna Gatell4, Pepe Serrano4, Miriam Poblet5, Dolors Canadell6, Mònica Vilà7, Georgina Catasús8, Cinta Valldepérez4, Martí Català2,9, Pere Soler-Palacín10, Clara Prats2, Antoni Soriano-Arandes10.
Abstract
BACKGROUND: Testing for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection is neither always accessible nor easy to perform in children. We aimed to propose a machine learning model to assess the need for a SARS-CoV-2 test in children (<16 years old), depending on their clinical symptoms.Entities:
Keywords: COVID-19; SARS-CoV-2; deep learning; epidemiology; machine learning; microbiology; paediatrics
Mesh:
Year: 2021 PMID: 35062267 PMCID: PMC8779426 DOI: 10.3390/v14010063
Source DB: PubMed Journal: Viruses ISSN: 1999-4915 Impact factor: 5.048
Baseline epidemiological and diagnostic characteristics of the cases included in the dataset.
| Characteristic | N (%) | COVID-19 | No COVID-19 | |
|---|---|---|---|---|
|
| ||||
| Male | 2461 (55.4) | 428 (55.9) | 2033 (55.3) | 0.78 |
| Female | 1984 (44.6) | 338 (44.1) | 1646 (44.7) | |
|
| ||||
| 0–5 | 1872 (42.4) | 315 (42.1) | 1557 (42.5) | 0.87 |
| 6–17 | 2540 (57.6) | 433 (57.9) | 2107 (57.5) | |
|
| ||||
| Yes | 4434 (99.5) | 764 (99.6) | 3670 (99.5) | 0.99 |
| No | 22 (0.5) | 3 (0.4) | 19 (0.5) | |
|
| ||||
| Positive | 321 (38.2) | 321 (100.0) | 0 (0.0) | <0.001 |
| Negative | 519 (61.8) | 0 (0.0) | 494 (100.0) | |
|
| ||||
| Positive | 463 (11.8) | 463 (100.0) | 0 (0.0) | <0.001 |
| Negative | 3453 (88.2) | 0 (0.0) | 3453 (100.0) | |
|
| ||||
| Yes | 77 (1.8) | 15 (2.0) | 62 (1.7) | 0.54 |
| No | 4251 (98.2) | 731 (98.0) | 3520 (98.3) | |
|
| ||||
| Yes | 1 (0.02) | 1 (0.1) | 0 (0.0) | 0.17 |
| No | 4262 (99.98) | 730 (99.9) | 3532 (100.0) | |
|
| ||||
| Yes | 121 (3.6) | 32 (5.5) | 89 (3.2) | 0.009 |
| No | 3274 (96.4) | 548 (94.5) | 2726 (96.8) | |
|
| ||||
| Yes | 376 (11.7) | 96 (17.1) | 280 (10.5) | <0.001 |
| No | 2851 (88.3) | 465 (82.9) | 2386 (89.5) | |
|
| ||||
| Yes | 1232 (29.6) | 197 (29.8) | 1035 (29.6) | 0.93 |
| No | 2929 (70.4) | 465 (70.2) | 2464 (70.4) | |
|
| ||||
| ≤4 | 1360 (32.2) | 215 (29.9) | 1145 (32.6) | 0.17 |
| >4 | 2870 (67.8) | 503 (70.1) | 2367 (67.4) | |
|
| ||||
| Yes | 956 (21.4) | 494 (64.4) | 462 (12.5) | <0.001 |
| No | 3500 (78.6) | 273 (35.6) | 3227 (87.5) | |
|
| ||||
| Yes | 548 (12.3) | 451 (58.8) | 97 (2.6) | <0.001 |
| No | 3908 (87.7) | 316 (41.2) | 3592 (97.4) | |
|
| ||||
| Yes | 338 (7.6) | 124 (16.2) | 214 (5.8) | <0.001 |
| No | 4118 (92.4) | 643 (83.8) | 3475 (94.2) | |
|
| ||||
| Yes | 291 (6.5) | 125 (16.3) | 166 (4.5) | <0.001 |
| No | 4165 (93.5) | 642 (83.7) | 3523 (95.5) | |
|
| ||||
| Yes | 14 (0.6) | 4 (2.3) | 10 (0.5) | 0.02 |
| No | 2292 (99.4) | 171 (97.7) | 2121 (99.5) | |
|
| ||||
| Yes | 60 (2.5) | 11 (2.7) | 49 (2.5) | 0.73 |
| No | 2303 (97.5) | 390 (97.3) | 1913 (97.5) | |
|
| ||||
| Yes | 688 (15.4) | 129 (16.8) | 559 (15.2) | 0.25 |
| No | 3768 (84.6) | 638 (83.2) | 3130 (84.8) |
Specifications of the dataset used to train the predictive model of COVID-19 in paediatric-aged patients.
| Characteristic | Total | COVID-19 | No COVID-19 |
|---|---|---|---|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | |
Figure 1Modelling and data pipeline: a classifier fs and its quality metrics is obtained for each dataset Xs.
Average CV scores for the models trained with the data subset including all ages (architecture 1), with the data subset for ages 0 to 5 years (architecture 2) and with the data subset for ages 6 to 14 years (architecture 3).
|
|
|
|
|
|
|
| XGB | 0.645 | 0.273 | 0.631 | 0.66 | 0.38 |
| RF | 0.644 | 0.278 | 0.607 | 0.680 | 0.381 |
| SVM | 0.627 | 0.289 | 0.507 | 0.747 | 0.367 |
| MLP | 0.567 | 0.253 | 0.496 | 0.637 | 0.264 |
| LR | 0.633 | 0.267 | 0.597 | 0.669 | 0.369 |
|
|
|
|
|
|
|
| XGB | 0.577 | 0.141 | 0.558 | 0.596 | 0.225 |
| RF | 0.58 | 0.14 | 0.603 | 0.556 | 0.226 |
| SVM | 0.58 | 0.145 | 0.576 | 0.593 | 0.231 |
| MLP | 0.542 | 0.128 | 0.465 | 0.619 | 0.198 |
| LR | 0.564 | 0.134 | 0.562 | 0.567 | 0.216 |
|
|
|
|
|
|
|
| XGB | 0.649 | 0.38 | 0.632 | 0.667 | 0.474 |
| RF | 0.652 | 0.377 | 0.663 | 0.641 | 0.479 |
| SVM | 0.651 | 0.378 | 0.653 | 0.65 | 0.477 |
| MLP | 0.596 | 0.33 | 0.588 | 0.605 | 0.419 |
| LR | 0.635 | 0.365 | 0.626 | 0.645 | 0.46 |
Test scores for each of the fine-tuned best performing models with 95% CI.
| Subset | Architecture | AUROC (95% CI) | Precision (95% CI) | Sensitivity (95% CI) | Specificity (95% CI) | F1 (95% CI) |
|---|---|---|---|---|---|---|
|
| XGB | 0.65 | 0.66 | 0.3 | 0.65 | 0.41 |
|
| RF | 0.63 | 0.15 | 0.65 | 0.59 | 0.25 |
|
| RF | 0.67 | 0.36 | 0.66 | 0.68 | 0.46 |
Figure 2Impact of each variable on the model output for the general model. The features are organised top-down, by decreasing overall importance. For each feature, the SHAP value of each test observation is shown as a point. Each symptom is present if the colour of the point is red and absent if it is blue. The more to the right the points are, the more the output is associated with a SARS-CoV-2 infection.
Figure 3(A) Absolute mean impact and (B) absolute maximum impact of each variable for the general model.
Figure 4(A) Impact of each variable on the model output for the model for children 0 to 5 years old and (B) for children 6 to 15 years old. The features are organised top-down, by decreasing overall importance. For each feature, the SHAP value of each test observation is shown as a point. Each symptom is present if the colour of the point is red and absent if it is blue. The more to the right the points are, the more the output is associated with a SARS-CoV-2 infection.