| Literature DB >> 35989982 |
Chak Foon Tso1, Carson Lam1, Jacob Calvert1, Qingqing Mao1,2.
Abstract
Respiratory syncytial virus (RSV) causes millions of infections among children in the US each year and can cause severe disease or death. Infections that are not promptly detected can cause outbreaks that put other hospitalized patients at risk. No tools besides diagnostic testing are available to rapidly and reliably predict RSV infections among hospitalized patients. We conducted a retrospective study from pediatric electronic health record (EHR) data and built a machine learning model to predict whether a patient will test positive to RSV by nucleic acid amplification test during their stay. Our model demonstrated excellent discrimination with an area under the receiver-operating curve of 0.919, a sensitivity of 0.802, and specificity of 0.876. Our model can help clinicians identify patients who may have RSV infections rapidly and cost-effectively. Successfully integrating this model into routine pediatric inpatient care may assist efforts in patient care and infection control.Entities:
Keywords: XGBoost; algorithm; diagnosis; machine learning; pediatric infection; respiratory syncytial virus
Year: 2022 PMID: 35989982 PMCID: PMC9385995 DOI: 10.3389/fped.2022.886212
Source DB: PubMed Journal: Front Pediatr ISSN: 2296-2360 Impact factor: 3.569
FIGURE 1Inclusion criteria for training and testing datasets of patient hospital encounters for algorithm development.
Demographic data of non-RSV positive and RSV positive patients with hospital encounters included in the holdout test set.
| Demographics | Training set ( | Testing set | ||
| Non-RSV positive ( | RSV positive ( | |||
| Below 1 years old | 21,204 (48.7%) | 5,244 (49.1%) | 61 (31.0%) | 0.002 |
| 1–3 | 12,552 (28.8%) | 3,024 (28.3%) | 120 (60.9%) | |
| 4–5 | 9,772 (22.4%) | 2,418 (22.6%) | 16 (8.1%) | |
| Unknown age | 2 (0.0%) | 0 (0.0%) | 0 (0.0%) | 1 |
| Male | 2,363 (54.3%) | 5,791 (54.2%) | 107 (54.3%) | 1 |
| Female | 19,743 (45.4%) | 4,860 (45.5%) | 90 (45.7%) | 1 |
| Unknown sex | 153 (0.4%) | 35 (0.3%) | 0 (0.0%) | 1 |
| White | 24,065 (55.3%) | 5,985 (56.0%) | 118 (59.9%) | 0.594 |
| Hispanic | 5,184 (11.9%) | 1,234 (11.5%) | 23 (11.7%) | 0.911 |
| Black | 5,997 (13.8%) | 1,441 (13.5%) | 21 (10.7%) | 0.342 |
| Asian | 1,142 (2.6%) | 257 (2.4%) | 8 (4.1%) | 0.158 |
| Other/unknown | 7,142 (16.4%) | 1,769 (16.6%) | 27 (13.7%) | 0.439 |
| Preterm birth | 2,786 (6.4%) | 713 (6.7%) | 1 (0.5%) | |
| Smoking exposure | 240 (0.6%) | 78 (0.7%) | 0 (0.0%) | 0.407 |
| Congenital heart defects | 485 (1.1%) | 140 (1.3%) | 0 (0.0%) | 0.185 |
| Neuromuscular disorders | 7 (0.0%) | 1 (0.0%) | 0 (0.0%) | 1 |
| Down syndrome | 77 (0.2%) | 20 (0.2%) | 1 (0.5%) | 0.320 |
| Cystic fibrosis | 11 (0.0%) | 7 (0.1%) | 1 (0.5%) | 0.137 |
| Chronic lung disease | 266 (0.6%) | 80 (0.7%) | 1 (0.5%) | 1 |
| Pediatric immunodeficiency | 70 (0.2%) | 15 (0.1%) | 0 (0.0%) | 1 |
| RSV PCR test performed | 4,175 (9.6%) | 851 (8.0%) | 197 (100.0%) | |
| RSV PCR test positive | 719 (1.7%) | 0 (0.0%) | 197 (100.0%) | |
FIGURE 2Algorithm discrimination and precision in identifying hospital encounters with future positive RSV tests. The receiver-operating curve (ROC) for the XGBoost model, showing superiority to random chance (gray) in discrimination between RSV-positive and non-RSV-positive encounters.
Summary of algorithm performance metrics.
| Performance metric | Value (95% CI) |
| AUROC | 0.919 (0.906–0.932) |
| Sensitivity | 0.802 (0.746–0.858) |
| Specificity | 0.876 (0.87–0.882) |
AUROC, area under the receiver–operating curve (no-skill baseline = 0.50). Optimal specificity was determined with a minimal sensitivity of 0.8.
FIGURE 3Shapley value plots for degree of model’s dependence on specific features. From top to bottom, the relative importance of each feature was ranked. Red dots represent relatively high values of a feature and blue dots represent relatively low values. On the x-axis, the SHAP values (or impact on model output) is plotted. If most of the red dots are on the right of the x-axis, it means high value of that feature (ex. mean DiasABP in this figure) substantially contributes to a positive prediction. SysABP, systolic arterial blood pressure; DiasABP, diastolic arterial blood pressure; RespRate, respiratory rate; HR, heart rate; SpO2, oxygen saturation; Temp, body temperature.