| Literature DB >> 33170848 |
Tzong-Shiann Ho1,2, Ting-Chia Weng3,4, Jung-Der Wang3,5,6, Hsieh-Cheng Han7, Hao-Chien Cheng8, Chun-Chieh Yang8, Chih-Hen Yu5, Yen-Jung Liu8, Chien Hsiang Hu9, Chun-Yu Huang8, Ming-Hong Chen9, Chwan-Chuen King10, Yen-Jen Oyang8, Ching-Chuan Liu1,2.
Abstract
In recent decades, the global incidence of dengue has increased. Affected countries have responded with more effective surveillance strategies to detect outbreaks early, monitor the trends, and implement prevention and control measures. We have applied newly developed machine learning approaches to identify laboratory-confirmed dengue cases from 4,894 emergency department patients with dengue-like illness (DLI) who received laboratory tests. Among them, 60.11% (2942 cases) were confirmed to have dengue. Using just four input variables [age, body temperature, white blood cells counts (WBCs) and platelets], not only the state-of-the-art deep neural network (DNN) prediction models but also the conventional decision tree (DT) and logistic regression (LR) models delivered performances with receiver operating characteristic (ROC) curves areas under curves (AUCs) of the ranging from 83.75% to 85.87% [for DT, DNN and LR: 84.60% ± 0.03%, 85.87% ± 0.54%, 83.75% ± 0.17%, respectively]. Subgroup analyses found all the models were very sensitive particularly in the pre-epidemic period. Pre-peak sensitivities (<35 weeks) were 92.6%, 92.9%, and 93.1% in DT, DNN, and LR respectively. Adjusted odds ratios examined with LR for low WBCs [≤ 3.2 (x103/μL)], fever (≥38°C), low platelet counts [< 100 (x103/μL)], and elderly (≥ 65 years) were 5.17 [95% confidence interval (CI): 3.96-6.76], 3.17 [95%CI: 2.74-3.66], 3.10 [95%CI: 2.44-3.94], and 1.77 [95%CI: 1.50-2.10], respectively. Our prediction models can readily be used in resource-poor countries where viral/serologic tests are inconvenient and can also be applied for real-time syndromic surveillance to monitor trends of dengue cases and even be integrated with mosquito/environment surveillance for early warning and immediate prevention/control measures. In other words, a local community hospital/clinic with an instrument of complete blood counts (including platelets) can provide a sentinel screening during outbreaks. In conclusion, the machine learning approach can facilitate medical and public health efforts to minimize the health threat of dengue epidemics. However, laboratory confirmation remains the primary goal of surveillance and outbreak investigation.Entities:
Year: 2020 PMID: 33170848 PMCID: PMC7654779 DOI: 10.1371/journal.pntd.0008843
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Fig 1Flow diagram for extracting 2942 laboratory-confirmed dengue cases (case group) and 1952 non-dengue cases (control group) from source of study population.
Demographic and clinical characteristics of those included study subjects (laboratory-confirmed dengue and non-dengue cases) and excluded ED patients at NCKU Hospital, Jan. 1 to Dec. 31, 2015.
| Laboratory-confirmed | Non-Dengue Patients | p-value* | Excluded | |
|---|---|---|---|---|
| 2942 | 1952 | 1474 | ||
| Age < 18 | 174 (5.91%) | 183 (9.38%) | <0.001 | 161 (10.92) |
| 18 ≤ age < 65 | 1871 (63.6%) | 1382 (70.8%) | 935 (63.43%) | |
| 65 ≤ age | 897 (30.49%) | 387 (19.83%) | 378 (25.64%) | |
| 50.24± 21.47 | 41.68 ±23.14 | <0.001 | 45.8±22.58 | |
| Female | 1473 (50.07%) | 945 (48.41%) | 0.2565 | 783 (53.12%) |
| Male | 1469 (49.93%) | 1007 (51.59%) | 691 (46.88%) | |
| Pre-peak: wks ≤ 35 | 399 (13.56%) | 150 (7.68%) | <0.001 | 361 (24.49%) |
| Peak: 35 < wks ≤ 40 | 1912 (64.99%) | 1077 (55.17%) | <0.001 | 810 (54.95%) |
| Post-peak: 40 < wks | 631 (21.45%) | 725 (37.14%) | <0.001 | 303 (20.56%) |
| Non-Hospitalized | 2443 (83.04%) | 1495 (76.59%) | <0.001 | 1148 (77.88%) |
| Hospitalized | 420 (14.28%) | 389 (20.08%) | <0.001 | 270 (18.32%) |
| ICU | 36 (1.22%) | 29 (1.49) | 0.5115 | 24 (1.63%) |
| Death | 43 (1.46%) | 36 (1.84%) | 0.2983 | 32 (2.17%) |
| (mean±SD) | (mean±SD) | |||
| Temperature (°C) | 38.33±0.98 | 37.97±0.99 | <0.001 | 37.64±1.43 |
| Systolic BP (mmHg) | 135±22 | 133±22 | <0.001 | 127±21 |
| Diastolic BP (mmHg) | 82±15 | 82±15 | 0.4068 | 79±20 |
| Heart Rate (BPM) | 100±19 | 102±20 | 0.0039 | 92±20 |
| Respiratory Rate (/min) | 20±3 | 20±3 | 0.0001 | 20±2 |
| (mean±SD) | (mean±SD) | |||
| WBCs (103/ | 5.25±2.70 | 9.18±4.29 | <0.001 | 4.81±3.33 |
| Platelets (103/ | 148.08±65.85 | 205.79±76.48 | <0.001 | 114.63±79.80 |
| Hemoglobin ( | 13.39±1.71 | 13.03±2.07 | <0.001 | 13.57±1.93 |
| Heart Disease | 332 (11.44%) | 213 (11.06%) | 0.6850 | 153 (10.38%) |
| CVA | 147 (5.06%) | 118(6.13%) | 0.1122 | 55 (3.37%) |
| CKD | 653 (22.49%) | 436 (22.64%) | 0.9069 | 288 (10.54%) |
| Severe Liver Disease | 250 (8.61%) | 185 (9.61%) | 0.2376 | 144 (9.77%) |
| DM | 532 (18.33%) | 348 (18.07%) | 0.8206 | 253 (17.16%) |
| Hypertension | 584 (20.12%) | 354 (18.38%) | 0.1352 | 264 (17.91%) |
| Cancer | 528 (17.95%) | 398 (20.39%) | 0.0327 | 208 (14.11%) |
Pre-peak: Before Epidemic Peak in the Epidemic Curve; SD: Standard Deviation; ICU: Intensive Care Units; BP: Blood Pressure; BPM: Heart Rate as Beats per Minute; WBCs: White Blood Cells; CVA: cerebral vascular accident; CKD: Chronic Kidney Disease; DM: Diabetes Mellitus
The Crude and Adjusted odds ratios for both 4-variable set and 6-variable set.
| Variables | 4-variable set | 6-variable set | |||
|---|---|---|---|---|---|
| Crude OR | Adjusted OR | 95% C.I. | Adjusted OR | 95% C.I. | |
| Young/Adult | 0.70 | 0.59 | (0.46,0.76) | 0.65 | (0.50,0.84) |
| Elder/Adult | 1.71 | 1.77 | (1.50,2.10) | 2.19 | (1.84,2.62) |
| Fever | 1.92 | 3.17 | (2.74,3.66) | 3.28 | (2.83,3.79) |
| Low_PLTs | 3.95 | 3.10 | (2.44,3.94) | 3.03 | (2.38,3.87) |
| Low_WBCs | 4.49 | 5.17 | (3.96,6.76) | 5.41 | (4.13,7.10) |
| High_WBCs | 0.10 | 0.08 | 0.08 | (0.07,0.10) | |
| Male/Female | 0.94 | 1.22 | (1.11,1.41) | ||
| Low_Hb | 0.67 | 0.49 | (0.42,0.58) | ||
| High_Hb | 1.13 | 1.13 | (0.86,1.49) | ||
OR: Odds Ratios; 95% CI: 95% Confidence Intervals in parentheses
4-variable set: Age, Body Temperature, Counts of white blood cells (WBCs) and platelets (PLTs)
6-variable set: Age, Body Temperature, Counts of WBCs, PLTs, and hemoglobin (Hb), and Gender
Fig 2Performance evaluation procedure based on 10-fold cross validation.
In each iteration of the 10-fold cross validation procedure, 90% of the patients’ records in the cohort were used to build the prediction model. Then, the remaining 10% of the patients’ records without the end results were fed into the prediction model and the predictions made by the prediction model were compared with the end results recorded in the cohort to evaluate how accurate the prediction model performed. The iteration was repeated 10 times with each of the 10 subsets being used for performance evaluation once and only once [24].
Fig 3Performance delivered by two machine learning methods (decision tree, deep neural network) and logistic regression models with 4, 6, 11, and 18 input variables.
AUCs with 4 input variables- DT: 83.75%±0.17%, DNN: 85.87%±0.54%, LR: 84.60%±0.03%; AUCs with 6 input variables-DT: 84.49%±0.11%, DNN: 86.95%±0.45%, LR: 85.69%±0.09%; AUCs with 11 input variables- DT: 84.49%±0.14%, DNN: 86.40%±0.64%, LR: 84.04%±0.07%; AUCs with 18 input variables- DT: 84.47%±0.14%, DNN: 86.35%±0.63%, LR: 84.07%±0.07%. The reported sensitivities/specificities for determining dengue, based on the 1997 and 2009 definitions were 95.4%/36.0% and 79.9%/57.0%, respectively [34].