| Literature DB >> 36135395 |
Shihab Uddin Chowdhury1, Sanjana Sayeed1, Iktisad Rashid1, Md Golam Rabiul Alam1, Abdul Kadar Muhammad Masum2, M Ali Akber Dewan3.
Abstract
Dengue is a viral disease that primarily affects tropical and subtropical regions and is especially prevalent in South-East Asia. This mosquito-borne disease sometimes triggers nationwide epidemics, which results in a large number of fatalities. The development of Dengue Haemorrhagic Fever (DHF) is where most cases occur, and a large portion of them are detected among children under the age of ten, with severe conditions often progressing to a critical state known as Dengue Shock Syndrome (DSS). In this study, we analysed two separate datasets from two different countries- Vietnam and Bangladesh, which we referred as VDengu and BDengue, respectively. For the VDengu dataset, as it was structured, supervised learning models were effective for predictive analysis, among which, the decision tree classifier XGBoost in particular produced the best outcome. Furthermore, Shapley Additive Explanation (SHAP) was used over the XGBoost model to assess the significance of individual attributes of the dataset. Among the significant attributes, we applied the SHAP dependence plot to identify the range for each attribute against the number of DHF or DSS cases. In parallel, the dataset from Bangladesh was unstructured; therefore, we applied an unsupervised learning technique, i.e., hierarchical clustering, to find clusters of vital blood components of the patients according to their complete blood count reports. The clusters were further analysed to find the attributes in the dataset that led to DSS or DHF.Entities:
Keywords: Dengue Haemorrhagic Fever; Dengue Shock Syndrome; Shapley Additive Explanation; XGBoosting; clinical data; dengue; hierarchical clustering; supervised; unsupervised
Year: 2022 PMID: 36135395 PMCID: PMC9506144 DOI: 10.3390/jimaging8090229
Source DB: PubMed Journal: J Imaging ISSN: 2313-433X
Figure 1Top level overview of the dengue prediction model (VDengue dataset).
Attributes of VDengue dataset.
| Name of the Attributes | Description |
|---|---|
| st_no | Patient study number |
| age | Age at enrolment (years) |
| sex | Gender (Female, Male) |
| wt | Weight at enrolment (kg) |
| day_ill | Day of illness at enrolment |
| his_tired | History of tiredness at enrolment (Yes, No) |
| his_vomit | History of vomit at enrolment (Yes, No) |
| ttest | Tourniquet test result at enrolment (Positive, Equivocal, Negative) |
| temp | Temperature at enrolment ( |
| pulse | Pulse rate at enrolment (count per minute) |
| sys_bp | Systolic blood pressure at enrolment (mmHg) |
| mucosal_bleed | Mucosal bleeding at enrolment (Yes, No) |
| abdominal_pain | Abdominal pain at enrolment (Yes, No) |
| liver | Liver sice at enrolment (cm) |
| hct_bsl | Haematocrit concentration at enrolment (%) |
| plt_bsl | Platelets count at enrolment (cells/mm |
| serotype2 | Serotype determined by PCR (DENV1, DENV2, DENV3,Mixed, Negative) |
| serology | Immune status determined by ELISA (Primary, Secondary, Possible Primary, Unclassifiable) |
| to_PICU | Referred to PICU (Yes, No) |
| shock | Dengue shock syndrome (Yes, No) |
| doi_shock | Day of illness at shock (days) |
| bleed_hos | Bleeding during hospitalisation (No, Skin, Mucose, Other) |
| minPLT_3to8 | Platelet nadir (cells/mm |
Figure 2Summary plot of different features of the datasets using SHAP values on XGBoost classifier model.
Attributes of dataset collected from Dr. M. R. khan Shishu Hospital (Bangladesh).
| Name of the Attributes | Description | Unit |
|---|---|---|
| DOA | Date of admission of patient | - |
| PatientId | ID of patients enrolled in the hospital | - |
| PatientName | Name of the patient | - |
| Gender | Gender (Female or Male) | - |
| Age | Age of the patient | years |
| Hb | Haemoglobin count | g/dL |
| TotalCountWBC | Total WBC count of the patient | - |
| Platelets | Platelets count of patient | cells/mm |
| ESR | Erythrocyte sedimentation rate of patient | mm |
| DengueNS1 | Dengue virus antigen detection | - |
| WeightOfThePatient | Weight of the patient | kg |
| BloodPressure | Blood pressure of patient | mm/Hg |
| HCT | Haematocrit concentration of patient | % |
| Lymphocytes | - | % |
| Monocytes | - | % |
| Neutrophils | - | % |
| Eosinophils | - | % |
| Basophils | - | % |
| BloodGroup | Blood group of the patient | - |
| Dengue IgM/IgG | Antibody testing in dengue diagnosis | - |
| SGPT | SGPT of patient | - |
| Albumin | - | m/dL |
| Symptoms | Symptoms of the patient | - |
Figure 3Showing attributes of the Shishu Hospital dataset.
Attributes of dataset collected from central hospital (Bangladesh).
| Name of the Attributes | Unit | Name of the Attributes | Unit |
|---|---|---|---|
| Date Of Arrival | dd/mm/yy | Monocytes | % |
| Patient ID | - | Basophils | % |
| Patient Name | - | HCT | % |
| Gender | 0(M)/1(F) | MCV | fl |
| Age | year | MCH | pg |
| Haemoglobin | g/dL | MCHC | g/dL |
| WBC Count | /cmm | RBC Count | million/cm |
| Platelets | K/L | Dengue NS1 | Positive/Negative |
| Neutrophils | % | Dengue IgG | Positive/Negative |
| Lymphocytes | % | Dengue IgM | Positive/Negative |
| Eosinophils | % | - | - |
Figure 4Bi-variant Relation between different features of the BDengue dataset.
Figure 5Top level overview of the dengue prediction model (BDengue dataset).
Figure 6Bar diagram showing the importance of different features for the VDengue Dataset.
Figure 7Bar diagram showing important features of the BDengue dataset.
Classification report after applying different machine learning approaches on the VDengue dataset.
| Algorithm | Misclassification | Precision | f1_Score | PPV | NPV |
|---|---|---|---|---|---|
| AdaBoost | 0.02 | 0.98 | 0.96 | 0.74 | 1 |
| XGBoost | 0.01 | 0.98 | 0.96 | 0.8 | 1 |
| Random Forest | 0.02 | 0.98 | 0.96 | 0.73 | 1 |
| Decision Tree | 0.02 | 0.98 | 0.96 | 0.76 | 1 |
Training accuracy, testing accuracy, sensitivity, and specificity of different models on VDengue dataset.
| Algorithm | Training Accuracy | Test Accuracy | Sensitivity | Specificity |
|---|---|---|---|---|
| AdaBoost | 0.998 | 0.981 | 0.94 | 0.98 |
| XGBoost | 1 | 0.986 | 0.94 | 0.99 |
| Random Forest | 1 | 0.979 | 0.94 | 0.98 |
| Decision Tree | 1 | 0.982 | 0.94 | 0.98 |
Figure 8XGBoost log loss curve.
Figure 9XGBoost classification error curve.
Figure 10ROC and AUC Curve.
Figure 11SHAP dependence plot between maxHCT_3to8 and to_PICU.
Figure 12SHAP dependence plot between maxhemo_3to8 and to_PICU.
Figure 13SHAP dependence plot between hct_bsl and to_PICU.
Figure 14SHAP dependence plot between maxHCT_3to8 and maxhemo_3to8.
Figure 15SHAP dependence plot of minPLT_3to8 and to_PICU illustrating to_PICU along horizontal axis and minPLT_3to8 along vertical axis.
Figure 16SHAP dependence plot of minPLT_3to8 and to_PICU illustrating minPLT_3to8 along horizontal axis and to_PICU along vertical axis.
Figure 17SHAP dependence plot between minPLT_3to8 and maxhemo_3to8.
Figure 18Percentage of serotype2 of patients going into shock.
Figure 19Minimum platelets count of shock victims as plotted each day since enrolment to the hospital using kernel density estimation plot.
Figure 20Maximum haematocrit concentration of shock victims as plotted each day since enrolment to the hospital using kernel density estimation plot.
Mean and standard deviation of different features for cluster 0.
| Attributes | Mean | Standard Deviation |
|---|---|---|
| Platelets (cells/mm | 221,085 | 63,918 |
| HCT (%) | 37.94 | 4.77 |
| Lymphocytes (%) | 28.50 | 16.26 |
| Monocytes (%) | 4 | 1.99 |
| Neutrophils (%) | 65 | 18.16 |
| WBC | 6918 | 5812 |
| Hb (g/dL) | 12 | 1.61 |
Mean and standard deviation of different features for cluster 1.
| Attributes | Mean | Standard Deviation |
|---|---|---|
| Platelets (cells/mm | 93,714 | 3596 |
| HCT (%) | 40.63 | 5.96 |
| Lymphocytes (%) | 40 | 18.57 |
| Monocytes (%) | 4 | 1.95 |
| Neutrophils (%) | 52 | 19.82 |
| WBC | 10,521 | 18,762 |
| Hb (g/dL) | 12 | 2.02 |