| Literature DB >> 35884066 |
Obvious Nchimunya Chilyabanyama1,2, Roma Chilengi2, Michelo Simuyandi2, Caroline C Chisenga2, Masuzyo Chirwa2, Kalongo Hamusonde2, Rakesh Kumar Saroj3, Najeeha Talat Iqbal4, Innocent Ngaruye5, Samuel Bosomprah2,6.
Abstract
Stunting is a global public health issue. We sought to train and evaluate machine learning (ML) classification algorithms on the Zambia Demographic Health Survey (ZDHS) dataset to predict stunting among children under the age of five in Zambia. We applied Logistic regression (LR), Random Forest (RF), SV classification (SVC), XG Boost (XgB) and Naïve Bayes (NB) algorithms to predict the probability of stunting among children under five years of age, on the 2018 ZDHS dataset. We calibrated predicted probabilities and plotted the calibration curves to compare model performance. We computed accuracy, recall, precision and F1 for each machine learning algorithm. About 2327 (34.2%) children were stunted. Thirteen of fifty-eight features were selected for inclusion in the model using random forest. Calibrating the predicted probabilities improved the performance of machine learning algorithms when evaluated using calibration curves. RF was the most accurate algorithm, with an accuracy score of 79% in the testing and 61.6% in the training data while Naïve Bayesian was the worst performing algorithm for predicting stunting among children under five in Zambia using the 2018 ZDHS dataset. ML models aids quick diagnosis of stunting and the timely development of interventions aimed at preventing stunting.Entities:
Keywords: Naïve Bayesian; ZDHS; machine learning; random forest; stunting
Year: 2022 PMID: 35884066 PMCID: PMC9320245 DOI: 10.3390/children9071082
Source DB: PubMed Journal: Children (Basel) ISSN: 2227-9067
Figure 1Workflow chart.
Figure 2Feature importance score.
Percent of stunted children by background characteristics.
| Characteristics | Number of Children (% of Total) | Stunted, | |
|---|---|---|---|
| Child’s age | |||
| 12 months | 1546 (22.7) | 350 (22.6) | <0.001 |
| >12 months | 5253 (77.3) | 1977 (37.6) | |
| Gender | |||
| Male | 3421 (50.3) | 1075 (31.4) | <0.001 |
| Female | 3378 (49.7) | 1252 (37.1) | |
| Mother’s Age (years) | |||
| 15–24 | 1938 (28.5) | 697 (36) | 0.088 |
| 25–34 | 3181 (46.8) | 1084 (34.1) | |
| 35–49 | 1680 (24.7) | 546 (32.5) | |
| Region | |||
| Urban | 4875 (71.7) | 1734 (35.6) | <0.001 |
| Rural | 1924 (28.3) | 593 (30.8) | |
| Mother’s Education | |||
| No formal education | 738 (10.9) | 282 (38.2) | <0.001 |
| Primary | 3722 (54.7) | 1390 (37.3) | |
| Secondary | 2045 (30.1) | 615 (30.1) | |
| Higher | 294 (4.3) | 40 (13.6) | |
| Mothers Current work | |||
| No | 3505 (51.6) | 1172 (33.4) | 0.158 |
| Yes | 3294 (48.4) | 1155 (35.1) | |
| Wealth Index | |||
| Poor | 3261 (48) | 1250 (38.3) | <0.001 |
| Middle | 1308 (19.2) | 453 (34.6) | |
| Richer | 2230 (32.8) | 624 (28) | |
| Religion | |||
| Muslim | 39 (0.6) | 17 (43.6) | 0.183 |
| Catholic | 1093 (16.1) | 399 (36.5) | |
| Protestant | 5603 (82.4) | 1891 (33.7) | |
| Other | 64 (0.9) | 20 (31.3) | |
| Toilet type | |||
| Unhygienic | 4265 (62.7) | 1428 (33.5) | 0.094 |
| Hygienic | 2534 (37.3) | 899 (35.5) | |
| Total | 6799 (100) | 2327 (34.2) |
Figure 3Calibration curve for ML algorithms.
Accuracy score for each classification model in predicting stunting.
| Model | Train F1 | Test F1 | Train Cohen’s Kappa | Test Cohen’s Kappa | Train PR-AUC | Test PR-AUC | Train Accuracy | Test Accuracy |
|---|---|---|---|---|---|---|---|---|
| Logistic Regression | 0.5298 | 0.5411 | 0.0797 | 0.0833 | 0.3728 | 0.3858 | 0.4471 | 0.4592 |
| Random Forest | 0.717 | 0.4826 | 0.5535 | 0.178 | 0.5992 | 0.4134 | 0.7921 | 0.6162 |
| SV Classification | 0.6083 | 0.523 | 0.3106 | 0.1486 | 0.4611 | 0.4051 | 0.6402 | 0.5583 |
| XG Boost | 0.6006 | 0.5381 | 0.3019 | 0.188 | 0.4566 | 0.4192 | 0.6385 | 0.5851 |
Precision and recall for each machine learning algorithm.
| Model | Precision Negative | Precision Positive | Average Precision | Recall Negative | Recall Positive | Average Recall |
|---|---|---|---|---|---|---|
| Logistic Regression | 0.78 | 0.39 | 0.585 | 0.22 | 0.89 | 0.555 |
| Random Forest | 0.71 | 0.47 | 0.59 | 0.68 | 0.5 | 0.59 |
| SV Classification | 0.73 | 0.43 | 0.58 | 0.49 | 0.67 | 0.58 |
| XG Boost | 0.75 | 0.45 | 0.6 | 0.54 | 0.67 | 0.605 |