| Literature DB >> 35700192 |
Aziz Zafar1,2, Ziad Attia1,3, Mehret Tesfaye4, Sosina Walelign4, Moges Wordofa4, Dessie Abera4, Kassu Desta4, Aster Tsegaye4, Ahmet Ay1,2, Bineyam Taye2.
Abstract
BACKGROUND: Previous epidemiological studies have examined the prevalence and risk factors for a variety of parasitic illnesses, including protozoan and soil-transmitted helminth (STH, e.g., hookworms and roundworms) infections. Despite advancements in machine learning for data analysis, the majority of these studies use traditional logistic regression to identify significant risk factors.Entities:
Mesh:
Year: 2022 PMID: 35700192 PMCID: PMC9236253 DOI: 10.1371/journal.pntd.0010517
Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN: 1935-2727
Risk factors for intestinal parasitic infections grouped by categories such as demographic, socioeconomic, health, environmental, and hematological factors.
| Demographic Factors | Socioeconomic Factors | Health Factors | Environmental Factors | Hematological Factors |
|---|---|---|---|---|
| Age | Sleeps on a bed | Cockroach skin | Application of dung to farm fields | Hematocrit |
| Deworming | Household burns | Prick test | Cigarette smokers in the house | Hemoglobin |
| Family Size | Charcoal | Child has asthma | Location of cooking area | Lymphocytes count |
| Residence | Household burns | Child has hay | Family has a cat | Mean Corpuscular Hemoglobin |
| Sex | Dung | Fever | Family has a cow | Mean Corpuscular Hemoglobin Concentration |
| Household burns gas | Child has had hay | Family has a dog | ||
| Household burns leaves | fever in last year | Family has a hen | Mean Corpuscular Volume | |
| Household burns nafta | Child with rash in last year | Family has a horse | Platelet count | |
| Household burns wood | Child has wheeze in last year | Family has a pig | Red Blood Cell count | |
| Household uses electricity | Father with asthma | Family has a sheep | White Blood Cell count | |
| Composition of floor in the home | Father with hay fever | Source of water in household | ||
| Maternal Education | Father with wheeze | Type of toilet in the home | ||
| Maternal Occupation | Mother with asthma | Location of household waste disposal | ||
| Child’s mattress | Mother with hay fever | |||
| Roof on the home | Mother with wheeze | |||
| Composition of walls in the home | ||||
| What the child sleeps on |
Risk factors for each infection outcome based on univariate (U) and multivariate (M) logistic regression (α = 0.05) and the feature selection methods: InfoGain (IG), ReliefF (ReF), Joint Mutual Information, and Minimum Redundancy Maximum Relevance (MRMR).
Robustly selected features (appeared in 95% of the top 20 of 100 feature selection runs for each method). Risk factors are ranked according to their frequency of occurrence in three approaches. Upwards arrows indicates significant odds ratio greater than 1, and downwards arrows indicate odds ratio lesser than 1. Arrows with a * lost statistical significance after Benjimini-Hochberg p-value adjustment.
| STH Infection | Protozoan Infection | Parasitic Infection | Helminthic Infection | |||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Risk Factor | IG | ReF | MRMR | JMI | U | M | IG | ReF | MRMR | JMI | U | M | IG | ReF | MRMR | JMI | U | M | IG | ReF | MRMR | JMI | U | M |
| Some nafta burning | ✔ | ✔ | ↑ | ↑* | ✔ | ✔ | ↑* | ↑* | ✔ | ✔ | ↑ | ✔ | ✔ | ↑* | ||||||||||
| Frequent leaves burning | ✔ | ✔ | ✔ | ✔ | ↑ | ✔ | ✔ | ✔ | ✔ | |||||||||||||||
| Frequent nafta burning | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ↑* | |||||||||||||||
| Positive cockroach skin prick test | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Father with wheeze | ✔ | ✔ | ✔ | ✔ | ↑* | ↑* | ✔ | ✔ | ↑* | ✔ | ✔ | |||||||||||||
| Child with asthma in last year | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Water from river/stream | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Household has thatched roof | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Application of dung to farm fields | ✔ | ✔ | ✔ | ✔ | ↑* | ✔ | ✔ | ✔ | ✔ | |||||||||||||||
| Father with hay fever | ✔ | ✔ | ↑* | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ↑* | ||||||||||||||
| Family size greater than 9 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ↑* | ↑* | ✔ | ✔ | ||||||||||||||
| Mother with hay fever | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Dust mite skin prick test | ✔ | ✔ | ✔ | ✔ | ↑* | ✔ | ✔ | ↑* | ↑* | ✔ | ✔ | |||||||||||||
| Frequent gas burning | ✔ | ✔ | ↑* | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | |||||||||||||||
| Platelets’ count | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Have a pig | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||
| Urban Residence | ✔ | ↓ | ↓* | ✔ | ↓ | ✔ | ↓ | ✔ | ||||||||||||||||
| High lymphocytes count | ✔ | ✔ | ↑* | ✔ | ✔ | ✔ | ✔ | |||||||||||||||||
| Cooking inside living area | ↓* | ✔ | ✔ | ✔ | ↓ | ↓* | ✔ | ↓ | ||||||||||||||||
| Some leaves burning | ✔ | ✔ | ↑ | ✔ | ↑ | |||||||||||||||||||
| Frequent dung burning | ✔ | ✔ | ✔ | ✔ | ✔ | |||||||||||||||||||
| Mean corpuscular volume | ✔ | ✔ | ✔ | ✔ | ✔ | |||||||||||||||||||
| White blood cell count | ✔ | ✔ | ↑ | ✔ | ✔ | |||||||||||||||||||
| Hemoglobin | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||||||
| Some gas burning | ↑* | ✔ | ↑ | ✔ | ↑ | ↑* | ||||||||||||||||||
| Mean corpuscular hemoglobin | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||||||
| Hematocrit | ✔ | ✔ | ✔ | ↑* | ↑* | ✔ | ↑* | |||||||||||||||||
| Some wood burning | ✔ | ✔ | ↑* | ✔ | ↑* | ✔ | ||||||||||||||||||
| Red blood cell count | ✔ | ✔ | ↑* | ✔ | ✔ | |||||||||||||||||||
| Mother with wheeze | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||||||
| Mean corpuscular hemoglobin concentration | ✔ | ↑* | ✔ | ✔ | ✔ | |||||||||||||||||||
| Father with asthma | ✔ | ✔ | ✔ | ✔ | ||||||||||||||||||||
Fig 1Heatmaps illustrating the accuracy scores for different feature selection and classifier combinations using SMOTE; for infection by (a) any parasite, (b) any helminth, (c) protozoan, and (d) any STH.
Green indicates a high accuracy, while red indicates a low accuracy. Feature selection include all features, or top 20 features selected through Joint Mutual Information (JMI-20), Minimum Redundancy Maximum Relevance (MRMR-20), InfoGain (IG-20) and ReliefF (ReF-20). Classifiers include Logistic Regression (LR), Random Forests (RF), Support Vector Machines (SVM), and XGBoost (XGB).
Fig 2Receiver operating characteristic (ROC) curves for different classifiers using best feature selection method and best hyperparameters for that infection outcome with SMOTE; for infection by (a) any parasite, (b) any helminth, (c) protozoan, and (d) any STH.
Classifiers include Logistic Regression (LR), Random Forests (RF), Support Vector Machines (SVM), and XGBoost (XGB). Blue dashed line represents the ROC curve for a random guess. The Area Under Curve (AUC) scores are (a) LR: 0.52, RF: 0.56, SVM: 0.50, XGB: 0.54, (b) LR: 0.47, RF: 0.51, SVM: 0.50, XGB: 0.48, (c) LR: 0.70, RF: 0.73, SVM: 0.50, XGB: 0.58, and (d) LR: 0.54, RF: 0.62, SVM: 0.50, XGB: 0.52.
The top five rules based on association rule learning with SMOTE for each infection outcome.
For each infection, the five rules with the highest lift values are chosen and sorted. The combinations of risk factors specified on the left leads to the given infection.
|
| Support | Confidence | Lift |
| Child’s sleeping, Smokers in the household, Sex, Family of size 6 to 9 members | 0.011 | 1 | 2 |
| Ages between six to ten, Water from open well, Waste disposal in open pit, Low White Blood cell count | 0.011 | 1 | 2 |
| Have a dog, Have a horse, Frequent wood burning, Low mean corpuscular hemoglobin concentration | 0.010 | 1 | 2 |
| Have a horse, Low hemoglobin, Sex, Low mean corpuscular hemoglobin concentration | 0.010 | 1 | 2 |
| Have a horse, Sex, Frequent wood burning, Low mean corpuscular hemoglobin concentration | 0.010 | 1 | 2 |
|
| Support | Confidence | Lift |
| Urban residence, Have a cow, Frequent nafta burning, Some wood burning | 0.014 | 1 | 2 |
| Informal maternal education, Some dung burning, Some nafta burning, Low mean corpuscular volume | 0.014 | 1 | 2 |
| Low hemoglobin, Some charcoal burning, Frequent nafta burning, Some wood burning | 0.014 | 1 | 2 |
| Urban residence, Informal maternal education, Frequent nafta burning, Some wood burning | 0.014 | 1 | 2 |
| Informal maternal education, Some charcoal burning, Frequent nafta burning, Some wood burning | 0.014 | 1 | 2 |
|
| Support | Confidence | Lift |
| Sex, Low mean corpuscular hemoglobin, High mean corpuscular hemoglobin concentration, Low mean corpuscular volume | 0.016 | 1 | 2 |
| Dewormed, Some wood burning, High mean corpuscular hemoglobin concentration, Low mean corpuscular volume | 0.016 | 1 | 2 |
| Dewormed, Sex, High mean corpuscular hemoglobin concentration, Low mean corpuscular volume | 0.016 | 1 | 2 |
| Dewormed, Low hemoglobin, High mean corpuscular hemoglobin concentration, Low mean corpuscular volume | 0.016 | 1 | 2 |
| Some gas burning, Some wood burning, Mother is a housewife, Low White Blood cell count | 0.016 | 1 | 2 |
|
| Support | Confidence | Lift |
| Have a sheep, Frequent charcoal burning, Low mean corpuscular hemoglobin, Low mean corpuscular volume | 0.013 | 1 | 2 |
| Have a sheep, Sex, Frequent charcoal burning, Low mean corpuscular hemoglobin | 0.011 | 1 | 2 |
| Child’s mattress, Child’s sleeping, Cooking in living area, Age greater than 10 | 0.009 | 1 | 2 |
| Child’s bed, Child’s mattress, Cooking in living area, Age greater than 10 | 0.009 | 1 | 2 |
| Child’s mattress, Child’s sleeping, Dewormed, Age greater than 10 | 0.009 | 1 | 2 |