| Literature DB >> 36003638 |
Xi Bai1, Zhibo Zhou1, Mingliang Su2, Yansheng Li2, Liuqing Yang2, Kejia Liu2, Hongbo Yang1, Huijuan Zhu1, Shi Chen1, Hui Pan1.
Abstract
Background: The association between prenatal pesticide exposures and a higher incidence of small-for-gestational-age (SGA) births has been reported. No prediction model has been developed for SGA neonates in pregnant women exposed to pesticides prior to pregnancy.Entities:
Keywords: environmental pollution; exposure to pesticides; machine learning; prediction; small for gestational age
Mesh:
Substances:
Year: 2022 PMID: 36003638 PMCID: PMC9394741 DOI: 10.3389/fpubh.2022.940182
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1The overall process of data extraction, training, and testing. NFPHEP, National Free Preconception Health Examination Project; RFE, recursive feature elimination; SHAP, Shapley Additive Explanation.
Demographic characteristics of the subjects by the status of small for gestational age (SGA).
|
|
| |||
|---|---|---|---|---|
| Gestational at birth, week | 39.0 (39.0–40.0) | 39.0 (39.0–40.0) | 39.5 (39.0–40.0) | 0.168 |
| Male gender | 384.0 (50.7%) | 334.0 (50.7%) | 50.0 (51.0%) | 0.963 |
| Birth weight, kg | 3.4 (3.1–3.6) | 3.5 (3.2–3.7) | 2.5 (2.2–2.7) | <0.001 |
| Maternal age, year | 26.0 (23.0–29.0) | 26.0 (23.0–29.0) | 27.0 (23.0–30.0) | 0.487 |
| Paternal age, year | 28.0 (24.0–32.0) | 28.0 (24.0–32.0) | 28.0 (24.3–32.0) | 0.392 |
| Maternal height, cm | 158.8 ± 5.3 | 158.9 ± 5.3 | 157.8 ± 5.7 | 0.055 |
| Paternal height, cm | 170.0 (167.0–173.0) | 170.0 (167.0–173.0) | 170.0 (165.0–172.0) | 0.046 |
| Maternal BMI, kg/m2 | 21.6 (19.8–23.7) | 21.6 (20.0–23.7) | 21.3 (19.5–23.5) | 0.229 |
| Paternal BMI, kg/m2 | 22.2 (20.8–24.1) | 22.2 (20.8–23.9) | 22.0 (20.6–24.5) | 0.447 |
| Maternal education level | ||||
| Below junior high school | 714.0 (94.3%) | 619.0 (93.9%) | 95.0 (96.9%) | 0.442 |
| Senior high school | 39.0 (5.2%) | 36.0 (5.5%) | 3.0 (3.1%) | |
| Bachelor's degrees and above | 4.0 (0.5%) | 4.0 (0.6%) | 0.0 (0.0%) | |
| Paternal education level | ||||
| Below junior high school | 700.0 (92.5%) | 608.0 (92.3%) | 92.0 (93.9%) | 0.838 |
| Senior high school | 49.0 (6.5%) | 44.0 (6.7%) | 5.0 (5.1%) | |
| Bachelor's degrees and above | 8.0 (1.0%) | 7.0 (1.0%) | 1.0 (1.0%) | |
| Paternal smoking status | ||||
| Quitting smoking | 486.0 (64.2%) | 429.0 (65.1%) | 57.0 (58.2%) | 0.012 |
| Reduced smoking | 154.0 (20.3%) | 138.0 (20.9%) | 16.0 (16.3%) | |
| The same or increased smoking | 117.0 (15.5%) | 92.0 (14.0%) | 25.0 (25.5%) | |
| Maternal interpersonal pressure | ||||
| None | 635.0 (83.9%) | 557.0 (84.5%) | 78.0 (79.6%) | 0.380 |
| Mild | 98.0 (12.9%) | 81.0 (12.3%) | 17.0 (17.3%) | |
| Severe | 24.0 (3.2%) | 21.0 (3.2%) | 3.0 (3.1%) | |
| Paternal interpersonal pressure | ||||
| None | 632.0 (83.5%) | 556.0 (84.4%) | 76.0 (77.5%) | 0.045 |
| Mild | 96.0 (12.7%) | 82.0 (12.4%) | 14.0 (14.3%) | |
| Severe | 29.0 (3.8%) | 21.0 (3.2%) | 8.0 (8.2%) |
BMI, body mass index. Data are presented as median (interquartile range), mean (standard deviation) or number (%).
Figure 2Receiver operating characteristic (ROC) curves of the seven machine learning (ML) models in predicting small for gestational age (SGA) in the testing dataset. SVM, support vector machine; GBDT, gradient boosting decision tree; LGBM, light gradient boosting machine; XGBoost, extreme gradient boosting; CatBoost, category boosting.
Performance of models by different algorithms in predicting small for gestational age (SGA) neonates.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |
| LR | 0.841 | 0.691 | 0.733 | 0.650 | 0.186 | 0.957 |
| SVM | 0.763 | 0.752 | 0.600 | 0.869 | 0.333 | 0.952 |
| RF | 0.943 | 0.787 | 0.733 | 0.781 | 0.268 | 0.964 |
| GBDT | 0.997 | 0.831 | 0.667 | 0.956 | 0.625 | 0.963 |
| XGBoost | 0.992 | 0.791 | 0.667 | 0.818 | 0.286 | 0.957 |
| LGBM | 0.994 | 0.778 | 0.667 | 0.774 | 0.244 | 0.955 |
| CatBoost | 0.991 | 0.855 | 0.667 | 0.912 | 0.455 | 0.962 |
AUC, area under the receiver-operating-characteristic curve; PPV, positive predictive value; NPV, negative predictive value; LR, logistic regression; SVM, support vector machine; RF, random forest; GBDT, gradient boosting decision tree; XGBoost, extreme gradient boosting; LGBM, light gradient boosting machine; CatBoost, category boosting.
Figure 3Receiver operating characteristic (ROC) curves of the final machine learning (ML) model using CatBoost algorithm generated after recursive feature elimination (RFE) in predicting small for gestational age (SGA).
Figure 4The Shapley Additive Explanation (SHAP) values for the most important predictors of SGA in the final model. The abscissa is the SHAP value, which shows the degree of impact on the outcome. Each dot represents a case. If the feature's value is high (low), the plot is colored red (blue). ALT, alanine aminotransferase; TSH, thyroid stimulating hormone.