| Literature DB >> 35840605 |
Wasif Khan1,2, Nazar Zaki1,2, Mohammad M Masud2,3, Amir Ahmad4, Luqman Ali1, Nasloon Ali5, Luai A Ahmed5,6.
Abstract
Accurate prediction of a newborn's birth weight (BW) is a crucial determinant to evaluate the newborn's health and safety. Infants with low BW (LBW) are at a higher risk of serious short- and long-term health outcomes. Over the past decade, machine learning (ML) techniques have shown a successful breakthrough in the field of medical diagnostics. Various automated systems have been proposed that use maternal features for LBW prediction. However, each proposed system uses different maternal features for LBW classification and estimation. Therefore, this paper provides a detailed setup for BW estimation and LBW classification. Multiple subsets of features were combined to perform predictions with and without feature selection techniques. Furthermore, the synthetic minority oversampling technique was employed to oversample the minority class. The performance of 30 ML algorithms was evaluated for both infant BW estimation and LBW classification. Experiments were performed on a self-created dataset with 88 features. The dataset was obtained from 821 women from three hospitals in the United Arab Emirates. Different performance metrics, such as mean absolute error and mean absolute percent error, were used for BW estimation. Accuracy, precision, recall, F-scores, and confusion matrices were used for LBW classification. Extensive experiments performed using five-folds cross validation show that the best weight estimation was obtained using Random Forest algorithm with mean absolute error of 294.53 g while the best classification performance was obtained using Logistic Regression with SMOTE oversampling techniques that achieved accuracy, precision, recall and F1 score of 90.24%, 87.6%, 90.2% and 0.89, respectively. The results also suggest that features such as diabetes, hypertension, and gestational age, play a vital role in LBW classification.Entities:
Mesh:
Year: 2022 PMID: 35840605 PMCID: PMC9287292 DOI: 10.1038/s41598-022-14393-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Work related to LBW classification.
| References | Problem and approach | Approach | Prepro Tech algorithms/method | ML models | Performance |
|---|---|---|---|---|---|
| Feng et al. 2019[ | Fetal weight estimation | Estimation and classification | SMOTE for data balancing | SVM classification, DBN for weight estimation | MAE of 198.55 g ± 158 g, MAPE of 6.09 ± 5.06% |
| Kuhle et al. 2018[ | SGA, AGA, and LGA | Classification | Data balancing[ | LR, EN, CT, RF, GB, and NN | An AUC of 0.6–0.70 for primiparous women, while an AUC of 0.7–0.8 for multiparous women for SGA and LGA prediction |
| Sebthilkumar et al. 2015[ | LBW prediction | Classification | –* | NB, RF, NN, DT, SVM, and LR | DT classifier with an accuracy of 0.899, a sensitivity of 0.97 and a specificity and AUC of 0.72 and 0.93, respectively |
| Borson et al. 2020[ | LBW prediction | Classification | Redundant feature elimination, elimination of unique features, missing values handling, attribute transformation | LR, NB, KNN, and MLP | The best accuracy of 81.67% was achieved by SVM and MLP |
| Loreto et al. 2019[ | LBW prediction | Classification | Elimination of records with missing data, normalization, oversampling techniques | KNN, Tree, NB, RF, SVM, and AdaBoost | AdaBoost classifier showed better classification performance with an accuracy of 98% and a sensitivity and specificity of 0.91 and 0.99, respectively |
| Kumar et al. 2020[ | LBW prediction from PAH | Classification | Women with existing health conditions, such as HIV and diabetes, were excluded | SVM, AdaBoost, NB | The SVM classifier achieved an accuracy of 81.21% and a sensitivity and specificity of 0.84 and 0.74, respectively |
| Anisha et al. 2017[ | LBW prediction | Classification | Eliminate significant missing values | Feature ranking using RF and XGBoost, and NB-based minimum error rate classifier | Bayes Minimum Error was used for classification that achieved an accuracy of 0.967 and a sensitivity and specificity of 1.0 and 0.85, respectively |
| Faruk et al. 2018[ | LBW prediction | Prediction and classification | Missing records were deleted | RF and LR | RF achieved 93% accuracy |
| Akhtar et al. 2020[ | LGA | Classification | Variable discretization, removing instances that had more than 30% missing values. missing value with less than 30 were replaced with mean and mode | Feature determination, SVM, RF, LR, and NB | A precision of 0.84 and an AUC of 0.72 with top 30 using SVM |
| Akhtar et al. 2019[ | LGA | Classification | IG, Grid Search based RFECVa + IG | SVM and DT | An accuracy of 92% using an SVM classifier with a linear kernel precision of 0.92, a recall of 0.87 and a specificity of 0.95 |
| Al Habashneh et al. 2012[ | LBW and PB | ROC analysis | – | ROC analysis | For LBW, an AUC of 0.81 LBW using CAL and a sensitivity and specificity of 0.81 and 0.70, respectively, for CAL with a cutoff value of 0.42 mm |
| Li et al. 2020[ | SGA | Prediction | Feature discretization, missing value as a separate value of 0 | SVM, RF, LR, and Sparse LR | Sparse LR performed well by achieving an AUC of 0.817 |
| Desiani et al. 2019[ | Birthweight in hypertensive mothers | Classification | Removing variables with ambiguous data | NB classifier | An accuracy of 81.25% and a precision and recall of 1.00 and 0.75, respectively, for LBW |
| Ahmadi et al. 2017[ | LBW prediction | Classification | – | RF and LR | An accuracy of 95% with 97% specificity and 72% sensitivity using RF |
| Hussain et al. 2020[ | LBW | Classification | Missing values were replaced with average of nearby cells | RF and Gaussian NB | An accuracy of RF is 100% with the precision, recall, and F1 score of 1.0 |
| Lu et al. 2019[ | Fetal weight estimation | Estimation | Normalization | Ensemble of RF, XGBoost, and lightGBM | An MRE of 7% with an accuracy of 64.3% |
| Akbulut et. al. 2018[ | Health status (normal or pathological) | Classification | – | AP, BDT, BPM, DF, LR, SVM, and NN | Web and mobile application development of 89.5% was achieved using decision forest |
| Trujillo et al. 2020[ | BW estimation | Estimation | – | SVR | SVR with RBF kernel achieved better accuracy with an MAE of 287.60 ± 195.86 (g) and an MPE of 0.364% ± 11.95% |
Figure 1Proposed ML framework for infant weight estimation and LBW classification.
Features used in this study (each subset represents the features used in previous LBW classification studies).
| Dataset name and authors | Classification/regression task | Total features | Feature names that were used in this study | Feature that were not available to us |
|---|---|---|---|---|
| Subset-1; Hussain et al. 2020[ | LBW classification | 445 samples with 18 features. Binary classification | Socioeconomic condition, age, height, BGroup, parity, antenatal check, initial weight of mother, final weight of mother (Last ANC), initial systolic blood pressure, initial diastolic blood pressure, final systolic blood pressure (Last ANC), final diastolic blood pressure (last ANC), initial hemoglobin level, final hemoglobin level (Last ANC), blood sugar (Random), TermPreterm Term: 37–40 weeks, preterm: < 37 weeks, sex, and weight | Socioeconomic condition, antenatal check, and blood sugar (random) |
| Subset-2; Faruk et al. 2018[ | Prediction and classification | 9 features including BW | Place of residence, time zone, wealth index, mother’s education, father’s education, age of mother, job of mother, and the number of children | Time zone, wealth index, and father’s job |
| Subset-3; Khule et al. 2018[ | SGA, AGA, and LGA classification | 30,705 pregnancy samples with complete information of all variables | Maternal age, common law/married, area-level income quintiles, urban residence, smoking before pregnancy, prepregnancy BMI [m/kg2], pre-existing hypertension, pre-existing diabetes, previous gestational diabetes, previous child with BW < 2500 g, previous child with BW > 4080 g, previous caesarean section, previous preterm delivery < 29 weeks, previous preterm delivery 29–32 weeks, previous preterm delivery 33–36 weeks, previous death of neonate ≥ 500 g, fetal male sex, weight gain at 26 weeks [kg], smoking during pregnancy, substance use during pregnancy, gestational diabetes, pregnancy-induced hypertension, and psychiatric disorder | Area-level income quintiles, urban residence, weight gain at 26 weeks [kg], smoking during pregnancy, substance use in pregnancy, pregnancy-induced hypertension, and psychiatric disorder |
| Subset-4; Sethilkumar et al. 2015[ | LBW classification | 11 features | years (age), the weight of the mother at her last menstrual period (LWT), the number of physician visits during the first trimester of pregnancy (FTV), race (RACE), lifestyle information, e.g., smoking (smoke), history of previous preterm delivery (PTL), the presence of uterine irritability (UI), and hypertension (HT) | Race and UI |
| Subset-5; Loreto et al. 2019[ | LBW classification | 9 features and 2328 instances | Multiplicity (whether the gestation is multiple) smoker, hypertension, diabetes, age, BMI, gestational age, fetus sex, and fetus weight | Multiplicity (when gestation is multiple) |
| Subset-6; Kader and Nirmala 2014[ | LBW | 20,946 instances, 11 features | Sex, wealth status, caste/tribe, age, education, BMI, stature, anemia level, interpregnancy interval, antenatal visits, and living place | Wealth status, caste/tribe, anemia level, and living place |
Summary of the best results across all the subsets.
| Dataset | Regression model | Original/MFE features | MAE | MAPE (%) |
|---|---|---|---|---|
| Subset-1 | SMOReg | D2 | 308.98 | 12.13 |
| Subset-2 | Nu-SVR | D1 | 361.74 | 14.57 |
| Subset-3 | RF | D1 | 345.08 | 13.76 |
| Subset-4 | RF | D1 | 352.91 | 14.07 |
| Subset-5 | Bagging (Rep tree) | D2 | 306.0239 | 11.88 |
| Subset-6 | Bagging (Rep tree) | D1 | 356.61 | 14.18 |
| Combined features | Linear Regression | D2 | 299.32 | 11.23 |
| Combined features | RF | Feature selection | 294.53 | 11.49 |
Summary of the best result across all subsets.
| Classifiers | Dataset | Confusion matrix | Accuracy | Precision | Recall | F1 score | ||
|---|---|---|---|---|---|---|---|---|
| Bagging (NB) | Subset-1 D2 | Class | LBW | ABW | 89.18 | 87.1 | 89.1 | 0.87 |
| LBW | 4 | 14 | ||||||
| ABW | 4 | 142 | ||||||
| Radom tree | Subset-2 D1 | LBW | 2 | 16 | 81.98 | 80.9 | 81.9 | 0.81 |
| ABW | 14 | 132 | ||||||
| Bagging (NB) | Subset-3 D1 | LBW | 7 | 11 | 69.88 | 81.9 | 69.8 | 0.73 |
| ABW | 35 | 111 | ||||||
| Bagging (NB) | Subset-4 D1 | LBW | 4 | 14 | 82.5 | 83.0 | 83.2 | 82.4 |
| ABW | 14 | 132 | ||||||
| Bagging (NB) | Subset-5 D1 | LBW | ||||||
| ABW | ||||||||
| kstar | Subset-6 D1 | LBW | 4 | 14 | 87.90 | 84.38 | 87.9 | 0.85 |
| ABW | 6 | 140 | ||||||
| Bagging (NB) | Combined features D1 | LBW | 8 | 10 | ||||
| ABW | 20 | 126 | ||||||
| MLP | Combined features D1 | LBW | ||||||
| ABW | ||||||||
Significant values are in bold.
Summary of the best classification results.
| Classifiers | Dataset (description) | Confusion matrix | Accuracy | Precision | Recall | F1 score | ||
|---|---|---|---|---|---|---|---|---|
| Bagging (NB) | D1 (Loreto Subset-1) | Class | LBW | ABW | 89.47 | 89.1 | 89.4 | 0.89 |
| LBW | 7 | 11 | ||||||
| ABW | 7 | 139 | ||||||
| LR | Loreto (D2 with mean, mode) | LBW | 4 | 14 | 88.81 | 86.8 | 88.8 | 0.86 |
| ABW | 4 | 142 | ||||||
| LR | Total Dataset (100% smote) | LBW | 6 | 12 | 90.24 | 87.6 | 90.2 | 0.89 |
| ABW | 4 | 142 | ||||||
| Bagging (REP) | Total Dataset (Balance) | LBW | 11 | 7 | 78.13 | 87.3 | 78.1 | 0.81 |
| ABW | 29 | 117 | ||||||