| Literature DB >> 35707186 |
Ashir Javeed1, Liaqat Ali2, Abegaz Mohammed Seid3, Arif Ali4, Dilpazir Khan4, Yakubu Imrana5,6.
Abstract
Nowadays, caesarean section (CS) is given preference over vaginal birth and this trend is rapidly rising around the globe, although CS has serious complications such as pregnancy scar, scar dehiscence, and morbidly adherent placenta. Thus, CS should only be performed when it is absolutely necessary for mother and fetus. To avoid unnecessary CS, researchers have developed different machine-learning- (ML-) based clinical decision support systems (CDSS) for CS prediction using electronic health record of the pregnant women. However, previously proposed methods suffer from the problems of poor accuracy and biasedness in ML. To overcome these problems, we have designed a novel CDSS where random oversampling example (ROSE) technique has been used to eliminate the problem of minority classes in the dataset. Furthermore, principal component analysis has been employed for feature extraction from the dataset while, for classification purpose, random forest (RF) model is deployed. We have fine-tuned the hyperparameter of RF using a grid search algorithm for optimal classification performance. Thus, the newly proposed system is named ROSE-PCA-RF and it is trained and tested using an online CS dataset available on the UCI repository. In the first experiment, conventional RF model is trained and tested on the dataset while in the second experiment, the proposed model is tested. The proposed ROSE-PCA-RF model improved the performance of traditional RF by 4.5% with reduced time complexity, while only using two extracted features through the PCA. Moreover, the proposed model has obtained 96.29% accuracy on training data while improving the accuracy of 97.12% on testing data.Entities:
Mesh:
Year: 2022 PMID: 35707186 PMCID: PMC9192258 DOI: 10.1155/2022/1901735
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1Workflow of the proposes ROSE-PCA-RF model.
The description of dataset.
| Sr. no | Feature | Data type | (Mean ± std)Caesarian | (Mean ± std)normal |
|---|---|---|---|---|
| 1 | Age | Integer | 28.02 ± 5.58 | 27.24 ± 4.00 |
| 2 | Delivery number | Integer | 1.76 ± 0.86 | 1.53 ± 0.65 |
| 3 | Delivery time | Integer | 0.52 ± 0.77 | 0.79 ± 0.83 |
| 4 | Blood pressure | Integer | 0.98 ± 0.79 | 1.03 ± 0.57 |
| 5 | Heart problem | Integer | 0.52 ± 0.50 | 0.18 ± 0.38 |
| 6 | Caesarean | Integer | 1.00 ± 0.00 | 0.00 ± 0.00 |
Figure 2Overview of data processing. (a) Original training data distribution. (b) Training data distribution after oversampling and scaling.
Figure 3Workflow of PCA for features extraction.
Performance of conventional predictive models on imbalanced CS dataset, where Acctrain is accuracy on training data, Acctest accuracy on test data, Sen sensitivity, Spec specificity, and MCC Matthews correlation coefficient.
|
|
Performance of various state-of-the-art predictive models on balance CS dataset, where Acctrain is accuracy on training data, Acctestis accuracy on test data, Sen is sensitivity, Spec is specificity, and MCC is Matthews correlation coefficient.
|
|
Figure 4Performance comparison between conventional and tuned RF. (a) ROC plot of the conventional RF. (b) ROC plot of the fine-tuned RF.
Classification accuracy of the proposed ROSE-PCA-RF model with optimal hyperparameters of RF on balance dataset, where E is the number of estimators, D is the depth hyperparameter, Acctrain is the accuracy on training data, Acctest is the accuracy on test data, Sen is sensitivity, Spec is specificity, and MCC is Matthews correlation coefficient.
|
|
Figure 5ROC plot of the proposed ROSE-PCA-RF model.
Figure 6Execution time comparison between conventional RF and the proposed ROSE-PCA-RF.
Figure 7ROC charts of the state-of-the-art ML model with optimized hyperparameters on balance dataset.
Classification accuracies comparison with the previously proposed methods for CS prediction.
| Study (year) | Method | Accuracy (%) | Balancing |
|---|---|---|---|
| Verhoeven et al. (2009) [ | SPSS | 76.00 | No |
| Gharehchopogh et al. (2012) [ | DT C4.5 | 86.25 | No |
| Vovsha et al. (2014) [ | LR, SVM | 65.00 | No |
| Sodsee (2014) [ | CPD-NN, kNN | 75.00 | No |
| Maroufizadeh et al. (2018) [ | LR, RF, ANN | 70.00 | No |
| Iftitah and Rulaningtyas (2018) [ | Naive Bayes | 90.00 | No |
| Amin and Ali (2018) [ | WEKA software | 95.00 | No |
| Ayyappan 2019 [ | SMO in PUK kernel | 75.00 | No |
| Souza et al. (2019) [ | LR | 88.03 | No |
| Saleem et al. (2019) [ | AdaBoost | 91.80 | No |
| Lee and Ahn (2019) [ | ANN | 91.00 | No |
| Khan et al. (2020) [ | AdaBoost | 88.69 | No |
| Meyer et al. (2020) [ | XGBoost | 85.00 | No |
| Abdillah et al. (2021) [ | LDA-SVM | 70.83 | No |
| Rahman et al. (2021) [ | SMOTE-RF | 93.00 | Yes |
| SMOTE-SVM | 94.00 | ||
| Proposed method (2022) | ROSE + PCA-RF | 97.12 | Yes |