| Literature DB >> 36050710 |
Sajad Khodabandelu1, Zahra Basirat2, Sara Khaleghi1, Soraya Khafri3, Hussain Montazery Kordy4, Masoumeh Golsorkhtabaramiri2.
Abstract
BACKGROUND: This study sought to provide machine learning-based classification models to predict the success of intrauterine insemination (IUI) therapy. Additionally, we sought to illustrate the effect of models fitting with balanced data vs original data with imbalanced data labels using two different types of resampling methods. Finally, we fit models with all features against optimized feature sets using various feature selection techniques.Entities:
Keywords: Cumulative live birth; Imbalanced data; Infertility; Intrauterine insemination; Machine learning
Mesh:
Year: 2022 PMID: 36050710 PMCID: PMC9434923 DOI: 10.1186/s12911-022-01974-8
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Baseline parameters of the IUI candidates in both groups
| Variable’s name | Category name | Success | Unsuccess | P value |
|---|---|---|---|---|
| Success rateD | – | 155 (28) | 391 (72) | |
| Female age (year)T | – | (24.98 ± 4.850) | (27.34 ± 6.588) | < 0.001 |
| Male age (year)T | – | (29.85 ± 6.033) | (32.09 ± 8.037) | 0.002 |
| Duration of infertility (year)T | – | (2.91 ± 2.45) | (3.79 ± 3.15) | 0.002 |
| Cycle day of IUI (day)T | – | (15.3 ± 2.774) | (15.33 ± 6.033) | 0.912 |
| Sperm concentrationT | – | (77.59 ± 25.8) | (79.58 ± 23.354) | 0.386 |
| Sperm motility (%)T | – | (61.94 ± 11.889) | (59.57 ± 9.444) | 0.015 |
| Sperm motility grading ScoreT | – | (1.9 ± 0.33) | (1.73 ± 0.22) | < 0.001 |
| Number of Follicle on the day of HCGM | – | 315 | 256 | < 0.001 |
| Number of previous IUIM | – | 275 | 272 | 0.867 |
| Type of infertilityC | Primary | 123 (29.3) | 297 (70.7) | 0.396 |
| Secondary | 32 (25.4) | 94 (74.6) | ||
| Menstruation regularityC | Regular | 83 (26.2) | 234 (73.8) | 0.179 |
| Irregular | 72 (31.4) | 157 (68.8) | ||
| GalactorrheaC | Yes | 20 (26.3) | 56 (73.3) | 0.666 |
| No | 135 (28.7) | 335 (71.3) | ||
| HirsutismC | Yes | 58 (31.2) | 128 (68.8) | 0.298 |
| No | 97 (26.9) | 263 (73.1) | ||
| Treatment with HMGC | Yes | 1 (7.1) | 13 (92.9) | 0.074 |
| No | 154 (28.9) | 378 (71.1) | ||
| Treatment with clomiphene-HMGC | Yes | 73 (30.8) | 164 (69.2) | 0.273 |
| No | 82 (26.5) | 227 (73.5) | ||
| Treatment with clomipheneC | Yes | 81 (27.5) | 214 (72.5) | 0.601 |
| No | 74 (29.5) | 177 (70.5) | ||
| Female factorC | Yes | 53 (28.5) | 133 (71.5) | 0.968 |
| No | 102 (28.3) | 258 (71.7) | ||
| Male factorC | Yes | 34 (34) | 66 (66) | 0.168 |
| No | 121 (27.1) | 325 (72.9) | ||
| Female and male pregnancy factors bothC | Yes | 10 (11.8) | 75 (88.2) | < 0.001 |
| No | 145 (31.5) | 316 (68.5) | ||
| Unexplained pregnancy factorC | Yes | 58 (33.1) | 117 (66.9) | 0.091 |
| No | 97 (26.1) | 274 (73.9) |
Abbreviations of D, T, M, and C, respectively, are related by the dependent variable expressed as count (%), continuous variable expressed as mean (standard deviation), discrete variables expressed as mean rank, and categorical variables expressed as count (%). Also, abbreviations of T, M, and C, respectively related by The P values of the Independent T-test, Mann–Whitney test, and Chi-square tests
Fig. 1Modeling steps with Python in this study
Fig. 2Boxplot for G-means index, for each model. a: d show plots related to the feature selection methods. Abbreviations: RS method: Resampling method
Fig. 3ROC curve and AUC index of each class by different models. Each row by 1: 4 numbers show graphs for each feature selection method and Columns a: c show plots related to the data used to model training
Fig. 4Reliability and predictive power of each class by different model. Each row by 1: 4 numbers show graphs for each feature selection method; 1) Without feature selection (W_FS), 2) Mutual Information Classification feature selection (MIC-FS), 3) genetic algorithm feature selection (GA-FS), and 4) random forest feature selection (RF-FS), and Columns a: c show plots related to the data used to model training
Performance values for trained models by RF-FS from the Stomek-balanced dataset
| Classifier | AUC | Brier | G mean |
|---|---|---|---|
| LR | 0.75 | 0.202 | 0.618 |
| SVC | 0.80 | 0.183 | 0.734 |
| RF | 0.84 | 0.158 | 0.739 |
| XGBoost | 0.89 | 0.129 | 0.806 |
| Stack | 0.88 | 0.134 | 0.805 |
Fig. 5Boxplot, calibration plot, and ROC curve for trained models with random forest- selected features from the Stomek-balanced dataset
Fig. 6Ranking of features used in XGBoost based on the effect on model learning and prediction