| Literature DB >> 35890778 |
Félix Nieto-Del-Amor1, Gema Prats-Boluda1, Javier Garcia-Casado1, Alba Diaz-Martinez1, Vicente Jose Diago-Almela2, Rogelio Monfort-Ortiz2, Dongmei Hao3, Yiyao Ye-Lin1.
Abstract
Due to its high sensitivity, electrohysterography (EHG) has emerged as an alternative technique for predicting preterm labor. The main obstacle in designing preterm labor prediction models is the inherent preterm/term imbalance ratio, which can give rise to relatively low performance. Numerous studies obtained promising preterm labor prediction results using the synthetic minority oversampling technique. However, these studies generally overestimate mathematical models' real generalization capacity by generating synthetic data before splitting the dataset, leaking information between the training and testing partitions and thus reducing the complexity of the classification task. In this work, we analyzed the effect of combining feature selection and resampling methods to overcome the class imbalance problem for predicting preterm labor by EHG. We assessed undersampling, oversampling, and hybrid methods applied to the training and validation dataset during feature selection by genetic algorithm, and analyzed the resampling effect on training data after obtaining the optimized feature subset. The best strategy consisted of undersampling the majority class of the validation dataset to 1:1 during feature selection, without subsequent resampling of the training data, achieving an AUC of 94.5 ± 4.6%, average precision of 84.5 ± 11.7%, maximum F1-score of 79.6 ± 13.8%, and recall of 89.8 ± 12.1%. Our results outperformed the techniques currently used in clinical practice, suggesting the EHG could be used to predict preterm labor in clinics.Entities:
Keywords: electrohysterography; genetic algorithm; imbalance data learning; machine learning; preterm labor prediction; resampling methods; uterine electromyography
Mesh:
Year: 2022 PMID: 35890778 PMCID: PMC9319575 DOI: 10.3390/s22145098
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Example of preprocessed EHG signal recorded from women with 30 weeks of gestation who finally delivered at preterm. Two EHG-bursts associated with uterine contraction can be clearly seen (around 150 s and 400 s) with increased amplitude and frequency contents with respect to basal activity when the uterus is at rest.
EHG features and obstetrical data included as input data to the classifier to discriminate preterm from term deliveries. The number of features per channel depends on the frequency bandwidths and were computed: 0.1–4 Hz, 0.2–0.34 Hz, 0.34–4 Hz and 0.34–1 Hz for temporal and non-linear features, and 0.2–1 Hz for spectral features. Considering: peak to peak amplitude (APP), dominant frequency (DF1) in the range 0.2–1 Hz, (DF2) in 0.34–1 Hz, normalized sub-band energy (NormEn) (0.2–0.34 Hz, 0.34–0.6 Hz and 0.6–1 Hz) and high (0.34–1 Hz)-to low (0.2–0.34 Hz) frequency energy ratio (H/L ratio), power spectrum deciles (D1, …, D9), spectral moment ratio (SpMR), binary and multistate Lempel–Ziv index (LZBin and LZMulti), time reversibility (TimeRev), Katz fractal dimension (KFD), Poincaré ellipse metrics (minor axis (SD1), major axis (SD2), square root of variance (SDRR, ) and SD1/SD2 ratio), sample entropy (SampEn), fuzzy entropy (FuzEn), spectral entropy (SpEn), dispersion entropy (DispEn), and bubble entropy (BubbEn).
| EHG Temporal Features | EHG Spectral Features | EHG Non-Linear Features | Obstetrical Data |
|---|---|---|---|
| 4 per channel | 18 per channel | 52 per channel | 5 |
| App | MeanF | LZBin | Maternal age |
Configuration parameters used in genetic algorithm.
| Parameter | Value | Parameter | Value |
|---|---|---|---|
| Population size | N = 222 | Mutation | Uniform |
| Genome length | N = 222 | Mutation Probability | 0.01 |
| Number of generations | 500 | Selection scheme | Tournament of size 2 |
| Crossover | Arithmetic | Elite count | 2 |
| Crossover Probability | 0.8 | Termination condition | No fitness function improvement for 150 consecutive iterations (differential tolerance: 10−6) |
Figure 2Flowchart to assess the effect of combining feature selection by the genetic algorithm and resampling methods to deal with the imbalanced data problem. The training or validation partitions are resampled by oversampling (TO, VO), undersampling (TU, VU), or applying hybrid methods (TH, VH). The initial population of N chromosomes masks the training and validation partitions. For each chromosome, LDA classifiers are trained and evaluated with the respective validation partitions by its fitness function. A new population of chromosomes is generated from the processes of mutation, crossing over, and selection of the elite chromosomes from the previous iteration until the termination condition was satisfied, obtaining its corresponding best chromosome.
Resampling method for predicting preterm labor.
| Approach | Resampling Technique | Abbreviation |
|---|---|---|
| No resampling | Not applicable | RN |
| Oversampling | SMOTE | RO |
| Undersampling | Neighborhood Cleaning Rule | RU |
| Hybrid | SMOTE + Neighborhood Cleaning Rule | RH |
Figure 3Flow diagram of the training process and evaluation of the prediction models.
AUC and AP scores for the testing datasets for each feature subset and resampling method. Homogenous groups of resampling method with similar performance with no statistically significant differences are shown in different shades of grey. Bottom row shows the average value of the four resampling methods for each feature subset.
|
|
|
|
|
|
|
|
| |
|
| 52.1 ± 12.3 | 86.7 ± 8.2 | 90.3 ± 6.6 | 89.9 ± 6.6 | 88.8 ± 5.5 | 94.5 ± 4.6 | 93.5 ± 4.4 | |
|
| 52.8 ± 12.2 | 86.5 ± 8.0 | 90.6 ± 6.0 | 90.2 ± 6.4 | 89 ± 5.8 | 93.7 ± 4.8 | 92.5 ± 5.2 | |
|
| 65.6 ± 11.7 | 86.7 ± 8.3 | 90.9 ± 6.3 | 89.2 ± 7.2 | 85.8 ± 6.9 | 92.9 ± 5.3 | 91.2 ± 5.5 | |
|
| 65.1 ± 12.2 | 85.9 ± 7.9 | 91.5 ± 5.3 | 89.9 ± 6.93 | 87.4 ± 6.2 | 92.3 ± 5.7 | 89.9 ± 6.3 | |
|
| 59.2 ± 13.9 | 86.5 ± 8.1 | 90.8 ± 6.1 | 89.9 ± 6.8 | 88.2 ± 5.9 | 93.4 ± 5.2 | 91.8 ± 5.5 | |
|
|
|
|
|
|
|
|
| |
|
| 22.9 ± 8.9 | 66.5 ± 16.1 | 70.3 ± 15.4 | 70.6 ± 14.5 | 63.6 ± 14.5 | 84.8 ± 11.7 | 77.8 ± 14.4 | |
|
| 21.6 ± 7.2 | 65.3 ± 15.8 | 67.1 ± 15.6 | 70.2 ± 15 | 65.4 ± 14.7 | 82.9 ± 11.9 | 75.6 ± 14.3 | |
|
| 36.7 ± 15.5 | 66.7 ± 15.7 | 69.1 ± 15.5 | 69.6 ± 15.9 | 57.4 ± 14.9 | 78.4 ± 14.1 | 71.5 ± 15.6 | |
|
| 36.7 ± 14.7 | 64.9 ± 15 | 67.3 ± 15.0 | 70.7 ± 14.9 | 60.6 ± 15.1 | 77.7 ± 14.4 | 69.1 ± 15.9 | |
|
| 29.9 ± 14.5 | 66 ± 15.8 | 68.5 ± 15.3 | 71.0 ± 15.0 | 62.7 ± 14.9 | 81.0 ± 13.4 | 73.5 ± 15.4 |
Figure 4Violin plots represent the distribution of (AUC + AP)/2 for the four resampling methods for each set of input features and average value of (AUC + AP)/2 in black line. Violin color represents homogenous group with similar performance without significant difference (p > 0.05).
AUC and AP scores for the testing datasets for optimum feature subset obtained from undersampling validation partition with different imbalance ratios.
|
|
|
|
|
|
|
| AUC (%) | 86.9 ± 8 | 88.9 ± 7.4 | 86.5 ± 8.5 | 89.2 ± 6.9 | 90.4 ± 7.2 |
| AP (%) | 70.7 ± 15.1 | 71.3 ± 15.1 | 72.1 ± 14.6 | 72.7 ± 14.9 | 81.6 ± 13 |
|
|
|
|
|
|
|
| AUC (%) | 88.5 ± 8.6 | 91.4 ± 5.7 | 92.5 ± 5.7 | 94.5 ± 4.6 | - |
| AP (%) | 76 ± 14.2 | 75.5 ± 14.5 | 81.2 ± 12.7 | 84.8 ± 11.7 | - |
Figure 5(AUC + AP)/2 distribution of testing partition for optimum feature subset obtained from undersampling validation partition with different imbalance ratios and the average value of (AUC + AP)/2 (black line). Violin colors represent homogenous groups with similar performance and no significant differences (p > 0.05).
Figure 6Average ROC curve (left) and precision-recall curve (right) of the testing dataset for the best combination of feature subset and resampling method (FSVU, imbalance ratio 100% and no resampling). The red “x” and “⦿” markers show the operative points that maximize the F1-score and G-mean, respectively. The threshold level was shown for each point of the color curves (more blue means closer to 0 and more yellow closer to 1). The dotted lines in the graphs represent the ROC baseline and precision-recall curves (random classifier).
Threshold-dependent metrics for the best model (FSVU, imbalance ratio 100% and no resampling method).
| Maximizing Criteria | F1-Score (%) | G-Mean (%) | Precision (%) | Recall (%) | Specificity (%) |
|---|---|---|---|---|---|
| F1-score | 79.6 ± 13.8 | 87.7 ± 10.1 | 81.9 ± 14.9 | 79.6 ± 17.4 | 97.9 ± 2.7 |
| G-mean | 71.5 ± 17.8 | 91.6 ± 6.7 | 61.4 ± 21.5 | 89.8 ± 12.1 | 94.0 ± 5.4 |