| Literature DB >> 33916679 |
Gema Prats-Boluda1, Julio Pastor-Tronch1, Javier Garcia-Casado1, Rogelio Monfort-Ortíz2, Alfredo Perales Marín2, Vicente Diago2, Alba Roca Prats2, Yiyao Ye-Lin1.
Abstract
Preterm birth is the leading cause of death in newborns and the survivors are prone to health complications. Threatened preterm labor (TPL) is the most common cause of hospitalization in the second half of pregnancy. The current methods used in clinical practice to diagnose preterm labor, the Bishop score or cervical length, have high negative predictive values but not positive ones. In this work we analyzed the performance of computationally efficient classification algorithms, based on electrohysterographic recordings (EHG), such as random forest (RF), extreme learning machine (ELM) and K-nearest neighbors (KNN) for imminent labor (<7 days) prediction in women with TPL, using the 50th or 10th-90th percentiles of temporal, spectral and nonlinear EHG parameters with and without obstetric data inputs. Two criteria were assessed for the classifier design: F1-score and sensitivity. RFF1_2 and ELMF1_2 provided the highest F1-score values in the validation dataset, (88.17 ± 8.34% and 90.2 ± 4.43%) with the 50th percentile of EHG and obstetric inputs. ELMF1_2 outperformed RFF1_2 in sensitivity, being similar to those of ELMSens (sensitivity optimization). The 10th-90th percentiles did not provide a significant improvement over the 50th percentile. KNN performance was highly sensitive to the input dataset, with a high generalization capability.Entities:
Keywords: K-nearest neighbors; electrohysterogram; extreme learning machine; imminent labor prediction; random forest; tocolytic therapy; uterine myoelectrical activity
Mesh:
Year: 2021 PMID: 33916679 PMCID: PMC8038321 DOI: 10.3390/s21072496
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Summary of electrohysterographic recordings (EHG) features and obstetric data inputs.
| EHG Temporal Parameters | EHG Spectral Parameters | EHG Nonlinear Parameters | Obstetric Data |
|---|---|---|---|
| Peak-to-peak amplitude | DF1 | Binary Lempel-Ziv | Cervical length |
Figure 1Scheme of the method used to train, validate and test the imminent labor prediction classifiers (time to delivery (TTD ≤ 7) based on EHG in women with threatened preterm labor. This was performed with two optimization criteria in the classifier design: F1-score and sensitivity.
Hyperparameters optimized for each classifier and gridsearch carried out (in brackets).
| RF | ELM | KNN |
|---|---|---|
| Number of trees (100, 200, 500, and 750) | Number of neurons in the hidden layer (100, 500, 750, 1000, 2000, and 30,000); | Number of neighbors (1, 3, 5, and 7) |
| Maximum depth of these trees (6, 10, and unlimited) | Activation function (hyperbolic tangent and sigmoid). | Kernel used for weighting the distances (triangular, Biweight and Epanechnikov). |
| Cost of division based on the criterion of gain of information were optimized (0.001, 0.2, and 0.5) |
Hyperparameters’ combination for the optimal RF classifiers in validation.
| Opt. Criterion | Inputs | Classifier | Number of Neurons | Activation Function |
|---|---|---|---|---|
| F1-score | EHGP10–P90 + Obs | ELMF1_1 | 500 | Sigmoid |
| EHGP50 + Obs | ELMF1_2 | 500 | Sigmoid | |
| EHGP10–P90 | ELMF1_3 | 500 | Sigmoid | |
| EHGP50 | ELMF1_4 | 500 | Sigmoid | |
| Sensitivity | EHGP10–P90 + Obs | ELMSEN_1 | 750 | Sigmoid |
| EHGP50 + Obs | ELMSEN_2 | 1000 | Sigmoid | |
| EHGP10–P90 | ELMSEN_3 | 750 | Sigmoid | |
| EHGP50 | ELMSEN_4 | 500 | Sigmoid |
Hyperparameters’ combination for the optimal ELM classifiers in validation.
| Opt. Criterion | Inputs | Classifier | Number of Neurons | Activation Function |
|---|---|---|---|---|
| F1-score | EHGP10–P90 + Obs | ELMF1_1 | 500 | Sigmoid |
| EHGP50 + Obs | ELMF1_2 | 500 | Sigmoid | |
| EHGP10–P90 | ELMF1_3 | 500 | Sigmoid | |
| EHGP50 | ELMF1_4 | 500 | Sigmoid | |
| Sensitivity | EHGP10–P90 + Obs | ELMSEN_1 | 750 | Sigmoid |
| EHGP50 + Obs | ELMSEN_2 | 1000 | Sigmoid | |
| EHGP10–P90 | ELMSEN_3 | 750 | Sigmoid | |
| EHGP50 | ELMSEN_4 | 500 | Sigmoid |
Hyperparameters’ combination for the optimal KNN classifiers in validation.
| Opt. Criterion | Inputs | Classifier | Number of Neighbors | Kernel |
|---|---|---|---|---|
| F1-score | EHGP10–P90 + Obs | KNNF1_1 | 2 | Triangular |
| EHGP50 + Obs | KNNF1_2 | 7 | Biweight | |
| EHGP10–P90 | KNNF1_3 | 2 | Triangular | |
| EHGP50 | KNNF1_4 | 7 | Biweight | |
| Sensitivity | EHGP10–P90 + Obs | KNNSEN_1 | 7 | Triangular |
| EHGP50 + Obs | KNNSEN_2 | 7 | Epanechnikov | |
| EHGP10–P90 | KNNSEN_3 | 5 | Triangular | |
| EHGP50 | KNNSEN_4 | 7 | Triangular |
Summary of the classifiers developed, their input features and optimization criterion (F1-score or sensitivity). RF: random forest, ELM: extreme learning machine, K-nearest neighbors (KNN).
| RF | ELM | KNN | |||||
|---|---|---|---|---|---|---|---|
| Criterion | F1-Score | Sensitivity | F1-Score | Sensitivity | F1-Score | Sensitivity | |
| Input Features | |||||||
| EHG 10th–90th percentiles + Obstetric data | RFF1_1 | RFSEN_1 | ELMF1_1 | ELMSEN_1 | KNNF1_1 | KNNSEN_1 | |
| EHG 50th + Obstetric data | RFF1_2 | RFSEN_2 | ELMF1_2 | ELMSEN_2 | KNNF1_2 | KNNSEN_2 | |
| EHG 10th–90th percentiles | RFF1_3 | RFSEN_3 | ELMF1_3 | ELMSEN_3 | KNNF1_3 | KNNSEN_3 | |
| EHG 50th percentile | RFF1_4 | RFSEN_4 | ELMF1_4 | ELMSEN_4 | KNNF1_4 | KNNSEN_4 | |
Figure 2Mean values of different RF classifier metrics for validation datasets in the 30 data partitions optimized by F1-score. The same results were obtained when optimizing by sensitivity. For each metric the significant differences (p < 0.05) for each input dataset are marked with: 10th–90th percentiles of EHG parameters + obstetric input data; 50th percentile of EHG + obstetric input data; 10th–90th percentiles of EHG parameters; 50th percentile of EHG parameters.
Mean ± standard deviation and coefficient of variation (in brackets) of RF classifiers performance metrics in test dataset for predicting imminent birth (TTD ≤ 7days) in women with threatened preterm labor (TPL) using EHG data or a combination of EHG and obstetric data. The maximum value for each metric is shown in bold. F1: F1-score, Sens: Sensitivity, Spec: Specificity.
| Opt. Criterion | Inputs | Classifier | Test_F1 | Test_Sens | Test_Spec |
|---|---|---|---|---|---|
| F1-Score | EHGP10–P90 + Obs | RFF1_1 | 77.51 ± 7.58% (9.8%) | 66.22 ± 11.70% (17.7%) | 97.12 ± 4.13% (4.3%) |
| EHGP50 + Obs | RFF1_2 |
|
| 92.25 ± 5.35% (5.8%) | |
| EHGP10–P90 | RFF1_3 | 77.81 ± 8.71% (11.2%) | 65.78 ± 11.61% (17.6%) |
| |
| EHGP50 | RFF1_4 | 77.7 ± 6.6% (8.5%) | 71.44 ± 10.99% (15.4%) | 90.72 ± 4.58% (5.0%) |
Figure 3Mean values of different ELM classifier metrics for validation datasets in the 30 data partitions (a) optimizing F1-score (b) optimizing sensitivity. For each optimization criteria and metric, the significant differences (p < 0.05) for each input dataset are marked with 10th–90th percentiles of EHG parameters + obstetric input data; 50th percentile of EHG + obstetric input data; 10th–90th percentiles of EHG parameters; 50th percentile of EHG parameters. Significant differences between the two optimization criteria for the same input data set are marked with *.
Mean ± standard deviation and coefficient of variation (in brackets) of ELM classifiers’ performance metrics in test dataset for predicting imminent birth (TTD ≤ 7 days) in women with TPL using EHG data or a combination of EHG and obstetric data. The maximum value for each metric and optimization criterion is shown in bold. F1: F1-score, Sens: sensitivity, Spec: specificity.
| Opt. Criterion | Inputs | Classifier | Test_F1 | Test_Sens | Test_Spec |
|---|---|---|---|---|---|
| F1-score | EHGP10–P90 + Obs | ELMF1_1 | 80.00 ± 4.98% (6.0%) | 87.56 ± 8.53% (9.7%) | 74.77 ± 7.32% (9.8%) |
| EHGP50 + Obs | ELMF1_2 |
|
|
| |
| EHGP10–P90 | ELMF1_3 | 78.41 ± 4.55% (5.8%) | 85.89 ± 7.91% (9.2%) | 73.24 ± 6.93% (9.5%) | |
| EHGP50 | ELMF1_4 | 79.00 ± 5.06% (6.4%) | 86.22 ± 6.65% (7.7%) | 73.87 ± 8.64% (11.7%) | |
| Sensitivity | EHGP10–P90 + Obs | ELMSEN_1 | 74.83 ± 3.88% (5.2%) | 95.44 ± 4.59% (4.8%) | 51.35 ± 9.28% (18.1%) |
| EHGP50 + Obs | ELMSEN_2 |
|
|
| |
| EHGP10–P90 | ELMSEN_3 | 73.13 ± 3.10% (4.2%) | 94.78 ± 4.61% (4.9%) | 47.57 ± 8.83% (18.6%) | |
| EHGP50 | ELMSEN_4 | 73.83 ± 3.24% (4.4%) | 94.89 ± 5.01% (5.3%) | 49.37 ± 9.63% (19.5%) |
Figure 4Mean values of different KNN classifier metrics for validation datasets in the 30 data partitions: (a) optimizing F1-score (b) optimizing sensitivity. For each optimization criteria and metric, the significant differences (p < 0.05) for each input dataset are marked with 10th–90th percentiles of EHG parameters + obstetric input data; 50th percentile of EHG + obstetric input data; 10th–90th percentiles of EHG parameters; 50th percentile of EHG parameters. Significant differences between the two optimization criteria for the same input dataset are marked with *.
Mean ± standard deviation and coefficient of variation (in brackets) of KNN classifier performance metrics in test dataset for predicting imminent birth (TTD ≤ 7 days) in women with TPL using EHG characteristics or a combination of EHG and obstetric data. The maximum value for each metric and optimization criterion is in bold. F1: F1-score, Sens: Sensitivity, Spec: Specificity.
| Opt. Criterion | Inputs | Classifier | Test_F1 | Test_Sens | Test_Spec |
|---|---|---|---|---|---|
| F1-score | EHGP10–P90 + Obs | KNNF1_1 | 84.18 ± 9.47% (11.2%) | 79.33 ± 13.23% (16.7%) |
|
| EHGP50 + Obs | KNNF1_2 | 74.16 ± 5.07% (6.8%) |
| 52.43 ± 9.59% (18.3%) | |
| EHGP10–P90 | KNNF1_3 |
| 80.56 ± 12.57% (15.6%) | 92.70± 8.81% (9.5%) | |
| EHGP50 | KNNF1_4 | 74.13 ± 4.57% (6.2%) | 90.89 ± 6.55% (7.2%) | 55.77 ± 9.67% (17.3%) | |
| Sensitivity | EHGP10–P90 + Obs | KNNSEN_1 |
| 82.78 ± 12.13% (14.7%) |
|
| EHGP50 + Obs | KNNSEN_2 | 72.98 ± 4.00% (5.5%) |
| 47.93 ± 8.98% (18.7%) | |
| EHGP10–P90 | KNNSEN_3 | 78.63 ± 8.60% (10.9%) | 83.56 ± 12.47% (14.9%) | 76.58 ± 14.2% (18.5%) | |
| EHGP50 | KNNSEN_4 | 73.19 ± 4.31% (5.9%) | 91.78 ± 7.15% (7.8%) | 52.07 ± 9.39% (18.0%) |
Figure 5Mean values of different classifier metrics for validation datasets in the 30 data partitions obtained for the best RF, ELM and KNN classifiers. Significant differences (p < 0.05) of the classifiers and metrics with the others are marked with RFF1_2; ELMF1_2; KNNF1_2.
Figure 6Average receiver operating curves (ROCs) for training, validation and test datasets for the ELMF1_2.