| Literature DB >> 24204760 |
Paul Fergus1, Pauline Cheung, Abir Hussain, Dhiya Al-Jumeily, Chelsea Dobbins, Shamaila Iram.
Abstract
There has been some improvement in the treatment of preterm infants, which has helped to increase their chance of survival. However, the rate of premature births is still globally increasing. As a result, this group of infants are most at risk of developing severe medical conditions that can affect the respiratory, gastrointestinal, immune, central nervous, auditory and visual systems. In extreme cases, this can also lead to long-term conditions, such as cerebral palsy, mental retardation, learning difficulties, including poor health and growth. In the US alone, the societal and economic cost of preterm births, in 2005, was estimated to be $26.2 billion, per annum. In the UK, this value was close to £2.95 billion, in 2009. Many believe that a better understanding of why preterm births occur, and a strategic focus on prevention, will help to improve the health of children and reduce healthcare costs. At present, most methods of preterm birth prediction are subjective. However, a strong body of evidence suggests the analysis of uterine electrical signals (Electrohysterography), could provide a viable way of diagnosing true labour and predict preterm deliveries. Most Electrohysterography studies focus on true labour detection during the final seven days, before labour. The challenge is to utilise Electrohysterography techniques to predict preterm delivery earlier in the pregnancy. This paper explores this idea further and presents a supervised machine learning approach that classifies term and preterm records, using an open source dataset containing 300 records (38 preterm and 262 term). The synthetic minority oversampling technique is used to oversample the minority preterm class, and cross validation techniques, are used to evaluate the dataset against other similar studies. Our approach shows an improvement on existing studies with 96% sensitivity, 90% specificity, and a 95% area under the curve value with 8% global error using the polynomial classifier.Entities:
Mesh:
Year: 2013 PMID: 24204760 PMCID: PMC3810473 DOI: 10.1371/journal.pone.0077154
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Numbers of Patients in each group.
| Terms: |
|
|
|
|
|
|
| Recording Time | Number of records | Mean/Median Recording weeks | Number of records | Median/Median Recording Weeks | Number of records | Mean/Median Recording Weeks |
|
| 143 | 22.7/22.86 | 19 | 23.0/23.43 | 162 | 22.73/23.0 |
|
| 119 | 30.8/31.14 | 19 | 30.2/30.86 | 138 | 30.71/31.14 |
|
| 262 | 26.75/24.36 | 38 | 27.0/25.86 | 300 | 26.78/24.43 |
Figure 1Distribution of deliveries in TPEHG dataset.
Figure 2PCA for features extracted from the Channel 3 0.34–1 Hz filtered signal.
Figure 3Distribution of deliveries in TPEHG dataset after the SMOTE technique is applied.
Summary of Classifiers, Features, Validation Techniques and Sample Sizes used in this study.
| Classifiers | Features | Validation | Sample Sizes |
|
| Root Mean Squares | Holdout Cross Validation | Original (38 preterm/262 term) |
| Linear Discriminant Classifier (LDC) | Peak Frequency | k-fold Cross Validation | SMOTE (262 preterm/262 term) |
| Quadratic Discriminant Classifier (QDC) | Median Frequency | Sensitivity/Specificity | SMOTE Clinical (150 preterm/150 term) |
| Uncorrelated Normal Density Classifier (UDC) | Sample Entropy | Receiver Operator Curve | Clinical (38 preterm/262 term) |
|
| Area Under the Curve | ||
| Polynomial Classifier (POLYC) | |||
| Logistic Classifier (LOGLC) | |||
|
| |||
|
| |||
| Decision Tree Classifier (TREEC) | |||
| Parzen Classifier (PARZENC) | |||
| Support Vector Classifier (SVC) |
Classifier Performance Results for the 0.34–1 Hz Filter.
| Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
| 0.0000 | 0.9807 | 53% |
|
| 0.0000 | 0.9807 | 53% |
|
| 0.0000 | 1.0000 | 52% |
|
| 0.0000 | 0.9807 | 61% |
|
| 0.0000 | 0.9807 | 60% |
|
| 0.0000 | 0.9230 | 53% |
|
| 0.2857 | 0.8653 | 60% |
|
| 0.0000 | 1.0000 | 50% |
|
| 0.0000 | 1.0000 | 61% |
Cross Validation Results for the 0.34–1 Hz Filter.
| 80% Holdout: 100 Repetitions | Cross Val, 5 Folds, 1 Repetitions | Cross Val, 5 Folds, 100 Repetitions | |||
| Classifiers |
|
|
|
|
|
|
| 0.1342 | 0.0127 | 0.1333 | 0.1349 | 0.0045 |
|
| 0.1355 | 0.0166 | 0.1366 | 0.1421 | 0.0088 |
|
| 0.1324 | 0.0142 | 0.1366 | 0.1383 | 0.0080 |
|
| 0.1300 | 0.0072 | 0.1300 | 0.1300 | 0.0000 |
|
| 0.1324 | 0.0112 | 0.1333 | 0.1322 | 0.0034 |
|
| 0.1707 | 0.0270 | 0.1267 | 0.1312 | 0.0081 |
|
| 0.2135 | 0.0443 | 0.1995 | 0.2183 | 0.0210 |
|
| 0.1267 | 0.0000 | 0.1267 | 0.1267 | 0.0000 |
|
| 0.1267 | 0.0000 | 0.1267 | 0.1267 | 0.0000 |
Figure 4Received Operator Curve for the 0.34–1 Hz Filter.
Classifier Performance Table for Oversampled 0.34–1 Hz Signal.
| Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
| 0.8653 | 0.8076 | 66% |
|
| 0.9230 | 0.8461 | 72% |
|
| 0.8269 | 0.8076 | 72% |
|
| 0.8653 | 0.8076 | 86% |
|
| 0.8653 | 0.8269 | 86% |
|
| 0.8653 | 0.8269 | 84% |
|
| 0.9038 | 0.8269 | 89% |
|
| 0.5961 | 0.9615 | 72% |
|
| 0.8076 | 0.7692 | 78% |
Cross Validation Results for Oversampled 0.34–1 Hz Signal.
| 80% Holdout: 100 Repetitions | Cross Val, 5 Folds, 1 Repetitions | Cross Val, 5 Folds, 100 Repetitions | |||
| Classifiers |
|
|
|
|
|
|
| 0.2132 | 0.0325 | 0.2116 | 0.2064 | 0.0023 |
|
| 0.1770 | 0.0347 | 0.1811 | 0.1806 | 0.0040 |
|
| 0.2035 | 0.0328 | 0.1981 | 0.2001 | 0.0018 |
|
| 0.2132 | 0.0325 | 0.2116 | 0.2064 | 0.0023 |
|
| 0.2037 | 0.0315 | 0.2118 | 0.1972 | 0.0059 |
|
| 0.2249 | 0.0386 | 0.2594 | 0.2340 | 0.0088 |
|
| 0.1995 | 0.0387 | 0.1944 | 0.1994 | 0.0069 |
|
| 0.2499 | 0.0392 | 0.2423 | 0.2461 | 0.0124 |
|
| 0.2851 | 0.0383 | 0.2899 | 0.2901 | 0.0042 |
Figure 5Received Operator Curve for Oversampled 0.34–1 Hz Signal.
Classifier Performance for Oversampled 0.34–1 Hz Signal with additional Features.
| Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
| 0.9666 | 0.9000 | 70% |
|
| 0.9666 | 0.1666 | 83% |
|
| 0.9666 | 0.1333 | 78% |
|
| 0.9666 | 0.9000 | 95% |
|
| 0.9666 | 0.9000 | 94% |
|
| 0.9333 | 0.8000 | 90% |
|
| 0.9666 | 0.9000 | 93% |
|
| 0.9666 | 0.5666 | 59% |
|
| 0.9666 | 0.7000 | 92% |
Cross Validation Results for Oversampled 0.34–1 Hz Signal with additional Features.
| 80% Holdout: 100 Repetitions | Cross Val, 5 Folds, 1 Repetitions | Cross Val, 5 Folds, 100 Repetitions | |||
| Classifiers |
|
|
|
|
|
|
| 0.0858 | 0.0289 | 0.00800 | 0.0867 | 0.0060 |
|
| 0.3260 | 0.0780 | 0.0780 | 0.3344 | 0.0216 |
|
| 0.4162 | 0.0471 | 0.0471 | 0.4289 | 0.0124 |
|
| 0.0858 | 0.0289 | 0.0289 | 0.0867 | 0.0060 |
|
| 0.0932 | 0.0301 | 0.0301 | 0.0983 | 0.0062 |
|
| 0.1458 | 0.411 | 0.0411 | 0.1522 | 0.0131 |
|
| 0.1127 | 0.0436 | 0.0436 | 0.1178 | 00.0149 |
|
| 0.2130 | 0.044 | 0.0444 | 0.2067 | 0.0056 |
|
| 0.1338 | 0.0419 | 0.0419 | 0.1233 | 0.0070 |
Figure 6Received Operator Curve for Oversampled 0.34–1 Hz Signal with additional features.
Classifier Performance for Clinical Data Only.
| Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
| 0.0000 | 1.0000 | 51% |
|
| 1.0000 | 0.0384 | 51% |
|
| 0.0000 | 0.9038 | 52% |
|
| 0.000 | 1.0000 | 55% |
|
| 0.0000 | 1.0000 | 55% |
|
| 0.0000 | 0.9230 | 50% |
|
| 0.1428 | 0.8461 | 52% |
|
| 0.0000 | 1.0000 | 49% |
|
| 0.0000 | 1.0000 | 53% |
Cross Validation Results for Clinical Data Only.
| 80% Holdout: 30 Repetitions | Cross Val, 5 Folds, 1 Repetitions | Cross Val, 5 Folds, 6 Repetitions | |||
| Classifiers |
|
|
|
|
|
|
| 0.1354 | 0.0146 | 0.1399 | 0.1355 | 0.0053 |
|
| 0.8443 | 0.0338 | 0.8532 | 0.8559 | 0.0073 |
|
| 0.1953 | 0.0364 | 0.1930 | 0.1939 | 0.0062 |
|
| 0.1278 | 0.0049 | 0.1300 | 0.1272 | 0.0013 |
|
| 0.1334 | 0.0139 | 0.1300 | 0.1322 | 0.0053 |
|
| 0.1652 | 0.0289 | 0.1267 | 0.1283 | 0.0028 |
|
| 0.2231 | 0.493 | 0.2126 | 0.2362 | 0.0227 |
|
| 0.1267 | 0.000 | 0.1267 | 0.1267 | 0.0000 |
|
| 0.1267 | 0.000 | 0.1267 | 0.1267 | 0.0000 |
Figure 7Received Operator Curve for Clinical Data Only.
Summary of Classifier Performance for Original TPEHG Dataset and Oversampled Dataset Using SMOTE.
| Original TPEHG dataset | Oversampled using SMOTE | |||||
| Sensitivity | Specificity | AUC | Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
|
|
|
| 0.0000 | 0.9807 | 53% | 0.8653 | 0.8076 | 66% |
|
| 0.0000 | 0.9807 | 53% | 0.9230 | 0.8461 | 72% |
|
| 0.0000 | 1.0000 | 52% | 0.8269 | 0.8076 | 72% |
|
| 0.0000 | 0.9807 | 61% | 0.8653 | 0.8076 | 86% |
|
| 0.0000 | 0.9807 | 60% | 0.8653 | 0.8269 | 86% |
|
| 0.0000 | 0.9230 | 53% | 0.8653 | 0.8269 | 84% |
|
| 0.2857 | 0.8653 | 60% | 0.9038 | 0.8269 | 89% |
|
| 0.0000 | 1.0000 | 50% | 0.5961 | 0.9615 | 72% |
|
| 0.0000 | 1.0000 | 61% | 0.8076 | 0.7692 | 78% |
Summary of Classifier Performance for Oversampling with Additional Features and Clinical Data Only.
| Oversampling with Additional Features | Clinical Data Only | |||||
| Sensitivity | Specificity | AUC | Sensitivity | Specificity | AUC | |
| Classifier |
|
|
|
|
|
|
|
| 0.9666 | 0.9000 | 70% | 0.0000 | 1.0000 | 51% |
|
| 0.9666 | 0.1666 | 83% | 1.0000 | 0.0384 | 51% |
|
| 0.9666 | 0.1333 | 78% | 0.0000 | 0.9038 | 52% |
|
| 0.9666 | 0.9000 | 95% | 0.0000 | 1.0000 | 55% |
|
| 0.9666 | 0.9000 | 94% | 0.0000 | 1.0000 | 55% |
|
| 0.9333 | 0.8000 | 90% | 0.0000 | 0.9230 | 50% |
|
| 0.9666 | 0.9000 | 93% | 0.1428 | 0.8461 | 52% |
|
| 0.9666 | 0.5666 | 59% | 0.0000 | 1.0000 | 49% |
|
| 0.9666 | 0.7000 | 92% | 0.0000 | 1.0000 | 53% |