| Literature DB >> 35442214 |
Zahra Sharifi-Heris1, Juho Laitala2, Antti Airola2, Amir M Rahmani1, Miriam Bender1.
Abstract
BACKGROUND: Preterm birth (PTB), a common pregnancy complication, is responsible for 35% of the 3.1 million pregnancy-related deaths each year and significantly affects around 15 million children annually worldwide. Conventional approaches to predict PTB lack reliable predictive power, leaving >50% of cases undetected. Recently, machine learning (ML) models have shown potential as an appropriate complementary approach for PTB prediction using health records (HRs).Entities:
Keywords: artificial intelligence; machine learning approach; prediction model; preterm birth
Year: 2022 PMID: 35442214 PMCID: PMC9069277 DOI: 10.2196/33875
Source DB: PubMed Journal: JMIR Med Inform
Quality assessment.
| Study | Unmet need (existing gap) | Reproducibility | Robustness | Generalizability (external validation data) | Clinical significance | |||||||
|
|
| Feature engineering | Platform package | Hyperparameters | Valid methods to overcome overfitting | Stability of results |
| Predictor explanation | Suggested clinical use | |||
| Weber et al, 2018 [ | Yes | Yes | Yes | No | 5-fold CVa | Minimum and maximum values reported from the CV | No | Logistic regression coefficients and odds ratios | No | |||
| Rawashdeh et al, 2020 [ | Yes | Yes | Yes | Number of neighbors for KNNb, number of hidden layers for ANNc, number of trees for RFd | Train-test split. Train size 237 with 19 positives. Test size 37 with 7 positives | No | No | No | Yes | |||
| Gao et al, 2019 [ | Yes | Representing medical concepts as a bag of words and word embeddings, TF-IDFe, discretization of continuous features | No | No | Train-test split. Train size 17,607 with 132 positives. Test size 8082 with 85 positives | Minimum and maximum values and CIs | No | Feature importance, odds ratio | Yes | |||
| Lee and Ahn, 2019 [ | Yes | No | Yes | Only neural network architecture described | Train-test split. Both train and test sets contained 298 participants | No | No | Feature importance (RF and ANN) | No | |||
| Woolery and Grzymala-Busse, 1994 [ | Yes | No | Yes | No | A total of 3 different data sets used in isolation; 50-50 train-test split was used with each data set | No | No | No | No | |||
| Grzymala-Busse and Woolery, 1994 [ | Yes | No | Yes | No | A total of 3 different data sets used in isolation; 50-50 train-test split was used with each data set | No | No | No | No | |||
| Vovsha et al, 2014 [ | Yes | No | Yes | No | Data separated timewise to 3 data sets, and 80-20 train-test split was used with each data set; 5-fold CV to select models | No | No | Feature importance (linear SVMf) | No | |||
| Esty et al, 2018 [ | Yes | No | Yes | No | No | No | No | No | No | |||
| Frize et al, 2011 [ | Yes | No | Yes | No | Division into 3 data sets (parous and nulliparous). Train-test-verification splits | SDs of the metrics were reported | No | No | No | |||
| Goodwin and Maher, 2000 [ | Yes | No | Yes | No | Train-test split (75%-25%) | No | No | Feature importance | No | |||
| Tran et al, 2016 [ | Yes | Unigrams were created from free-text fields after removal of stop words | No | No | Train-test split (66%-33%) | No | No | Feature importance | Yes | |||
| Koivu and Sairanen, 2020 [ | Yes | New features were created. Continuous features were standardized, and nominal features were one-hot encoded | Yes | All hyperparameters described | Data set partitioned into 4 parts (feature selection, training, validation, and test, with stratified splits of 10%-70%-10%-10%) | 95% CIs for metrics | Yes | Feature importance | Yes | |||
| Khatibi et al, 2019 [ | Yes | Imputation with mode for categorical features and median for continuous features | No | No | Train-test split | No | No | Feature importance | No | |||
aCV: cross-validation.
bKNN: K-nearest neighbor.
cANN: artificial neural network.
dRF: random forest.
eTF-IDF: term frequency-inverse document frequency.
fSVM: support vector machine.
Figure 1PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) chart.
Descriptive characteristics of studies and feature selection.
| Study, country, and type of study | Population characteristics | Data source (number of features) | Population (birth) | Study (PTBa), control groups, and type of PTB | Feature selection process and gestational week for when selected features are related | Number of selected features | Date |
| Weber et al, 2018 [ | Nulliparous women with a singleton birth (<32, ≥20, and ≥37 weeks); non-Hispanic Black (n=54,084) and White (n=282,130) | Birth certificate and hospital discharge records: >1000 features | 336,214 | PTB (early spontaneous): ≥20 and <32 weeks; control: ≥37 weeks | Factors with uncertain and ambiguous values were excluded, highly correlated features were collapsed, exclusion of features with no variation; —b | 20 | 2007 to 2011 |
| Rawashdeh et al, 2020 [ | Australian; pregnancies with cervical cerclage | Data from a fetal medicine unit in a tertiary hospital in NSWc: 19 features | 274 | PTB (spontaneous): <26 weeks; control: >26 weeks | Unnecessary features (eg, medical record numbers) were excluded | 19 | 2003 to 2014 |
| Gao et al, 2019 [ | Caucasian (>68%), Black (16%-21%), and other (10%-13%) | EHRd of Vanderbilt University Medical Center: 150 features | 25,689 | PTB: <28 weeks; control: ≥28 weeks; type of PTB was not distinguished | Features were arranged by their information gain and top 150 features were retained; — | 150 | 2005 to 2017 |
| Lee and Ahn, 2019 [ | Korean; induced labors were excluded | Anam Hospital in Seoul | 596 | PTB (spontaneous): >20 and <37 weeks; control: ≥37 weeks | — | 14 | 2014 to 2018 |
| Woolery and Grzymala-Busse, 1994 [ | — | 3 data sets: 214 features in total | 18,890 | PTB: <37 weeks; control: ≥37 weeks; type of PTB was not distinguished | — | Data set 1 (n=52), data set 2 (n=77), and data set 3 (n=85) | 1994 |
| Grzymala-Busse and Woolery, 1994 [ | — | 3 data sets:153 features in total | 9480 | PTB: <36 weeks; control: ≥36 weeks; type of PTB was not distinguished | — | Data set 1 (n=13), data set 2 (n=73), and data set 3 (n=67) | 1994 |
| Vovsha et al, 2014 [ | — | NICHDe-MFMUf data set: >400 features | 2929 | PTB (spontaneous and induced): <32, <35, and <37 weeks; control: ≥37 weeks | Logistic regression with forward selection, stepwise selection, LASSOg, and elastic net; — | 24th week (n=50), 26th week (n=205), and 28th week (n=316) | 1992 to 1994 |
| Esty et al, 2018 [ | — | BORNh and PRAMSi: 520 features | 782,000 | PTB: <37 weeks; control: ≥37 weeks; type of PTB was not distinguished | Features with >50% missing values were removed before missing value imputation; features come from before the 23rd gestational week | 520 | — |
| Frize et al, 2011 [ | — | PRAMS: >300 features | >113, 000 | PTB: <37 weeks; control: ≥37 weeks; type of PTB was not distinguished | Decision tree (to establish consistency between data sets, features specific to the United States were excluded, eg, Medicaid and Women Infants Children Program); features come from before the 23rd gestational week | 19 for parous and 16 for nulliparous | 2002 to 2004 |
| Goodwin and Maher, 2000 [ | — | Duke University’s Medical Center TMR TM perinatal data: 4000−5000 features | 63,167 | PTB: <37 weeks; control: ≥37 weeks; type of PTB was not distinguished | Heuristic techniques (features related to week <37 were included); — | 32 demographic and 393 clinical | 1988 to 1997 |
| Tran et al, 2016 [ | Australian | RNSj, NSW | 15,814 births | PTB (spontaneous and elective): <34 and <37 weeks; control: ≥37 weeks | Features kept based on their importance (top | 10 | 2011 to 2015 |
| Koivu and Sairanen, 2020 [ | White, Black, American Indian or Alaskan native, and Asian or Pacific Island individuals | CDCk and NYCl data sets | 13,150,017 | PTB: <37 weeks; control: ≥37 weeks; type of PTB was not distinguished | Excluding highly correlated features with correlation analysis (Pearson); — | 26 | CDC: 2013 to 2016; NYC: 2014 to 2016 |
| Khatibi et al 2019 [ | Iranian | National maternal and neonatal records (IMaNm registry): 112 features | >1,400,000 | PTB (spontaneous and medically indicated): >28 and <37 weeks; control: ≥37 weeks | Parallel feature selection and classification methods including MR-PB-PFS (features with nonzero scores are selected as top features); — | 112 | 2016 to 2017 |
aPTB: preterm birth.
bNot reported in the study.
cNSW: New South Wales.
dEHR: electronic health record.
eNICHD: National Institute of Child Health and Human Development.
fMFMU: Maternal-Fetal Medicine Units Network.
gLASSO: least absolute shrinkage and selection operator.
hBORN: Better Outcomes Registry Network.
iPRAMS: Pregnancy Risk Monitoring Assessment System.
jRNS: Royal North Shore.
kCDC: Centers for Disease Control and Prevention.
lNYC: New York City.
mIMaN: Iranian Maternal and Neonatal Network.
Data processing and machine learning modeling.
| Study | Preprocessing data | Model | Dominant model | Evaluation metrics | Analysis software and package | Findings | ||||||
|
| Missing data management | Class imbalance |
|
|
|
|
| |||||
| Weber et al, 2018 [ | MICEa | —b | Super learning approach using logistic regression, random forest, | No difference between models | Sensitivity, specificity, PVPe, PVNf, and AUCg | Rstudio (version 3.3.2), SuperLearner package | AUC=0.67, sensitivity=0.61, specificity=0.64 | |||||
| Rawashdeh et al, 2020 [ | Instances with missing values were removed manually | SMOTEh | Locally weighted learning, Gaussian process, K-star classifier, linear regression, | Random forest | Accuracy, sensitivity, specificity, AUC, and G-means | WEKAi (version 3.9) | Random forest: G-mean=0.96, sensitivity=1.00, specificity=0.94, accuracy=0.95, AUC=0.98 (oversampling ratio of 200%) | |||||
| Gao et al, 2019 [ | — | Control group were undersampled | RNNsj, long short-term memory network, logistic regression, SVMk, Gradient boosting | RNN ensembled models on balanced data | Sensitivity, specificity, PVP, and AUC | — | AUC=0.827, sensitivity=0.965, specificity=0.698, PVP=0.033 | |||||
| Lee and Ahn, 2019 [ | — | — | ANNl, logistic regression, decision tree, naïve Bayes, random forest, SVM | No difference between models | Accuracy | Python (version 3.52) | No difference in accuracy between ANN (0.9115) with logistic regression and the random forest (0.9180 and 0.8918, respectively) | |||||
| Woolery and Grzymala-Busse, 1994 [ | — | — | LERSm | — | Accuracy | ID3n, LERS CONCLUS | Database 1: accuracy=88.8% accurate for both low-risk and high-risk pregnancy. Database 2: accuracy=59.2% in high-risk pregnant women. Database 3: accuracy=53.4% | |||||
| Grzymala-Busse and Woolery,1994 [ | — | — | LERS based on the | — | Accuracy | LERS | Accuracy=68% to 90% | |||||
| Vovsha et al, 2014 [ | — | Oversampling techniques (Adasyn) | SVMs with linear and nonlinear kernels, LR (forward selection, stepwise selection, L1 LASSO regression, and elastic net regression) | — | Sensitivity, specificity, and G-means | Rstudio, glmnet package | SVM: sensitivity (0.404 to 0.594), specificity (0.621 to 0.84), G-mean (0.575 to 0.652); LR: sensitivity (0.502 to 0.591), specificity (0.587 to 0.731), G-mean (0.586 to 0.604) | |||||
| Esty et al, 2018 [ | Imputation with the | Not clear | Hybrid C5.0 decision tree−ANN classifier | — | Sensitivity, specificity, and ROCo | R software, missForest Package, FANNp library | Sensitivity: 84.1% to 93.4%, specificity: 70.6% to 76.9%, AUC: 78.5% to 89.4% | |||||
| Frize et al, 2011 [ | Decision tree | — | Hybrid decision tree–ANN | — | Sensitivity, specificity, ROC for Pq and NPr cases | See5, MATLAB Neural Ware tool | Training (P: sensitivity=66%, specificity=83%, AUC=0.81; NP: sensitivity=62.8%, specificity=71.7%, AUC=0.72), test (P: sensitivity=66.3%, specificity=83.9%, AUC=0.80; NP: sensitivity=65%, specificity=71.3%, AUC=0.73), and verification (P sensitivity=61.4%, specificity=83.3%, AUC=0.79; NP: sensitivity=65.5%, specificity=71.1%, AUC=0.73) | |||||
| Goodwin and Maher, 2000 [ | PVRuleMinerl or FactMiner | — | Neural networks, LR, CARTs, and software programs called PVRuleMiner and FactMiner | No difference between models | ROC | Custom data mining software (Clinical Miner and PVRuleMiner, FactMiner) | No significant difference between techniques. Neural network (AUC=0.68), stepwise LR (AUC=0.66), CART (AUC=0.65), FactMiner (demographic features only; AUC=0.725), FactMiner (demographic plus other indicator features; AUC=0.757) | |||||
| Tran et al, 2016 [ | — | Undersampling of the majority class | SSLRt, RGBu | — | Sensitivity, specificity, NPVv, PVP, F-measure, and AUC | — | SSLR: sensitivity=0.698 to 0.734, specificity=0.643 to 0.732, F-measure=0.70 0.73, AUC=0.764 to 0.791, NPV=0.96 to 0.719, PVP=0.679, 0.731; RGB: sensitivity=0.621 to 0.720, specificity=0.74 to 0.841, F-measures=0.693 to 0.732, NPV=0.675 to 0.717, PVP=0.783 to 0.743, AUC=0.782 to 0.807 | |||||
| Koivu and Sairanen, 2020 [ | — | — | LR, ANN, LGBMw, deep neural network, SELUx network, average ensemble, and weighted average WAy ensemble | — | AUC | Rstudio (version 3.5.1) and Python (version 3.6.9) | AUC for classifiers: LR=0.62 to 0.64; deep neural network: 0.63 to 0.66; SELU network: 0.64 to 0.67; LGBM: 0.64 to 0.67; average ensemble: 0.63 to 0.67; WA ensemble: 0.63 to 0.67 | |||||
| Khatibi et al, 2019 [ | Map phase module | — | Decision trees, SVMs and random forests, ensemble classifiers | — | Accuracy and AUC | — | Accuracy=81% and AUC=68% | |||||
aMICE: Multiple Imputation by Chained Equations.
bNot reported in the study.
cLR: linear regression.
dLASSO: least absolute shrinkage and selection operator.
ePVP: predictive value positive.
fPVN: predictive value negative.
gAUC: area under the ROC curve.
hSMOTE: Synthetic Minority Oversampling Technique.
iWEKA: Waikato Environment for Knowledge Analysis.
jRNN: recurrent neural network.
kSVM: support vector machine.
lANN: artificial neural network.
mLERS: learning from examples of rough sets.
nID3: iterative dichotomiser 3.
oROC: receiver operating characteristic.
pFANN: Fast Artificial Neural Network.
qP: parous.
rNP: nulliparous.
sCART: classification and regression tree.
tSSLR: stabilized sparse logistic regression.
uRGB: Randomized Gradient Boosting.
vNPV: net present value.
wLGBM: Light Gradient Boosting Machine.
xSELU: scaled exponential linear unit.
yWA: weighted average.
Frequency of potential risk factors in the studies (n=13).
| Potential risk factors | Studies, n (%) |
| Previous PTBa | 10 (77) |
| Hypertensive disorders | 9 (70) |
| Maternal age | 7 (54) |
| Cervical or uterus disorders (cerclage, myoma, or inconsistency) | 7 (54) |
| Ethnicity and race | 6 (46) |
| Diabetes (eg, gestational, mellitus) | 6 (46) |
| Smoking or substance abuse | 5 (38) |
| Multiple pregnancy | 5 (38) |
| Education | 4 (30) |
| Physical characteristics (BMI, weight, and height) | 4 (30) |
| Parity | 4 (30) |
| Marital status | 3 (23) |
| Other chronic diseases (thyroid, asthma, systemic lupus erythematosus, or cardiovascular) | 3 (23) |
| PTB symptoms (bleeding, contractions, premature rupture of membranes, etc) | 3 (23) |
| Insurance | 2 (15) |
| Income | 2 (15) |
| In vitro fertilization | 2 (15) |
| Stress or domestic violence | 2 (15) |
| Infections (gonorrhea, syphilis, chlamydia, or hepatitis C) | 1 (7) |
| Biopsy | 1 (7) |
aPTB: preterm birth.