| Literature DB >> 31442238 |
Jong Hyun Jhee1,2, SungHee Lee3,4, Yejin Park5, Sang Eun Lee3,4, Young Ah Kim6, Shin-Wook Kang2, Ja-Young Kwon5, Jung Tak Park2.
Abstract
Preeclampsia is one of the leading causes of maternal and fetal morbidity and mortality. Due to the lack of effective preventive measures, its prediction is essential to its prompt management. This study aimed to develop models using machine learning to predict late-onset preeclampsia using hospital electronic medical record data. The performance of the machine learning based models and models using conventional statistical methods were also compared. A total of 11,006 pregnant women who received antenatal care at Yonsei University Hospital were included. Maternal data were retrieved from electronic medical records during the early second trimester to 34 weeks. The prediction outcome was late-onset preeclampsia occurrence after 34 weeks' gestation. Pattern recognition and cluster analysis were used to select the parameters included in the prediction models. Logistic regression, decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, and stochastic gradient boosting method were used to construct the prediction models. C-statistics was used to assess the performance of each model. The overall preeclampsia development rate was 4.7% (474 patients). Systolic blood pressure, serum blood urea nitrogen and creatinine levels, platelet counts, serum potassium level, white blood cell count, serum calcium level, and urinary protein were the most influential variables included in the prediction models. C-statistics for the decision tree model, naïve Bayes classification, support vector machine, random forest algorithm, stochastic gradient boosting method, and logistic regression models were 0.857, 0.776, 0.573, 0.894, 0.924, and 0.806, respectively. The stochastic gradient boosting model had the best prediction performance with an accuracy and false positive rate of 0.973 and 0.009, respectively. The combined use of maternal factors and common antenatal laboratory data of the early second trimester through early third trimester could effectively predict late-onset preeclampsia using machine learning algorithms. Future prospective studies are needed to verify the clinical applicability algorithms.Entities:
Mesh:
Year: 2019 PMID: 31442238 PMCID: PMC6707607 DOI: 10.1371/journal.pone.0221202
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Flow chart of pattern recognition and cluster analysis based variable selection process for late-onset preeclampsia prediction.
Maternal characteristics and laboratory parameters at early second trimester.
| No preeclampsia | Preeclampsia | ||
|---|---|---|---|
| Maternal age, years | 38.9 ± 5.0 | 44.1 ± 20.2 | <0.001 |
| Parity number | 1.9 ± 1.1 | 2.0 ± 1.1 | 0.07 |
| Height, cm | 160.9 ± 7.1 | 159.8 ± 7.8 | <0.001 |
| Maternal weight at pregnancy, kg | 57.8 ± 10.0 | 60.1 ± 11.8 | <0.001 |
| SBP, mmHg | 111.73 ± 8.7 | 116.7 ± 12.3 | <0.001 |
| DBP, mmHg | 67.8 ± 6.5 | 71.6 ± 9.2 | <0.001 |
| Maternal history, n (%) | |||
| Smoking, n (%) | 36 (0.3) | 4 (0.9) | 0.05 |
| Alcohol, n (%) | 108 (1.0) | 7 (1.6) | 0.26 |
| Hypertension | 154 (1.4) | 75 (16.8) | <0.001 |
| Diabetes | 425 (4.0) | 18 (4.0) | 0.98 |
| Preeclampsia | 5 (0.1) | 6 (1.3) | <0.001 |
| Laboratory data | |||
| WBC, 103/uL | 9.01 ± 4.17 | 11.04 ± 3.45 | <0.001 |
| Hemoglobin, g/dL | 11.6 ± 1.4 | 13.5 ± 12.4 | <0.001 |
| Platelet counts, 109/L | 200.1 ± 57.0 | 195.5 ± 63.3 | <0.001 |
| BUN, mg/dL | 5.7 ± 3.2 | 9.9 ± 8.2 | <0.001 |
| Creatinine, mg/dL | 0.4 ± 0.2 | 0.7 ± 0.7 | <0.001 |
| Total bilirubin, mg/dL | 0.3 ± 0.2 | 0.4 ± 0.4 | <0.001 |
| AST, IU/L | 15.8 ± 18.4 | 24.8 ± 41.5 | <0.001 |
| ALT, IU/L | 12.8 ± 16.7 | 20.0 ± 29.7 | <0.001 |
| Potassium, mEq/L | 4.1 ± 0.3 | 4.3 ± 0.2 | 0.36 |
| TCO2, mEq/L | 21.9 ± 2.1 | 20.6 ± 2.6 | 0.17 |
| Calcium, mg/dL | 8.5 ± 0.5 | 8.7 ± 1.1 | 0.43 |
| Magnesium, mg/dL | 1.2 ± 0.1 | 1.6 ± 0.0 | 0.56 |
| UPCR, g/gCr | 0.09 [0.02–0.12] | 0.20 [0.08–0.26] | 0.87 |
Data are presented as mean ± standard deviation or number (%)
SBP, systolic blood pressure; DBP, diastolic blood pressure; UPCR, urine protein to creatinine ratio
Clinical characteristics and laboratory parameters at delivery.
| No preeclampsia | Preeclampsia | ||
|---|---|---|---|
| Maternal weight at delivery, kg | 62.8 ± 9.3 | 64.0 ± 10.9 | <0.001 |
| SBP, mmHg | 113.5 ± 12.0 | 145.0 ± 22.6 | <0.001 |
| DBP, mmHg | 68.4 ± 8.9 | 89.8 ± 15.4 | <0.001 |
| Laboratory data | |||
| WBC, 103/uL | 9.4 ± 2.8 | 11.0 ± 4.7 | <0.001 |
| Hemoglobin, g/dL | 11.8 ± 1.3 | 11.9 ± 1.8 | 0.15 |
| Platelet counts, 109/L | 228.8 ± 63.9 | 207.3 ± 79.3 | <0.001 |
| BUN, mg/dL | 7.8 ± 2.5 | 12.4 ± 5.3 | <0.001 |
| Creatinine, mg/dL | 0.5 ± 0.2 | 0.7 ± 0.3 | <0.001 |
| Total bilirubin, mg/dL | 0.5 ± 0.2 | 0.5 ± 0.8 | 0.51 |
| AST, IU/L | 18.3 ± 0.5 | 54.7 ± 20.7 | <0.001 |
| ALT, IU/L | 12.9 ± 0.4 | 37.8 ± 9.9 | <0.001 |
| Potassium, mEq/L | 4.1 ± 0.3 | 4.3 ± 0.3 | <0.001 |
| TCO2, mEq/L | 21.8 ± 2.2 | 20.9 ± 2.8 | <0.001 |
| Calcium, mg/dL | 8.7 ± 0.4 | 8.4 ± 0.7 | <0.001 |
| Magnesium, mg/dL | 1.5 ± 0.5 | 1.6 ± 0.7 | 0.85 |
| UPCR, g/gCr | 0.15 [0.08–0.21] | 0.36 [0.20–0.40] | <0.001 |
Data are presented as mean ± standard deviation or number (%)
SBP, systolic blood pressure; DBP, diastolic blood pressure; UPCR, urine protein to creatinine ratio
Fig 2Normalized importance of the selected variables for late-onset preeclampsia prediction models.
The plot shows relative importance of the variables in random forest model. IncNodePurity reflects the reduction in entropy, which is the uncertainty, due to sorting of the attribute. SBP, systolic blood pressure; WBC, white blood cell; UPCR, urine protein to creatinine ratio; UACT, urine albumin to creatinine ratio.
Fig 3Receiver operating characteristic curves of late-onset preeclampsia prediction models.
C-statistics for each prediction model are presented in the graph. DT, decision tree; NBC, naïve Bayes classification; SVM, support vector machine; RF, random forest; SGB, stochastic gradient boosting; LR, logistic regression.
Comparison of prediction performances for late-onset preeclampsia development.
| Models | Accuracy | Sensitivity | Specificity | Detection Rate |
|---|---|---|---|---|
| LR | 0.862 | 0.703 | 0.870 | 0.209 |
| DT | 0.874 | 0.648 | 0.885 | 0.215 |
| NBC | 0.899 | 0.500 | 0.918 | 0.229 |
| SVM | 0.892 | 0.137 | 0.928 | 0.085 |
| RF | 0.923 | 0.679 | 0.935 | 0.336 |
| SGB | 0.973 | 0.603 | 0.991 | 0.771 |
LR, logistic regression; DT, decision tree; NBC, naïve Bayes classification; SVM, support vector machine; RF, random forest; SGB, stochastic gradient boosting