| Literature DB >> 36138035 |
Seung Mi Lee1,2,3, Yonghyun Nam2, Eun Saem Choi3,4, Young Mi Jung1,3, Vivek Sriram2, Jacob S Leiby2, Ja Nam Koo5, Ig Hwan Oh5, Byoung Jae Kim1,6, Sun Min Kim1,6, Sang Youn Kim7, Gyoung Min Kim8, Sae Kyung Joo9,10, Sue Shin11,12, Errol R Norwitz13, Chan-Wook Park1,3, Jong Kwan Jun1,3, Won Kim9,10, Dokyoon Kim14,15, Joong Shin Park16,17.
Abstract
Clinical guidelines recommend several risk factors to identify women in early pregnancy at high risk of developing pregnancy-associated hypertension. However, these variables result in low predictive accuracy. Here, we developed a prediction model for pregnancy-associated hypertension using graph-based semi-supervised learning. This is a secondary analysis of a prospective study of healthy pregnant women. To develop the prediction model, we compared the prediction performances across five machine learning methods (semi-supervised learning with both labeled and unlabeled data, semi-supervised learning with labeled data only, logistic regression, support vector machine, and random forest) using three different variable sets: [a] variables from clinical guidelines, [b] selected important variables from the feature selection, and [c] all routine variables. Additionally, the proposed prediction model was compared with placental growth factor, a predictive biomarker for pregnancy-associated hypertension. The study population consisted of 1404 women, including 1347 women with complete follow-up (labeled data) and 57 women with incomplete follow-up (unlabeled data). Among the 1347 with complete follow-up, 2.4% (33/1347) developed pregnancy-associated HTN. Graph-based semi-supervised learning using top 11 variables achieved the best average prediction performance (mean area under the curve (AUC) of 0.89 in training set and 0.81 in test set), with higher sensitivity (72.7% vs 45.5% in test set) and similar specificity (80.0% vs 80.5% in test set) compared to risk factors from clinical guidelines. In addition, our proposed model with graph-based SSL had a higher performance than that of placental growth factor for total study population (AUC, 0.71 vs. 0.80, p < 0.001). In conclusion, we could accurately predict the development pregnancy-associated hypertension in early pregnancy through the use of routine clinical variables with the help of graph-based SSL.Entities:
Mesh:
Substances:
Year: 2022 PMID: 36138035 PMCID: PMC9499925 DOI: 10.1038/s41598-022-15391-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Baseline clinical features and pregnancy outcomes of the study population.
| Characteristics | Pregnancy-associated HTN (−) (n = 1314) | Pregnancy-associated HTN ( +) (n = 33) | |
|---|---|---|---|
| Maternal age (years) | 32.3 ± 4.0 | 33.0 ± 4.7 | 0.518 |
| Nulliparity | 671 (51.1%) | 21 (63.6%) | 0.211 |
| a. Previous history of preeclampsia | 8 (0.6%) | 4 (12.1%) | < 0.001 |
| b. Chronic hypertension | 7 (0.5%) | 4 (12.1%) | < 0.001 |
| c. Pregestational diabetes | 15 (1.1%) | 2 (6.1%) | 0.063 |
| d. Renal disease | 2 (0.2%) | 0 (0.0%) | 1.000 |
| e. Autoimmune disease | 1 (0.1%) | 0 (0.0%) | 1.000 |
| a. First pregnancy | 671 (51.1%) | 21 (63.6%) | 0.211 |
| b. Old age (≥ 35 year) | 385 (29.3%) | 12 (36.4%) | 0.493 |
| c. Obesity (BMI > 30 kg/m2) | 70 (5.3%) | 6 (18.2%) | 0.003 |
| d. African race | 0 (0%) | 0 (0%) | (−) |
| Gestational age at delivery (weeks) | 39.0 ± 1.3 | 36.3 ± 3.0 | < 0.001 |
| Gestational diabetes | 72 (5.7%) | 6 (20.0%) | 0.007 |
| Birthweight at delivery (kg) | 3.2 ± 0.4 | 2.6 ± 0.8 | < 0.001 |
| Infant sex (male) | 675 (51.4%) | 17 (51.5%) | 1.000 |
| Infant admission to NICU | 53 (4.0%) | 9 (27.3%) | < 0.001 |
Data are presented as proportion (%) or mean standard ± deviation.
BMI body mass index, HTN hypertension, NICU neonatal intensive care unit.
Rank of top 11 important variables selected from various machine learning methods; support vector machine with recursive feature elimination (SVM-RFE), logistic regression with recursive feature elimination (LR-RFE), random forest using gini index (RF-gini), and random forest using information entropy (RF-entropy).
| Clinical variables | Ranks by selection methods | Integrated rankinga | |||
|---|---|---|---|---|---|
| SVM (RFE) | LR (RFE) | RF (gini) | RF (entropy) | ||
| Diastolic blood pressure (BP) in early pregnancy | 1 | 1 | 1 | 1 | 1.00 |
| Systolic BP in early pregnancy | 1 | 2 | 2 | 2 | 1.68 |
| Diastolic BP in late first trimester | 3 | 5 | 4 | 4 | 3.94 |
| Hemoglobin level measured in the first trimester | 4 | 14 | 5 | 5 | 6.17 |
| Systolic BP in late first trimester | 16 | 17 | 3 | 2 | 6.36 |
| BMI before pregnancy | 7 | 3 | 10 | 10 | 6.77 |
| Maternal age | 5 | 4 | 14 | 14 | 7.91 |
| BMI in late first trimester | 10 | 9 | 9 | 6 | 8.35 |
| History of preeclampsia in previous pregnancy | 12 | 6 | 6 | 12 | 8.49 |
| Weight in late first trimester | 8 | 7 | 12 | 11 | 9.27 |
| Weight before pregnancy | 6 | 10 | 11 | 13 | 9.62 |
Early pregnancy, measured at 7.7 ± 1.2 weeks; late first trimester, measured at 12.4 ± 0.5 weeks.
BMI body mass index, BP blood pressure.
aTo combine/aggregate four different rankings, we apply the geometric mean which is defined as where is the variable ranks in th selection methods.
Analysis of selected top 11 important variables in the study population.
| (a) In training set | |||
|---|---|---|---|
| Selected variables | Pregnancy-associated HTN (−) (n = 949) | Pregnancy-associated HTN ( +) (n = 22) | |
| Maternal age | 32.2 ± 3.9 | 33.3 ± 4.9 | 0.383 |
| History of preeclampsia | 7 (0.7%) | 2 (9.1%) | 0.016 |
| Weight before pregnancy | 57.9 ± 10.1 | 69.0 ± 13.9 | < 0.001 |
| BMI before pregnancy | 22.2 ± 3.6 | 26.2 ± 5.2 | < 0.001 |
| Systolic BP in early pregnancy | 113.3 ± 11.4 | 132.7 ± 17.2 | < 0.001 |
| Diastolic BP in early pregnancy | 67.3 ± 8.4 | 80.5 ± 10.8 | < 0.001 |
| Systolic BP in late first trimester | 112.6 ± 11.4 | 126.3 ± 14.4 | < 0.001 |
| Diastolic BP in late first trimester | 67.6 ± 8.6 | 77.9 ± 9.8 | < 0.001 |
| Weight in late first trimester | 58.8 ± 10.1 | 69.7 ± 14.0 | < 0.001 |
| BMI in late first trimester | 22.5 ± 3.6 | 26.4 ± 5.2 | < 0.001 |
| Hemoglobin level in the first trimester | 12.6 ± 1.0 | 13.6 ± 1.0 | < 0.001 |
Early pregnancy, measured at 7.7 ± 1.2 weeks; late first trimester, measured at 12.4 ± 0.5 weeks.
BMI body mass index, BP blood pressure.
Performance comparison in test set.
| Models | AUROC | Sensitivity | Specificity | PPV | NPV |
|---|---|---|---|---|---|
| 0.762 | 0.636 | 0.663 | 0.054 | 0.984 | |
| 0.701 | 0.719 | 0.795 | 0.096 | 0.989 | |
| 0.725 | 0.725 | 0.666 | 0.062 | 0.987 | |
| Risk factors | – | 0.455 | 0.805 | 0.066 | 0.980 |
Risk factors: conventional risk factors recommended by American College of Obstetricians and Gynecologists.
AUROC area under the ROC curve, PPV positive predicted value, NPV negative predicted value; [a]: models with variables from clinical guidelines, [b] models with selected important variables, and [c] models with all routine variables.
The performances of the best model are in bold.
Figure 1Receiver operating characteristic curve of proposed prediction model with graph-based semi-supervised learning in test set population (enrolled 2018–2019). Risk factors: conventional risk factors recommended by American College of Obstetricians and Gynecologists.
Figure 2Receiver operating characteristic curve of proposed prediction model vs. placental growth factor (PlGF). Risk factors: conventional risk factors recommended by American College of Obstetricians and Gynecologists. AUROC the area under the ROC curve, NPV negative predicted value, PlGF placental growth factor, PPV positive predicted value, SSL semi-supervised learning.
Figure 3The patient-derived network with 1404 pregnant women. Training set (enrolled in 2014–2017); Test set (enrolled in 2018–2019).
Figure 4Overall framework.