| Literature DB >> 34535116 |
Guy Amit1, Irena Girshovitz2, Karni Marcus2, Yiye Zhang3, Jyotishman Pathak3, Vered Bar4, Pinchas Akiva2.
Abstract
BACKGROUND: Postpartum depression is a widespread disorder, adversely affecting the well-being of mothers and their newborns. We aim to utilize machine learning for predicting risk of postpartum depression (PPD) using primary care electronic health records (EHR) data, and to evaluate the potential value of EHR-based prediction in improving the accuracy of PPD screening and in early identification of women at risk.Entities:
Keywords: Electronic health records; Machine learning; Postpartum depression
Mesh:
Year: 2021 PMID: 34535116 PMCID: PMC8447665 DOI: 10.1186/s12884-021-04087-8
Source DB: PubMed Journal: BMC Pregnancy Childbirth ISSN: 1471-2393 Impact factor: 3.007
Fig. 1Cohort generation flow, with partitioning into training and testing sets using either geographical or temporal criteria (a), and the distribution of the cohort by year and by country (b)
Prevalence of PPD outcome in the study cohort, by diagnosis of depression (Dx) and by treatment for depression (Tx)
| Recorded Depression Tx | No recorded Depression Tx | Total(% patients) | |
|---|---|---|---|
| Recorded Depression Dx | 22,153 PPD by Dx (8.3% of cohort) | ||
| No recorded Depression Dx | |||
| Total (% pts) | 29,839 PPD by Tx (11.2% of cohort) | 35,708 PPD (13.4% of cohort) |
Cohort’s characteristics
| Temporal validation | Geographical validation | |||||
|---|---|---|---|---|---|---|
| Characteristic | All (%) | Train set | Test set | Train set | Test set (S,W,NI) | Holdout set |
| N | 266,544 | 178,017 (68) | 82,568 (32) | 177,833 (68) | 82,752 (32) | 5959 |
| Age | 30.0 ± 5.8 | 30.1 ± 5.8 | 29.8 ± 5.8 | 30.3 ± 5.7 | 29.5 ± 5.8 | 30.2 ± 5.7 |
| White | 99,971 (37.5) | 57,295 (32.1) | 40,600 (49.2) | 71,666 (40.3) | 26,229 (31.7) | 2076 (34.8) |
| Asian | 7367 (2.8) | 4095 (2.3) | 3214 (3.9) | 6667 (3.7) | 642 (0.8) | 58 (1.0) |
| Black | 3167 (1.2) | 1708 (1.0) | 1441 (1.7) | 2968 (1.7) | 181 (0.2) | 18 (0.3) |
| Other | 2412 (0.9) | 1110 (0.6) | 1276 (1.5) | 1837 (1.0) | 549 (0.7) | 26 (0.4) |
| Unknown | 152,573 (57.2) | 113,334 (63.7) | 35,467 (43.0) | 93,737 (52.7) | 55,064 (66.5) | 3772 (63.3) |
| Single | 34,145 (12.8) | 20,612 (11.6) | 12,567 (15.2) | 16,118 (9.1) | 17,061 (20.6) | 966 (16.2) |
| Married | 62,929 (23.6) | 43,526 (24.5) | 17,460 (21.1) | 40,765 (22.9) | 20,221 (24.4) | 1943 (32.6) |
| Unknown | 169,470 (63.6) | 113,879 (64.0) | 52,541 (63.6) | 120,950 (68.0) | 45,470 (54.9) | 3050 (51.2) |
| Country | ||||||
| England | 182,506 (68.5) | 129,113 (72.5) | 48,720 (59.0) | 177,833 (100) | 0 | 4673 (78.4) |
| Scotland | 42,113 (15.8) | 23,982 (13.5) | 16,984 (20.6) | 0 | 40,966 (49.5) | 1147 (19.2) |
| Wales | 26,565 (10.0) | 15,481 (8.7) | 10,963 (13.3) | 0 | 26,444 (32.0) | 121 (2.0) |
| N. Ireland | 15,360 (5.8) | 9441 (5.3) | 5901 (7.1) | 0 | 15,342 (18.5) | 18 (0.3) |
| Deprivation index quantile | 3.03 ± 1.3 | 2.96 ± 1.3 | 3.22 ± 1.2 | 2.95 ± 1.3 | 3.24 ± 1.2 | 2.7 ± 1.3 |
| Pre-pregnancy BMI | 25.0 ± 5.4 | 24.9 ± 4.2 | 25.2 ± 4.9 | 24.9 ± 4.2 | 25.2 ± 4.6 | 25.1 ± 4.5 |
| Cesarean section | 51,151 (19.2) | 30,724 (17.2) | 19,195 (23.2) | 31,531 (17.7) | 18,388 (22.2) | 1232 (20.7) |
| Smoking | 64,778 (24.3) | 44,807 (25.2) | 18,482 (22.4) | 41,792 (23.5) | 21,497 (26.0) | 1489 (25.0) |
| History of depression | 17,384 (6.5) | 12,495 (7.0) | 4379 (5.3) | 12,052 (6.8) | 4822 (5.8) | 510 (8.6) |
Predictive variables. For continuous variables, the numbers indicate average and standard deviation, with P-value of an independent t-test. For binary variables, the number of occurrences and their percentage (out of the subjects with a ‘True’ value) are given, along with the unadjusted odds ratio (OR) of PPD with 95% confidence intervals (CI)
| Variable | PPD ( | Non-PPD ( | OR (CI) / |
|---|---|---|---|
| Age (yrs) | 28.9 ± 6.1 | 30.2 ± 5.7 | |
| Age ≤ 25 yrs | 10,634 (18.7) | 46,330 (81.3) | 1.69 (1.65,1.73) |
| Pre-pregnancy BMI | 25.7 ± 5.9 | 24.9 ± 5.3 | |
| BMI > =30 | 5016 (17.3) | 24,034 (82.7) | 1.44 (1.39,1.48) |
| Deprivation index | 3.28 ± 1.28 | 3.00 ± 1.27 | |
| Deprivation index quantile > =4 | 15,257 (16.7) | 76,360 (83.3) | 1.56 (1.52,1.60) |
| Resides in England (vs. non-England) | 22,714 (12.4) | 159,792 (87.6) | 0.78 (0.76,0.80) |
| White ethnicity (vs. known non-white) | 13,609 (13.6) | 86,362 (86.4) | 2.60 (2.42,2.80) |
| Married (vs. known single) | 7171 (11.4) | 55,758 (88.6) | 0.69 (0.67,0.72) |
| Smoking (currently) | 12,847 (19.8) | 51,931 (80.2) | 1.94 (1.89,1.98) |
| Alcohol abuse (10y) | 634 (31.6) | 1374 (68.4) | 3.02 (2.75,3.32) |
| Drug abuse (10y) | 714 (38.7) | 1131 (61.3) | 4.14 (3.77,4.55) |
| Anxiety | 950 (38.8) | 1497 (61.2) | 4.19 (3.86,4.54) |
| Anxiety symptoms | 831 (28.7) | 2064 (71.3) | 2.64 (2.43,2.86) |
| Depression | 1579 (54.1) | 1339 (45.9) | 7.93 (7.37,8.54) |
| Depression symptoms | 1281 (45.6) | 1526 (54.4) | 5.59 (5.19,6.03) |
| Antidepressants | 3508 (51.4) | 3322 (48.6) | 7.46 (7.11,7.83) |
| Antihistamines | 4257 (19.0) | 18,130 (81.0) | 1.59 (1.53,1.64) |
| Antibacterials | 1109 (20.0) | 4439 (80.0) | 1.63 (1.53,1.75) |
| Beta blockers | 557 (20.4) | 2178 (79.6) | 1.66 (1.51,1.83) |
| Pregnancy complications | 6033 (17.1) | 29,351 (82.9) | 1.40 (1.35,1.44) |
| Vomiting | 2354 (19.2) | 9936 (80.8) | 1.57 (1.50,1.64) |
| Anxiety | 3008 (32.5) | 6239 (67.5) | 3.31 (3.17,3.46) |
| Anxiety symptoms | 1463 (27.6) | 3831 (72.4) | 2.53 (2.38,2.69) |
| Depression | 5702 (39.4) | 8764 (60.6) | 4.82 (4.65,4.99) |
| Depression symptoms | 3902 (35.7) | 7041 (64.3) | 3.90 (3.74,4.06) |
| Antidepressants | 12,024 (35.6) | 21,797 (64.4) | 4.87 (4.74,5.00) |
| Antihistamines | 5657 (18.1) | 25,604 (81.9) | 1.51 (1.46,1.56) |
| Antibacterials | 1163 (18.2) | 5237 (81.8) | 1.45 (1.36,1.55) |
| Beta blockers | 1960 (24.9) | 5920 (75.1) | 2.21 (2.09,2.33) |
| Premenstrual syndrome (10y) | 1311 (29.3) | 3161 (70.7) | 2.74 (2.57,2.93) |
| Cesarean section | 7051 (13.8) | 44,100 (86.2) | 1.04 (1.01,1.07) |
| Gestational week | 39.66 ± 2.25 | 39.79 ± 2.12 | |
| Gest. week<=37 | 2141 (14.6) | 12,569 (85.4) | 1.13 (1.08,1.19) |
| APGAR 1 min | 8.32 ± 1.6 | 8.44 ± 1.6 | |
| APGAR 5 min | 9.29 ± 1.0 | 9.36 ± 0.9 | |
Fig. 2PPD odds ratios of single predictor variables, including socio-demographic, diagnoses, drug prescriptions and labor-related variables. Horizontal error bars indicate 95% confidence intervals
Prediction results using different prediction models and validation strategies. EHR: risk score based on Electronic Health Records; EPDS: score based on Edinburgh questionnaire; AUC: area under the ROC curve; CI: 95% confidence intervals
| Prediction model | Configuration | Test data | N (prevalence) | AUC (CI) | Sensitivity @0.80specificity (CI) |
|---|---|---|---|---|---|
| EHR | Geographical(train: England) | Test set (S,W,NI) | 82,752 (0.15) | 0.715 (0.709,0.719) | 0.509 (0.499,0.518) |
| Holdout set | 5959 (0.20) | 0.729 (0.711,0.746) | 0.530 (0.495,0.562) | ||
| England | 4673 (0.18) | 0.729 (0.709,0.748) | 0.534 (0.498, 0.571) | ||
| Non-England | 1286 (0.26) | 0.712 (0.677,0.744) | 0.489 (0.416,0.565) | ||
| EPDS score | Holdout set | 5959 (0.20) | 0.805 (0.787,0.821) | 0.723 (0.695,0.750) | |
| England | 4673 (0.18) | 0.825 (0.807, 0.842) | 0.732 (0.671,0.773) | ||
| Non-England | 1286 (0.26) | 0.771 (0.736,0.805) | 0.688 (0.637,0.738) | ||
| EHR + EPDS | Holdout set | 5959 (0.20) | 0.843 (0.828,0.856) | 0.764 (0.736,0.791) | |
| England | 4673 (0.18) | 0.860 (0.846,0.876) | 0.772 (0.740,0.804) | ||
| Non-England | 1286 (0.26) | 0.815 (0.783,0.844) | 0.727 (0.677,0.776) | ||
| EHR | Temporal(train: 1/00–4/10) | Test set (05/10–12/17) | 82,568 (0.12) | 0.744 (0.739,0.749) | 0.558 (0.548,0.569) |
| Holdout set | 5959 (0.20) | 0.731 (0.714,0.748) | 0.521 (0.489,0.553) | ||
| 1/00–4/10 | 4779 (0.18) | 0.738 (0.719,0.757) | 0.534 (0.494,0.572) | ||
| 5/10–12/17 | 1296 (0.28) | 0.689 (0.655,0.723) | 0.475 (0.407,0.538) | ||
| EPDS score | Holdout set | 5959 (0.20) | 0.805 (0.787,0.821) | 0.723 (0.695,0.750) | |
| 1/00–4/10 | 4779 (0.18) | 0.775 (0.754,0.795) | 0.677 (0.646,0.71) | ||
| 5/10–12/17 | 1296 (0.28) | 0.872 (0.847,0.895) | 0.821 (0.773,0.866) | ||
| EHR + EPDS | Holdout set | 5959 (0.20) | 0.844 (0.83,0.857) | 0.764 (0.735,0.791) | |
| 1/00–4/10 | 4779 (0.18) | 0.823 (0.806,0.841) | 0.733 (0.697,0.768) | ||
| 5/10–12/17 | 1296 (0.28) | 0.886 (0.866,0.907) | 0.836 (0.797,0.876) | ||
| EHR | Pooled 3-fold crossvalidation | Random test set | 86,862 (0.13) | 0.732 (0.729,0.735) | 0.535 (0.53,0.541) |
| EHR-Early ( | Geographical | Test set | 82,752 (0.15) | 0.701 (0.696,0.706) | 0.488 (0.478,0.498) |
| Holdout set | 5959 (0.20) | 0.708 (0.69,0.725) | 0.510 (0.475,0.544) | ||
| EHR-Early | Temporal | Test set | 82,568 (0.12) | 0.732 (0.727,0.737) | 0.540 (0.529,0.550) |
| Holdout set | 5959 (0.20) | 0.709 (0.692,0.727) | 0.504 (0.471,0.537) | ||
| EHR-Early | Pooled 3-foldcross validation | Random test set | 86,862 (0.13) | 0.719 (0.716,0.722) | 0.516 (0.511,0.522) |
| EHR-New onset | Pooled 3-foldcross validation | Random test set | 67,105 (0.10) | 0.666 (0.662–0.67) | 0.413 (0.406–0.42) |
Fig. 3Performance of PPD prediction models with geographical split (a) and temporal split (b) of the training/testing sets. The models are compared by the area under the ROC curve (AUC) and by the sensitivity at false positive rate of 0.2. EHR: risk score based on Electronic Health Records; EPDS: score based on Edinburgh questionnaire
Fig. 4Feature significance using SHapley Additive exPlanations (SHAP). The top 20 contributing features from the time periods of the pregnancy (P) and the preceding two years (H). The red and blue bars indicate a positive and negative impact, respectively (a). For the age variable, the SHAP value per subject (b) indicates a stronger impact of age for younger patients