| Literature DB >> 31835547 |
Mei Cai1, Yiming Wang1, Qian Luo2, Guo Wei3.
Abstract
Postpartum depression (PPD), a severe form of clinical depression, is a serious social problem. Fortunately, most women with PPD are likely to recover if the symptoms are recognized and treated promptly. We designed two test data and six classifiers based on 586 questionnaires collected from a county in North Carolina from 2002 to 2005. We used the C4.5 decision tree (DT) algorithm to form decision trees to predict the degree of PPD. Our study established the roles of attributes of the Postpartum Depression Screening Scale (PDSS), and devised the rules for classifying PPD using factor analysis based on the participants' scores on the PDSS questionnaires. The six classifiers discard the use of PDSS Total and Short Total and make extensive use of demographic attributes contained in the PDSS questionnaires. Our research provided some insightful results. When using the short form to detect PPD, demographic information can be instructive. An analysis of the decision trees established the preferred sequence of attributes of the short form of PDSS. The most important attribute set was determined, which should make PPD prediction more efficient. Our research hopes to improve early recognition of PPD, especially when information or time is limited, and help mothers obtain timely professional medical diagnosis and follow-up treatments to minimize the harm to families and societies.Entities:
Keywords: decision tree (DT); factor analysis; postpartum depression (PPD); postpartum depression screening scale (PDSS); prediction
Mesh:
Year: 2019 PMID: 31835547 PMCID: PMC6950650 DOI: 10.3390/ijerph16245025
Source DB: PubMed Journal: Int J Environ Res Public Health ISSN: 1660-4601 Impact factor: 3.390
Figure 1Interpretive ranges for the Postpartum Depression Screening Scale (PDSS) Total and Short Total.
Description of the experiment.
| Description of Test Data | Demographics Information | Questions 1–7 (Labeled A1–A7) | Questions 8–35 (Labeled A8–A35) | ||
|---|---|---|---|---|---|
| Test data A | Contains 209 instances which are abstracted from “PPD 2006 Jan ALL 586 records” 1 | ✓ | Test 1/classifier 1 | ||
| ✓ | ✓ | Test 2/classifier 2 | |||
| ✓ | ✓ | Test 3/classifier 3 | |||
| ✓ | ✓ | ✓ | Test 4/classifier 4 | ||
| Test data B | PPD 2006 Jan ALL 586 records | ✓ | Test 5/classifier 5 | ||
| ✓ | ✓ | Test 6/classifier 6 |
1 Full scale scores of the 209 instances are all 1, which means these instances have completed both the short form and full form.
General calculation results of the C4.5 decision tree (DT).
| Number of Attributes | Correctly Classified Instances (n) | Correctly Classified Instances (%) | Incorrectly Classified Instances (n) | Incorrectly Classified Instances (%) | Precision | Recall | ROC Area | |
|---|---|---|---|---|---|---|---|---|
| Test 1 (Classifier 1) | 7 | 140 | 66.98 | 69 | 33.01 | 0.654 | 0.670 | 0.757 |
| Test 2 (Classifier 2) | 18 | 150 | 71.77 | 59 | 28.33 | 0.712 | 0.718 | 0.771 |
| Test 3 (Classifier 3) | 35 | 146 | 69.86 | 63 | 30.14 | 0.701 | 0.699 | 0.794 |
| Test 4 (Classifier 4) | 46 | 146 | 69.86 | 63 | 30.14 | 0.698 | 0.699 | 0.799 |
| Test 5 (Classifier 5) | 7 | 480 | 81.91 | 106 | 18.09 | 0.789 | 0.819 | 0.852 |
| Test 6 (Classifier 6) | 18 | 486 | 82.94 | 100 | 17.06 | 0.792 | 0.829 | 0.847 |
Confusion matrix of the four classifiers of Test Data A.
| A | B | C | Classified as | |
|---|---|---|---|---|
| Test 1 | 77 | 10 | 1 | 1 |
| 29 | 20 | 11 | 2 | |
| 6 | 12 | 43 | 3 | |
| Test 2 | 78 | 9 | 1 | 1 |
| 21 | 29 | 10 | 2 | |
| 6 | 12 | 43 | 3 | |
| Test 3 | 68 | 14 | 6 | 1 |
| 14 | 33 | 13 | 2 | |
| 2 | 14 | 45 | 3 | |
| Test 4 | 69 | 14 | 5 | 1 |
| 15 | 32 | 13 | 2 | |
| 3 | 13 | 45 | 3 |
Confusion matrix of the two classifiers of Test Data B.
| A | B | C | Classified as | |
|---|---|---|---|---|
| Test 5 | 63 | 8 | 17 | 1 |
| 21 | 7 | 32 | 2 | |
| 17 | 11 | 410 | 3 | |
| Test 6 | 66 | 8 | 14 | 1 |
| 23 | 5 | 32 | 2 | |
| 14 | 9 | 415 | 3 |
Parameter settings.
| Pre-Process | Attributes Filters | Numerical to Nominal |
|---|---|---|
| Classify | Cross-validation | 10-fold cross-validation |
| Classifier | J48 |
Figure 2Visualized decision trees of the 6 classifiers: (a) the visualized decision tree of Classifier 1; (b) the visualized decision tree of Classifier 2; (c) the visualized decision tree of Classifier 3; (d) the visualized decision tree of Classifier 4; (e) the visualized decision tree of Classifier 5; (f) the visualized decision tree of Classifier 6.
The preferred sequence of attributes of the short form.
|
|
|
|
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|---|---|---|---|
| Test Data A | 4 | 0.314 | 5 | 0.279 | 6 | 0.259 | 7 | 0.200 | marri | 0.017 |
| Test Data B | 4 | 0.388 | 7 | 0.313 | 6 | 0.277 | 5 | 0.185 | marri | 0.006 |
The general calculation results of random forest DT.
| Number of Attributes | Correctly Classified Instances (n) | Correctly Classified Instances (%) | Incorrectly Classified Instances (n) | Incorrectly Classified Instances (%) | Precision | Recall | ROC Area | |
|---|---|---|---|---|---|---|---|---|
| Test 1 (Classifier 1) | 7 | 143 | 68.42 | 66 | 31.58 | 0.677 | 0.684 | 0.851 |
| Test 2 (Classifier 2) | 18 | 142 | 67.94 | 67 | 31.06 | 0.664 | 0.679 | 0.848 |
| Test 3 (Classifier 3) | 35 | 169 | 80.86 | 40 | 19.14 | 0.801 | 0.809 | 0.941 |
| Test 4 (Classifier 4) | 46 | 163 | 77.99 | 46 | 22.01 | 0.771 | 0.780 | 0.927 |
| Test 5 (Classifier 5) | 7 | 506 | 86.35 | 80 | 13.65 | 0.840 | 0.863 | 0.958 |
| Test 6 (Classifier 6) | 18 | 506 | 86.35 | 80 | 13.65 | 0.840 | 0.863 | 0.958 |
Comparison of indicator ROC Area 1.
| Approach | ROC Area |
|---|---|
| Simple Tree | 0.907 |
| Linear SVM | 0.912 |
| Weighted k-Nearest Neighbor (kNN) | 0.865 |
| Bagged Trees | 0.914 |
| Random forest DT | 0.914 |
| C4.5 DT | 0.803 |
SVM = support vector machine. 1 The value of random forest DT and C4.5 DT ROC Area is the average of the 6 classifiers in Table 6 and Table 2. The value of other the four approach’s ROC Area is the average of Table 3 in [15].