| Literature DB >> 32116859 |
Zhenkai Yang1, Chuansheng Chen2, Hanwen Li1, Li Yao1, Xiaojie Zhao1.
Abstract
Large-scale screening for depression has been using norms developed based on a given population at a given time. Researchers have attempted to adjust the cutoff scores over time and for different populations, but such efforts are too few and far in between to be sensitive to temporal and regional variations. In this study, we proposed an unsupervised machine learning approach to constructing depression classifications to overcome the limitations of the traditional norm-based method. Data were collected from 8,063 Chinese middle and high school students. Using k-means clustering, we generated four levels of depressive symptoms to match the norm-based classifications. We then evaluated the validity of the classifications by comparing them with the norm-based method (and its variations) in terms of their robustness, model performance (accuracy, AUC, and sensitivity), and convergent construct validity (i.e., associations with known correlates). The results showed that our automatic classification system performed well as compared to the norm-based method.Entities:
Keywords: clustering ; depression; norm; scale data; unsupervised classification
Year: 2020 PMID: 32116859 PMCID: PMC7034392 DOI: 10.3389/fpsyt.2020.00045
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Demographic variables.
| Demographic variables | Response options |
| Gender | Male; female |
| Academic performance | Excellent; good; medium; poor |
| Burden of school work | Very light; somewhat light; average; heavy; very heavy |
| Parents divorced | Yes; no |
| Family economic situation | Much better than average; better than average; average; poorer than average; much poorer than average |
| Only-child | Yes; no |
The cross tabulation of individuals in each level of depression based on the clustering- and norm-based classifications.
| Norm | None | Mild | Moderate | Severe | Total | |
|---|---|---|---|---|---|---|
| Clustering | ||||||
| None | 3958 | 84 | 0 | 0 | 4042 | |
| Mild | 25 | 1265 | 1165 | 0 | 2455 | |
| Moderate | 5 | 83 | 521 | 62 | 671 | |
| Severe | 0 | 0 | 144 | 383 | 527 | |
| Total | 3988 | 1432 | 1830 | 445 | 7695 | |
The number of people (N) and the corresponding score range (S) in each level based on the four ways of classifications.
| Criterion | None | Mild | Moderate | Severe | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S | N | % | S | N | % | S | N | % | S | N | % | |
| Clustering | 0–6 | 4,042 | 52.2 | 4–15 | 2,455 | 31.9 | 3–18 | 671 | 8.7 | 13–39 | 527 | 6.8 |
| Norm | 0–4 | 3,988 | 51.8 | 5–7 | 1,432 | 18.6 | 8–15 | 1,830 | 23.8 | 16–39 | 445 | 5.8 |
| Criterion 1 | 0–3 | 3,438 | 44.7 | 4–6 | 1,534 | 19.9 | 7–14 | 2,155 | 28.0 | 15–39 | 568 | 7.4 |
| Criterion 2 | 0–5 | 4,514 | 58.7 | 6–8 | 906 | 11.8 | 9–16 | 1,930 | 25.1 | 17–39 | 345 | 4.5 |
S, score range; N, number of individuals for a given level of depression based on a given method of classification; %, percentage of the total sample.
Figure 1Model performance in terms of the averaged accuracy (A), AUC (B), and sensitivity (C: Imbalanced data, D: Balanced data) by classification method/criterion.
Figure 2The learning curve of LDA classifier for the 4 criteria in the imbalanced data (A: clustering, norm, criterion 1, and criterion 2 from left to right) and balanced data (B: clustering, norm, criterion 1, and criterion 2 from left to right).
The associations (Chi-squared) between demographic factors and depression levels based on different classification methods.
| Demographic variables | Clustering | Norm | Criterion 1 | Criterion 2 |
|---|---|---|---|---|
| Gender | 55.808 | 59.619 | 70.998 | 60.391 |
| Academic performance | 248.399 | 263.135 | 270.785 | 267.610 |
| Burden of | 377.312 | 411.159 | 349.587 | 386.253 |
| Parents divorced | 28.871 | 28.519 | 30.704 | 20.564 |
| Family economic situation | 167.901 | 161.236 | 157.392 | 152.535 |
| Only-child | 13.441 | 8.487 | 11.531 | 10.778 |
Given the sample size, all Chi-squared statistics were significant at the level of p < .001.
Figure 3The mean scores of the related scales including SAS (A), ASLEC (B), ISI (C), and PSS (D) by the level of depression and classification method/criterion.
The Eta-squared values from ANOVA of correlates of depression by levels of depression.
| SAS | PSS | ISI | ASLEC | |
|---|---|---|---|---|
| Clustering | 0.30 | 0.27 | 0.21 | 0.29 |
| Norm | 0.34 | 0.30 | 0.24 | 0.31 |
| Criterion 1 | 0.33 | 0.30 | 0.23 | 0.31 |
| Criterion 2 | 0.33 | 0.28 | 0.22 | 0.30 |