| Literature DB >> 35874865 |
Nam Hyeok Kim1, Ji Min Kim2, Da Mi Park2, Su Ryeon Ji1, Jong Woo Kim3.
Abstract
Objective: Although depression in modern people is emerging as a major social problem, it shows a low rate of use of mental health services. The purpose of this study was to classify sentences written by social media users based on the nine symptoms of depression in the Patient Health Questionnaire-9, using natural language processing to assess naturally users' depression based on their results.Entities:
Keywords: Depression; Patient Health Questionnaire-9; deep learning; machine learning; natural language processing; social media
Year: 2022 PMID: 35874865 PMCID: PMC9297458 DOI: 10.1177/20552076221114204
Source DB: PubMed Journal: Digit Health ISSN: 2055-2076
Figure 1.The entire process of training and using the depression classifier.
Number and proportion of sentences according to depression and the Patient Health Questionnaire-9 (PHQ-9) symptoms.
| Label Y/N | The PHQ-9 label 0–9 | Count | Percentage | |
|---|---|---|---|---|
| Y | 0 | 2156 | 15.07% | 49.99% |
| 1 | 2132 | 14.90% | ||
| 2 | 534 | 3.73% | ||
| 3 | 1358 | 9.49% | ||
| 4 | 685 | 4.79% | ||
| 5 | 809 | 5.65% | ||
| 6 | 1986 | 13.88% | ||
| 7 | 2027 | 14.17% | ||
| 8 | 591 | 4.13% | ||
| 9 | 2030 | 14.19% | ||
| Total | 14,308 | 100.00% | ||
| N | 14,313 | 50.01% | ||
| Total | 28,621 | 100.00% | ||
Figure 2.Performance comparison of the Y/N and 0–9 sentence classifiers according to algorithm.
Performance of bidirectional encoder representations from transformers (BERT)-based Y/N sentence classifier.
| Y/N sentence classifier | |||
|---|---|---|---|
|
|
|
|
|
| N | 0.96 | 0.91 | 0.93 |
| Y | 0.92 | 0.96 | 0.94 |
|
| 0.9368 | ||
Performance of the 0–9 sentence classifier.
| 0–9 sentence classifier | |||
|---|---|---|---|
|
|
|
|
|
| 0 | 0.73 | 0.69 | 0.71 |
| 1 | 0.77 | 0.79 | 0.78 |
| 2 | 0.67 | 0.73 | 0.79 |
| 3 | 0.90 | 0.96 | 0.93 |
| 4 | 0.94 | 0.97 | 0.95 |
| 5 | 0.80 | 0.82 | 0.81 |
| 6 | 0.89 | 0.81 | 0.85 |
| 7 | 0.86 | 0.85 | 0.86 |
| 8 | 0.82 | 0.89 | 0.85 |
| 9 | 0.90 | 0.92 | 0.91 |
|
| 0.8329 | ||
Figure 3.F1-scores of the 0–9 sentence classifier.
Logistic regression results of baseline and proposed depression classifier by fivefold cross-validation.
| Baseline depression classifier | Proposed depression classifier | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| 1 | Intercept | 0.3800 | 0.0963 | 2.71e-4 | Intercept | 0.449 | 0.079 | 1.07e-6 |
| Ratio_D | 3.0651 | 1.6601 | 0.0712 | Ratio_1 | 15.219 | 6.328 | 0.0204 | |
| Ratio_2 | -61.892 | 37.617 | 0.1069 | |||||
| 2 | Intercept | 0.3593 | 0.1034 | 0.0011 | Intercept | 0.413 | 0.096 | 9.6e-5 |
| Ratio_1 | 20.172 | 7.287 | 0.0082 | |||||
| Ratio_D | 3.0429 | 1.6253 | 0.0675 | Ratio_2 | -56.379 | 29.164 | 0.0598 | |
| Ratio_3 | -30.373 | 16.583 | 0.0732 | |||||
| Ratio_6 | 20.586 | 11.849 | 0.0894 | |||||
| 3 | Intercept | 0.5000 | 0.0729 | 1.35e-8 | Intercept | 0.444 | 0.090 | 1.26e-5 |
| Ratio_1 | 26.240 | 9.821 | 0.0106 | |||||
| Ratio_2 | -78.131 | 29.953 | 0.0125 | |||||
| Ratio_3 | -34.988 | 17.794 | 0.0557 | |||||
| Ratio_6 | 24.628 | 11.798 | 0.0428 | |||||
| 4 | Intercept | 0.5000 | 0.0729 | 1.35e-8 | Intercept | 0.362 | 0.098 | 6.5e-4 |
| Ratio_1 | 21.768 | 7.997 | 0.0093 | |||||
| Ratio_2 | -63.253 | 32.416 | 0.0575 | |||||
| Ratio_3 | -33.660 | 16.825 | 0.0517 | |||||
| Ratio_6 | 27.675 | 12.397 | 0.0308 | |||||
| 5 | Intercept | 0.3775 | 0.1021 | 5.78e-4 | Intercept | 0.412 | 0.094 | 8.19e-5 |
| Ratio_1 | 21.261 | 7.994 | 0.0109 | |||||
| Ratio_D | 2.7240 | 1.6182 | 0.0990 | Ratio_2 | -67.880 | 34.380 | 0.0548 | |
| Ratio_3 | -36.262 | 18.687 | 0.0589 | |||||
| Ratio_6 | 25.842 | 14.083 | 0.0734 | |||||
Figure 4.Comparison of the accuracy of the baseline and the proposed depression classifier by fivefold cross-validation.
Logistic regression results of baseline and proposed depression classifier by threefold cross-validation.
| Baseline depression classifier | Proposed depression classifier | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| 1 | Intercept | 0.500 | 0.080 | 2.37e-7 | Intercept | 0.447 | 0.085 | 6.53e-6 |
| Ratio_1 | 15.759 | 6.064 | 0.013 | |||||
| Ratio_2 | -66.203 | 37.013 | 0.081 | |||||
| 2 | Intercept | 0.326 | 0.111 | 0.005 | Intercept | 0.406 | 0.105 | 4.88e-4 |
| Ratio_1 | 20.623 | 7.953 | 0.013 | |||||
| Ratio_D | 3.714 | 1.706 | 0.035 | Ratio_2 | -65.304 | 29.789 | 0.035 | |
| Ratio_3 | -29.131 | 16.752 | 0.091 | |||||
| Ratio_6 | 26.269 | 12.593 | 0.044 | |||||
| 3 | Intercept | 0.500 | 0.080 | 2.37e-7 | Intercept | 0.363 | 0.111 | 0.0023 |
| Ratio_1 | 20.257 | 11.702 | 0.0922 | |||||
| Ratio_2 | -81.680 | 43.931 | 0.0714 | |||||
| Ratio_3 | -43.838 | 21.637 | 0.0504 | |||||
| Ratio_6 | 37.081 | 15.827 | 0.0249 | |||||
Accuracy of baseline and proposed depression classifier by threefold cross-validation.
| K-fold | 1 | 2 | 3 | Average |
|---|---|---|---|---|
| Baseline depression classifier | 0.50 | 0.50 | 0.50 | 0.50 |
| Proposed depression classifier | 0.70 | 0.65 | 0.70 | 0.683 |
Logistic regression results of baseline and proposed depression classifier by 10-fold cross-validation.
| Baseline depression classifier | Proposed depression classifier | |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| 1 | Intercept | 0.371 | 0.094 | 2.64e-4 | Intercept | 0.408 | 0.088 | 2.65e-5 |
| Ratio_1 | 19.611 | 7.194 | 0.008 | |||||
| Ratio_D | 3.045 | 1.573 | 0.058 | Ratio_2 | -56.506 | 28.866 | 0.055 | |
| Ratio_3 | -26.151 | 16.321 | 0.115 | |||||
| Ratio_6 | 20.434 | 11.728 | 0.087 | |||||
| 2 | Intercept | 0.355 | 0.095 | 4.64e-4 | Intercept | 0.405 | 0.087 | 2.45e-5 |
| Ratio_1 | 20.959 | 7.055 | 0.004 | |||||
| Ratio_D | 3.381 | 1.599 | 0.039 | Ratio_2 | -69.264 | 27.726 | 0.015 | |
| Ratio_3 | -36.069 | 15.941 | 0.028 | |||||
| Ratio_6 | 28.988 | 11.463 | 0.014 | |||||
| 3 | Intercept | 0.388 | 0.095 | 1.59e-4 | Intercept | 0.411 | 0.088 | 2.47e-5 |
| Ratio_1 | 20.968 | 7.303 | 0.006 | |||||
| Ratio_D | 2.648 | 1.600 | 0.103 | Ratio_2 | -66.677 | 32.394 | 0.044 | |
| Ratio_3 | -31.352 | 16.221 | 0.059 | |||||
| Ratio_6 | 24.542 | 12.778 | 0.061 | |||||
| 4 | Intercept | 0.500 | 0.068 | 1.6e-9 | Intercept | 0.386 | 0.089 | 6.98e-5 |
| Ratio_1 | 22.385 | 7.441 | 0.004 | |||||
| Ratio_2 | -64.583 | 31.222 | 0.043 | |||||
| Ratio_3 | -35.105 | 16.774 | 0.041 | |||||
| Ratio_6 | 26.204 | 12.151 | 0.035 | |||||
| 5 | Intercept | 0.381 | 0.095 | 2.04e-4 | Intercept | 0.3696 | 0.084 | 5.87e-5 |
| Ratio_1 | 11.627 | 6.531 | 0.081 | |||||
| Ratio_D | 2.881 | 1.629 | 0.082 | Ratio_2 | -58.225 | 27.590 | 0.039 | |
| Ratio_6 | 27.298 | 12.053 | 0.026 | |||||
| 6 | Intercept | 0.366 | 0.097 | 4.26e-4 | Intercept | 0.395 | 0.088 | 4.44e-5 |
| Ratio_1 | 20.764 | 7.025 | 0.004 | |||||
| Ratio_D | 3.024 | 1.593 | 0.063 | Ratio_2 | -63.907 | 27.663 | 0.025 | |
| Ratio_3 | -31.458 | 16.053 | 0.055 | |||||
| Ratio_6 | 25.317 | 11.328 | 0.030 | |||||
| 7 | Intercept | 0.500 | 0.068 | 1.6e-9 | Intercept | 0.441 | 0.089 | 9.04e-6 |
| Ratio_1 | 19.753 | 7.636 | 0.012 | |||||
| Ratio_2 | -70.562 | 28.502 | 0.016 | |||||
| Ratio_3 | -36.055 | 15.904 | 0.027 | |||||
| Ratio_6 | 24.206 | 11.409 | 0.038 | |||||
| 8 | Intercept | 0.500 | 0.068 | 1.6e-9 | Intercept | 0.434 | 0.075 | 4.87e-7 |
| Ratio_1 | 11.592 | 6.231 | 0.068 | |||||
| 9 | Intercept | 0.500 | 0.068 | 1.6e-9 | Intercept | 0.449 | 0.075 | 2.3e-7 |
| Ratio_1 | 15.401 | 5.892 | 0.011 | |||||
| Ratio_2 | -65.636 | 29.208 | 0.029 | |||||
| 10 | Intercept | 0.500 | 0.068 | 1.6e-9 | Intercept | 0.375 | 0.088 | 9.2e-5 |
| Ratio_1 | 14.537 | 6.835 | 0.038 | |||||
| Ratio_2 | -52.649 | 27.772 | 0.063 | |||||
| Ratio_6 | 19.591 | 11.213 | 0.086 | |||||
Accuracy of baseline and proposed depression classifier by 10-fold cross-validation.
| K-fold | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Average |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Baseline depression classifier | 0.50 | 0.50 | 0.50 | 0.50 | 0.50 | 0.33 | 0.50 | 0.83 | 0.33 | 0.50 | 0.50 |
| Proposed depression classifier | 0.66 | 0.50 | 0.66 | 0.50 | 0.66 | 0.66 | 0.66 | 0.83 | 0.66 | 0.83 | 0.66 |