| Literature DB >> 31199323 |
Diego Fernandez1,2, Fidel Cacheda1,2, Francisco J Novoa1,2, Victor Carneiro1,2.
Abstract
BACKGROUND: Major depressive disorder (MDD) or depression is among the most prevalent psychiatric disorders, affecting more than 300 million people globally. Early detection is critical for rapid intervention, which can potentially reduce the escalation of the disorder.Entities:
Keywords: artificial intelligence; depression; machine learning; major depressive disorder; social media
Mesh:
Year: 2019 PMID: 31199323 PMCID: PMC6598420 DOI: 10.2196/12554
Source DB: PubMed Journal: J Med Internet Res ISSN: 1438-8871 Impact factor: 5.428
Analysis of dataset statistics.
| Features | Depressed | Control | Total | |
| Subjects, n | 135 | 752 | 887 | |
| Posts, n | 49,557 | 481,837 | 531,394 | |
| Average | 367.1 | 640.7 | 599.1 | |
| Median (range) | 154 (10-1832) | 375 (10-2000) | 321 (10-2000) | |
| Interquartile range | 562 | 1039.5 | 1006 | |
| Average words per submission | 27.3 | 21.9 | 22.4 | |
| Average | 586.42 | 625.02 | 619.15 | |
| Median (range) | 520.95 (0.60-2249.48) | 477.12 (0.26-3067.16) | 484.88 (0.26-3067.16) | |
| Interquartile range | 786.88 | 753.19 | 756.83 | |
Figure 1Relative percentage for number of words used on title (a), text (b), and both fields (c) for depressed and nondepressed individuals.
Figure 2Average time gaps distribution between writings for depressed and nondepressed subjects.
Figure 3Time span bar plots according to the day of the week (a) and hour of the day (b) for depressed and nondepressed subjects.
Figure 4Textual similarity measures. IDF: inverse document frequency; BM25: Okapi Best Matching 25.
Figure 5Latent semantic space.
Dataset statistics.
| Features | Training | Test | ||
| Depressed | Control | Depressed | Control | |
| Subjects, n | 83 | 403 | 52 | 349 |
| Posts, n | 30,851 | 264,172 | 18,706 | 217,665 |
| Average submissions per subject | 371.7 | 655.5 | 359.7 | 623.7 |
| Average words per submission | 27.6 | 21.3 | 26.9 | 22.5 |
Figure 6Early risk detection error metric. ERDE: early risk detection error.
Baselines used for comparison with our proposed methods.
| Method | ERDEa5 | ERDE50 | Precision | Recall | |
| Random | 18.51 | 15.20 | 0.20 | 0.12 | 0.00 |
| All depressed | 21.67 | 15.03 | 0.23 | 0.13 | 1.00 |
| Nondepressed | 12.97 | 12.97 | 0.00 | 0.00 | 0.00 |
| Oracle1 | 10.38 | 3.74 | 1.00 | 1.00 | 1.00 |
| Oracle2 | 11.83 | 5.30 | 1.00 | 1.00 | 1.00 |
| Oracle3 | 12.23 | 6.73 | 1.00 | 1.00 | 1.00 |
| Oracle5 | 12.59 | 7.86 | 1.00 | 1.00 | 1.00 |
| Oracle10 | 12.97 | 12.97 | 1.00 | 1.00 | 1.00 |
| FHDOBb | 12.70 | 10.39 | 0.55 | 0.69 | 0.46 |
| UNSLAc | 13.66 | 9.68 | 0.59 | 0.48 | 0.79 |
aERDE: early risk detection error.
bModel B presented by the University of Applied Sciences and Arts Dortmund, Germany (FHDO).
cModel A presented by the National University of San Luis, Argentina (UNSL).
Evaluation results for the singleton model on different feature sets. Writing feature (WF) groups all WFs presented. The values for the best early risk detection error with 0=5 and 0=50 are in italics.
| Features | ERDEa5 | ERDE50 | Precision | Recall | |
| Cosb Textc | 15.83 | 13.22 | 0.31 | 0.23 | 0.46 |
| Cos Alld | 16.48 | 13.62 | 0.36 | 0.24 | 0.67 |
| BM25e Text | 18.11 | 16.61 | 0.26 | 0.16 | 0.60 |
| BM25 All | 14.36 | 12.43 | 0.36 | 0.32 | 0.40 |
| LSAf | 21.60 | 14.96 | 0.23 | 0.13 | 1.00 |
| Normg LSA | 21.34 | 18.02 | 0.23 | 0.13 | 1.00 |
| LSA stemh | 23.51 | 14.70 | 0.23 | 0.13 | 1.00 |
| Norm LSA stemi | 12.97 | 12.97 | 0.00 | 0.00 | 0.00 |
| Cos Text + WF | 14.09 | 13.60 | 0.07 | 0.33 | 0.04 |
| Cos All + WF | 13.31 | 12.31 | 0.20 | 0.67 | 0.12 |
| BM25 Text + WF | 15.59 | 14.62 | 0.29 | 0.24 | 0.37 |
| BM25 All + WF | 20.49 | 18.05 | 0.30 | 0.18 | 0.83 |
| Cos BM25 Text + WF | 14.15 | 12.97 | 0.30 | 0.38 | 0.25 |
| Cos BM25 All + WF | 13.29 | 12.97 | 0.12 | 0.29 | 0.08 |
| LSA Cos Text + WF | 17.86 | 12.92 | 0.29 | 0.18 | 0.73 |
| LSA BM25 Text + WF | 16.61 | 12.09 | 0.27 | 0.18 | 0.56 |
| LSA Cos All + WF | 19.51 | 13.46 | 0.26 | 0.15 | 0.90 |
| LSA BM25 All + WF | 20.47 | 14.08 | 0.24 | 0.14 | 0.94 |
| LSA Cos BM25 Text + WF | 18.34 | 12.85 | 0.28 | 0.17 | 0.85 |
| LSA Cos BM25 All + WF | 0.34 | 0.45 | 0.27 | ||
| Norm LSA Cos Text + WF | 13.35 | 13.35 | 0.04 | 0.20 | 0.02 |
| Norm LSA BM25 Text + WF | 13.58 | 13.33 | 0.11 | 0.17 | 0.08 |
| Norm LSA Cos All + WF | 14.70 | 14.45 | 0.11 | 0.21 | 0.08 |
| Norm LSA BM25 All + WF | 14.55 | 14.30 | 0.13 | 0.22 | 0.10 |
| Norm LSA Cos BM25 Text + WF | 14.60 | 14.60 | 0.25 | 0.25 | 0.08 |
| Norm LSA Cos BM25 All + WF | 13.73 | 13.48 | 0.20 | 0.20 | 0.08 |
aERDE: early risk detection error.
bCos: cosine.
cOnly the text part of the writing is considered.
dThe whole writing is considered.
eBM25: Okapi Best Matching 25.
fLSA: latent semantic analysis.
gNormalized LSA.
hLSA with stemming.
iNormalized LSA with stemming.
Evaluation results for classification on different writing features for the best singleton model from Table 4, which combines cosine and Okapi Best Matching 25 textual features for all text fields and latent semantic analysis features. The values for the best early risk detection error with 0=5 and 0=50 are in italics.
| WFa combinations | ERDEb5 | ERDE50 | Precision | Recall | |
| BSMc + Writing, TimeGap, Hour | 17.35 | 11.39 | 0.30 | 0.18 | 0.85 |
| BSM + Writing, TimeGap, Day | 12.12 | 0.22 | 0.31 | 0.17 | |
| BSM + Writing, TimeGap, Week | 14.77 | 11.44 | 0.33 | 0.25 | 0.48 |
| BSM + Writing, LogTimeGap, Hour | 14.03 | 13.54 | 0.12 | 0.29 | 0.08 |
| BSM + Writing, LogTimeGap, Day | 18.95 | 12.53 | 0.27 | 0.16 | 0.96 |
| BSM + Writing, LogTimeGap, Week | 17.80 | 12.72 | 0.28 | 0.17 | 0.85 |
| BSM + Writing, TimeGap, Day, Hour | 16.14 | 11.55 | 0.31 | 0.21 | 0.63 |
| BSM + Writing, TimeGap, Week, Hour | 19.28 | 12.85 | 0.26 | 0.15 | 0.94 |
| BSM + Writing, LogTimeGap, Day, Hour | 16.86 | 12.28 | 0.29 | 0.18 | 0.77 |
| BSM + Writing, LogTimeGap, Week, Hour | 16.91 | 12.13 | 0.29 | 0.19 | 0.63 |
| BSM + Writing, TimeGap, LogTimeGap, Day | 17.00 | 0.31 | 0.19 | 0.87 | |
| BSM + Writing, TimeGap, LogTimeGap, Week | 17.85 | 12.62 | 0.30 | 0.18 | 0.87 |
| BSM + Writing, TimeGap, LogTimeGap, Hour | 17.71 | 12.65 | 0.28 | 0.17 | 0.83 |
| BSM + Writing, TimeGap, LogTimeGap, Hour, Week | 16.53 | 13.47 | 0.29 | 0.20 | 0.52 |
aWF: writing feature.
bERDE: early risk detection error.
cBSM: best singleton model.
Evaluation results for classification of different feature sets for the dual model (thw=6). The first column shows features for the positive model, and the first row shows features for the negative model. Positive feature sets are numbered and negative features follow the same numbering. The values for the best early risk detection error5 are in italics. Labels for the algorithms (Roman numerals) are shared for rows and columns.
| Features | I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII |
| LSAa (I) | 13.24 | 12.99 | 12.99 | 12.99 | 12.99 | 12.99 | 13.48 | 13.24 | 13.48 | 13.24 | 29.20 | 13.24 |
| Norm LSAb (II) | 13.22 | 12.97 | 12.97 | 12.97 | 13.47 | 12.97 | 13.47 | 13.22 | 13.47 | 13.22 | 29.43 | 13.22 |
| LSA stemc (III) | 13.40 | 13.15 | 13.15 | 13.15 | 13.15 | 13.15 | 13.65 | 13.40 | 13.65 | 13.40 | 29.36 | 13.40 |
| Norm LSA stemd (IV) | 13.22 | 12.97 | 12.97 | 12.97 | 13.47 | 12.97 | 13.47 | 13.22 | 13.47 | 13.22 | 29.43 | 13.22 |
| Cose BM25f Textg + WFh (V) | 13.22 | 12.97 | 12.97 | 12.97 | 13.47 | 12.97 | 13.47 | 13.22 | 13.47 | 13.22 | 29.43 | 13.22 |
| Cos BM25 Alli + WF (VI) | 13.22 | 12.97 | 12.97 | 12.97 | 13.47 | 12.97 | 13.47 | 13.22 | 13.47 | 13.22 | 29.43 | 13.22 |
| LSA Cos Text + WF (VII) | 13.24 | 12.99 | 12.99 | 12.99 | 12.99 | 12.99 | 13.48 | 13.24 | 13.48 | 13.24 | 29.20 | 13.24 |
| LSA BM25 Text + WF (VIII) | 13.24 | 12.99 | 12.99 | 12.99 | 12.99 | 12.99 | 13.48 | 13.24 | 13.48 | 13.24 | 29.20 | 13.24 |
| LSA Cos All + WF (IX) | 12.14 | 11.89 | 11.89 | 11.89 | 11.89 | 11.89 | 12.39 | 12.14 | 12.39 | 12.14 | 28.35 | 12.14 |
| LSA BM25 All + WF (X) | 13.24 | 12.99 | 12.99 | 12.99 | 12.99 | 12.99 | 13.48 | 13.24 | 13.48 | 13.24 | 29.20 | 13.24 |
| LSA Cos BM25 Text + WF (XI) | 12.13 | 12.38 | 12.13 | 12.38 | 12.13 | 28.34 | 12.13 | |||||
| LSA Cos BM25 All + WF (XII) | 12.73 | 12.49j | 12.49j | 12.49j | 12.73 | 12.49j | 12.98 | 12.73 | 12.98 | 12.73 | 28.94 | 12.73 |
aLSA: latent semantic analysis.
bNormalized LSA.
cLSA with stemming.
dNormalized LSA with stemming.
eCos: cosine.
fBM25: Okapi Best Matching 25.
gOnly the text part of the writing is considered.
hWF: writing features.
iThe whole writing is considered.
jStatistically significant performance improvements over the best singleton model in Table 4.
Evaluation results for classification of different feature sets for the dual model (thw=53). The first column shows features for the positive model, and the first row shows features for the negative model. Positive feature sets are numbered and negative features follow the same numbering. The values for the best early risk detection error50 are in italics. Labels for the algorithms (Roman numerals) are shared for rows and columns.
| Features | I | II | III | IV | V | VI | VII | VIII | IX | X | XI | XII |
| LSAa (I) | 10.20 | 9.95 | 9.95 | 9.95 | 9.95 | 9.95 | 10.45 | 10.20 | 10.45 | 9.95 | 16.18 | 10.20 |
| Norm LSAb (II) | 15.46 | 15.21 | 12.97 | 15.21 | 15.21 | 15.21 | 15.71 | 15.46 | 15.71 | 15.46 | 31.42 | 5.46 |
| LSA stemc (III) | 11.19 | 10.94 | 10.94 | 10.94 | 10.94 | 10.94 | 11.44 | 11.19 | 11.44 | 11.19 | 25.15 | 11.19 |
| Norm LSA stemd (IV) | 15.46 | 15.21 | 12.97 | 15.21 | 15.21 | 15.21 | 15.71 | 15.46 | 15.71 | 15.46 | 31.42 | 15.46 |
| Cose BM25f Textg + WFh (V) | 15.46 | 15.21 | 12.97 | 15.21 | 15.21 | 15.21 | 15.71 | 15.46 | 15.71 | 15.46 | 31.42 | 15.46 |
| Cos BM25 Alli + WF (VI) | 15.46 | 15.21 | 12.97 | 15.21 | 15.21 | 15.21 | 15.71 | 15.46 | 15.71 | 15.46 | 31.42 | 15.46 |
| LSA Cos Text + WF (VII) | 10.20 | 9.95j | 9.95j | 9.95j | 9.95j | 9.95j | 10.45 | 10.20 | 10.45 | 9.95j | 16.18 | 10.20j |
| LSA BM25 Text + WF (VIII) | 10.20 | 9.95j | 9.95j | 9.95j | 9.95j | 9.95j | 10.45 | 10.20 | 10.45 | 9.95j | 16.18 | 10.20j |
| LSA Cos All + WF (IX) | 10.41 | 10.16 | 9.16 | 10.16 | 10.16 | 10.16 | 10.66 | 10.41 | 10.66 | 10.16 | 16.65 | 10.41 |
| LSA BM25 All + WF (X) | 10.32 | 10.07 | 10.07 | 10.07 | 10.07 | 10.07 | 10.57 | 10.32 | 10.57 | 10.07 | 16.30 | 10.32 |
| LSA Cos BM25 Text + WF (XI) | 10.17 | 9.93 | 9.93 | 9.93 | 9.93 | 10.42 | 10.17 | 10.42 | 9.93 | 17.41 | 10.17 | |
| LSA Cos BM25 All + WF (XII) | 13.48 | 13.23 | 10.98j | 13.23 | 13.23 | 13.23 | 13.73 | 13.48 | 13.73 | 13.48 | 28.94 | 13.48 |
aLSA: latent semantic analysis.
bNormalized LSA.
cLSA with stemming.
dNormalized LSA with stemming.
eCos: cosine.
fBM25: Okapi Best Matching 25.
gOnly the text part of the writing is considered.
hWF: writing features.
iThe whole writing is considered.
jStatistically significant performance improvements over the best singleton model in Table 4.