| Literature DB >> 34013675 |
Alina Trifan1, José Luis Oliveira1.
Abstract
With the continuous increase in the use of social networks, social mining is steadily becoming a powerful component of digital phenotyping. In this paper we explore social mining for the classification of self-diagnosed depressed users of Reddit as social network. We conduct a cross evaluation study based on two public datasets in order to understand the impact of transfer learning when the data source is virtually the same. We further complement these results with an experiment of transfer learning in post-partum depression classification, using a corpus we have collected for the matter. Our findings show that transfer learning in social mining might still be at an early stage in computational research and we thoroughly discuss its implications.Entities:
Keywords: cross-evaluation; depression; machine learning; post-partum depression; social monitoring
Mesh:
Year: 2021 PMID: 34013675 PMCID: PMC8238472 DOI: 10.1515/jib-2020-0051
Source DB: PubMed Journal: J Integr Bioinform ISSN: 1613-4516
Absolutist words validated by Al-Mosaiwi et al. [30].
| Absolutely | Constant | Every | Never |
| All | Constantly | Everyone | Nothing |
| Always | Definitely | Everything | Totally |
| Complete | Entire | Full | Whole |
| Completely | Ever | Must |
Statistics of the training datasets.
| [ | [ | |||
|---|---|---|---|---|
| Control | Depressed | Control | Depressed | |
| Number of subjects | 36197 | 3112 | 403 | 83 |
| Avg. number of words per user | 20820 | 69556 | 21318 | 16416 |
| Avg. number of absolutist words | 189 | 701 | 153 | 154 |
| Avg. number of self-related words | 579 | 2411 | 430 | 731 |
Statistics of the test datasets.
| [ | [ | |||
|---|---|---|---|---|
| Control | Depressed | Control | Depressed | |
| Number of subjects | 36218 | 3112 | 352 | 54 |
| Avg. number of words per user | 21164 | 70305 | 21933 | 15370 |
| Avg. number of self-related words | 590 | 2435 | 529 | 637 |
| Avg. number of absolutist words | 189 | 709 | 167 | 145 |
Classification results.
| Method | Prec. | Rec. | F1 | Acc |
|---|---|---|---|---|
|
| ||||
| Support vector machine |
| 0.62 | 0.68 |
|
| Multinomial Bayes | 0.61 | 0.47 | 0.53 | 0.94 |
| Passive aggressive | 0.64 | 0.64 | 0.64 | 0.94 |
| Feature union rule-based | 0.68 |
|
|
|
| [ | 0.75 | 0.57 | 0.65 | N/A |
| [ | 0.37 | 0.70 | 0.49 | N/A |
|
| ||||
| Support vector machine |
| 0.20 | 0.33 |
|
| Multinomial Bayes | 0.52 |
|
| 0.87 |
| Passive aggressive | 0.70 | 0.38 | 0.50 |
|
| Feature union rule-based | 0.72 | 0.14 | 0.24 | 0.87 |
| [ | 0.62 | 0.59 | 0.65 | N/A |
Cross evaluation results.
| Method | Prec. | Rec. | F1 | Acc |
|---|---|---|---|---|
| Support vector machine |
| 0.38 | 0.42 |
|
| Multinomial Bayes | 0.19 | 0.19 | 0.19 | 0.75 |
| Passive aggressive | 0.40 |
|
| 0.80 |
Comparative results on classifying PPD. These results were obtained prior to the removal of depression related terms from the corpus.
| Method | Prec. | Rec. | F1 | Acc |
|---|---|---|---|---|
| Stochastic gradient descent (l1 = 0.95, loss = | 0.90 | 0.88 | 0.89 |
|
| Multinomial naive Bayes (alpha = 1) |
| 0.81 | 0.87 | 0.92 |
| Perceptron | 0.83 | 0.87 | 0.85 | 0.89 |
| Passive aggressive (loss = | 0.90 |
|
|
|
| RSDD trained model |
| 0.43 | 0.6 | 0.8 |
Comparative results on classifying PPD after removing depression related words from the corpus.
| Method | Prec. | Rec. | F1 | Acc |
|---|---|---|---|---|
| Passive aggressive (loss = | 0.88 |
|
|
|
| RSDD trained model |
| 0.22 | 0.36 | 0.73 |