| Literature DB >> 32922323 |
Yong Li1, Mengsi Cai2,3, Shuo Qin4, Xin Lu2,5.
Abstract
BACKGROUND: A large amount of evidence has indicated an association between depression and HIV risk among men who have sex with men (MSM), but traditional questionnaire-based methods are limited in timely monitoring depressive emotions with large sample sizes. With the development of social media and machine learning techniques, MSM depression can be well monitored in an online and easy-to-use manner. Thereby, we adopt a machine learning algorithm for MSM depressive emotion detection and behavior analysis with online social networking data.Entities:
Keywords: Blued; Twitter; behavior analysis; depressive emotion detection; men who have sex with men
Year: 2020 PMID: 32922323 PMCID: PMC7456911 DOI: 10.3389/fpsyt.2020.00830
Source DB: PubMed Journal: Front Psychiatry ISSN: 1664-0640 Impact factor: 4.157
Description on the Blued and Twitter data sets.
| Database | Data set | Number of users | Number of posts | Time range of posts |
|---|---|---|---|---|
| Blued | B_D1 | 346 | 19,457 | Jan 2012 to Mar 2019 |
| B_D2 | 8,552 | 155,138 | Feb 2019 to Mar 2019 | |
| B_D3 | 2,627 | 63,332 | Jan 2012 to Mar 2019 | |
| Total | 11,525 | 237,927 | Two-month time window | |
| T_D1 | 2,353 | 480,631 | Jan 2009 to Dec 2016 | |
| T_D2 | 4,990 | 3,956,077 | Dec 2016 | |
| T_D3 | 43,668 | 37,024,339 | Dec 2016 | |
| Total | 51,011 | 41,461,047 | One-month time window |
Features used in the XGBoost model.
| Level | Group | Feature Name | Definition |
|---|---|---|---|
| Use-level | User Profile Features | followings | The total number of users who followed me. |
| followers | The total number of users who I followed. | ||
| listNum | The total number of interested groups that I participated in. | ||
| Post-level | Social Interaction Features | favorNum | The average number of times each post was favored by others, |
| mentionNum | The average number of times I was mentioned in other’s posts (e.g. @me), | ||
| repostNum | The average number of times each post was reposted/retweeted by others, | ||
| topicNum | The average number of topic tags contained in each post (e.g. #fun# on Blued or #fun on Twitter), | ||
| postNum | The number of my posts in the data sets. | ||
| timeDist | The average numbers of my posts during 24 h, | ||
| Emotion Features | posWordNum | The average number of positive words in each post, | |
| negWordNum | The average number of negative words in each post, | ||
| emoNum | The total number of emoticons in my posts. | ||
| posEmoNum | The average number of positive emoticons contained in my posts, | ||
| negEmoNum | The average number of negative emoticons contained in my posts, | ||
| Linguistic Features | LDATopicWords | The top 15 LDA topic words extracted from my posts, | |
| antidepressNum | The average number of antidepressant drug names in each post, | ||
| depressWordNum | The average number of times the character string “depress” appeared in each post (named depressive word), | ||
| picNum | The average number of pictures in each post, | ||
| videoNum | The average number of videos in each post, |
Parameters of XGBoost model for the Blued and Twitter data sets.
| Parameter | Blued | ||
|---|---|---|---|
| XGBoost | Estimators | 50 | 200 |
| Maximum tree depth | 4 | 4 | |
| Learning rate | 0.06 | 0.06 | |
| Minimum child weight | 2 | 1 | |
| Subsample | 0.45 | 0.8 | |
| Colsample_bytree | 0.65 | 0.8 | |
| Regularization alpha | 1 | 0.001 |
Classification performance of the XGBoost algorithm on the Blued and Twitter data sets.
| Data set | Accuracy | Recall | Precision | F1 score |
|---|---|---|---|---|
| Blued | 0.9940 | 0.9648 | 0.9563 | 0.9602 |
| 0.9671 | 0.9591 | 0.9649 | 0.9619 |
Figure 1Performance comparison of different feature combinations in the Blued data sets.
Figure 2Feature importance of XGBoost algorithm in the Blued and Twitter data sets.
Figure 4Salient LDA topic words for (A) depressed MSM users, (B) depressed non-MSM users, (C) non-depressed MSM users, and (D) non-depressed non-MSM users. The larger the word, the more frequent it appears in the posts.
Figure 3Online behavior characteristics of depressed and non-depressed users for MSM population on Blued (top row) and non-MSM population on Twitter (bottom row). (A, D): active time comparison; (B, E): posting custom of each post with regard to five representative behaviors; (C, F): distributions of user profile features.
Mean of features on Blued data sets and Twitter data sets.
| Feature | MSM users (Blued) | Non-MSM users (Twitter) | ||||
|---|---|---|---|---|---|---|
| Depressed | Non-depressed | Total | Depressed | Non-depressed | Total | |
| followings | 254.491 | 306.236 | 298.2 | 676.427 | 748.8731 | 710.384 |
| followers | 514.511 | 550.949 | 545.3 | 888.671 | 1237.16 | 1052.01 |
| listNum | 1.9 | 2.758 | 2.6 | 35.313 | 52.8384 | 43.53 |
| postNum | 28.675 | 19.167 | 20.6 | 392.887 | 1214.16 | 777.834 |
| repostNum | 0.0227 | 0.0798 | 0.07 | 1201.23 | 1558.16 | 1368.53 |
| mentionNum | 0.0004 | 0.0028 | 0.002 | 0.605 | 0.6466 | 0.624 |
| topicNum | 0.0394 | 0.022 | 0.025 | 0.265 | 0.184 | 0.227 |
| picNum | 0.966 | 1.371 | 1.31 | 0.17 | 0.21 | 0.189 |
| videoNum | 0.0129 | 0.0921 | 0.08 | 0.0155 | 0.024 | 0.019 |
| antidepressNum | 0.0003 | 3.453 | 0.00007 | 0.013 | 0.0122 | 0.0126 |
| depressWordNum | 0.225 | 0.0019 | 0.037 | 0.0447 | 0.0039 | 0.025 |
| emoNum | 0.098 | 0.112 | 0.109 | 0.0916 | 0.112 | 0.101 |
| posEmoNum | 0.057 | 0.084 | 0.079 | 0.0533 | 0.0655 | 0.059 |
| negEmoNum | 0.039 | 0.026 | 0.028 | 0.026 | 0.0325 | 0.029 |
| posWordNum | 2.368 | 1.41 | 1.559 | 0.434 | 0.339 | 0.39 |
| negWordNum | 2.079 | 0.992 | 1.161 | 0.339 | 0.266 | 0.305 |
| favorNum | 7.618 | 23.622 | 21.135 | 1.0065 | 0.976 | 0.992 |