| Literature DB >> 24465462 |
Lin Li1, Ang Li1, Bibo Hao1, Zengda Guan1, Tingshao Zhu2.
Abstract
Because of its richness and availability, micro-blogging has become an ideal platform for conducting psychological research. In this paper, we proposed to predict active users' personality traits through micro-blogging behaviors. 547 Chinese active users of micro-blogging participated in this study. Their personality traits were measured by the Big Five Inventory, and digital records of micro-blogging behaviors were collected via web crawlers. After extracting 839 micro-blogging behavioral features, we first trained classification models utilizing Support Vector Machine (SVM), differentiating participants with high and low scores on each dimension of the Big Five Inventory [corrected]. The classification accuracy ranged from 84% to 92%. We also built regression models utilizing PaceRegression methods, predicting participants' scores on each dimension of the Big Five Inventory. The Pearson correlation coefficients between predicted scores and actual scores ranged from 0.48 to 0.54. Results indicated that active users' personality traits could be predicted by micro-blogging behaviors.Entities:
Mesh:
Year: 2014 PMID: 24465462 PMCID: PMC3898945 DOI: 10.1371/journal.pone.0084997
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mean Value and Standard Deviation of Scores on Five Dimensions of the Big-Five Personality Traits (n = 547).
| Extraversion | Agreeableness | Conscientiousness | Neuroticism | Openness | |
| Mean Value | 3.20 | 3.65 | 3.15 | 3.08 | 3.61 |
| Standard Deviation | 0.66 | 0.56 | 0.60 | 0.65 | 0.56 |
Notes. Each dimension of personality is scored from 1.00 to 5.00.
Details of APIs Provided by Sina Weibo.
| Categories | Description |
| Users/show | detailed information of a user's profile |
| Blog/user_timeline | list of a user's micro-blogs updates |
| Trends | list of a user's trending topics selection |
| Tag | list of a user's tags selection |
| Friendships/friends | detailed information of friends whom a user follows |
| Friendships/friends/ids | list of registration IDs of friends whom a user follows |
| Friendships/followers/ids | list of registration IDs of a user's followers |
| Friendships/friends/bilateral | detailed information of a user's mutual friends |
Figure 1The Distribution of Users' Total Number of Micro-Blogs Updates (n = 99,925,821).
Figure 2The Distribution of Users' Average Count of Micro-Blogs Updates per Day (n = 5,807,999).
Figure 3User Interface of a Weibo-Based Application Named “XinLiDiTu”.
Figure 4Procedure of this Study.
Number of High-Scoring and Low-Scoring Participants on Each Dimension of the Big-Five Personality Traits (n = 547).
| Agreeableness | Conscientiousness | Extraversion | Neuroticism | Openness | |
| High-Scoring Group | 94 | 86 | 96 | 97 | 86 |
| Low-Scoring Group | 78 | 102 | 95 | 85 | 81 |
Figure 5Exporting Time Series Data from Feature Matrix.
Performance of Selected Features in PaceRegression Models (n = 547).
| Optimal Prediction Period | Number of Selected Features | Adjusted R-Square | |
| Agreeableness | 74-day | 31 | 0.22 |
| Conscientiousness | 97-day | 35 | 0.29 |
| Extraversion | 51-day | 25 | 0.26 |
| Neuroticism | 74-day | 30 | 0.26 |
| Openness | 74-day | 26 | 0.23 |
Figure 6Results of Model Evaluation.
PaceRegression Coefficients of Important Features for Predicting Personality Dimensions (n = 547).
| Extraversion | Agreeableness | Conscientiousness | Neuroticism | Openness |
| Feature 1 | Feature 5 | Feature 10 | Feature 13 | Feature 1 |
| (β = 0.7551) | (β = −0.2434) | (β = 0.9968) | (β = −1.1922) | (β = 1.0274) |
| Feature 2 | Feature 6 | Feature 11 | Feature 14 | Feature 19 |
| (β = 0.6704) | (β = −0.4580) | (β = −0.6978) | (β = −0.6246) | (β = 0.0007) |
| Feature 3 | Feature 7 | Feature 12 | Feature 5 | Feature 20 |
| (β = 0.0610) | (β = 0.1997) | (β = −0.0075) | (β = 0.3756) | (β = 0.1189) |
| Feature 4 | Feature 8 | Feature 15 | Feature 21 | |
| (β = 0.6846) | (β = −0.0469) | (β = 0.5821) | (β = 0.3794) | |
| Feature 9 | Feature 16 | |||
| (β = 0.2432) | (β = −0.1727) | |||
| Feature 17 | ||||
| (β = −1.4693) | ||||
| Feature 18 | ||||
| (β = 1.2599) |
Notes. Feature 1 = Having certified users in mutual friends (yes = 1, no = 0). Feature 2 = Number of friends whom a user follows. Feature 3 = Numerical order of the date in observation period for updating positive emoticons the most. Feature 4 = Standard deviation of hours (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in every 24-hour period for forwarding the first micro-blog whose total number of comments and forwards have been over 5. Feature 5 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually sending @ mentions to friends whom a user follows the most. Feature 6 = The day of the week (1–7, ranging from Monday to Sunday) for usually forwarding micro-blogs updated by friends whom a user follows the most. Feature 7 = Standard deviation of hours (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in every 24-hour period for forwarding micro-blogs updated by friends whom a user follows the most. Feature 8 = Numerical order of the date in observation period for forwarding micro-blogs updated by apps for information purpose the most. Feature 9 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually forwarding micro-blogs updated by accounts of organization the most. Feature 10 = Having a register account of Sina Blogging as well (yes = 1, no = 0). Feature 11 = Standard deviation of numbers in updating negative emoticons every day. Feature 12 = Summation of hours (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in every 24-hour period for updating the first micro-log attached with @ mentions. Feature 13 = Use of the first personal pronoun subjects in creating self-statement (yes = 1, no = 0). Feature 14 = Number of user's interested trending topics shared by over 10000 users. Feature 15 = The day of the week (1–7, ranging from Monday to Sunday) for usually updating emoticons the most. Feature 16 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually forwarding micro-blogs updated by apps for business purpose the most. Feature 17 = Standard deviation of numbers in forwarding micro-blogs updated by accounts of website every day. Feature 18 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually updating original micro-blogs with maximum content length. Feature 19 = Number of favorite micro-blogs which user collects. Feature 20 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually using apps for business purpose the most. Feature 21 = The hour (1–24, ranging from 6:00 a.m. to the next 6:00 a.m.) in a 24-hour period for usually updating positive emoticons the most.