| Literature DB >> 35788626 |
Max Pellert1,2,3,4, Hannah Metzler5,6,7,8, Michael Matzenberger9, David Garcia5,6,7.
Abstract
Measuring sentiment in social media text has become an important practice in studying emotions at the macroscopic level. However, this approach can suffer from methodological issues like sampling biases and measurement errors. To date, it has not been validated if social media sentiment can actually measure the temporal dynamics of mood and emotions aggregated at the level of communities. We ran a large-scale survey at an online newspaper to gather daily mood self-reports from its users, and compare these with aggregated results of sentiment analysis of user discussions. We find strong correlations between text analysis results and levels of self-reported mood, as well as between inter-day changes of both measurements. We replicate these results using sentiment data from Twitter. We show that a combination of supervised text analysis methods based on novel deep learning architectures and unsupervised dictionary-based methods have high agreement with the time series of aggregated mood measured with self-reports. Our findings indicate that macro level dynamics of mood expressed on an online platform can be tracked with social media text, especially in situations of high mood variability.Entities:
Mesh:
Year: 2022 PMID: 35788626 PMCID: PMC9253324 DOI: 10.1038/s41598-022-14579-y
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1(A) Time series of the daily percentage of positive mood reported in the survey and the aggregated sentiment of user-generated text on derstandard.at. The shaded blue area corresponds to 95% bootstrapped confidence intervals. (B) Scatterplot of text sentiment and survey responses with regression line. (C) Scatterplot of the daily changes in both text sentiment and survey responses compared to the previous day, with regression line.
Figure 2(A) Time series of the daily percentage of positive mood reported in the survey and the aggregated sentiment of user-generated text on Twitter in Austria. The shaded blue area corresponds to 95% bootstrapped confidence intervals. (B) Scatterplot of text sentiment and survey responses with regression line. (C) Scatterplot of the daily changes in both text sentiment and survey responses compared to the previous day, with regression line.
Correlation of positive mood in the survey with text sentiment measures on both platforms (Der Standard and Twitter).
| Der Standard (no shift) | Twitter (Shift 1) | Twitter (No shift) | |
|---|---|---|---|
| LIWC+GS | 0.93 [0.82,0.97] | 0.90 [0.75,0.96] | 0.71 [0.39,0.88] |
| LIWC | 0.74 [0.44,0.89] | 0.85 [0.65,0.94] | 0.66 [0.31,0.85] |
| LIWC pos | 0.81 [0.56,0.92] | 0.80 [0.56,0.92] | 0.60 [0.22,0.83] |
| LIWC neg | 0.03 [− 0.42,0.46] | − 0.74 [− 0.89, − 0.43] | − 0.63 [− 0.84,− 0.26] |
| GS | 0.91 [0.78,0.96] | 0.91 [0.79,0.96] | 0.73 [0.43,0.89] |
| GS pos | 0.89 [0.75,0.96] | 0.91 [0.79,0.97] | 0.80 [0.54,0.92] |
| GS neg | − 0.57 [− 0.81,− 0.18] | − 0.39 [− 0.71,0.06] | − 0.17 [− 0.57,0.3] |
The table presents sentiment aggregates (positive minus negative emotions), as well as positive and negative components separately. LIWC+GS indicates the average across both sentiment analysis methods, all other lines present aggregates or components separately for each method. Shift 1 denotes a shift of one day, where survey values precede Twitter values.
Correlation of survey, aggregate Twitter sentiment and aggregate Der Standard sentiment with the number of new COVID-19 cases.
| New cases | |
|---|---|
| Twitter (aggregate shift 1) | − 0.60 [− 0.82,− 0.21] |
| Twitter (aggregate no shift) | − 0.57 [− 0.81,− 0.17] |
| Survey | − 0.53 [− 0.79,− 0.12] |
| Der Standard (aggregate) | − 0.33 [− 0.68,0.13] |
Figure S3 shows scatter plots for each of the variables and new COVID-19 cases.
Figure 3Time series of the aggregate (LIWC + GS) sentiment measure for Twitter (blue) and Der Standard (red) covering the time period between 2020-09-15 and 2021-12-30.