Literature DB >> 30897174

Identifying long-term periodic cycles and memories of collective emotion in online social media.

Yukie Sano¹, Hideki Takayasu^2,3, Shlomo Havlin⁴, Misako Takayasu³.

Abstract

Collective emotion has been traditionally evaluated by questionnaire survey on a limited number of people. Recently, big data of written texts on the Internet has been available for analyzing collective emotion for very large scales. Although short-term reflection between collective emotion and real social phenomena has been widely studied, long-term dynamics of collective emotion has not been studied so far due to the lack of long persistent data sets. In this study, we extracted collective emotion over a 10-year period from 3.6 billion Japanese blog articles. Firstly, we find that collective emotion shows clear periodic cycles, i.e., weekly and seasonal behaviors, accompanied with pulses caused by natural disasters. For example, April is represented by high Tension, probably due to starting school in Japan. We also identified long-term memory in the collective emotion that is characterized by the power-law decay of the autocorrelation function over several months.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2019 PMID： 30897174 PMCID： PMC6428299 DOI： 10.1371/journal.pone.0213843

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Information and Communication Technology enables large amounts of data related to human behaviors to be collected in milliseconds opening a novel research area of data-driven social sciences [1-3]. In particular, personal opinions and feelings that cannot be known directly from other sources are archived from blogs. In the past, only a few celebrities have been able to express their opinions and feelings typically in a book or magazine form. Nowadays, more and more people are writing articles and share content on the Internet, not only for archival purposes, but also for sharing them in real-time. Since the Internet population has already exceeded three billion and many people post their own texts online, various studies of Web-based phenomena have been conducted since the beginning of the twenty-first century. Diffusion phenomena on microblogging platforms such as Twitter have been well studied in various languages [4-6]. Bursty behaviors [7] and collective attention [8] have been quantified in the Japanese Twitter space. Furthermore, studies on predicting real-world phenomena through the Internet data are rapidly growing, e.g., stock prices [9, 10], movie box office revenue [11, 12], political polls [13], public health including depression mood [14, 15] and macroeconomic indices [16]. Studies of collective emotion from the Internet are also growing rapidly. Pioneering work of measuring collective emotion on Twitter space in the UK is conducted since 2009 [17]. The diffusion of positive and negative emotions in Twitter has been investigated [18]. In one study, circadian rhythms of positive and negative moods on Twitter were measured for two years [19], and in another study, emotional contagions in Facebook posts were reported in 2014 [20]. Collective emotion and its detection method are well discussed in [21]. Collective emotion and its relation to real social phenomena have been also studied [9, 16, 22, 23]. Gilbert and Karahalios constructed an ‘Anxiety Index’ using blog data from three periods in 2008 and performed a comparison with S&P stock market prices. They found that a one sigma increase of the Anxiety Index corresponds to a 0.4% downturn of S&P prices [9]. Bollen et al. measured emotional mood using Twitter for nine months in 2008 and performed a comparison with the Dow Jones Industrial Average. They found that adding the emotion of calm increased prediction accuracy [22]. The United Nations project found that increases in the emotion of confusion happened about three months ahead of the increase in the unemployment rate in Ireland [16]. Furthermore, collective emotion is found to have greater power in affecting ideology [23], and sometimes on misinformation spreading. Extracting and tracing collective emotion on the Internet seems to be essential for building a safe and secure society. However, most of the earlier studies focused on collective emotion during relatively short-term, i.e., three years or less. This is since social media has penetrated our daily lives only about 10 years ago, e.g., Facebook officially launched in 2006, Twitter began to spread in early 2008, and Instagram was not released until 2010. Therefore, only a few studies on long-term dynamics of collective emotion have been conducted [24, 25] and in particular, the possibility of long-term memory in collective emotion have attracted very little attention so far. In the present study, we analyzed 3.6 billion blog articles posted during a 10-year period in Japan, from 2006 to 2016. To the best of our knowledge, the 10-year period is the longest period for which emotions have been extracted from the Internet. Our pre-built emotional dictionary was carefully tested with regard to whether the frequency of each listed word was adequate and to whether the listed words were actually affiliated with the emotions of the blog authors. Our paper is structured as follows. First, we provide a definition of collective emotion used here and compare it with the definition used in earlier studies in Materials and methods. Also, we introduce our data and statistical procedures in this section. We then provide our results regarding the accumulation of collective emotion from blogs. Next, we show the existence of periodic cycles in collective emotion. After removing these periodic cycles, sharp spikes attributed to external events such as natural disasters have been observed. Finally, we discuss the long-term memory of collective emotion which we found using basic statistical methods.

Materials and methods

To quantify collective emotion for long-term, we examined the Japanese blog space that has been widely used since around 2006. Unlike Twitter, which is currently in widespread use, blogs generally have no character limitation and can include long texts. For long texts, it is found that dictionary-based methods are robust to classify emotions accurately [26]. Therefore, we applied dictionary-based methods for 10 years of blog data to determine long-term collective emotion.

Blog data

We employed data from the Japanese blog space between November 1, 2006 and October 31, 2016 using a fee-charging service called ‘Kuchikomi@kakaricho (https://kakaricho.jp/: Accessed August 24, 2018)’ on December 1, 2016. This service provides the daily number of blog articles that include any given target word more than once with a built-in spam filter via API. Here we set the spam filter to a high level. As of October 2016, the full database contains more than 3.6 billion blog articles from 43 million independent accounts. Basically, this database contains public blog articles that are posted on major blogging platforms, tweets on Twitter, and writings on a textboard system in Japanese. Here we only use public blog articles based on the terms and conditions of the service. In principle, the database can be used by anyone if contracted with the company (https://www.hottolink.co.jp/: Accessed August 24, 2018)’. In fact, various studies have been conducted based on the database so far [27-29]. Due to the system specification, if one blog article contained the same word multiple times, we counted it once. On the other hand, if one blog article contained two different words, we counted it as two. Since we mainly used word frequencies on blog space via API, we cannot access personally identifying information. We checked several publicly readable blog articles throughout our study, but they are anonymized, and we cannot identify the authors.

POMS and emotion dictionary

To extract collective emotion from the Internet, one popular method is to categorize articles as either positive or negative emotion, and then to extend these categories into more dimensions with further complex emotions [25]. The aim of the present study is to analyze long-term periodic cycles and memories of collective emotion which is extracted from the texts obtained from blogs in the Internet. Here we categorize emotions into six dimensions based on the well-established psychological literature [30]. Because some emotions are already difficult to categorize into either positive or negative, e.g., feelings representing fatigue may be classified as both positive and negative according to the context, multidimensional emotions may reveal interesting properties of collective emotion from new perspectives. Extracting multidimensional emotions has historically been done by psychologists using questionnaires on relatively small groups [31]. In self-reported questionnaire surveys, participants passively answer questions. In recent years, attempts have been made to extract emotions from online texts, which have been written actively and spontaneously, based on words contained in traditional question items [32]. There exists various ways to extract multidimensional emotions. The Affective Norms for English Words (ANEW) is an English emotion dictionary that contains about 1,000 words [33]. ANEW has three semantic differentials, namely, good-bad, active-passive, and strong-weak. Dodds and Danforth quantified happiness in songs, blogs, and a State of the Union address using ANEW words [34]. The Positive and Negative Affect Schedule (PANAS) is also a well-established English psychometric scale that consists of two 10-item mood scales [35], including fear (negative) and joviality (positive). Recently, PANAS was expanded to extract emotions from Twitter [36]. Unlike ANEW, PANAS is officially translated into a number of languages, including Russian and German. However, the Japanese version of PANAS has only been validated within a limited scope. Here we develop and study the emotion based on the Profile of Mood States (POMS) measure of a psychological rating scale [30]. In this study, we built an original emotion dictionary based on the Japanese version of POMS. POMS was originally developed to measure the effectiveness of pharmacological therapy for veterans in the U.S. POMS can measure temporal mood states based on answers to 65 short questions identifying the following six extracted emotions: Tension-Anxiety (Tension), Depression-Dejection (Depression), Anger-Hostility (Anger), Vigor, Fatigue, and Confusion. In the following, the names of the POMS emotions will be used as those given in parentheses. POMS 65 questions are attributed to each of the six emotions: 9 items for Tension, 15 for Depression, 12 for Anger, 8 for Vigor, 7 for Fatigue, and 7 for Confusion. The participants answer the questions with scores from zero (fully disagree) to four (fully agree). Note that there are 2 opposite question items in Tension and Confusion. For example, the question ‘feel relaxed’ is used for measuring Tension by scoring small values. These 2 opposite questions and 7 dummy questions that were excluded in our procedure. The original purpose of POMS is to measure temporal emotions of individuals. However, since many English POMS questions are simple, including items such as ‘sad’ and ‘angry,’ several researchers have recently decided to use it to determine collective emotion on the Internet. Bollen et al. used POMS to extract emotions from Twitter over about a 1-year period [32]. They found that POMS mood reflected some social/economic phenomena such as Thanksgiving Day and elections. POMS was officially translated into Japanese in 1994 by a Japanese psychologist [37]. Since then, it has been used for various purposes, such as measuring conditions of athletes and conducting mental health checks in firms; therefore, POMS is considered reliable, even for Japanese. The Japanese version of POMS is also used to determine collective emotion on Japanese Twitter space for 5 months and it is found to be related to real social phenomena such as Christmas time [38]. Here we parsed some words which are attributed to POMS emotions to build our emotion dictionary. Overview of our dictionary building procedure is as follows (details are described in S1 Appendix): Parse one word that best expresses the emotion from each POMS question Add orthographic variants and synonyms for each parsed word Remove very low and very high frequency words When building the emotion dictionary, we adjusted the number of listed words so that specific words would not become dominant. Due to our careful procedure, the number and frequency of words were comparable for each emotion. Eventually, 21 words for Tension, 25 for Depression, 25 for Anger, 20 for Vigor, 22 for Fatigue, and 35 for Confusion were included in our emotion dictionary. Our original emotion dictionary and each emotion time series can be found in S2 Appendix.

Collective emotion time series

In previous literature, Bollen et al. [32] produced collective emotion by averaging the mood vectors for each tweet that is limited to 140 characters. However, in the case of blogs that has no limit on the number of characters, the same method is difficult to implement. Therefore, in order to make it as simple and clear, we defined the collective emotion by aggregating the time series of the frequency of words listed in our dictionary. We first generate the time series for word i that belongs to emotion k at day t, , and define the time series of emotion k as follows: where M is the number of words that belong to emotion k. Because the appearance of a word in the emotion dictionary can easily fluctuate due to news and external factors, summing up several words can reduce the fluctuation [28]. Next, to determine each of the emotional dynamics, we calculate each emotion’s time series Z(t). First, we calculated normalized raw dynamics as follows: where X(t) is the total number of blog articles posted at day t. Then, we standardized as follows: where and σ are the temporal mean and temporal standard deviation of for whole period. The standardized number of whole emotional dynamics and whole blogs that are independent of words X(t) are displayed on a monthly scale in Fig 1.

Fig 1

Monthly changes of blogs over 10 years.

(A) Each emotional dynamics Z(t) Tension, Depression, Anger, Vigor, Fatigue, and Confusion are shown from top to bottom. (B) Standardized numbers of summed emotions (top) and the whole number of blogs X(t) (bottom).

Monthly changes of blogs over 10 years.

Calculation of periodic cycles

We determined periodic cycles of time series y(t) as {y(t);t = t0, t0 + 1, ⋯, t0 + L, ⋯, t0 + 2L, ⋯} with its periodicity l = (0, 1, ⋯, L − 1). Thus, weekly periodicity is l = (Mon., Tue., ⋯, Sun.) with L = 7, and yearly periodicity in monthly scale is l = (Jan., Feb., ⋯, Dec.) with L = 12, and yearly periodicity in daily scale is l = (1, 2, ⋯, 365) with L = 365. The m-th periodicity p(l) is calculated as follows: where t = t0+ mL and . Then, the averaged periodicity p(l) is where M is the total number of periodic cycles in time series y(t). The standard deviations of M ensembles s(l) is To exclude the periodic cycle, we simply divided y(t) = y(t0 + ml) by p(l).

Autocorrelation and power spectral density

Autocovariance function Cov(τ) for time series z(t) is calculated as follows: where μ is the temporal mean of z(t) and 〈⋅〉 is the ensemble mean. Then autocorrelation function ρ(τ) is When a stationary time series has long-term memory property, . This occurs when ρ(τ) ∼ τ−, α < 1 is a clear sign of long-term memory property. The power spectral density S(f) is the Fourier transform of the corresponding autocorrelation function ρ(τ) by Wiener-Khinchin theorem.

Results

Fig 1A shows the monthly time series before removing periodic cycles of each emotional dynamics Z(t) since November 2006. It is seen that Confusion increased during the global financial crisis in 2008. Tension increased sharply after the 3.11 earthquake in 2011. Vigor turned upward, and Anger and Fatigue turned downward in late 2012, when the Japanese government changed over and the economic situation started to improve.

Periodic cycles

Weekly periodicity

Weekly (7-day) periodicities are observed for each of the six emotional dynamics Z(t). This is clearly indicated by the autocorrelation functions of each emotional dynamics ρ(τ) before excluding the periodic cycles which show weekly periodic correlations and sharp peak in the power spectrum densities S(f) (shown later in Fig 4A and 4B). To further clarify this periodicity, we averaged daily amounts of collective emotion excluding the week of the 3.11 earthquake: March 9 to March 15 in 2011 and the 6 days at the end of the data period in October 2016. The weekly periodicity p(l) is clearly seen in Fig 2A.

Fig 4

Autocorrelation functions ρ(τ) and power spectral density S(f) of collective emotion.

Fig 2

Weekly and yearly periodicities p(l).

Weekly and yearly periodicities p(l).

(A) Weekly periodicities p(l) for each of the emotional dynamics with error bars representing standard deviations, s(l), over 520 weeks. Most emotions show differences between weekdays and weekends. (B)(C) Yearly periodicities p(l) for each of the emotional dynamics in monthly scale (bold) and in daily scale (dotted) with shaded area representing standard deviations s(l) for nine years. Fatigue increases during summer times (July and August) while Depression and Confusion slightly increases during winter times (December and January). It can be seen, for example, that Fatigue is higher on Mondays. By checking blog articles directly, we found some examples of people going out on weekends and being tired until Monday. Depression also increases on Mondays probably due to non-motivation feelings with regard to work and school. Tension increases on Fridays because people are probably worried about the weekend weather. Somewhat similar weekly periodicities of collective emotion were observed in Twitter space in the United Kingdom in 2011 [39] and in the United States between 2009 and 2010 [40]. In the U.K. study, it has not been clarified which emotions have increased on which day of the week, they found that joy showed the most clearly periodic behavior and anger showed less. In the U.S. study, they found that Saturday has the highest average happiness and Tuesday is the lowest. We cannot compare these results directly to ours. But note that our results show that Anger has weekly periodicity becoming less on weekends and more on weekdays. These weekly periodic cycles may correspond to the result of the U.S. study of happiness.

Yearly periodicity

To test the possibility of yearly (12-month and 365-day) periodicities of collective emotion, we calculated 12 months and each day of the month over the ten years average amounts of collective emotion. We excluded November 2010 to October 2011 because this span surrounds the 3.11 earthquake. Note that for calculating 365-day periodicities, we also excluded February 29 in 2008, 2012, and 2016 for the leap years. Fig 2B and 2C show the yearly periodicities p(l) for each emotion in monthly scale with shaded colored areas indicating the standard deviations s(l) (see Eq (6) in Materials and methods) and in daily scale with points indicating the major peaks as shown in Table 1.

Table 1

Major dates in which emotion increased significantly every year.

Rates are calculated from the temporal average, see also Fig 2.

Date	Emotion	Rate (%)	Event
February 14	Tension	112.3±13.7	Valentine’s Day
April 6	Tension	111.2±11.6	Entrance ceremonies
April 7	Tension	113.0±12.4
April 8	Tension	112.4±10.6
May 7	Fatigue	127.1±8.9	After GW holidays
December 30	Confusion	111.9±9.6	New Year’s Eve
December 31	Depression	115.6±12.6
December 31	Confusion	122.7±8.5

Major dates in which emotion increased significantly every year.

Rates are calculated from the temporal average, see also Fig 2. By collecting blog articles selectively and reading their content, we suggest the following reasons for the yearly periodicities in monthly scale. Fatigue increases in July and August, which are summer months in Japan. Because people suffer from hot and humid weather during the Japanese summer, they get tired easily. Because new school and fiscal years start every April in Japan, and many people start new schools or workplaces, this probably creates in April high Tension. Depression and Confusion tend to increase slightly in winter times, particularly in December and January, which might be caused by the short day-length. On the other hand, we did not detect clear monthly trends in the other emotions, Anger and Vigor. In Table 1, we list the specific dates for which the amount of each emotion increased more than 10% from the temporal average over the 10-year study period after excluding weekly and yearly cycles in monthly scale. In order to extract dates that are systematically high every year, we show dates where the emotion rate’s standard deviations are less than 15% in Table 1. As expected, the listed dates correspond to typical annual events such as New Year’s Eve and Valentine’s Day. Fatigue tends to be higher after the end of consecutive holidays. For example, Golden Week (GW) holidays that are consecutive national holidays every spring in Japan, show increased Fatigue. Although Fatigue rate is not more than 110%, after New Year’s holidays (108.8% and 108.5% for January 5 and 6 respectively) and traditional Japanese summer holidays (108.7% for August 17) show also higher Fatigue. Interestingly, Depression shows slightly higher on the final day of GW holidays (108.9% for May 6). This result suggests that people feel sad about the end of the holidays. It is also interesting to note that there are some dates that emotions steadily decrease every year. For example, January 1 is a special day that all emotions except Confusion decrease less than 90%. Christmas Eve is also a special day that all emotions except Depression decrease less than 90%. During New Year’s Days, GW holidays and Christmas days, Tension continues to decrease less than 90%. These findings suggest that people are spending relaxed time (Details are in S1 Appendix). Yearly periodicities of collective emotion on Twitter in the U.K. has been recently investigated during a period of four years from 2010 to 2014, excluding 2012 [41]. In the U.K., anger and sadness peak in the winter month and anxiety peaks in the autumn and spring. Our data did not show seasonal cycles in Anger, however, we find that also Tension peaks in the spring. Dzogang et al. [41] did not suggest the possible reasons of anxiety, however, since new school year in the U.K. starts in autumn, it may coincide with our results for Tension(-Anxiety). Furthermore, happy dates in the U.S. between 2008 and 2010 are reported by using Twitter [40] e.g., Christmas Eve and Day, New Year’s Eve and Day, Valentine’s Day, Thanksgiving etc. Some of these days coincide with our outlier dates shown in Table 1, while emotions are very different in both places. For example, New Year’s Eve is a happy day in the U.S. but Confusion and Depression increases in this day in Japan. This might be due to the differences between the typical people character in the U.S. and Japan. In New Year’s holiday, people expect to spend with family in both the U.S. and Japan. On the other hand, in Japan, the person who spends alone tends to feel much more lonely and post their feeling blogs causing high Depression. Taken together, yearly periodicities exist independent of language, culture and social media platform, but these characteristics might be different depending on them. There are various contexts behind collective emotion due to cultural background and platform usage. Since the difference of cyclic behaviors in Wikipedia editorial activities has been also observed to depend on various cultural backgrounds [42], comparing these periodic cycles in collective emotion across the countries may be of interest in future studies.

Remaining spikes

After removing the weekly and yearly periodicities, autocorrelation functions ρ(τ) show no periodic behaviors (shown later in Fig 4C) and distributions of the daily difference of each emotional dynamics, ΔZ(t) = Z(t) − Z(t − 1), show normal distribution in every emotion (S1 Appendix). However, we still identify several sharp spikes in each of the emotional dynamics. In Table 2, the major spikes that the emotion increased above the average value estimated from earlier seven days are listed. We confirmed that these spikes are associated with real events. We verified that most spikes were attributed to Tension in conjunction with natural disasters such as earthquakes and typhoons that occurred throughout the observation period. As for the duration of increased emotion, all cases except for the 3.11 earthquake returned to their original baseline within a week (Fig 3).

Table 2

Major spikes in descending order of increased rate which are estimated from averaging earlier seven days.

Tension shows many spikes due to earthquakes and typhoons.

Day	Emotion	Rate (%)	Event
March 11, 2011	Tension	602.6	the 3.11 earthquake
March 12, 2011	Deression	305.1	the day after the 3.11 earthquake
March 12, 2011	Confusion	273.6	the day after the 3.11 earthquake
April 15, 2016	Tension	240.3	Kumamoto earthquake
September 21, 2011	Tension	193.4	Typhoon Roke
October 7, 2009	Tension	177.1	Typhoon Melor
June 14, 2008	Tension	167.5	Iwate earthquake
January 18, 2016	Confusion	157.9	Heavy snowfall in Tokyo metropolitan area
September 10, 2015	Tension	156.7	Heavy rain in Tokyo metropolitan area
August 11, 2009	Tension	154.5	Shizuoka earthquake
October 15, 2013	Tension	153.8	Typhoon Wipha

Fig 3

Examples of sharp spikes of collective emotion after removing weekly and yearly periodicities.

Major spikes in descending order of increased rate which are estimated from averaging earlier seven days.

Tension shows many spikes due to earthquakes and typhoons.

Examples of sharp spikes of collective emotion after removing weekly and yearly periodicities.

(A) Tension, Depression, and Confusion continue to increase for more than a week at the 3.11 earthquake period in 2011. (B) Tension at the Kumamoto (the southwest of Japan) earthquake in 2016, and (C) Confusion at the heavy snowfall in the Tokyo metropolitan area in 2016. Except for the 3.11 earthquake, most sharp spikes returned to their original baseline within a week. The 3.11 earthquake was a special case where Tension continued to increase more than one month (37 days), followed by Depression and Confusion. It is also interesting to mention that the peak day of each emotion is different at the 3.11 earthquake. The peak day of Tension is one day after the earthquake, Depression is two days, and Confusion is three days after the earthquake. It reflects the fact that collective emotions are changing day by day. After the 3.11 earthquake, the social mood has been regarded to have changed qualitatively. At the time, an extraordinary mood, the so-called ‘self-restraint mood,’ has been prevalent in Japanese society. In relation to this mood, many people refrained from going out, such as choosing not to hold/attend annual cherry blossom viewing parties. In addition, fewer corporate TV commercials were broadcasts, and movie premiers and new product launches were postponed. To the best of our knowledge, there has been no previous quantitative survey regarding how long this unusual mood continued. Therefore, the present study is the first attempt to measure this unusual mood quantitatively based on the Internet. The 3.11 earthquake has been found to cause relatively low happiness in the U.S. Twitter space as well as found for the Chilean earthquake in February, 2010 [40]. We note that while events such as the Bailout of the U.S. financial system and the Royal Wedding of Prince William caused outlier days of happiness [40], our observed outlier days in Japanese blog space could be only attributed to the natural disasters.

Long-term memory

Long-term correlation

Fig 4 shows autocorrelation functions ρ(τ) and power spectral densities S(f) of each daily emotional dynamics Z(t) in log-log scale. We first separated Z(t) every one year and ρ(τ) was calculated with maximum lag τ = 365 days. The average autocorrelation function ρ(τ) which are shown in Fig 4C is calculated using only the stationary samples (Details are in S1 Appendix) to clearly see the exponent of ρ(τ). The power spectral densities S(f) are averaged over 10 years by Welch’s method [43] after removing leap days (29 February). Compared to the result of raw series (Fig 4A and 4B) that has high correlations in one week and one month (peaks), we can observe clear persistence or long-term correlations without periodic peaks, after removing periodic cycles (Fig 4C and 4D).

Autocorrelation functions ρ(τ) and power spectral density S(f) of collective emotion.

ρ(τ) and S(f) are calculated for (A)(B) Raw emotional dynamics which shows periodic cycles, (C)(D) Emotional dynamics without periodic cycles which shows long-term memory, (E)(F) Weekly randomized dynamics which diminished correlations larger than weekly scale. Fig 4E and 4F show results of weekly randomized time series, after 10 times averaging. We made randomized series following three patterns: monthly randomized, weekly randomized, and daily randomized. For monthly randomized series, we keep the time series for periods shorter than one month and shuffled randomly the different months in the time series. For weekly randomized series, we applied the same procedure but shuffled randomly the weeks without 6 days at the end of the data period in October 2016. For daily randomized series, we fully randomized the time series on a daily basis. The autocorrelation functions ρ(τ) show approximately a power-law decay, ρ(τ) ∼ τ− in the real data (Fig 4C). The power law exponent is found to be close to α ∼ 0.5 for all six emotions, and after six months, ρ(τ) ∼ 0. The long-term persistence is supported by the observation that ρ(τ) decays much sharper in randomized samples depending on the randomized time scales (Fig 4E and S1 Appendix). In particular the results of daily randomized series show indeed ρ(τ) ∼ 0 for τ > 0 as expected. For the power spectral densities S(f), all emotions show approximately S(f) ∼ f−0.5 in the low-frequency range (Fig 4D), and white noise is observed in daily randomized result (S1 Appendix). From the Wiener-Khintchine theorem, the power spectral density S(f) can be expressed by the Fourier transform of its autocorrelation function ρ(τ), resulting in the following relation between the exponents. For ρ(τ) ∼ τ−, the S(f) behaves as S(f) ∼ f−(1−. Thus, we can see that α is indeed approximately 0.5 for all emotions indicating that each emotional dynamics has long-term memory of order of a few months.

Coarse-grained movement

Since positive correlation of emotional dynamics ρ(τ) is found to roughly six months, we performed principal component analysis for time series summarized every six months of each emotional dynamics Z(t). The first and second eigenvectors accompanying with component scores are shown in Fig 5. Up to the second principal component, the cumulative contribution ratio was 96.1% (Fig 5A), and 88.6% (weekly randomized in Fig 5B). Thus, the results of principal component analysis reflect the dominant part of the six emotional dynamics in two dimensions. Note that the first principal component was mainly Vigor, and the second was mainly Fatigue for real time series in Fig 5A. Since there are no periodic cycles in time series summarized every six months, we cannot confirm a clear difference between before and after removing periodic cycles.

Fig 5

Results of principal component analysis for each emotion time series organized every six months.

Results of (A) Raw emotional dynamics, that moved gradually for every six month, and (B) Weekly randomized dynamics that jumped from point to point are shown.

Results of principal component analysis for each emotion time series organized every six months.

Results of (A) Raw emotional dynamics, that moved gradually for every six month, and (B) Weekly randomized dynamics that jumped from point to point are shown. We confirmed that they were almost independent eigenvectors. However, there were some overlapping parts in the emotion directions (Fig 5). Duplicate words did not exist in different emotions, but the vector directions still overlapped. This may be due to the process of summarizing time series for every six months, e.g., Depression and Confusion moved same directions for a long time in Fig 1. For the first and second principal component scores, each point in the figure corresponds to a six months average and it moved gradually from point to point in the real data (Fig 5A) rather than jumping between the points in the randomized data (Fig 5B). This indicates that the emotional dynamics changed moderately over time. Thus, we could successfully capture, for the first time, the evidence of the slow dynamics of collective emotion.

Discussion

People are increasingly active on the Internet, and this currently available data can provide new perspectives of collective human behaviors. Extracting and tracing collective emotion is a challenging new research topic because social media has only become widespread in the past decade. Here we extracted collective emotion from the Japanese blog space for 10 years between 2006 and 2016, analyzing 3.6 billion blogs based on dictionary-based method. Firstly, the periodic cycles for each of the emotional dynamics has been observed after averaging over the 10 years. Weekly and yearly periodicities appeared in each of the emotional dynamics in the Japanese blog space that were connected to real phenomena. In particular, Fatigue tends to increase after consecutive holidays. In Japan, it is known that suicide numbers tend to increase after consecutive holidays. Suicide number is known to be associated with Google Trends in England [44] and Korean blogs [45], measuring collective emotion might be applied to identify earlier signals of suicides. Secondly, after removing these periodic cycles from each of the emotional time series, we find that sharp spikes could be attributed to natural disasters. In particular, collective emotion increased largely under the influence of the 3.11 earthquake. This influence continued to be high over a month in Tension, Depression, and Confusion. During the 3.11 earthquake period, many rumors spread [5]. It was argued that feelings of anxiety contribute to the spread of rumors during a disaster [46]. In addition, a psychological study involving 24 introductory psychology students reported that anxious feelings accelerate rumor spreading [47]. We achieved similar results but with much richer data from our 3.6 billion blog articles. Finally, our study is the first to shed light on long-term memory of collective emotion which have attracted little attention so far. In every emotion of real data, autocorrelation showed power-law decay with an exponent much less than one which suggests the existence of long-term memory. Also, the result of power spectrum density and principal component analysis suggest strong indications of long-term memories in collective emotion for time scales of several months. There are important limitations of this research. Since there are no ground-truth data for collective emotion, our results represent an estimation and plausible. We expect to accumulate a broader range of similar studies and data of collective emotion for future analysis. Also, due to the current lack of geo-located data, we cannot consider the geographical differences of collective emotions in different locations. We believe that considering geographic differences will provide deeper insights and understanding, especially in the cases of natural disasters. To further develop the present research, the following points could be considered. First, we only focused on the Japanese blog space, which is not equivalent to other cultures in the world. Compared with previous studies with a limited number of participants answering questionnaires, our study used rich data from actively writing individuals. This larger variety of data compared to others represents the high-quality nature of the present study. Second, our results were limited by our dictionary based on POMS [30]. Our dictionary was built based on a traditional psychology scale, the extracted emotions depended on six dimensions with five negative emotions and one positive emotion. However, it is obvious that these emotions do not cover whole dimensions of collective emotion. Especially it is important to add new positive emotions in the analysis. For example, POMS2 [48], the second edition of POMS with new positive emotion Friendliness, has been released and translated into Japanese recently. Additionally, we applied naive summation of dictionary listed words that are checked manually. Using Word2vec [49] and Doc2vec [50] could be a new possible direction for dictionary building procedure semi-automatically. Furthermore, there exists numerous other psychological measures that could be analyzed. Extracting multidimensional emotions is a still challenging task that should attract researchers in the future.

Details of methodology, figures, and tables.

Dictionary building procedure Dates in which emotion decreased every year Emotion dynamics before removing periodic cycles Emotion dynamics after removing periodic cycles Results of monthly and daily randomized series Histogram of differences of collective emotion. (PDF) Click here for additional data file.

Emotion dictionary and time series of each emotion .

(XLSX) Click here for additional data file.

17 in total

1. Core Concepts: Computational social science.

Authors: Adam Mann
Journal: Proc Natl Acad Sci U S A Date: 2016-01-19 Impact factor: 11.205

2. Empirical observations of ultraslow diffusion driven by the fractional dynamics in languages.

Authors: Hayafumi Watanabe
Journal: Phys Rev E Date: 2018-07 Impact factor: 2.529

3. Development and validation of brief measures of positive and negative affect: the PANAS scales.

Authors: D Watson; L A Clark; A Tellegen
Journal: J Pers Soc Psychol Date: 1988-06

4. Diurnal and seasonal mood vary with work, sleep, and daylength across diverse cultures.

Authors: Scott A Golder; Michael W Macy
Journal: Science Date: 2011-09-30 Impact factor: 47.728

5. Circadian patterns of Wikipedia editorial activity: a demographic analysis.

Authors: Taha Yasseri; Robert Sumi; János Kertész
Journal: PLoS One Date: 2012-01-17 Impact factor: 3.240

6. Competing for Attention in Social Media under Information Overload Conditions.

Authors: Ling Feng; Yanqing Hu; Baowen Li; H Eugene Stanley; Shlomo Havlin; Lidia A Braunstein
Journal: PLoS One Date: 2015-07-10 Impact factor: 3.240

7. Early prediction of movie box office success based on Wikipedia activity big data.

Authors: Márton Mestyán; Taha Yasseri; János Kertész
Journal: PLoS One Date: 2013-08-21 Impact factor: 3.240

8. Self-organization on social media: endo-exo bursts and baseline fluctuations.

Authors: Mizuki Oka; Yasuhiro Hashimoto; Takashi Ikegami
Journal: PLoS One Date: 2014-10-16 Impact factor: 3.240

9. Quantifying collective attention from tweet stream.

Authors: Kazutoshi Sasahara; Yoshito Hirata; Masashi Toyoda; Masaru Kitsuregawa; Kazuyuki Aihara
Journal: PLoS One Date: 2013-04-30 Impact factor: 3.240

10. When can social media lead financial markets?

Authors: Ilya Zheludev; Robert Smith; Tomaso Aste
Journal: Sci Rep Date: 2014-02-27 Impact factor: 4.379

4 in total

1. Global and Local Trends Affecting the Experience of US and UK Healthcare Professionals during COVID-19: Twitter Text Analysis.

Authors: Ortal Slobodin; Ilia Plochotnikov; Idan-Chaim Cohen; Aviad Elyashar; Odeya Cohen; Rami Puzis
Journal: Int J Environ Res Public Health Date: 2022-06-04 Impact factor: 4.614

2. Differences in fractal patterns and characteristic periodicities between word salads and normal sentences: Interference of meaning and sound.

Authors: Jun Shimizu; Hiromi Kuwata; Kazuo Kuwata
Journal: PLoS One Date: 2021-02-18 Impact factor: 3.240

3. Sustainable customer retention through social media marketing activities using hybrid SEM-neural network approach.

Authors: Qing Yang; Naeem Hayat; Abdullah Al Mamun; Zafir Khan Mohamed Makhbul; Noor Raihani Zainol
Journal: PLoS One Date: 2022-03-04 Impact factor: 3.240

4. Classification of endogenous and exogenous bursts in collective emotions based on Weibo comments during COVID-19.

Authors: Qianyun Wu; Yukie Sano; Hideki Takayasu; Misako Takayasu
Journal: Sci Rep Date: 2022-02-24 Impact factor: 4.996

4 in total