| Literature DB >> 35383245 |
Ryosuke Harakawa1, Tsutomu Ito2, Masahiro Iwahashi2.
Abstract
This article presents a method for trend clustering from tweets about coronavirus disease (COVID-19) to help us objectively review the past and make decisions about future countermeasures. We aim to avoid detecting usual trends based on seasonal events while detecting essential trends caused by the influence of COVID-19. To this aim, we regard daily changes in the frequencies of each word in tweets as time series signals and define time series signals with single peaks as target trends. To successfully cluster the target trends, we propose graphical lasso-guided iterative principal component analysis (GLIPCA). GLIPCA enables us to remove trends with indirect correlations generated by other essential trends. Moreover, GLIPCA overcomes the difficulty in the quantitative evaluation of the accuracy of trend clustering. Thus, GLIPCA's parameters are easier to determine than those of other clustering methods. We conducted experiments using Japanese tweets about COVID-19 from March 8, 2020, to May 7, 2020. The results show that GLIPCA successfully distinguished trends before and after the declaration of a state of emergency on April 7, 2020. In addition, the results reveal the international argument about whether the Tokyo 2020 Summer Olympics should be held. The results suggest the tremendous social impact of the words and actions of Japanese celebrities. Furthermore, the results suggest that people's attention moved from worry and fear of an unknown novel pneumonia to the need for medical care and a new lifestyle as well as the scientific characteristics of COVID-19.Entities:
Mesh:
Year: 2022 PMID: 35383245 PMCID: PMC8982667 DOI: 10.1038/s41598-022-09651-6
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Results for IPCA. The first, second, and third trend clusters are denoted by red, blue, and green, respectively. B was set to 6, yielding a Q of 0.156. (a) The obtained trend clusters. The size of each node is determined by the degree to which the corresponding trend belongs to its clusters. (b) Time series signals for each cluster. We show averages of all trends in each cluster. The vertical line indicates the day of the state-of-emergency declaration (April 7, 2020).
(a) Early trend words and (b) late trend words.
| Joining a company, interest rates, Bakatono (Japanese TV program and its character), the Drifters (Japanese comedian group), Spain, | |
| Italy, Korea, Kazuko Kurosawa (Japanese comedian), Japan, Hitoshi Matsumoto (Japanese comedian), immigration, interest rate cut, | |
| Hanshin Tigers (Japanese baseball team), Morisanchu (Japanese comedian group), Tom Hanks, autograph letter, Lorenzo Sanz, | |
| Inspection, IOC (International Olympic Committee), president, woman, bank of Japan, the dead, | |
| Shintaro Fujinami (baseball player in Hanshin Tigers), novel pneumonia, positive, Katsuya Maiguma (Japanese actor), player, | |
| Sense of smell, graduation, returning to one’s country, mourning, virus, Kozo Tashima (Japanese former soccer player), | |
| Pandemic, Masataka Nashida (Japanese former baseball player), company | |
| Mask, ATARASHIICHIZU (Japanese music group), home, Sumo wrestler, TV Asahi corporation, video, | |
| Kumiko Shiratori (Japanese comedian), Takehiko Orimo (Japanese former basketball player), remaining at home, entering school, now, | |
| Tamao Akae (Japanese broadcaster), comment, broadcaster, by-election, Remdesivir, | |
| Hirofumi Yoshimura (governor of Osaka prefecture), Miki Sumiyoshi (Japanese broadcaster), Cologne, therapeutic drug, | |
| Hanamaru (alias of a Japanese TV program), Takadagawa (name of a Sumo stable), Kotaro Shiga (Japanese actor), gold’s gym, | |
| Children’s day (May 5th for each year), request, Pachinko (Japanese gambling machine), Mitz Mangrove (Japanese entertainer), | |
| Leaving the hospital, announcement, Gotoku Sakai (Japanese soccer player), life, support, benefit, Jun-ichi Ishida (Japanese actor), | |
| Constitutional amendment, golden weak (Japanese holidays during the last week of April up to the first week of May), end, | |
| Kumiko Okae (Japanese actress), Yuta Tomikawa (Japanese broadcaster), Shinzo Abe (Japanese former prime minister), | |
| Takashi Okamura (Japanese comedian), self-restraint, Yoshio Tateishi (Japanese businessman), crisis, Mainichi Broadcasting System, | |
| Person, long holidays, Atsushi Kataoka (Japanese former baseball player), report station (Japanese TV news program), | |
| Inter-high school competition, restart, donation, crude oil, extension, constitution, business suspension, Baku Owada (Japanese actor) |
In this table, brackets show notations. We show English translations of the original Japanese.
Figure 2First trend cluster in Fig. 1a. Words corresponding to each trend are shown. For visualization, only 90 words in descending order of the degree to which each trend belongs to the cluster are shown.
Figure 3Second trend cluster in Fig. 1a. The words corresponding to each trend are shown.
Figure 4Third trend cluster in Fig. 1a. The words corresponding to each trend are shown.
Rates of trends within clusters, which are generated from the unimodal Gaussian distribution.
| (a) | (b) | |||||
|---|---|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 1 | Cluster 2 | ||
| 0.0916 | 0.0625 | 0.000 | 0.156 | 0.172 | ||
| 0.146 | 0.125 | 0.000 | 0.300 | 0.333 | ||
| 0.172 | 0.156 | 0.000 | 0.356 | 0.448 | ||
We investigated whether the p values of the Shapiro–Wilk test for the trends are greater than the thresholds. (a) Results for IPCA. (b) Results for GLIPCA.
Figure 5Results for GLIPCA. The first and second trend clusters are denoted by red and blue, respectively. B and were set to 6 and 0.015, yielding a Q of 0.229. (a) The obtained trend clusters. The size of each node is determined by degree to which the corresponding trend belongs to its clusters. (b) Time series signals for each cluster. We show the averages of all trends in each cluster. The vertical line indicates the day of the state-of-emergency declaration (April 7, 2020).
Figure 6First trend cluster in Fig. 5a. The words corresponding to each trend are shown.
Figure 7Second trend cluster in Fig. 5a. The words corresponding to each trend are shown.
Rates of early trend words and late trend words, as defined by Google Trends, within the clusters.
| (a) | (b) | |||||
|---|---|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 3 | Cluster 1 | Cluster 2 | ||
| Early trend words | 0.600 | 0.563 | 0.706 | Early trend words | 0.889 | 0.368 |
| Late trend words | 0.400 | 0.438 | 0.294 | Late trend words | 0.111 | 0.632 |
(a) Results for IPCA. (b) Results for GLIPCA.
Rates of trends within clusters, which are generated from the unimodal Gaussian distribution.
| (a) | (b) | ||||
|---|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 1 | Cluster 2 | ||
We investigated whether the p values of the Shapiro–Wilk test for the trends are greater than the thresholds. The number of clusters and B were set to 2 and 6 as in the proposed GLIPCA. We performed experiments 10 times with different initialization values and show the average and standard deviation. (a) Results for k-shape[32]. (b) Results for k-shape[32] with the graphical lasso algorithm[22]. was set to 0.015 as in GLIPCA.
Figure 8Time series signals for each cluster. The first and second trend clusters are denoted by red and blue, respectively. We show the averages of all trends in each cluster. The vertical line indicates the day of the state-of-emergency declaration (April 7, 2020). Although Tables 4 and 5 show the average and standard deviation over 10 trials, these figures show results for one trial. (a) Results for k-shape[32]. (b) Results for k-shape[32] with the graphical lasso algorithm[22].
Rates of early trend words and late trend words, as defined by Google Trends, within the clusters.
| (a) | (b) | ||||
|---|---|---|---|---|---|
| Cluster 1 | Cluster 2 | Cluster 1 | Cluster 2 | ||
| Early trend words |
|
| Early trend words |
|
|
| Late trend words |
|
| Late trend words |
|
|
Experimental conditions are the same as in Table 4. (a) Results for k-shape[32]. (b) Results for k-shape[32] with the graphical lasso algorithm[22].
Figure 9We observe the frequencies of M words for N days. (a) The frequencies of the ith word for N days are denoted by . (b) For the jth day, the frequencies of M words are denoted by .