| Literature DB >> 28428163 |
Alberto Gayle1, Motomu Shimaoka1.
Abstract
BACKGROUND: In this age of social media, any news-good or bad-has the potential to spread in unpredictable ways. Changes in public sentiment have the potential to either drive or limit investment in publicly funded activities, such as scientific research. As a result, understanding the ways in which reported cases of scientific misconduct shape public sentiment is becoming increasingly essential-for researchers and institutions, as well as for policy makers and funders. In this study, we thus set out to assess and define the patterns according to which public sentiment may change in response to reported cases of scientific misconduct. This study focuses on the public response to the events involved in a recent case of major scientific misconduct that occurred in 2014 in Japan-stimulus-triggered acquisition of pluripotency (STAP) cell case.Entities:
Keywords: Japan; data mining; mass media; public opinion; public policy; publication; retraction of publication as a topic; scientific misconduct; social media; stem cells
Year: 2017 PMID: 28428163 PMCID: PMC5418527 DOI: 10.2196/publichealth.5980
Source DB: PubMed Journal: JMIR Public Health Surveill ISSN: 2369-2960
Text processing steps.
| Steps | Description |
| Tokenize | Parse every tweet into separate, single-element tokens (ie, words or word-parts) |
| Transform cases | Makes all text lower case to facilitate data processing |
| Filter tokens by length | Removes tokens consisting of less than 2 characters |
| Filter English stop words | Removes common, low-information particles (eg, “the”) and punctuation marks |
| Filter tokens by content | Removed hashtags and other message-irrelevant tokens such as “http” |
| Stemming (WordNet) | Algorithm for identifying and groups tokens as lemmas, to facilitate processing |
| Generate n-grams | Generates list of all two-, three-, or four-word token combinations (ie, phrases) |
| Word vector creation | Generate metric indicative of the measuring the important of each word in a tweet |
| Pruning | Remove tokens that appear in less than 1% or more than 80% of documents |
This processing generated weighted word vectors, representing the weighted distribution of each processed token or n-gram within a given Tweet. Word vector statistics were calculated using the term frequency-inverse document frequency (TF-IDF) weighting scheme. TF-IDF emphasizes the importance of key but not uncommon terms [36,37] and has been demonstrated to improve the performance of text-mining tasks [38]. TF-IDF is calculated as follows: TF-IDF=tf*log (N/df), where tf is the frequency of a term within a given document, df is the frequency across all documents, and N is the number of documents total [39].
Figure 1Volume and average sentiment over time. Sentiment score calculated using unweighted aggregate sentiment scores found in the SentiWordNet database, for each valid token in each Tweet. For this analysis, verb, adjectives, and adverbs were considered valid for the purpose of sentiment scoring. Volume is based on number of Tweets retrieved per sampling interval. Sentiment increasingly negative over time; one key exception corresponds with the tragedy surrounding Dr Sasai (August to October 2014). Volume is driven by major events.
Tukey’s post hoc test for significance (1-way analysis of variance, ANOVA). Italicized values indicate significance P<.05.
| Year | Month | 2014 | 2015 | ||||||||||||||
| Jan | Feb | Mar | Apr | May | Jun | Jul | Aug | Sep | Oct | Nov | Dec | Jan | Feb | Mar | |||
| 2014 | Jan | 0.000 | − | − | − | − | − | − | − | 0.015 | −0.036 | − | − | − | − | − | |
| Feb | 0.000 | −0.005 | − | −0.021 | − | −0.039 | 0.053 | 0.002 | −0.067 | − | − | − | |||||
| Mar | 0.005 | 0.000 | − | − | −0.016 | − | −0.034 | 0.008 | −0.062 | − | − | − | − | ||||
| Apr | 0.000 | − | − | 0.028 | −0.001 | −0.020 | − | − | − | ||||||||
| May | 0.000 | 0.001 | 0.029 | 0.010 | −0.025 | − | − | ||||||||||
| Jun | 0.021 | 0.016 | − | − | 0.000 | − | −0.018 | 0.024 | −0.046 | − | − | − | − | ||||
| Jul | −0.001 | 0.000 | 0.028 | 0.009 | −0.026 | − | − | ||||||||||
| Aug | 0.039 | 0.034 | −0.028 | − | 0.018 | − | 0.000 | 0.041 | −0.028 | − | − | − | − | ||||
| Sep | −0.015 | −0.053 | − | − | − | − | − | − | 0.000 | −0.051 | − | − | − | − | − | ||
| Oct | 0.036 | −0.002 | −0.008 | − | − | −0.024 | − | −0.041 | 0.051 | 0.000 | −0.070 | − | − | − | − | ||
| Nov | 0.067 | 0.062 | 0.001 | −0.029 | 0.046 | −0.028 | 0.028 | 0.070 | 0.000 | −0.019 | −0.054 | − | − | ||||
| Dec | 0.020 | −0.010 | −0.009 | 0.019 | 0.000 | −0.035 | − | − | |||||||||
| 2015 | Jan | 0.025 | 0.026 | 0.054 | 0.035 | 0.000 | − | − | |||||||||
| Feb | 0.000 | ||||||||||||||||
| Mar | − | 0.000 | |||||||||||||||
Figure 2Month-to-month trinary sentiment or volume density chart. Density plot calculated based on the proportion of negative (N: top left), positive (P: top right), and objective or nonpolarized (O: bottom center) discussion volume (represented by the unlabeled data points). Volume density is calculated via isometric log ratio transformation.
Tukey’s post hoc test for homogenous subsets (1-way analysis of variance, ANOVA).
| Year | Month | N | Subset for alpha=.05 | ||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||
| 2015 | February | 1034 | −.2648 | ||||||||
| March | 187 | −.1774 | |||||||||
| January | 209 | −.0768 | |||||||||
| 2014 | May | 558 | −.0521 | −.0521 | |||||||
| July | 680 | −.0510 | −.0510 | ||||||||
| December | 1092 | −.0422 | −.0422 | −.0422 | |||||||
| November | 75 | −.0230 | −.0230 | −.0230 | |||||||
| April | 2349 | −.0224 | −.0224 | −.0224 | |||||||
| August | 395 | .0052 | .0052 | .0052 | |||||||
| June | 887 | .0230 | .0230 | ||||||||
| March | 691 | .0391 | .0391 | ||||||||
| February | 424 | .0443 | .0443 | ||||||||
| October | 136 | .0467 | .0467 | ||||||||
| January | 630 | .0829 | .0829 | ||||||||
| September | 120 | .0978 | |||||||||
| Significance | |||||||||||
Figure 3Sentiment comparison for key actors. Month-to-month sentiment for key figures and entities corresponds with associated timeline events. Month-to-month sentiment scores were independently aggregated for Tweets mentioning Ms Obokata, Dr Sasai, or Riken. Data labels shown where mean differences are significant versus total.