Literature DB >> 35895626

Vaccine discourse during the onset of the COVID-19 pandemic: Topical structure and source patterns informing efforts to combat vaccine hesitancy.

Juwon Hwang¹, Min-Hsin Su², Xiaoya Jiang², Ruixue Lian³, Arina Tveleneva⁴, Dhavan Shah².

Abstract

BACKGROUND: Understanding public discourse about a COVID-19 vaccine in the early phase of the COVID-19 pandemic may provide key insights concerning vaccine hesitancy. However, few studies have investigated the communicative patterns in which Twitter users participate discursively in vaccine discussions.
OBJECTIVES: This study aims to investigate 1) the major topics that emerged from public conversation on Twitter concerning vaccines for COVID-19, 2) the topics that were emphasized in tweets with either positive or negative sentiment toward a COVID-19 vaccine, and 3) the type of online accounts in which tweets with either positive or negative sentiment were more likely to circulate.
METHODS: We randomly extracted a total of 349,979 COVID-19 vaccine-related tweets from the initial period of the pandemic. Out of 64,216 unique tweets, a total of 23,133 (36.03%) tweets were classified as positive and 14,051 (21.88%) as negative toward a COVID-19 vaccine. We conducted Structural Topic Modeling and Network Analysis to reveal the distinct topical structure and connection patterns that characterize positive and negative discourse toward a COVID-19 vaccine.
RESULTS: Our STM analysis revealed the most prominent topic emerged on Twitter of a COVID-19 vaccine was "other infectious diseases", followed by "vaccine safety concerns", and "conspiracy theory." While the positive discourse demonstrated a broad range of topics such as "vaccine development", "vaccine effectiveness", and "safety test", negative discourse was more narrowly focused on topics such as "conspiracy theory" and "safety concerns." Beyond topical differences, positive discourse was more likely to interact with verified sources such as scientists/medical sources and the media/journalists, whereas negative discourse tended to interact with politicians and online influencers.
CONCLUSIONS: Positive and negative discourse was not only structured around distinct topics but also circulated within different networks. Public health communicators need to address specific topics of public concern in varying information hubs based on audience segmentation, potentially increasing COVID-19 vaccine uptake.

Entities: Chemical

Mesh：

Substances：

Year: 2022 PMID： 35895626 PMCID： PMC9328525 DOI： 10.1371/journal.pone.0271394

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

1. Introduction

Development and uptake of a COVID-19 vaccine is a major step in fighting the spread of this novel disease [1], which has resulted in an unprecedented global public health burden. Herd immunity, which occurs when a sufficiently large proportion of a population has been vaccinated against or recovered from the specific infectious disease, is critical to slowing the pandemic’s spread [2, 3]. To achieve herd immunity, it is estimated that a COVID-19 vaccine should be accepted by at least 75–80% of the population [4]. However, a recent study found a majority of the U.S. public would be uncomfortable being among the first to receive a COVID-19 vaccine and one-third of adults declining to accept a vaccine if offered [5], generating substantial concern from a public health standpoint [3]. To effectively address vaccine hesitancy and foster vaccine confidence [3], it is important to understand the nature of U.S. COVID-19 vaccine discourse. Social media serve as a site for widely available and accessible public discourse surrounding the COVID-19 vaccine [6]. In contrast to traditional news media or scholarly outlets, social media content does not undergo an editorial processes or scientific vetting unless violating certain rules of a platform, thereby allowing users to voice their opinions on their own terms in most cases [6]. This characteristic facilitates users’ ability to speak out on public health issues, such as vaccination, with no expertise required [7]. Indeed, vaccine discourse is frequently observed across social media platforms, with a considerable number of studies examining the different types of vaccine content on social media in contexts such as childhood vaccination schedules and HPV vaccines [8-14]. For example, content with negative sentiment toward vaccines was prominently present across social media platforms, with topics ranging from vaccine safety concerns [9, 10, 14], pharmaceutical and medical skepticism [9], conspiracy-style beliefs [9, 11, 12, 14], and infringement of civil liberties [13]. As such, a great deal of research has focused on online discourse that has negative sentiment toward vaccines [12, 14, 15], but little attention has been paid to positive counterparts, which provide a useful reference point to respond to concerns and topics voiced by users who hold negative opinions on vaccines. Few studies have considered positive vaccine content alongside negative vaccine content when analyzing social media; when they have, positive content was more likely to present accurate information [9] and emphasize vaccine effectiveness [16], compared to negative content. Moreover, due to the novelty of the COVID-19 pandemic, few studies have investigated the communicative patterns in which Twitter users participate discursively in vaccine discussions about this highly politicized issue. Apart from the topical differences between the positive and negative vaccine discourse, tracking the specific groups in which the two discourses circulate not only clarifies the distinct characteristics between the entities, but also aids understanding of how positive and negative vaccine discourse is formed and shaped. Social media platforms, such as Twitter, allow users to interact with each other by following, sharing, and mentioning other accounts (i.e., @NAME). News feeds show self-selected streams for each account based on personal interests [6]. This process is further amplified by social media’s proprietary algorithms. Consequently, the self-selection of networks as well as retweeting and @mentioning can be indicative of the flow by which positive and negative vaccine discourse spreads. While the effects of diverse media use on vaccination behaviors are well-documented [17-19], little is known about how contrasting vaccine discourses (i.e., positive vs. negative discourses) are spreading through different types of mentioned accounts. This study has two aims. The first goal of the present study is to investigate the content of the Twitter discourse surrounding the COVID-19 vaccine during the four months after the declaration of the pandemic by the World Health Organization and widespread US lockdowns. We focus on this initial stage of vaccine development because public health communication during this time was inherently challenging due to high levels of scientific uncertainty, constantly changing information, and a highly partisan information environment [20]. Specifically, we compare topical prevalence of vaccine discourse that has either positive or negative sentiment toward a COVID-19 vaccine to understand distinct areas of concern with COVID-19 vaccines. Notably, we explore what individual users discuss regarding the issue, outside of the mere dissemination of elite-supplied messages. Second, we compare the types of actors with whom users who have either positive or negative sentiment toward the COVID-19 vaccine chose to interact. By doing so, our study provides important insights and practical guides for public health communicators, which, to date, predominantly focus on disseminating vaccination guidelines or schedules, rather than proactively responding to vaccine hesitancy. Thus, we suggest public health communication professionals should broaden their messages to address specific concerns as revealed in public conversations on social media, as well as leverage a diversity of actors who have been centrally situated in the vaccine-hesitant communities.

2. Methods

2.1 Data collection

Given the purposes of this study, our analysis focuses on original tweets (i.e., tweets that were composed by ordinary users themselves). We constructed a unique dataset consisting of a random sample of 1% original tweets. To do this, we first scraped Twitter data through Synthesio, a social media monitoring tool to randomly sample 1% of public tweets using COVID-19 related keywords (S1 Appendix). Second, we retained only tweets that were topically related to vaccines using our customized dictionary (i.e., mentioned at least one of the keywords in the main text, excluding URLs), resulting in a sample of 349,979 tweets. Finally, to obtain tweets reflecting the user’s interest and views, duplicated tweets were removed, including retweets (n = 251,025) and quote tweets (n = 34,738) following conventional practice [21], resulting in a final dataset of 64,216 user-generated COVID-19 vaccine-related tweets. Given that our analysis focuses on the different topics that positive and negative vaccine tweets emphasized when talking about a COVID-19 vaccine, we removed retweets that merely disseminate messages without user input; yet, quote tweets present an ambiguous case to determine the user’s vaccine stance. Consider the following tweets that convey entirely different meaning but with similar languages: “This is a stupid idea! RT@Dr.Science: Vaccine is safe and effective” and “Such an important point! RT @Dr.Science: Vaccine is safe and effective.” The first tweet shows negative vaccine discourse, whereas the latter endorses vaccines despite the common language such as “safe” and “effective.” Due to our focus on vaccine stance and topic detection, we opted for a more rigorous approach to avoid noise and reduce false positives. We acknowledge that user-initiated discussion constitutes only a subset of the entire twitter discourse, albeit an important one. We note that the patterns observed here largely hold when replicating the procedures with the complete dataset composed of both original tweets and retweets.

2.2 Stance classification

This study applied a supervised machine-learning technique to classify tweets into positive and negative discourse categories. Specifically, we used a transformer-based supervised machine-learning method, which allows labeling a tweet as expressing positive or negative sentiment regarding COVID-19 vaccines, and in turn, classifying the rest of the tweets based on the labeled samples using the Bidirectional Encoder Representations from Transformers (BERT) machine learning algorithm. There are several advantages of our current approach. Compared to other unsupervised textual analysis techniques, supervised machine-learning allows tracking discourses that are theoretically important or comparing discourses between groups on the same criteria. In our case, our supervised machine learning approach provides a computationally efficient way to classify tweets based on target-specific sentiments (i.e., sentiment toward vaccines), rather than more generic tasks of sentiment detection. By contrast, unsupervised machine learning is to learn the inherent structure of the data without pre-provided labels [22, 23]. Since the performance of the unsupervised learning-based sentiment analysis methods heavily relies on the pre-generated lexicon and corpus, if the sentiment/emotion relevant words do not exist in the lexicon or corpus, the classification accuracy will be low [22, 23]. Given that COVID-19 and its vaccines involve high levels of uncertainty and novelty, as well as its nature of constantly evolving, we believe that a supervised machine-learning approach would be appropriate to provide a scalable solution to text classification tasks with pre-defined criteria. We defined positive vaccine discourse as tweets that express favorable sentiment or attitude toward a COVID-19 vaccine or contain affirmative information about COVID-19 vaccine development; negative vaccine discourse, on the other hand, was defined as those espousing unfavorable sentiment, commentary, or information about COVID-19 vaccines (see S2 Appendix for more details and examples). To build a deep learning classifier, we first constructed a human-labeled set of tweets: two graduate students who were external from the research design, but had experience in general coding works and familiarity about the domain subject, were responsible for the labeling. After receiving training with the codebook developed by authors, each of them independently labeled a random sample of 200 positive and negative vaccine tweets (0 = absent, 1 = present). The classification problem with respect to each variable was treated independently, which allowed us to capture COVID-19 vaccine discourse types with more granularity. After achieving a sufficient level of intercoder agreement (Krippendorf’s alpha = .90), the two coders each labeled another 5,000 randomly selected tweets and continued coding until a balance between the two classes was reached. The labeled tweets were then used to train and validate the stance classifier using the BERT machine learning algorithm, with dimension 768 embeddings and 5-fold cross-validation. The model performance has reached a satisfactory level of accuracy, 71% and 75% for positive and negative vaccine tweets respectively. After achieving satisfactory performance, the classifier was used to label the remaining tweets. Two coders manually verified the stance labels of a random sample of 200 tweets based on ML classification (93.5% agreement). After removing tweets with neutral (24,497; 38.15%) or mixed (2,526; 3.93%) sentiment, we retain 23,133 (36.03%) tweets with positive sentiment and 14,051 (21.88%) tweets with negative sentiment toward the COVID-19 vaccine for structural topic modeling (More model assessment metrics are available in S3 Appendix).

2.3 Structural Topic Modeling (STM) analysis

To detect the topical structure of Twitter conversation surrounding the COVID-19 vaccine, we employed structural topic modeling (STM), an automated text analysis method that incorporates meta data into topic models [24]. Building on other traditional topic modeling techniques such as the latent Dirichlet allocation model (LDA), STM infers the latent topical structure based on word co-occurrence, using bag-of-word as the representation. Topics are “latent” in the sense that they emerge inductively as algorithms learn the hidden patterns underlying a collection of texts, offering the advantage of preventing researcher bias. Unlike other probabilistic topic models that treat each document as a discrete observation, STM incorporates document-level information (i.e., whether a tweet has either positive or negative sentiment toward a vaccine), allowing for partial pooling of parameters along the structure defined by the covariates. We take advantage of STM’s ability to incorporate meta-data in estimating topical prevalence in the positive and negative vaccine corpora. This allows us to not only find what Twitter users discussed, but also how their vaccine sentiment affected their tweeted topics. Before conducting STM, several standard data-preprocessing procedures were taken to remove noises. These include the following: remove a) all non-English text and non-American Standard Code for Information Interchange (ASCII) characters, b) twitter handles, URLs, and common English stop words, and c) tokenization and lemmatization. As an additional data-preprocessing step, we retained words that are nouns, verbs, and adjectives, applying the part-of-speech (POS)-tagger from Stanford Core Natural Language Processing (NLP) suite. Focusing on specific parts of speech has been found to improve topic coherence and efficiently generate more coherent and meaningful (non-ambiguous) document clusters [25]. In creating the document-term matrix, we further removed too frequent (appearing in over 90% of the documents) or infrequent features (appearing in less than 0.005% of the documents), as their distribution patterns often do not contribute significantly to meaningful topics [26]. To obtain the optimal number of topics, we compared models with a broad range of possible k (2–100) on four commonly used metrics: coherency, exclusivity, residuals, and lower-bound. The resulting topics were labeled based on a) each topic’s most frequently occurring features, b) top exclusive words that distinguished one topic from others (or FREX words), and c) the most representative texts (tweets with the highest theta scores) [24]. Two authors took steps to validate the topic labels with a random sample of 200 tweets (92% agreement).

2.4 Mention network

Beyond the distinct topical patterns, this study aims to further reveal how positive and negative vaccine discourse is circulating within certain types of networks. One of the ways to classify mentioned accounts through which vaccine discourses circulate is to focus on the degree to which account users are expected to provide medical expertise and/or how their posted content was associated with formal editorial processes [17, 18]. For example, a study classified health information sources as either official or unofficial sources based on the degree to which the source provides access to medical expertise and/or whether the content of the source undergoes vetted inspection [17, 18]. Based on this approach, we created primary categories (e.g., scientist/medical source, media/journalists, political source). We then identified the top 100 twitter accounts that have been mentioned by other users in our sample (see S4 Appendix) and manually assigned each a category label based on the literature. In this process, we recognized that there were a substantial number of accounts that didn’t fall into the primary categories; thus, two new categories–online influencer and suspended account–were added. Therefore, we classified account mentions into five categories–scientist/medical source, media/journalists, political source, online influencer, and suspended accounts–with the first two categories based on the degree to which the source provides access to medical expertise, or whether the source is linked to an organization with formal editorial processes in place. Note that, as of February 2021, when this study was conducted, the former President Trump’s Twitter account, “@realdonaldtrump” was suspended by Twitter due to the violation of their terms. However, since the account was active during our data collection (March to July 2020) and given his role as the President, we categorized this account as a political source rather than a suspended account. Based on this classification, we examined the mention network of users’ connections. In a mention network, a directed edge runs from A to B when A mentions B in a tweet. This communication practice creates a tie between two nodes, creating potential information pathways for tweeted content to flow between both users’ networks (as a tweet mentioning another person will become visible in that person’s timeline) [27]. Additionally, we conducted the Welsh t-test to statistically test whether positive and negative vaccine discourse resolves around different types of information sources.

3. Results

3.1 Content

3.1.1 Volume of tweets

We start by plotting the temporal dynamics of twitter volume during the course of our study time, with all types of tweets included, because the overall volume of tweets and how it evolved over time provides a big picture of our dataset before we delve into the examination of content of tweets. Overall, vaccine Twitter discussion saw a gradual uptick from March to May, with occasional fluctuations and a slight decrease after mid-May. As shown in Fig 1, public discussion about COVID-19 vaccines started to gain steam with the spread of the disease to society at large; the discussion peaked around mid-March, when the WHO declared COVID-19 a global pandemic and a potential timeline for vaccine development was released by the U.S. Department of Health and Human Services (HHS) [28]. The negative vaccine Twitter volume saw its first peak around mid-April, when President Trump criticized the WHO’s handling of the pandemic and ordered a halt in its funding [29]. This was also around the time vaccine conspiracy theories surrounding Bill Gates started trending on social media, with topics like “microchip implants” and “ID2020.”

Fig 1

Twitter discourse volume over time.

COVID-19 vaccine discussions reached a second peak when Trump announced plans on March 15 for speedy vaccine development, manufacturing, and distribution in a timeline that maybe seemed overly ambitious [30], followed by the release of promising early results from Moderna’s vaccine trials [31]. Overall, while vaccine discourse on Twitter saw several event-driven peaks, positive and negative vaccine discourse were prominent in different time period; positive vaccine voices remained more active in the early stage of the pandemic (i.e., March), negative vaccine ones were more salient at the lager stage of the study period (i.e., June to July). Our dataset of original tweets followed similar patterns.

3.1.2 Prevalence of topics

Next, we employed structural topic modeling to compare topical prevalence across positive and negative vaccine discourses. Table 1 showed the resulting 13-topic structure that characterized the U.S. COVID-19 vaccine discourse from March to June. Fig 2 shows the average gamma value for each topic (γ) (i.e., the estimated proportion of words from each document that are generated from that topic).

Table 1

The 13-topic structure that characterizes the U.S. COVID-19 vaccine discourses.

Topic	Label	Top Terms
2	Infectious Diseases	flu, death, die, spread, herd, immunity, outbreak, mortality, rate
6	Conspiracy Theory	push, force, mandatory, lie, chip, control, trust, implant, 5g
3	Safety Concern	child, kill, cause, kid, body, doctor, fact, inject, harm, injury
9	Inherent Uncertainty	flu, infection, disease, stop, risk, protect, immunity, prevent, symptom
8	Vaccine Development	trial, develop, research, scientist, dose, support, lead, mrna, #immunotherapy, #moderna,
5	New Normal	wait, public, economy, mask, family, school, allow, reopen, normal, business, closure
7	Consolidation and Mobilization	need, cure, help, antibody, hope, patient, medical, fight, ventilator, recover, supplies, focus, plasma
1	Monetary Motivation	money, drug, pay, fund, million, order, save, profit, bill, cost, patent, corporation
11	COVID Testing and Clinical Trial	testing, available, end, ready, phase, study, clinical trial, plan, approve, speed, accelerate
10	Vaccine Effectiveness	work, effective, safe, possible, science, home, future, social distancing, proven
13	Vaccine Safety Test & Production	test, company, create, candidate, safety, response, require, volunteer, production
4	Vaccine Information	health, change, fear, system, #vaccineswork, check, offer, law
12	Coping Strategies	treatment, potential, continue, global, product, result, provide, race, priority, basics

Fig 2

The average gamma value for each topic (γ).

The document-topic probability, or the gamma value, is the estimated proportion of words from a given document that are generated from that topic.

The average gamma value for each topic (γ).

The document-topic probability, or the gamma value, is the estimated proportion of words from a given document that are generated from that topic. The most prominent vaccine-related topic in the initial phase of the COVID-19 pandemic in the U.S. was “other infectious diseases” (Topic 2), usually referring to the comparison of COVID-19 to other more familiar illness such as seasonal influenza in terms of fatality rate and transmission. Common narratives under this topic centered around arguments that the coronavirus “is not the flu” or “is just like the flu.” The second and third prominent topics focused respectively on concerns over vaccine safety and potential side effects (Topic 3), often citing scientific studies or terminology, and conspiratorial explanations for mandatory vaccines or even the COVID-19 origin, playing up conspiracies surrounding “Bill Gates”, “5G”, “microchip implant”, and “ID2020” (Topic 6). Following these three topics were discussions over inherent uncertainty resulting from the novelty of the disease (e.g., implications for chronic health conditions) (Topic 9), news and updates about vaccine development (Topic 8), as well as disruptions of various aspects of social life (Topic 5). These daily disruptions range from school closure, stay-at-home orders, and restrictions on other business and social events (e.g., sports, bars, and restaurants). Tweets under this topic typically acknowledged the need for adjusting to the “new normal” before a safe and effective vaccine becomes widely available. Less prominent, but still constituting a sizeable proportion of Twitter discussion, were tweets that called for consolidation and mutual support (either in spirit or medical supplies such as PPE for medical professionals) (Topic 7), and those that cast doubt on the monetary motivations behind mandatory vaccination (Topic 1), often targeting institutions, mainstream media, and the political and financial establishment. A frequent reference point under this topic was the history behind the unpatented polio vaccine. There were topics more directly associated with vaccine itself, including safety testing and production (Topic 13), clinical trials (Topic 11), effectiveness (Topic 10). Finally, there was also a broader range coping strategies and reactions (Topic 12), and pop culture references or other types of “soft news” (Topic 4).

3.1.3 Topic communities

To further explore the relationships among topics, we constructed a semantic network in which topics serve as nodes and their associations as edges, using pairwise cosine similarity calculated based on the theta-cosine matrix generated by STM [32]. Topics were further grouped into clusters using Spinglass, a widely used community detection algorithm [33]. The association between two topics (i.e., nodes) is reflected by the thickness of the undirected line connecting them (i.e., edge). The edges are weighted, calculated based on cosine similarity, with the weight indicating topic co-occurrence in documents. Specifically, the higher the weight, the more likely that two topics are discussed within a given document. For example, vaccine production is more likely to be talked about alongside other coping strategies, yet it has a low cosine similarity value with the topic of conspiracy [32]. Our results suggest two distinct topic communities (see Fig 3), with one associated with the COVID-19 vaccine itself, ranging from clinical trials (Topic 11), development (Topic 8), manufacturing and distribution (Topic 13), to vaccine effectiveness (Topic 10) and calls for medical support and global cooperation (Topic 7). The second cluster revolved more around public deliberations on several contentious issues, scientifically based or not, as people made attempts to understand the novel disease by comparing it to other infectious diseases (e.g., Influenza, Polio, MMR) (Topic 2), reckoned with the disease’s impact on social lives (Topic 5), and created simplified narratives for societal crises linked to the pandemic (Topic 1 and Topic 6).

Fig 3

A semantic network with nodes as topics and edges as their associations.

3.1.4 Relative volume of topics over time

Finally, to uncover the overtime trend, we plotted the relative tweet volume over the four-month period. This approach helped us investigate the patterns in which each topic gained (or lost) prominence relative to others, controlling for the fluctuation in public attention to the COVID-19 vaccine overall. As shown in Fig 4, Topic 2 (Infectious Disease) dominated early Twitter vaccine conversation, alongside expressions of safety concerns (Topic 3). While some topics received more sustained overtime attention (e.g., Topic 6 –Conspiracy Theory), others showed a higher degree of fluctuation in Twitter volume (e.g., Topic 8 –Vaccine Development).

Fig 4

Overtime trend in daily relative volume by topics.

Topic1 = monetary motivation, Topic 2 = infectious disease, Topic 3 = safety concern, Topic 4 = vaccine information, Topic 5 = new normal, Topic 6 = conspiracy, Topic 7 = consolidation, topic 8 = vaccine development, Topic 9 = inherent uncertainty, Topic 10 = vaccine effectiveness, Topic 11 = clinical trial, Topic 12 = coping strategies, Topic 13 = vaccine production.

Overtime trend in daily relative volume by topics.

3.1.5 Distinct topical prevalence in positive vs. negative valenced tweets

To understand the discourses of Twitter users with positive and negative perspectives on COVID-19 vaccines, we compared topical prevalence across tweets with different viewpoints on vaccination (see Table 2 and Fig 5).

Table 2

Regression analysis predicting topical prevalence by positive and negative vaccine stance.

Negative vaccine sentiment (1)^a	Topic 1	Topic 2	Topic 3	Topic 4
	Monetary Motivation	Infectious Diseases	Safety Concern	Vaccine Info
	.07^b	–.03^b	.12^b	–.00
Negative vaccine sentiment (1) ^a	Topic 5	Topic 6	Topic 7	Topic 8
	New Normal	Conspiracy	Consolidation	Development
	–.02^b	19^b	–.04^b	–.07^b
Negative vaccine sentiment (1) ^a	Topic 9	Topic 10	Topic 11	Topic 12
	Uncertainty	Effectiveness	Clinical Trial	Coping Strategy
	–.01^b	–.05^b	–.07^b	–.04^b
Negative vaccine sentiment (1) ^a	Topic 13
	Production
	–.02^b

aReference: positive vaccine discourse (0)

bP < .001.

Fig 5

Difference in topical prevalence.

Moving to the left means positive- and moving to the right means negative vaccine Twitter discourse.

Difference in topical prevalence.

Moving to the left means positive- and moving to the right means negative vaccine Twitter discourse. aReference: positive vaccine discourse (0) bP < .001. Overall, positive vaccine discussions demonstrated a wider range of perspectives, with greater attention to vaccine research updates, progress (e.g., clinal trial and safety test), and production. Additionally, compared to negative vaccine discourse, positive vaccine discussions devoted more Twitter conversation to general coping strategies and the importance of adjusting to the “new normal” before a vaccine becomes available, by taking preventive measures such as social distancing, washing hands, and wearing masks. There was also a greater emphasis on supporting medical professionals (e.g., provide PPE for health care workers while waiting for vaccines), calls for global collaboration (e.g., #inThisTogether, #defeatdiseasetogether), and comparisons between the coronavirus and other infectious diseases. Finally, while positive vaccine tweets were more likely than negative vaccine ones discuss vaccine effectiveness, they also paid significantly more attention to the inherent complexities and uncertainty associated with the COVID-19 vaccine, such as implications for those with chronic health conditions. Compared to the topical diversity in positive vaccine discourse, we see negative vaccine Twitter conversation dominated by a narrower set of narratives. Negative vaccine discourse was more likely to be about the monetary motivations behind mandatory vaccines, concerns over side effects, as well as conspiratorial beliefs.

3.2 Actors

3.2.1 Mention network

The mention network analysis also revealed distinct interaction patterns among positive and negative vaccine Twitter users (see Fig 6). To construct the mention network, we first examined the top 50 most mentioned accounts in positive and negative vaccine Twitter discourses, resulting in the 100 most influential users in total (see S4 Appendix). When an account was classified into more than two categories (e.g., a doctor who is an online influencer), the priority was placed on 1) suspended account, 2) scientist/medical source, 3) political source, 4) media/journalist, and 5) online influencer.

Fig 6

The mention network.

Overall, positive and negative vaccine discourses tended to mention distinct types of Twitter accounts. Over half of the most mentioned accounts were non-overlapping, prominent only in one group (58%). Further, among the co-mentioned influential users, political actors were the most mentioned type (38%; @realdonaldtrump, @joebiden, @berniesanders, @whitehouse, @potus, @speakerpelosi, @janeeopie, @doritmi), followed by media accounts (23.8%; @cnn, @foxnews, @nytimes, @thehill, @realcandaceo), online influencers (19%; @mcfunny, @monstercoyliar, @frankdelia7, @beckyjohnson222), and public health institutions (9.5%; @who, @cdcgov). The top co-mentioned accounts also included Bill Gates (@billgates) and one suspended user (@jkellyca).

3.2.2 Mention by account type

There was a noticeable difference in how each type of account mention was able to get attention from positive and negative vaccine users (see Fig 7). Specifically, while media/journalists and scientists/medical sources tended to get mentioned in positive vaccine tweets, political figures, online influencers and suspended accounts appeared more influential among users who have negative sentiment toward the COVID-19 vaccine.

Fig 7

Proportion of the top mention by account type and discourse category.

To confirm the above patterns, a series of Welch’s Two-Sample t-tests were performed. Welch’s t-test was chosen because it provides an alternative to traditional t-test for samples with unequal size and variance [34]. Also, to account for confounding factors such as tweet length or the difference in each group’s tendency to use @ functionality, we calculate the average mention counts for each source type at the tweet level, which gives an indicator of the relative attention a tweet devoted to a particular source type (e.g., scientists/medical sources) relative to others. Results confirm that positive vaccine discourse was more likely to mention scientists/medical sources, t(19892) = 2.54, P < .01 and media/journalists, t(20013) = 4.52, P < .001; by contrast, negative vaccine tweets tended to interact with political sources, t(18194) = –13.68, P < .001, online influencers, t(12425) = –10.21, P < .001, and suspended accounts, t(12425) = –5.58, P < .001.

4. Discussion

Our study focused on positive and negative vaccine sentiments on tweets to understand the nature of public involvement as one of the key forces shaping vaccine acceptance and policy making. Specifically, we examined a) content difference in positive versus negative tweets surrounding the COVID-19 vaccine, as well as b) prominent actors frequently mentioned by the two camps. The primary findings were summarized as follows: As to content difference, while positive vaccine discussions demonstrated a wider range of perspectives informed by research and medical professionals, negative vaccine conversation was dominated by a narrower set of narratives, such as monetary motivations behind mandatory vaccines, concerns over side effects, as well as conspiratorial beliefs. Furthermore, while positive vaccine discourse interacted with verified agents, negative vaccine discourse tended to mention online influencers and suspended accounts, suggesting possible pathways through which misinformation was spreading. Regarding the difference of prominent actors between the two camps, while positive vaccine discourse tended to be circulated by the network that consisted of scientists/medical sources and media/journalists, negative vaccine sentiment tended to be spread by another network including political sources, online influencers, and suspended accounts.

4.1 Content

Our STM findings on the prominent topics surrounding a COVID-19 vaccine revealed the way in which Twitter users attempted to understand this novel illness and the vaccine being developed to treat it. That is, users made comparisons between COVID-19 and other infectious diseases such as Polio, MMR, HPV, and most notably, influenza. For example, users discussed how the development and uptake of existing vaccines have successfully controlled the spread of infectious disease (e.g., “countless people in the world were infected with A disease and died until A vaccines were developed”). Importantly, when the same terms (e.g., influenza) were used, the context of use could be entirely different [35]. For example, users with negative attitude toward the COVID-19 vaccine, despite relatively low levels of activity in this topic, also talked about other infectious diseases, especially when they downplayed the severity of the COVID-19 (e.g., “you don’t know that seasonal flu is deadlier than coronavirus”). Indeed, false comparisons with other diseases were among the most prevalent myths on Twitter in the early stage of the pandemic [20]. While positive vaccine discourse covered a wide range of topics from vaccine narratives (e.g., vaccine development, effectiveness, safety test) to the new normal of daily life (e.g., coping strategies for COVID-19, other preventive measures, appreciation for medical professionals), negative vaccine discourse was narrowly focused on topics such as conspiracy theories and safety concerns. The discourse suggested that users with positive vaccine stance may adopt diverse viewpoints in navigating the COVID-19 vaccine issue, whereas users with negative vaccine stance tend to propagate existing anti-vaccine narratives, often in the form of unmistakably false claims including conspiracy theories, unchecked rumors, false prevention methods, and dubious cures [20] while reinforcing the old topics in the new context of COVID-19 [36]. Indeed, the prevalence of conspiratorial thinking related to COVID-19 is unsurprising given that rumors and conspiracy theories about other outbreaks of infectious diseases have long prevailed [36]. The narratives of conspiracy surrounding COVID-19 featured secret plots that connected powerful individuals (e.g., Bill Gates) or institutions to inflict intentional harms or prolonged surveillance. These conspiracy narratives were less disease focused but more politically motivated agendas (e.g., 5G Wireless, chemtrails, depopulations, etc.), all of which were designed to rouse fear and limits public’s willingness to get a COVID-19 vaccine. These results imply that public health communicators need to design effective messages to prevent the public from building mental connections between existing “old” vaccine myths and the novel disease, especially among less receptive audiences. Another predominant topic in negative vaccine discourse was concerns over vaccine safety and potential side effects. Some tweets in this topic emphasized the risk of the vaccine while downplaying the risk of the disease (e.g., “Learn the risk of vaccine! Even before the advent of the vaccine, the mortality of disease had no difference”). Whether a conscious or unconscious decision, vaccination does require weighing the risks of the disease that the vaccine is designed to prevent against the risks of vaccine side-effects [37]. Due to the rapid development of a COVID-19 vaccine and uncertainty of long-term side effects, discourses emphasizing the risks of a COVID-19 vaccine potentially outweighing the risks of the COVID-19 disease were circulated, including by legitimate sources. Yet many of these users were likely to argue that their personal perceived susceptibility to COVID-19 is quite low, but the general risk of the vaccine is high. Communication professionals need to assist the public in interpreting risks by providing useful points of references for legitimate risks vs false hysteria.

4.2 Actors

Mentioning other accounts in the posts may be an important function in spreading positive and negative vaccine discourse and potentially growing the two different networks. Our findings revealed there are two different networks that diffuse the distinctive views on a COVID-19 vaccine. The first network that tended to circulate positive vaccine discourse consisted of scientists/medical sources and media/journalists. This network is viewed as credible resources in public health due to their medical expertise or gatekeeping process [17, 18]. By contrast, another network that tended to spread negative vaccine sentiment included political sources, online influencers, and suspended accounts, defining this network as unverified resources. As online influencers are people who establish online profiles and voice opinions based on a topic with which they are familiar [37, 38], the content produced by online influencers is by nature subjective and often times extreme. In addition, political sources are increasingly polarized regarding vaccination, usually based on their ideology [39]. Public health issues are often politicized [39], with the COVID-19 vaccine a prime example of this growing phenomenon. Finally, given Twitter’s effort to limit the spread of misleading and false health claims [40], suspended accounts may be seen as hubs of misinformation. All of these actors in this network show that negative vaccine discourse is circulated within a closely connected network of dubious sources lacking information gatekeeping.

4.3 Limitations and future directions

Our study has limitations in several aspects. First, we used a random sample of data generated by a particular data collection platform Synthesio. While this is an established way of collecting data, it is not certain whether and how the sample would have bias. Similarly, since we focused only on original tweets to detect prominent topics, caution should be exercised in generalizing these findings to the entire body of tweets. We, however, opted for a more rigorous approach to avoid noise and reduce false positives in vaccine stance and topic detection. Second, this study focused on the first four months of the pandemic, which limits the ability to make conclusions about temporal patterns of topical prevalence throughout the ongoing pandemic. We chose to focus on this initial phase of vaccine development because public health communication was faced with unique challenges due to high levels of scientific uncertainty, and constantly changing information, and politicized environment [20]. Future research should extend the time period to draw a more complete picture of topical variance and temporal dynamics throughout the pandemic.

4.4 Practical implications for public health and vaccine communication

Our findings provide important insights for vaccine communication during the pandemic. Specifically, our results highlight the type of content that needs to be addressed to improve vaccine hesitancy as well as actors that need to be targeted as a conduit through which negative vaccine content diffuses, when disseminating vaccine information in order to increase the COVID-19 vaccine uptake rate. For effective content to promote the acceptance of a COVID-19 vaccine, public health and vaccine communicators could harness social media with positive stories of vaccination experiences that may be able to shift vaccine perceptions among vaccine hesitant individuals. Alternately, false, misleading, or otherwise negative content could be detected and countered on an individual or mass scale. Clearly, a substantial number of tweets in Topic 3 (concerns over vaccine safety) cited stories of people who experienced adverse reaction to vaccines as their reason for believing the risk or side effects of vaccines. Given that people are strongly swayed by personal narrative, and that those stories have a strong power to alter perceptions of risk [41], it is necessary to make positive vaccination narratives visible on social media platforms. When it comes to the actors who need to be targeted, our pattern of results, in which negative vaccine discourse was predominantly circulating within an online influencer-centered network, suggests that effective intervention should also attempt to shift the dominant source of information and interactivity patterns among Twitter users who have negative views on the COVID-19 vaccine. Notably, these actors are, despite their lack of medical expertise or vetted inspection [17, 18], considered a reputable source within many networks. Thus, partnering with online influencers could be beneficial for public health communicators to reach their followers, though encouraging involvement from this group poses its own challenges. Communication efforts may involve prominent information hubs by tagging or mentioning influential accounts. By doing so, users who have negative sentiment toward vaccines are more likely to be exposed to science-based information, even just incidentally.

5. Conclusion

While a great deal of research has focused on vaccine hesitancy online, less has paid attention to both positive and negative vaccine discourse. In this study, we focus on both vaccine sentiments equally to understand the nature of public involvement. Our findings provide insights into which topics should be addressed to improve vaccine hesitancy and which actors can be leveraged as a conduit through which vaccine-related scientific information can be spread. Public health and vaccine communicators can be more proactive in monitoring and understanding these types of users to curb the negative influence of misinformation or misleading claims.

Keywords.

(DOCX) Click here for additional data file.

Codebook for positive and negative vaccine discourse on Twitter.

(DOCX) Click here for additional data file.

The fine-tuned BERT classification accuracy and Area Under Curve (AUC) (S3 Appendix table) and the receiver operating characteristic curve (ROC) on the test set by positive and negative vaccine discourse (S1 Fig).

(DOCX) Click here for additional data file.

The top 100 most mentioned accounts in positive and negative vaccine twitter discourses.

(DOCX) Click here for additional data file. (TIF) Click here for additional data file. 2 Dec 2021

PONE-D-21-31719

Vaccine discourse during the onset of the COVID-19 pandemic: Thematic structure and source patterns informing efforts to combat vaccine hesitancy

PLOS ONE Dear Dr. Hwang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 16 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Kazutoshi Sasahara Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and " ext-link-type="uri" xlink:type="simple">https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf" 2. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. Additional Editor Comments: Both reviewers think the manuscript is important, but they also think that it needs more improvements. Please read the comments carefully and address them in the revised paper. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is a strong study that provides important insights into communication flows on Twitter during the early part of the COVID pandemic. It shows key differences between negative and positive sentiment through a comprehensive analysis (i.e., using multiple methods) of original tweets. The methods are sound, and findings support the conclusions. There also is real value here for public health communication. My main recommendation is to tighten and revise the structure of the article in order to clearly draw out the core insights and to show why these methods are required to generate those insights. In the "Descriptive" section, it is somewhat difficult for the reader to understand how each part of this sequentially presented analysis builds to the article's findings and main argument. Why, for example, does this section of the article start by tracking the volume of tweets? What does this tell us about COVID communication flows that will be integral to the study's main argument? Each of these sections in the "Descriptive" section should be introduced with a clear statement of why the authors are choosing to, in order, analyze: 1) Volume of tweets 2) Prevalence of topics 3) Topic communities 4) Relative volume of topics over time 5) Distinct topical prevalence, positive vs. negative valence 6) Same as 5, but in relationship to hashtags 7) Mention network 8) Mention by account type In other words, this is a richly detailed analysis that needs to be strung together in a more coherent fashion that, in each case along the way, builds and supports a clear argument based on the findings. More generally, the findings need to be more closely stitched together with the very clear enumeration of issues of 1) content, and 2) actors on pp. 27-28. Revising the article to support, step-by-step, this dual focus on content and actors will help bring more coherence to the methods and findings. It also would be nice to see a concise, brief summary of the study's three principal takeaways in the first paragraph of the Discussion. This could be one or two sentences following the mention of topics and actors, which then would lead into the more detailed elaborations of each individual finding below. Also helpful for maximizing the article's impact would be noting present public health communication strategies designed to counter vaccine hesitancy. These could be added to the introduction's literature review, and they would help clarify what is new about the author's conclusions and recommendations for improving these strategies and developing new strategies. A few other issues: The authors use the terms "themes," "thematic patterns," and "topics" interchangeably. I suggest sticking to a single term, such as "topic," unless there is a distinction between themes and topics that is important to the study. If so, this distinction needs to be clarified. The methods arise in mid-paragraph on p. 5. All references to methods to be contained in the methods section. Something is missing in the first sentence on p. 17. "To understand how the discourses" what? In Footnote 3 on p. 20, please clarify that @realdonaldtrump is former President Trump's Twitter account. As it now reads, "his" is a pronoun without a first reference. p. 11: Figure 1 has a typo in the title: "Volumn" p. 23: "Principle findings" should be "Principal findings" p. 24 and p. 29: the term "anti-vaxxers" is used here, without definition, though the term implies that all of these social media users were actively opposed to vaccinations. However, we know that there is a range of negative vaccine sentiments, from hesitancy and doubt, to criticism and outright opposition. Accordingly, the literature in this area tends to invoke a spectrum of vaccine confidence (see Larson, 2020; MacDonald the SAGE Working Group on Vaccine Hesitancy, 2015), and the authors open the article by referring to this spectrum. It then is confusing why the term "anti-vaxxers" appears at this point in the article. Please clarify or revise the terminology to align with the article's introduction. Reviewer #2: Overall The general area of research is topical. Overall I don't find this paper compelling, partly due to the methods, and partly due to the execution. I am not an expert in STM, so I cannot comment on that. If they could do the paper without machine classification, I think it would be improved, as I don't think these methods are successful on Twitter data--based on their results and my personal experiences. Major comments I have done extensive work using machine learning and Twitter, and have never published it because the results are always disappointing. My experience is that 140 characters is not sufficient for proper classification of most Twitter content. The authors used a machine learning method that has, in my view, high error, reporting only 71% and 75% accuracy. The ROC curves in vaccine 3 confirms fairly high combined error tradeoff between sensitivity and specificity. The fact that that 20-30% of the content is coded wrong should cause the authors to be very cautious in the interpretation of their results, and I think presents a serious challenge to publishing this paper. The use of regression models in analyzing an outcome with large measurement error (Table 2) is especially problematic. I don't know much about structural topic modelling, but these kinds of approaches are more effective when there is more text available. Again, Twitter data presents a challenge here; gathering a meaning from small quantities of text is hard. However, since i am not an expert in this method, I cannot comment further. Finally, I don't see the practical implications of the paper. The authors claim that "our results highlight the type of content that needs to be addressed to improve vaccine hesitancy" but offer no actual evidence of this. This is a desscriptive paper that does not drill into what imapats vaccine hesitancy. Moreover, changes in attitudes towards covid-19 vaccines render some of these findings somewhat obsolete; what concerns people today (in late 2021) may be different from what concerned them early in vaccine uptake. Minor comments Page 4 "social media content does not undergo an editorial processes or scientific vetting". This statement is not true. Twitter, Facebook and YouTube implemented editorial controls specific to Covid and vaccination in 2020. Page 6. Is the use of Synthesio and web scraping of Twitter legal? Is the use of this tool covered under fair use legislation/practice? Do we know that Synthesesio actually generates true and unbiased samples? Was this service paid for by the authors? Details here are important for the reader for a number of reasons. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 10 Jan 2022 Dear Editor, We appreciate the feedback from the reviewers and the opportunity to make further revisions on our manuscript for PLOS ONE. We have implemented all of the reviewers’ suggestions and believe that the manuscript is much strengthened as a result. Below, we list the feedback we received from the reviewers, and detail how we have responded to each concern in our revision of the manuscript. Reviewer #1: This is a strong study that provides important insights into communication flows on Twitter during the early part of the COVID pandemic. It shows key differences between negative and positive sentiment through a comprehensive analysis (i.e., using multiple methods) of original tweets. The methods are sound, and findings support the conclusions. There also is real value here for public health communication. My main recommendation is to tighten and revise the structure of the article in order to clearly draw out the core insights and to show why these methods are required to generate those insights. Response: We appreciate the reviewer’s constructive comment. This important comment led us to revise our paper in a way that delivers our main points more effectively. To address this comment, in the introduction, we have now clarified our dual research goals – to identify patterns in the “content” of vaccine discourses on the one hand and to key “actors” with whom users engaging with the vaccine conversation interacted on the other. This revision is presented on page 6: In the "Descriptive" section, it is somewhat difficult for the reader to understand how each part of this sequentially presented analysis builds to the article's findings and main argument. Why, for example, does this section of the article start by tracking the volume of tweets? What does this tell us about COVID communication flows that will be integral to the study's main argument? Response: Thank you for this comment. We agree with the reviewer that each finding needs to be presented in a more coherent way and strung together to show the main line of our arguments. Thus, we have now clarified why we started by presenting the volume of tweets and how it would lead readers to the next findings. This revision is presented on page 10. “We start by plotting the temporal dynamics of twitter volume during the course of our study time, with all types of tweets included, because the overall volume of tweets and how it evolved over time provides a big picture of our dataset before we delve into the examination of content of tweets.” Each of these sections in the "Descriptive" section should be introduced with a clear statement of why the authors are choosing to, in order, analyze: 1) Volume of tweets 2) Prevalence of topics 3) Topic communities 4) Relative volume of topics over time 5) Distinct topical prevalence, positive vs. negative valence 6) Same as 5, but in relationship to hashtags 7) Mention network 8) Mention by account type In other words, this is a richly detailed analysis that needs to be strung together in a more coherent fashion that, in each case along the way, builds and supports a clear argument based on the findings. Response: We appreciate this constructive feedback. The reviewer’s comment truly led us to re-organize our findings in an effective way. We have now made two changes in the results section. First, we have broken down the results section into two major sections (3.1 Content and 3.2 Actors). Second, under each section, we have assigned six subsections for 3.1 Content (3.1.1 Volume of tweet, to 3.1.6 Distinct topical prevalence in relation to hashtags) and two subsections for 3.2 Actors (3.2.1 Mention network and 3.2.2 Mention by account type). This revision allows our findings to be more clearly presented. We thank the reviewer for this very constructive comment. More generally, the findings need to be more closely stitched together with the very clear enumeration of issues of 1) content, and 2) actors on pp. 27-28. Revising the article to support, step-by-step, this dual focus on content and actors will help bring more coherence to the methods and findings. Response: We appreciate this feedback. As stated in the previous responses, we have now clarified our dual purposes for this paper and consistently emphasized them throughout the manuscript in the introduction, methods, results, and discussion. It also would be nice to see a concise, brief summary of the study's three principal takeaways in the first paragraph of the Discussion. This could be one or two sentences following the mention of topics and actors, which then would lead into the more detailed elaborations of each individual finding below. Response: Thank you for this comment. We agree with the reviewer that it would be helpful to briefly summarize our study’s principal takeaways in the first paragraph of the discussion. Thus, we have now included a summary before delving into the detailed discussion of each finding: “Our study examined a) content difference in positive versus negative tweets surrounding the COVID-19 vaccine, as well as b) prominent actors frequently mentioned by the two camps. The primary findings were summarized as follows: While positive vaccine discussions demonstrated a wider range of perspectives informed by research and medical professionals, negative vaccine conversation was dominated by a narrower set of narratives, such as monetary motivations behind mandatory vaccines, concerns over side effects, as well as conspiratorial beliefs. Furthermore, while positive vaccine discourse interacted with verified agents, negative vaccine discourse tended to mention online influencers and suspended accounts, suggesting possible pathways through which misinformation was spreading.” Also helpful for maximizing the article's impact would be noting present public health communication strategies designed to counter vaccine hesitancy. These could be added to the introduction's literature review, and they would help clarify what is new about the author's conclusions and recommendations for improving these strategies and developing new strategies. Response: Thank you for the important comment. We have now created a separate paragraph that notes the present public health communication strategies and shows how our study can provide practical implications based on the current approach. The paragraph at the end of the introduction on page 6 reads: “By doing so, our study provides important insights and practical guides for public health communicators, which, to date, predominantly focus on disseminating vaccination guidelines or schedules, rather than proactively responding to vaccine hesitancy. Thus, we suggest public health communication professionals should broaden their messages to address specific concerns as revealed in public conversations on social media, as well as leverage a diversity of actors who have been centrally situated in the vaccine-hesitant communities.” A few other issues: The authors use the terms "themes," "thematic patterns," and "topics" interchangeably. I suggest sticking to a single term, such as "topic," unless there is a distinction between themes and topics that is important to the study. If so, this distinction needs to be clarified. Response: Thank you for the important comment. Indeed, we should stick to a single term, which describes our measurement accurately. We have now used “topics” consistently and removed all the “themes.” We have also revised our title accordingly. We appreciate the reviewer’s comment. The methods arise in mid-paragraph on p. 5. All references to methods to be contained in the methods section. Response: We thank the reviewer for this comment. We have now removed any mention about methods from the introduction. We have specifically removed our statement about the five categorizations of account mentions from this section and moved it to the method section on page 10. We appreciate the reviewers’ comment. Something is missing in the first sentence on p. 17. "To understand how the discourses" what? Response: We have corrected this typo. In Footnote 3 on p. 20, please clarify that @realdonaldtrump is former President Trump's Twitter account. As it now reads, "his" is a pronoun without a first reference. Response: Thanks for this comment. We have now clarified that this account is the former President Trump’s in the footnote 3 on page 21. p. 11: Figure 1 has a typo in the title: "Volumn" Response: Thanks for the attention to detail. We have corrected this typo. p. 23: "Principle findings" should be "Principal findings" Response: We have corrected this typo. p. 24 and p. 29: the term "anti-vaxxers" is used here, without definition, though the term implies that all of these social media users were actively opposed to vaccinations. However, we know that there is a range of negative vaccine sentiments, from hesitancy and doubt, to criticism and outright opposition. Accordingly, the literature in this area tends to invoke a spectrum of vaccine confidence (see Larson, 2020; MacDonald the SAGE Working Group on Vaccine Hesitancy, 2015), and the authors open the article by referring to this spectrum. It then is confusing why the term "anti-vaxxers" appears at this point in the article. Please clarify or revise the terminology to align with the article's introduction. Response: We appreciate this important comment. We agree that the appearance of the term, “anti-vaxxers” in this stage of the paper might confuse the readers for several reasons mentioned by the reviewer. We also agree that it is inappropriate to mention this term without definition, since it may lead readers to overlook the dynamic of negative attitudes toward vaccines. We have now removed this term and replaced it with “users with negative attitude toward the COVID-19 vaccine.” We have also made sure that the expression is consistently used throughout the manuscript (e.g., positive/negative attitude toward COVID-19 vaccine) to properly reflect the nuanced nature of this spectrum in negative attitudes, as the reviewer well-pointed out. We thank the reviewer for this comment. --- Reviewer #2: Overall The general area of research is topical. Overall I don't find this paper compelling, partly due to the methods, and partly due to the execution. I am not an expert in STM, so I cannot comment on that. If they could do the paper without machine classification, I think it would be improved, as I don't think these methods are successful on Twitter data--based on their results and my personal experiences. Major comments I have done extensive work using machine learning and Twitter, and have never published it because the results are always disappointing. My experience is that 140 characters is not sufficient for proper classification of most Twitter content. The authors used a machine learning method that has, in my view, high error, reporting only 71% and 75% accuracy. The ROC curves in vaccine 3 confirms fairly high combined error tradeoff between sensitivity and specificity. The fact that that 20-30% of the content is coded wrong should cause the authors to be very cautious in the interpretation of their results, and I think presents a serious challenge to publishing this paper. The use of regression models in analyzing an outcome with large measurement error (Table 2) is especially problematic. Response: Thanks for the reviewer’s important comment. We resonate with the reviewer’s concerns that conducting machine learning, and especially with Twitter data, is not easy. As with other methodologies, machine learning-based approaches certainly have their limitations, as we acknowledged in the manuscript. Nevertheless, machine-learning approach provides a scalable solution to text classification tasks with consistent criteria, adding important insights to our existing knowledge on the topic. There are several advantages of our current approach. First, compared to other unsupervised textual analysis techniques, machine-learning allows tracking discourses that are theoretically important or comparing discourses between groups on the same criteria (e.g., discourses about anti-COVID-19 vaccine). Similarly, when compared with other research methods such as survey, machine-learning approach helps to detect naturally occurring public expression of certain topic or sentiment unobtrusively. We also want to emphasize that there are a significant number of published papers that used machine-learning method with Twitter data (Below we listed a few examples). Please also let us note that the classification solutions were validated with several additional procedures, including manual coding and structural topic modeling. The accuracy level (71% and 75%) also seems acceptable in our downstream application task, as indicated in other published work (e.g., 74.1% in Zhu et al.,2020). Lastly, as for the regression models in Table 2, please let us clarify that the numbers are coefficients in the regression model, which is not an index of measurement error. Allen, C., Tsou, M. H., Aslam, A., Nagel, A., Gawron, J. M. (2016). Applying GIS and machine learning methods to Twitter data for multiscale surveillance of influenza. PloS one, 11(7), e0157734. SteelFisher, G. K., Blendon, R. J., Caporello, H. (2021). An uncertain public—encouraging acceptance of Covid-19 vaccines. New England Journal of Medicine, 384(16), 1483-1487. Xue, H., Bai, Y., Hu, H., Liang, H. (2019). Regional level influenza study based on Twitter and machine learning method. PloS one, 14(4), e0215600. Zhu, J. M., Sarker, A., Gollust, S., Merchant, R., Grande, D. (2020). Characteristics of Twitter use by state medicaid programs in the United States: machine learning approach. Journal of medical Internet research, 22(8), e18401. I don't know much about structural topic modelling, but these kinds of approaches are more effective when there is more text available. Again, Twitter data presents a challenge here; gathering a meaning from small quantities of text is hard. However, since i am not an expert in this method, I cannot comment further. Response: Thank you for this comment. We observe the research communities that are still in the process of advancing algorithms for detecting topics in Twitter data. Nevertheless, structural topic modeling and topic modeling, while not ideal as with any other method, is an established approach for analyzing Tweets. Below are some relevant publications that use structural topic modeling for twitter data analysis. Surian, D., Nguyen, D. Q., Kennedy, G., Johnson, M., Coiera, E., Dunn, A. G. (2016). Characterizing Twitter discussions about HPV vaccines using topic modeling and community detection. Journal of medical Internet research, 18(8), e6045. Mishler, A., Crabb, E. S., Paletz, S., Hefright, B., Golonka, E. (2015, August). Using structural topic modeling to detect events and cluster Twitter users in the Ukrainian crisis. In International Conference on Human-Computer Interaction (pp. 639-644). Springer, Cham. Finally, I don't see the practical implications of the paper. The authors claim that "our results highlight the type of content that needs to be addressed to improve vaccine hesitancy" but offer no actual evidence of this. This is a desscriptive paper that does not drill into what imapats vaccine hesitancy. Moreover, changes in attitudes towards covid-19 vaccines render some of these findings somewhat obsolete; what concerns people today (in late 2021) may be different from what concerned them early in vaccine uptake. Response: Thank you for the important comment. As the reviewer pointed out, this paper is not providing direct evidence that certain content can reduce vaccine hesitancy. However, our findings do identify specific areas of concerns that can serve as leverage points to create more effective and tailored messages. We believe this is where our practical implications are. We also agree with reviewer’s comment that our results from the early phase of the pandemic might be different from the results from the current phase. Our goal of this paper, however, is to highlight vaccine discourses particularly occurring in the early stage of the pandemic. It is reasonable to expect that, when the pandemic began or when the COVID-19 vaccine first became available, the fear or levels of uncertainty about COVID-19 vaccine was unprecedented. It was also the time where all kinds of mis- and disinformation have begun to gain steam. Thus, we intended to detect topical themes, topic prevalence, and network communities regarding pro-vaccine and anti-vaccine discourses in this particular stage. On the other hand, it was also a critical point where health professionals can intervene to shape public perception and spread science-based messages. As such, it is our belief that exploring the topics and prominent actors in this critical period has values in and outside the Covid-19 context. Per the reviewers’ comment, we have now clarified our practical implication in the introduction on page 6: “By doing so, our study provides important insights and practical guides for public health communicators, which, to date, predominantly focus on disseminating vaccination guidelines or schedules, rather than proactively responding to vaccine hesitancy. Thus, we suggest public health communication professionals should broaden their messages to address specific concerns as revealed in public conversations on social media, as well as leverage a diversity of actors who have been centrally situated in the vaccine-hesitant communities.” Minor comments Page 4 "social media content does not undergo an editorial processes or scientific vetting". This statement is not true. Twitter, Facebook and YouTube implemented editorial controls specific to Covid and vaccination in 2020. Response: Thank you for the comment. Here, we wanted to emphasize that the social media contents, as expressions from individual users, do not generally go through editorial processes before entering the public sphere. Different from other types of contents such as news articles, social media contents represent individuals’ opinions in a more direct way, and with fewer constraints. As the reviewer pointed out, some platforms implemented editorial controls such as rules to curb the spread of misinformation. Nevertheless, social media contents still reflect individuals’ expression primarily because the contents are only subject to the editorial control (i.e., removal) after, not prior to production. More importantly, a variety of opinions still have presence after the editorial control. For example, though Twitter has policies in place to remove tweets containing false claims about COVID-19, there is a fine line to draw between "undisputed" scientific facts and freedom of expression. As a result, strong sentiment or opinions that contradict official recommendations or against the best available scientific evidence were often left to thrive online. Due to the fundamental difference in epistemic production, social media contents like tweets are not bound by common newsroom practice, leading to an "unedited" public sphere (Bimber Gil de Zúñiga, 2020, p.700). Bimber, B., Gil de Zúñiga, H. (2020). The unedited public sphere. New Media Society, 22(4), 700-715. We also referred to Twitter’s COVID-19 misleading information policy (https://help.twitter.com/en/rules-and-policies/medical-misinformation-policy). Thus, per the reviewer’s comment, we have now clarified our sentences: “In contrast to traditional news media or scholarly outlets, social media content does not undergo an editorial processes or scientific vetting unless violate certain rules of a platform, thereby allowing users to voice their opinions on their own terms in most cases (p.4). Page 6. Is the use of Synthesio and web scraping of Twitter legal? Is the use of this tool covered under fair use legislation/practice? Do we know that Synthesesio actually generates true and unbiased samples? Was this service paid for by the authors? Details here are important for the reader for a number of reasons. Response: Thank you for the comment. Synthesio is a widely used, paid platform to obtain social media data. It is a legal third-party platform and we were authorized to use the platform by paying for it. It is a random sample. While it is possible that data quality is compromised by restrictions such as caps on the maximum return or the total daily queries allowed to run, our data is not obviously biased since Synthesio uses random sampling. Please refer to this website for more information about Synthesio’s dashboard for social media: https://www.synthesio.com/products/social-listening/. We have clarified that it is a random sample on page 6: “We first scraped Twitter data through Synthesio, a social media monitoring tool to randomly sample 1% of public tweets using COVID-19 related keywords.” We appreciate all the constructive comments, and we believe we have strengthened our paper considerably by addressing them. Submitted filename: Response letter_final_CLEAN.docx Click here for additional data file. 7 Mar 2022

PONE-D-21-31719R1

Vaccine discourse during the onset of the COVID-19 pandemic: Topical structure and source patterns informing efforts to combat vaccine hesitancy

PLOS ONE Dear Dr. Hwang, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Apr 21 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

14 May 2022 Dear Editor, We appreciate the feedback we received from the reviewers and the opportunity to make further revisions on our manuscript for PLOS ONE. We have implemented all of the reviewers’ suggestions and believe that the manuscript is much strengthened as a result. Thank you. Below, we list the reviewers’ feedback, and detail how we have responded to each concern in our revision of the manuscript. Reviewer #1: The authors have improved the readability of the article by providing effective subheadings, by more strongly emphasizing the two-fold nature of the analysis (content/actors), and by including additional cites and cautions regarding the methods. I do think that these methods are appropriate for Twitter analysis, though they must be presented with caution, as the authors do. My only remaining question concerns the authors' recommendation of paying for the promotion of content among vaccination hesitant Twitter users, as I am unclear of ethics considerations or other potential concerns surrounding such an intervention. Are there cites that could be included to support this recommendation or at least to point out relevant debates concerning it? Response: We appreciate the reviewer’s important comment. We originally included this practical suggestion by observing the increasing efforts to reduce misleading information policies, which are very creative and proactive. This recommendation, however, should be suggested with caution because it could produce potential ethical issues, as the reviewer pointed out. Thus, after careful consideration, we have removed this recommendation on page 25. We are very grateful to the reviewer for the attention to detail. Reviewer #3: Authors explore public discourse about covid 19 via random sample of twitter and analyzed using Structural Topic Modeling and Network Analysis. I am satisfied by most of responses provided by the authors on the reviewes. However, I would argue that authors still need to highlight/defend their selection of methods. Since this same study can be conducted using multiple methods in unsupervised learning, and there is much research available for this as well, it is important that authors explain their strategy of selecting this method. I,e sentiment could also capture using other NLP methods, authors may have a reason to use the ML model instead. Response: Thank you for the constructive comment. We agree with the reviewer that a justification on our selection of supervised machine learning will further strengthen our manuscript. Thus, we have added the following paragraph on pages 7-8: “Specifically, we used supervised machine-learning method, which allows labeling a tweet as expressing positive or negative sentiment regarding COVID-19 vaccines, and in turn, classifying the rest of the tweets based on the labeled samples using the Bidirectional Encoder Representations from Transformers (BERT) machine learning algorithm. There are several advantages with our current approach. Compared to other unsupervised textual analysis techniques, supervised machine-learning allows tracking discourses that are theoretically important or comparing discourses between groups on the same criteria. In our case, our supervised machine learning approach provides a computationally efficient way to classify tweets based on target-specific sentiments (i.e., sentiment toward vaccines), rather than more generic tasks of sentiment detection. By contrast, unsupervised machine learning is to learn the inherent structure of the data without pre-provided labels [40,41]. Since the performance of the unsupervised learning-based sentiment analysis methods heavily relies on the pre-generated lexicon and corpus, if the sentiment/emotion relevant words do not exist in the lexicon or corpus, the classification accuracy will be low [40,41]. Given that COVID-19 and its vaccines involve high levels of uncertainty and novelty, as well as its nature of constantly evolving, we believe that a supervised machine-learning approach would be appropriate to provide a scalable solution to text classification tasks with pre-defined criteria.” [40] Hu, T., Wang, S., Luo, W., Zhang, M., Huang, X., Yan, Y., ... Li, Z. (2021). Revealing Public Opinion Towards COVID-19 Vaccines With Twitter Data in the United States: Spatiotemporal Perspective. Journal of Medical Internet Research, 23(9), e30854. [41] Wang, S., Huang, X., Hu, T., Zhang, M., Li, Z., Ning, H., ... Li, X. (2022). The times, they are a-changin’: tracking shifts in mental health signals from early phase to later phase of the COVID-19 pandemic in Australia. BMJ Global Health, 7(1), e007081. At the same time, it would be helpful to know how did you ended up 2 coders, and were they experienced researchers or external from this study – this will help up to navigate the validity of the codes. Response: Thank you for the comment. We had exerted extra cautions when inviting two coders to increase validity of codes as well as reduce any bias. We had two criteria. First, we intentionally excluded experienced researchers who conceptualized and designed this study from the coder pool to prevent any potential bias in codes. Second, familiarity with the primary concept and basic background or experience in content coding are of importance. Based on these criteria, two graduate assistants familiar with the domain subject and content analysis methodology, but not directly involved in designing this research, served as the coders for this study. We have now clarified this information on page 8. I am not sure how did authors classify the mention network into several subjects – were these done by authors? 1 author? inductively or deductively? Response: We appreciate this comment. We created classifications of the mention network both inductively and deductively. On one hand, we created primary categories (e.g., scientist/medical source, media/journalists, political source) based on the literature that emphasizes the degree to which the source provides access to medical expertise (inductive process). This process was designed as iterative where all members in the research team compared notes and validated with actual sample tweets until all major categories important to this study were captured. On the other hand, we identified the top 100 twitter accounts that have been mentioned by other users in our sample (Appendix 4) and manually assigned each a category label based on the literature. In this process, we recognized that there were a substantial number of accounts that didn’t fall into the primary categories; thus, two new categories– online influencer and suspended account– were added (deductive process).” It also came to our attention that one account may be classified into more than one category (e.g., a certified doctor who is also an online influencer); thus, we created a priority to classify accounts (deductive process). All of these procedures were completed based on discussion between authors. The following reference was added: [39] Hwang, J., Shah, D. V. (2019). Health information sources, perceived vaccination benefits, and maintenance of childhood vaccination schedules. Health Communication, 34(11), 1279-1288. We have now clarified this information on page 11. We are grateful for the reviewer to provide us with an opportunity to clarify our procedures in the manuscript. Topical associate represents in the figure 3 needs bit more details – how the association was derived between topics. May be one example to reflect the association would be more helpful to the reader. Response: Thank you for the comment. We agree with the reviewer that additional explanation on how to interpret the association using edges is needed. Thus, we added the following paragraph on pages 16-17. “The association between two topics (i.e., nodes) is reflected by the thickness of the undirected line connecting them (i.e., edge). The edges are weighted, calculated based on cosine similarity, with the weight indicating topic co-occurrence in documents. Specifically, the higher the weight, the more likely that two topics are discussed within a given document. For example, vaccine production is more likely to be talked about alongside other coping strategies, yet it has a low cosine similarity value with the topic of conspiracy [28]." Near the figure 4 where authors present the temporal volume, authors state “…While some topics received more sustained overtime attention (e.g., Topic 6 – Conspiracy Theory), others showed a higher degree of fluctuation in Twitter volume (e.g., Topic 8 – Vaccine Development).” – It is not possible to derive this statement from figure 4, it should be explained using figure 3 and it need to bring the connection to figure 4. Response: Thank you for the comment. We have now added a note in Figure 4 to add clarity of its interpretation as well as to bring connection. Figure 6 is not words its #tags right, ( words can be misunderstood as words in the topic,but this not the case right? - “Words featuring most prominently in positive versus negative vaccine discourses”.. Besides word cloud seems not much providing the context to what authors trying to explain in this. So its more taking space without much value. Response: We appreciate your attention to detail. We originally included results about distinct topical prevalence in relation to hashtags to further clarify the difference between positive and negative vaccine discourses. However, we also feel that these results do not provide much novel information; thus, we have decided to remove these results. Overall, I also felt the need of the coherent story and the key findings in the discussions and authors somewhat have addressed that, but this is still could be improved. Overall, I enjoyed the implication part the most, which is the essence of this this contribution, and with that I think this paper has a substantial value to the journal. We appreciate all the constructive comments. We have gone through the discussion to present our key findings more effectively. We have also tried our best to improve coherency in this section. We believe we have strengthened our paper considerably by addressing the reviewers’ comments. Thank you. Submitted filename: Response letter.docx Click here for additional data file. 21 Jun 2022

PONE-D-21-31719R2

Vaccine discourse during the onset of the COVID-19 pandemic: Topical structure and source patterns informing efforts to combat vaccine hesitancy

Both reviewers agreed that the manuscript was improved. But one reviewer required a few minor revision. you Please read comments and address accordingly. Please submit your revised manuscript by Aug 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-emailutm_source=authorlettersutm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Kazutoshi Sasahara Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: (No Response) Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: (No Response) ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This article is much improved and, in my view, needs only a couple minor revisions: On p. 10: Should "Sandford" be "Stanford"? p. 25: The authors indicated in their responses to reviewer comments that they had removed references to paying for promotion of content, yet this sentence remains in the article: "Again, paid sponsorship of influencers accounts may prove to be an effective intervention strategy." Do they still intend to include such recommendations? Reviewer #3: I thank the authos for addressing my comments in the manuscript and also explaining in the response letter. New changes appearing much improved and clear decription of the conduct and results and the story behind the results. I think the insights will bring much value to the readers. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Dilrukshi Gamage ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

27 Jun 2022 Dear Editor, We appreciate the feedback we received from the reviewers and the opportunity to make further revisions on our manuscript for PLOS ONE. We have implemented all of the reviewers’ suggestions and believe that the manuscript is much strengthened as a result. Thank you. Below, we list the reviewers’ feedback, and detail how we have responded to each concern in our revision of the manuscript. Reviewer #1: This article is much improved and, in my view, needs only a couple minor revisions: On p. 10: Should "Sandford" be "Stanford"? Response: We appreciate your attention to detail. We have corrected this typo and proofread the manuscript again. Once again, we thank the reviewer for the comment. p. 25: The authors indicated in their responses to reviewer comments that they had removed references to paying for promotion of content, yet this sentence remains in the article: "Again, paid sponsorship of influencers accounts may prove to be an effective intervention strategy." Do they still intend to include such recommendations? Response: We appreciate the reviewer’s important comment. This recommendation, along with references to paying for promotion of content, should be removed, as it could produce potential ethical issues. We apologize for not getting it right in advance. We have now removed this sentence and are very grateful to the reviewer for the comment. Reviewer #3: I thank the authos for addressing my comments in the manuscript and also explaining in the response letter. New changes appearing much improved and clear decription of the conduct and results and the story behind the results. I think the insights will bring much value to the readers. Response: We appreciate all the constructive comments in the previous round. We believe we have strengthened our paper considerably by addressing the reviewers’ comments. Thank you. Submitted filename: Response letter.docx Click here for additional data file. 30 Jun 2022 Vaccine discourse during the onset of the COVID-19 pandemic: Topical structure and source patterns informing efforts to combat vaccine hesitancy PONE-D-21-31719R3 Dear Dr. Hwang, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Kazutoshi Sasahara Academic Editor PLOS ONE Additional Editor Comments (optional): Now all the reviewers think it's properly revised. Reviewers' comments: 4 Jul 2022 PONE-D-21-31719R3 Vaccine discourse during the onset of the COVID-19 pandemic: Topical structure and source patterns informing efforts to combat vaccine hesitancy Dear Dr. Hwang: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Kazutoshi Sasahara Academic Editor PLOS ONE

25 in total

1. Considering Emotion in COVID-19 Vaccine Communication: Addressing Vaccine Hesitancy and Fostering Vaccine Confidence.

Authors: Wen-Ying Sylvia Chou; Alexandra Budenz
Journal: Health Commun Date: 2020-10-30

2. Health Information Sources and the Influenza Vaccination: The Mediating Roles of Perceived Vaccine Efficacy and Safety.

Authors: Juwon Hwang
Journal: J Health Commun Date: 2020-11-13

3. Coronavirus vaccine trials have delivered their first results - but their promise is still unclear.

Authors: Ewen Callaway
Journal: Nature Date: 2020-05 Impact factor: 49.962

Review 4. When vaccines go viral: an analysis of HPV vaccine coverage on YouTube.

Authors: Rowena Briones; Xiaoli Nan; Kelly Madden; Leah Waks
Journal: Health Commun Date: 2011-10-27

5. An analysis of the Human Papilloma Virus vaccine debate on MySpace blogs.

Authors: Jennifer Keelan; Vera Pavri; Ravin Balakrishnan; Kumanan Wilson
Journal: Vaccine Date: 2009-12-08 Impact factor: 3.641

6. Health Information Sources, Perceived Vaccination Benefits, and Maintenance of Childhood Vaccination Schedules.

Authors: Juwon Hwang; Dhavan V Shah
Journal: Health Commun Date: 2018-06-05

7. Understanding the messages and motivation of vaccine hesitant or refusing social media influencers.

Authors: Amy E Leader; Amelia Burke-Garcia; Philip M Massey; Jill B Roark
Journal: Vaccine Date: 2020-12-03 Impact factor: 3.641

8. Revealing public opinion towards COVID-19 vaccines with Twitter data in the United States: a spatiotemporal perspective.

Authors: Tao Hu; Siqin Wang; Wei Luo; Mengxi Zhang; Xiao Huang; Yingwei Yan; Regina Liu; Kelly Ly; Viraj Kacker; Bing She; Zhenlong Li
Journal: J Med Internet Res Date: 2021-07-26 Impact factor: 5.428

9. The times, they are a-changin': tracking shifts in mental health signals from early phase to later phase of the COVID-19 pandemic in Australia.

Authors: Siqin Wang; Xiao Huang; Tao Hu; Mengxi Zhang; Zhenlong Li; Huan Ning; Jonathan Corcoran; Asaduzzaman Khan; Yan Liu; Jiajia Zhang; Xiaoming Li
Journal: BMJ Glob Health Date: 2022-01

10. COVID-19-Related Infodemic and Its Impact on Public Health: A Global Social Media Analysis.

Authors: Md Saiful Islam; Tonmoy Sarkar; Sazzad Hossain Khan; Abu-Hena Mostofa Kamal; S M Murshid Hasan; Alamgir Kabir; Dalia Yeasmin; Mohammad Ariful Islam; Kamal Ibne Amin Chowdhury; Kazi Selim Anwar; Abrar Ahmad Chughtai; Holly Seale
Journal: Am J Trop Med Hyg Date: 2020-10 Impact factor: 3.707