Literature DB >> 35527790

The popularity of contradictory information about COVID-19 vaccine on social media in China.

Abstract

To eliminate the impact of contradictory information on vaccine hesitancy on social media, this research developed a framework to compare the popularity of information expressing contradictory attitudes towards COVID-19 vaccine or vaccination, mine the similarities and differences among contradictory information's characteristics, and determine which factors influenced the popularity mostly. We called Sina Weibo API to collect data. Firstly, to extract multi-dimensional features from original tweets and quantify their popularity, content analysis, sentiment computing and k-medoids clustering were used. Statistical analysis showed that anti-vaccine tweets were more popular than pro-vaccine tweets, but not significant. Then, by visualizing the features' centrality and clustering in information-feature networks, we found that there were differences in text characteristics, information display dimension, topic, sentiment, readability, posters' characteristics of the original tweets expressing different attitudes. Finally, we employed regression models and SHapley Additive exPlanations to explore and explain the relationship between tweets' popularity and content and contextual features. Suggestions for adjusting the organizational strategy of contradictory information to control its popularity from different dimensions, such as poster's influence, activity and identity, tweets' topic, sentiment, readability were proposed, to reduce vaccine hesitancy.

Entities: Chemical

Keywords: Attitude; COVID-19 vaccine; Content feature; Contextual feature; Information popularity; Weibo

Year: 2022 PMID： 35527790 PMCID： PMC9068608 DOI： 10.1016/j.chb.2022.107320

Source DB: PubMed Journal: Comput Human Behav ISSN： 0747-5632

Introduction

In China, as of April 8, 2020, the number of confirmed cases of COVID-19 reached approximately 80,000. Although physical preventive measures such as wearing masks and social distancing effectively cut off the spread of the virus, long-term control of the COVID-19 pandemic hinged on the development and uptake of vaccines (Chou & Budenz, 2020). In March 2020, an anonymous cross-sectional survey, conducted online among Chinese adults, showed that 91.3% of participants would accept COVID-19 vaccination after the vaccine became available, among whom 52.2% wanted to get vaccinated as soon as possible, while others would delay vaccination until the vaccine’ safety was confirmed (J. Wang, Jing, et al., 2020). As a preventive innovation, vaccines’ diffusion and adoption are inevitably influenced by the competing dissemination of contradictory information expressing different attitudes towards vaccine and vaccination on social media (Cohen & Head, 2013; Pan & Di Zhang, 2020). Social media such as: Twitter (Jamison et al., 2020), Facebook (Xu & Guo, 2018), Instagram (Massey et al., 2020), YouTube (Ekram et al., 2019) etc., is not only an important resource for obtaining health information, but also serves as a breeding ground of health misinformation (Y. Wang, McKee, et al., 2019). Information cues, such as “getting COVID-19 vaccination can effectively prevent COVID-19 infection, but meanwhile causing side effects, like fatigue, sore arms”, are insufficient or insufficiently cogent, individuals could not accurately predict their outcomes (Mishel, 1988), which leads to confusion and negative beliefs about vaccine or vaccination in the context of health communication (Nagler et al., 2019). Uncertainty management theory (Brashers, 2001) concludes that, exposure to this two-sided health information would increase the ambivalence of messages, encourage people to be reluctant to follow health recommendations, and implement harmful or even dangerous health decisions and behaviors (Chang, 2013), namely vaccine hesitancy in this case. Social media provides multiple interactive perspectives (such as posting, liking, retweeting, commenting, etc.) to encourage “dialogue” and compete for the limited attention of users (Zhu et al., 2020). Therefore, it plays a powerful role in popularizing pro-vaccine and anti-vaccine arguments (Jamison et al., 2020). Weibo in China serves equivalent to Twitter (Pulido Rodríguez et al., 2020). The key to solving vaccine hesitancy among Chinese, which could also serve as reference for other communities, states and even countries to improve immunization rates, was to facilitate the victory of pro-vaccine messages in the competitive dissemination with anti-vaccine messages to dominate public opinion, increasing the consistency of online opinions. Only after making sense of what subjects about COVID-19 vaccine and vaccination were disseminating on Weibo, how popular these subjects were among social media users, and what items contributed to their popularity, can we provide urgent insights about online vaccine promotion for public health communication and education programs from the perspective of the relationship between information characteristics and its popularity.

Relevant researches

Vaccine hesitancy

Vaccine hesitancy referred to an attitude (doubts, concerns) as well as a behavior (refusing some/many vaccines, delaying vaccination), which was complex and context-specific, varying across time, place and vaccines (MacDonald, 2015). Most researches explored vaccine hesitancy's scope and determinants based on self-reported attitude and behavior data from the perspective of vaccinators. J. Wang et al., 2020 conducted an anonymous cross-sectional online survey to evaluate the acceptance of COVID-19 vaccine among Chinese adults in March 2020, and performed multivariate logistic regression to identify factors considered by individual during his/her decision process, involving perceived-risk and impact of COVID-19, attribute-preferences of vaccines (effectiveness, safety, source, cost, means to get vaccinated). Similar research was conducted in USA (Khubchandani et al., 2021), Italy (Biasio et al., 2020), and Syrians (Labban et al., 2020). Lazarus et al. (2021) expanded the research globally, concluded that public trusted government-sourced information more, thus more likely to accept vaccination. Limited researches shifted attention to health communication on social media. Elkin et al. (2020) input the personal profile and post to code author's vaccine attitude on Google, Facebook and YouTube. Jamison et al. (2020) combined manual content analysis and Latent Dirichlet Allocation (LDA) model to mine posts' vaccine-topics on Twitter. Ittefaq et al. (2021) analyzed topics about polio vaccine in online news comments in Pakistan. Du et al. (2020), adapting Health Belief Model (HBM) and Theory of Planned Behavior (TPB) as framework, used deep learning to detect and summarize topics about HPV vaccine on Twitter.

Contradictory information about vaccine

Contradictory information is defined as logically inconsistent statements (Carpenter et al., 2016). Inconsistencies can be found in true and false messages or in scientifically recognized positive and negative findings of certain issues (Pan & Di Zhang, 2020). Some researches cited game theory to construct competitive propagation models of contradictory information, then analyzed propagation results (game equilibrium states: dominance, polarization and consensus) and detected influencing factors on results (number of initial spreaders, participation degree, and network structures) through computational experiments at a macro level (Huang et al., 2021; Sun et al., 2019; Vasconcelos et al., 2019). They focused on the interaction between contradictory information. Other researches concentrated on analyzing similarities and differences among online contradictory information's characteristics, and comparing contradictory information's dissemination effectiveness at a micro level. Limited researches further explored the relationship between the above two. Chou and Budenz (2020) claimed that anti-vaccine messages contained stronger anger than pro-vaccine messages. Xu and Guo (2018) used word clouds and networks to visualize the word usage and clustering in pro- and anti-vaccine headlines searched from Google, and combined text mining and sentiment analysis, declaring that pro-vaccine information's emotion was more positive. They then compared headlines' popularity (sum of shares on Facebook, Google+, LinkedIn, Pinterest, and StumbleUpon, reactions, and comments on Facebook), finding that anti-vaccine information was more popular. Finally using statistical analysis, they declared that the number of sentiment-words positively influenced pro-vaccine information's popularity, while which had insignificant effect on anti-vaccine's popularity. Massey et al. (2020) analyzed topics (coded based on HBM), sentiment, images, and social media features (links, “mention”, location in text) as well as posters' identities of pro-vaccine and anti-vaccine (HPV) tweets on Instagram. Through univariate, bivariate, and network analysis, they detected frequently used features and their clustering, indicating that pro-vaccine tweets got more likes. Ekram et al. (2019) discovered that there was no significant difference in the popularity (considering number of views, likes, dislikes, and comments) of videos expressing different attitudes about HPV vaccines on YouTube, and most of videos were either negative or neutral in tone, which was not a predictor of popularity, but topics about side effects, safety, conspiracy theories caught more attention. W. Wang et al., 2020 focused on messages about HPV vaccines from Chinese websites and WeChat public accounts in 2019, indicating that over 90% of messages were difficult to read, and topics about vaccine's effectiveness were mostly emphasized. Gandhi et al. (2020) searched posts about influenza vaccine on Facebook, finding that anti-vaccine posts were shared and liked more than pro-vaccine posts, there was no correlation between ease of reading and popularity, and pro-vaccine personal post by a nurse was the most popular.

Research questions

Researches evaluating vaccine hesitancy lacked sufficient mining of health information on social media. Some researches set initial conditions and interaction rules to model communication process. Although having analyzed how characteristics affected information receivers' cognitive decision-making, they inevitably oversimplified the complex communication mechanism of contradictory information, whose conclusions were not robust enough. Other researches about its characteristics and popularity, proposed diverse features from information's source and content, which needed to be logically summarized by a unified framework to suit different-form information. Besides, they lacked in-depth modeling for each feature, and ignored that stakeholders had different habits of creating and adapting information in contradictory information-environment. Results varied from social media, feature-dimensions, measurement of popularity. To fill these gaps, we took COVID-19 vaccine in China as an example, stating that vaccine hesitancy's scale could be reflected by the popularity of information expressing contradictory attitudes on Weibo. We established: Were there significant differences among the popularity of tweets expressing different attitudes towards COVID-19 vaccine or vaccination? Which attitude was generally more popular, about what topics, and from whom? What were the similarities and differences of characteristics among tweets expressing different attitudes? How characteristics influenced the popularity of tweets expressing different attitudes? Positive or negative?

Methods

Fig. 1 outlined the research framework.

Fig. 1

Research framework.

Data collection and preprocessing

We firstly called Sina Weibo Application Program Interface (API) to crawl original tweets, which contained the keywords, “COVID-19 vaccine (新冠疫苗)” or “COVID-19 vaccination (新冠疫苗接种)”, and their posters’ information from January 23, 2020 to February 11, 2021. This period covered the entire process of the first outbreak and cessation of COVID-19 epidemic in China, as well as the initial stage of vaccine development and promotion. Due to the timeliness, the interaction (i.e., retweet, comment or like) data of an original tweet tended to stabilize after one week it was released (Wang et al., 2015; Wang et al., 2019). Hence, we traversed the retweet-list and comment-list of each original tweet within one week since it was released, and crawled the tweets and user information of retweet/comment. The initial dataset contained 29,218 original tweets, corresponding to 50,693 retweets and 50,796 comments. Then came preprocessing. We deleted the low-influence original tweets whose number of likes, retweets or comments was 0 (3062 original tweets remained.). Next, we invited two trained professionals to annotate the 3062 original tweets. If it contained above keywords but talked about unrelated topics, it was coded as ‘N’; if not, it was ‘Y’. The coders conducted the intercoder reliability test (Krippendorff, 2011) based on 10% of tweets (κ = 0.958). After eliminating differences and reaching agreement through discussion, they marked the remaining samples. We deleted 375 original tweets coded as ‘N’. The corresponding retweets and comments as well as user information were also eliminated (2687 original tweets, their 40,325 retweets and 38,865 comments remained.). In Fig. 2 , as soon as the epidemic broke out, discussions about vaccines arose (Wuhan began to close on January 23, 2020). Even if the epidemic became under control, vaccine discussions continued to rise until the end of 2020.

Fig. 2

The number of original tweets during the period.

Text categorization

We classified original tweets into four categories according to the attitude expressed in each original tweet based on the theory of planned behavior (TPB) (Du et al., 2020). TPB believes that attitudes, subjective norms, and perceived behavioral control drive individuals’ intention to perform health behaviors (Ajzen, 1991). We focused only on an amalgamated construct of attitude due to the low prevalence of other constructs in data set, though they also influence vaccination behavior. Two trained professionals were invited to annotate the attitude for 10% of samples, passing through intercoder reliability tests (Krippendorff, 2011) (κ = 0.942). After repeating review and eliminating disagreements, they marked the remaining samples. Coding scheme was shown in Table 1 .

Table 1

Definitions of key constructs of Theory of Planned Behavior (TPB) found in original tweets.

Construct	Attitudes	Examples in samples
Approving attitude	Approve of COVID-19 vaccine or vaccination	“The number of COVID-19 cases in the world has exceeded 100 million, get vaccinated quickly!”
Disapproving attitude	Disapprove of COVID-19 vaccine or vaccination	“COVID-19 Vaccination is associated with serious side effects, stay away from it!”
Querying attitude	Query COVID-19 vaccine or vaccination	“COVID-19 Vaccination price may be 200 RMB/pc, is it necessary to vaccinate COVID-19 vaccine?”
Neutral attitude	Stay neutral towards COVID-19 vaccine or vaccination	“The COVID-19 vaccine has obvious protective effect only after 35 days of inoculation”

Definitions of key constructs of Theory of Planned Behavior (TPB) found in original tweets. Definitions of key constructs of Health Belief Model (HBM) found in original tweets.

Popularity index construction

To evaluate the effectiveness of rumor rebuttals on social media, Li et al. (2021) proposed rumor refutation effectiveness index (REI), measured as: was the number of likes the original tweets received, was the number of retweets, was the ratio of retweets by influential users (Influential accounts on Sina Weibo are stamped with the letter “V”), was the number of positive comments. Likes imply that users approve of the tweet or are interested in it (Del Vicario et al., 2017; Massey et al., 2020; Schmidt et al., 2018) More retweets mean higher credibility and stronger sharing intention (Del Vicario et al., 2017; Lee & Oh, 2017; Schmidt et al., 2018; Zeng et al., 2019). Positive comments indicate audiences' supports, while negative comments indicate mistrusting (Wang & Song, 2020; Zeng et al., 2019). This study used the same formula to calculate the popularity index (PI) of each original tweet. and measured popularity from the scale of information dissemination, while and measured popularity from the quality of information dissemination (Fu & Oh, 2019). To count positive comments, we firstly deleted irrelevant comments and converted traditional Chinese characters to simplified ones in each comment. Comments usually contained emojis which could complement semantics and express emotions (Zhang et al., 2019), and one emoji may have different meanings when being used to discuss about different topics. So we manually converted emojis into corresponding text according to the context of the comment. Finally, we adapted Baidu's AipNLP (Hong et al., 2021) to compute the sentiment positive probability () for each comment. If , this comment was regarded as positive.

Factor extraction

Humans mainly process information in two modes: systematics and heuristics (Chaiken, 1980). From the systematic view of persuasion, social media users make behavioral decisions based on their perception of information quality displayed in the content (Ghaisani et al., 2017). From the heuristic view of persuasion, information recipients may rely on the more accessible contextual cues than content characteristics (Chaiken, 1980), because excessive online information may reduce users' motivation to scrutinize content carefully (Alsmadi & O'Brien, 2020). This research comprehensively considered the impact of content and contextual factors of information on its popularity.

Content factors

Content factors of each original tweet involved general text characteristics (Li et al., 2021; Massey et al., 2020), information display dimension (Image and video were vivid and straightforward; link could direct readers to external webpages for more information (Fu et al., 2017; Li et al., 2021; Massey et al., 2020).), topic and sentiment (Chou & Budenz, 2020; Jamison et al., 2020; Li et al., 2021; Massey et al., 2020), readability (W. Wang, Jing, et al., 2020), summarized in Table 3 .

Table 3

Each original tweet's content and contextual factors that might affect its PI.

		Variable	Description
Content factors	General text characteristics	text_length	the number of Chinese characters
		num_sentence	the number of sentences
		num_first_person	the number of first-person, e.g. I (“我”)
		num_number	the number of numeric
		num_noun	the number of nouns
		num_verb	the number of verbs
		num_adj	the number of adjectives
		num_adv	the number of adverbs
		num_emo	the number of emojis
		num_@	the number of “@” (mention)
		num_!	the number of “!”
		num_?	the number of “?”
		num_#	the number of “#” (hashtag)
		num_place	the number of place names
		location_included	the poster stated his/her location in the original tweet, yes or no
	Information display dimension	link_ included	it contained one or more links, yes or no
		image_ included	it contained one or more images, yes or no
		video_ included	it contained one or more videos, yes or no
	Topic	“risk”, “severity”, “effectiveness”, “adverse_effects, “cost”, “fake_vaccine”, “security”, “conspiracy”, “means”, “dos_don'ts”, “domestic”, “foreign”, “experience”
	Sentiment	positive_probability	α∈ [0,1]
		emotional_intensity	β∈ [0,1]
		emotional_fluctuation	f∈ [0,1]
		emotional_trend	“rise”, “fall”, “rise_fall”, “stable”
	Readability	proportion_passive	the proportion of passive sentences
		aver_sentence	the average length of sentences
		proportion_prep	the proportion of prepositions
		num_ term	the number of medical terms
Contextual factors	Posters' characteristics	is_V	marked with the letter “V”, yes or no
		num_tweet	the number of tweets he/she already posted.
		num_fan	the number of fans
		identity	“government”, “traditional_media”, “self_media”, “organization”, “platform”, “medical_company”, “common_company”, “campus”, “medical_personnel”, “common_personnel”

Each original tweet's content and contextual factors that might affect its PI. We coded the topic of each original tweet based on the health belief model (HBM). HBM believes that the motivation of individuals to adopt preventive health behaviors (e.g. vaccination) is affected by six factors: perceived susceptibility, perceived severity, perceived benefits, perceived barriers, cues to action, and self-efficacy (Champion & Skinner, 2008). Due to the low prevalence of self-efficacy in data set, we focused on the other five constructs. The two professionals firstly annotated the topic for 10% of samples, passing through intercoder reliability tests (Krippendorff, 2011) (κ = 0.973). After adding/deleting the coding scheme from Du et al. (2020), they marked the remaining samples. The final scheme was shown in Table 2.

Table 2

Definitions of key constructs of Health Belief Model (HBM) found in original tweets.

Construct	Topics	Examples in samples
Perceived susceptibility	Risk of getting COVID-19 infection.	“The number of COVID-19 cases in the world has exceeded 100 million, get vaccinated quickly!”
Perceived severity	Severity of getting COVID-19 infection or refusing COVID-19 vaccination.	“COVID-19 causes severe sequelae, not getting vaccinated is like facing death.”
Perceived benefits	Effectiveness of COVID-19 vaccination.	“COVID-19 Vaccination not only protects against infection, but also reduces contagion.”“The COVID-19 vaccine has obvious protective effect only after 35 days of inoculation”
Perceived barriers	Adverse effects of COVID-19 vaccination	“COVID-19 Vaccination is associated with serious side effects, stay away from it!”
	Cost of COVID-19 vaccination	“COVID-19 Vaccination price may be 200 RMB/pc, is it necessary to vaccinate COVID-19 vaccine”
	Fake (Counterfeit vaccines, fraudulent information)	“Some institutions use normal saline to make fake COVID-19 vaccines.”
	Safety (novelty, infectivity of the vaccine and the standardization of vaccination process)	“COVID-19 vaccine is produced with relatively new technology, and its safety performance cannot be totally guaranteed.”
	Conspiracy theory	“COVID-19 Vaccinations are a scam!”
Cues to action	Means or channels to get vaccination	“After making an appointment online for COVID-19 vaccination, you can get vaccinated in the community where you live.”
	Dos and don'ts for vaccination	“Do not eat foods that are prone to allergies, such as seafood, for a day or two after getting COVID-19 vaccine.”
	Domestic vaccine development, production and vaccination	“More than 14 million people in China have been vaccinated with COVID-19 vaccine.”
	Foreign vaccine development, production and vaccination	“1.5 million people in the UK have reportedly received at least one dose of COVID-19 vaccine.”
	Personal experience of vaccination	“On February 5, 2021, I finished the first injection of COVID-19 vaccine and made an appointment for the second injection on February 20, without discomfort.”

The emotional positivity expressed in tweets affected audiences' retweeting (Saura et al., 2019). The emotional intensity amplified the information's vividness, making the publisher's standpoint seem more extreme and more likely to trigger feedback, like comments (Huffaker, 2010). The emotional trend and fluctuation also mattered (Li et al., 2021). We adapted Baidu's AipNLP (Hong et al., 2021) to compute the sentiment positive probability of each original tweet (, higher value meant more positive emotion). The emotional intensity , referring to Zhang and Zhang (2014), defined as: To describe emotional fluctuation, we firstly split the text into separate sequential sentences and computed the positive probability of each sentence, then calculated the standard deviation of all sentences' positive probabilities (Li et al., 2021). To measure emotional trend, we firstly converted each original tweet to a vector in which each component represented each sentence's positive probability. Due to the different number of sentences in different tweets, then combined Dynamic Time Warping (DTW) (Berndt & Clifford, 1994) to align the score vectors of all the tweets. To classify the emotional trends for these vectors (tweets), K-means (Hartigan & Wong, 1979), K-medoids (Park & Jun 2009) and K-shape Clustering Method (Paparrizos & Gravano, 2015) were compared. The effect and the interpretability of each cluster obtained by K-medoids Clustering Method were the strongest. Hence, K-medoids Clustering Method was implemented on aligned vectors to classify the emotional trends. The optimal number of clusters was 4, that is to say, the emotional trends are classified into 4 categories, namely, ‘‘rise’‘, ‘‘fall’‘, ‘‘first rise and then fall’‘, “stable”. Readability of online vaccine information affected public's immunization willingness (MacLean et al., 2019; W. Wang, Jing, et al., 2020; Xu et al., 2019). Flesch Reading Ease formula, Flesch-Kincaid Grade Level, Fog Scale and SMOG Index can be used to measure readability (Ley & Florio, 1996). However, these functions were neither specific to health information nor suitable for Chinese languages. Therefore, we constructed four indicators to measure readability, summarized in Table 3. Compared to the active voice, passive voice was more difficult to be understood by readers in Chinese daily language situation (Hsu et al., 2020). Sentences written in a passive voice often used more characters and prepositional phrases, which could obscure the intended meaning (Hsu et al., 2020). The terminology used in medical consultations might contribute to insecurity and anxiety (Peters et al., 2016). COVID Term (National Population Health, 2020) contained 442 COVID-related terms' full names in Chinese and English, involving disease, virus, symptoms and signs, infected population, epidemic prevention and control, psychological assistance, etc. THUOCL (HanShiyi et al., 2016) contained 18,749 common medical terms in Chinese derived from social media. Regarding words in above two thesaurus as medical terms, we counted the number of medical terms appearing in each original tweet.

Contextual factors

Author's influence, as heuristic cues to clarify source identity and activity, was critical for assigning credibility to a given message (Massey et al., 2020; Zareie et al., 2019). The number of tweets (Noro et al., 2013; Riquelme & González-Cantergiani, 2016) and fans (Cappelletti & Sastry, 2012) of posters, and whether their accounts were stamped with the letter “V”, derived from user profiles, were considered. In addition, researchers claimed that tweets posted by news media were retweeted more frequently than tweets posted by common users (Cha et al., 2012), and vaccine information from health accounts gained more likes than non-health ones (Massey et al., 2020). We categorized posters' stakeholder-identities by matching keywords in their personal authentication, introduction, and tags. Referring to the identity-keyword list from An and Ou (2017), firstly manually marking the identity of 10% of posters (two coders' intercoder reliability tests: κ = 0.971), we modified and expanded the list, then determined 10 categories, shown in Table 3. Using the new list, remaining posters' identities were finally coded automatically.

Statistics analysis

One-Way analysis of variance is used to infer the significant differences among three or more independent groups’ averages of a variable (Bewick et al., 2004). To answer RQ1, we used it to compare the popularity indexes of original tweets with different attitudes.

Network analysis of tweets and factors

To explore the central characteristics and their clustering from original tweets with different attitudes towards vaccines to answer RQ2, this research established three affiliation networks, whose nodes contained original tweets and their characteristics (Faust, 1997). Original tweets coded as “approve” attitude were used to establish the “approve” network, coded as “disapprove” attitude were for the “disapprove” network, coded as “query” or “neutral” attitude were for the “unclear” network. Firstly, for each continuous variable (A) in Table 3, we calculated its first-quartile and third-quartile among all tweets, then we transferred A into three sub-categorical variables: A_low (A < first-quartile); A_medium (first-quartile ≤ A < third-quartile); A_high (A ≥ third-quartile). For each categorical variable (B) in Table 3, like “emotional_trend”, we transferred “emotional_trend” into 4 (number of possible values of “emotional_trend”) sub-categorical variables (emotional_trend_rise; emotional_trend_fall; emotional_trend_rise_fall; emotional_trend_stable etc.). We acquired 106 sub-features. Then, in the “approve” network, if tweet i had the sub-feature j, then there was a link from tweet i to sub-feature j. Following the same method, we established the other two networks, using Gephi (Bastian et al., 2009) to visualize these directed but unweighted networks and calculate the in-degree centrality for each sub-feature node which indicated how connected or popular a single node was (Farooq et al., 2018). Finally, Gephi's community detection algorithm (Kauffman et al., 2014) was adapted to detect frequent combination of sub-features in each network.

Regression model establishment

Linear regression models, like Lasso (Ranstam & Cook, 2018) and Ridge (McDonald, 2009), are commonly used. Support Vector regression model (SVR) maps the linear inseparable sample points in the low-dimensional space to the high-dimensional linear separable feature space through nonlinear mapping, and then performs linear regression (Ahmad et al., 2020). Random Forest Regressor (Pedregosa et al., 2017), Extreme Gradient Boosting regression model (XGBoostRegressor) (Dong et al., 2020) and Light Gradient Boosting Machine regression model (LGBMRegressor) (Ke et al., 2017) are integrated learning algorithms based on decision tree regression. To answer RQ3, namely to explore the relationship between PI and its possible affecting factors, the above six models were established on three data sets respectively (original tweets coded as “approve” attitude; original tweets coded as “disapprove” attitude; original tweets coded as “query” or “neutral” attitude). For each data set, 75% of which as training-set was used to find each model's optimal function to fit the data, remaining 25% as testing-set was used to evaluate the performance of the trained optimal function. We used the mean absolute error and mean square error to compare six models' optimal function's performance. Cox and Wermuth (1992) reminded that the correlation coefficient was not suitable to judge the effectiveness of regression models, especially for linear regression, so we did not consider it in this study. The smaller and meant the less error between the actual and predicted values (PI). SHapley Additive exPlanations (SHAP) (Lundberg & Lee, 2017) was employed on the best model to explain the regression results for each data set.

Results

Descriptive statistics

In Fig. 3 , most tweets’ attitudes were clear (“approve” or “disapprove”). Tweets about domestic status, self-experience, risk, severity, foreign status, and means to get vaccine mostly supported vaccine. While tweets about fake vaccine and conspiracy (vaccine nationalism, terrorism, stigmatization, racial discrimination, religion, monopoly, ethics, pseudoscience) mostly held “disapprove” attitude. Tweets about adverse effects and cost were highly controversial. Most tweets from stakeholders supported vaccine.

Fig. 3

Percentages of original tweets expressing different attitudes belonged to different topics and from different stakeholders.

Percentages of original tweets expressing different attitudes belonged to different topics and from different stakeholders. Fig. 4 showed that tweets about COVID-19 vaccine were long. Some used many numbers to declare the scale of infected people to emphasize the risk of infection and the urgency of vaccination, meanwhile conveying the number of people who had been vaccinated at home or abroad. One tweet using multiple “#” meant poster quoting multiple hashtags to make the tweet more searchable, and citing place-name made content detailed and focused. Vaccine sentiment among tweets was polarized, with high emotional intensity and strong emotional fluctuation. Passive voice, prepositions and professional terms rarely appeared in each tweet, which meant that the text was readable. High number of tweets and fans meant that these original posters were highly active and influential on social media.

Fig. 4

The distribution of features (continuous variables) among original tweets. The three horizontal lines from top to bottom represented the maximum, median, and minimum values. The horizontal width of the shadow represented the number of tweets whose feature took the value this horizontal line points to. In Fig. 5 , few posters displayed their current location when posting. Most tweets contained external links, images, videos, and were published by traditional media, self-media, government or general public marked with “V”. Most tweets’ emotional trends were not stable.

Fig. 5

The distribution of features (categorical variables) among original tweets.

Comparison for popularity indexes of vaccine tweets with different attitudes

In Fig. 6 , tweets whose topic and attitude were “fake_vaccine-approve”, “dos_don't-approve”, “cost-query”, “conspiracy-approve”, “security-query”, “security-approve”, were more popular. Notably, there was one tweet about fake vaccine, five tweets about conspiracy all supporting vaccine, and nine tweets about domestic status against vaccine. In response to rumors and conspiracy theories, government, traditional media, and self-media actively refuted rumors, guided the public to establish a correct view of a great country to promote the fair distribution of vaccines around the world. These positive speeches widely spread among public. However, the negative evaluation of domestic vaccines by a few traditional media and self-media also attracted widespread attention.

Fig. 6

The average popularity indexes of tweets with different attitudes under different topics.

The average popularity indexes of tweets with different attitudes under different topics. In Fig. 7 , tweets whose source and attitude were “platform-query”, “medical_personnel-query”, “common_company-disapprove”, “traditional_media-disapprove”, “traditional_media-approve”, “self_media-query”, “self_media-disapprove” were more popular. Medical companies and campus only expressed deterministic attitudes. The former, as vaccine providers, showed support. The latter was responsible for the health of students, making careful decisions for or against vaccines. Most popular anti-vaccine tweets were created by non-medical companies, while pro-vaccine tweets were from traditional media.

Fig. 7

The average popularity indexes of tweets with different attitudes posted by users with different identities.

The average popularity indexes of tweets with different attitudes posted by users with different identities. Although the average popularity index of tweets holding “disapprove” attitude (5.12) was slightly higher than tweets holding “approve” attitude (5.08), “query” attitude (5.09), and “neutral” attitude (4.69). The results of One-Way analysis of variance showed that there was no significant difference among popularity indexes of tweets expressing different attitudes (“approve-disapprove”: p = 0.999; “approve-query”: p = 1.000; “approve-neutral”: p = 0.640; “disapprove-query”: p = 1.000; “disapprove-neutral”: p = 0.564; “query-neutral”: p = 0.807).

Characteristics of vaccine tweets with different attitudes

The number of tweets coded as “query” or “neutral” was low, and they both meant unclear attitudes. Therefore, we combined this two data sets, overviewed in Table 4 .

Table 4

Network overview.

	Original tweet nodes	Attribute nodes	Edges
Approve network	1709	106	52,979
Disapprove network	784	106	24,304
Unclear network	194	106	6014

Network overview. In Fig. 8 , the “approve” network was visualized using Gephi's Fruchterman Reingold layout algorithm (Grandjean, 2015). The node's color was consistent with the feature's name (tweet-nodes set to light green). Sub-features’ names were labeled out. The node's size was proportional to its in-degree. The line's color was consistent with the targeted feature-node. Fig. 9, Fig. 10 followed the same settings. In Table 5 , tweets with different attitudes shared some commonalities. They had low number of passive sentences, “@“, emojis, “!“, first-person, and contained no videos. Posters with “V” would not like to state their geographic location. Differences were that “approve” tweets contained more “#“, emotional fluctuation in “disapprove” tweets was stronger, and “unclear” tweets (“query”, “neutral”) contained fewer clues (links, images).

Fig. 8

“Approve” network.

Fig. 9

“Disapprove” network.

Fig. 10

“Unclear” network.

Table 5

Top 10 in-degree centrality.

Rank	Approve network	Disapprove network	Unclear network
1	proportion_passive_low	location_not_included	proportion_passive_low
2	location_not_included	proportion_passive_low	location_not_included
3	is_V	is_V	num_@_low
4	num_?_low	num_@_low	num_!_low
5	num_@_low	num_emo_low	is_V
6	num_emo_low	video_not_included	num_emo_low
7	num_!_low	num_!_low	video_not_included
8	video_not_included	num_?_low	num_first_person_low
9	num_first_person_low	num_first_person_low	link_not_included
10	num_#_Medium	emotional_fluctuation_Medium	image_not_included

“Approve” network. “Disapprove” network. “Unclear” network. Top 10 in-degree centrality. In Fig. 11 , based on detected features' combination, we could dig out the writing pattern of self-media from approve-community 1. They usually used short, easy-to-understand language to express support, preferred videos over links to increase information capacity, and adapted relatively stable emotional expression, rather than large emotional swings (“stable”, low emotional fluctuation). In approve-community 2, tweets about “foreign” were long and complex (prepositions, professional terms, place nouns), with significant emotional change (“rise_fall”, medium emotional fluctuation). In approve-community 3, government usually informed vaccines’ effectiveness and domestic vaccination status, using professional terms but rarely expressed strong emotions. In approve-community 4, different from self-media, emotional trends of tweets from traditional media were “fall”. In approve-community 5, tweets about individual experience were positive (high positive probability, high emotional intensity, rise emotional trend).

Fig. 11

Communities in the “approve” network and “disapprove” network.

Communities in the “approve” network and “disapprove” network. In disapprove_community 2, the posting mode of self-media when expressing “disapprove” attitude was similar to “approve” attitude. In disapprove_community 3, tweets involving side effects were poorly readable (long, many professional terms, preposition, etc.) In Fig. 12 community 1, the posting mode of self-media when expressing “query” or “unknown” attitude was similar to “approve” attitude. In community 2, non-medical companies, without “V”, created messages about vaccination channels. In community 3, when traditional media expressed uncertainty about vaccine prices at home and abroad, emotions were relatively negative, and showed a downward trend, but the intensity was not strong. In community 4, although government and social media platform's accounts did not express clear views on vaccines' effectiveness, they expressed optimistic expectations.

Fig. 12

Communities in the “unclear” network.

Features influence popularity indexes of vaccine tweets with different attitudes

In Fig. 13 , trained RandomForestRegressor on each dataset performed best. Based on the fitted RandomForestRegressor, the following analyses are carried out.

Fig. 13

Performance of models on “approve” tweets (a), “disapprove” tweets (b), “unclear” tweets (c).

Performance of models on “approve” tweets (a), “disapprove” tweets (b), “unclear” tweets (c). SHapley Additive exPlanations (SHAP) is a game theory method used to explain the output of any machine learning model (Lundberg & Lee, 2017). Fig. 14 sorted features by the sum of SHAP value magnitudes over “approve” samples, and used SHAP values to display the distribution of the impact each feature had on the RandomForestRegressor model output. The color represented the feature's value (red-high, blue-low), and features with negligible impact on the model output were omitted. Fig. 15, Fig. 16 were for “disapprove” and “unclear” samples, respectively.

Fig. 14

Results of RandomForestRegressor shown by SHAP based on tweets with “approve” attitude.

Fig. 15

Results of RandomForestRegressor shown by SHAP based on tweets with “disapprove” attitude.

Fig. 16

Results of RandomForestRegressor shown by SHAP based on tweets with “unclear” attitude (“query”, “unknown”).

Results of RandomForestRegressor shown by SHAP based on tweets with “approve” attitude. Results of RandomForestRegressor shown by SHAP based on tweets with “disapprove” attitude. Results of RandomForestRegressor shown by SHAP based on tweets with “unclear” attitude (“query”, “unknown”). In Fig. 14, for “approve” tweets, the number of fans of poster, text length, the number of adjectives, emotional intensity, emotional positive probability, the number of places, “domestic” topic-category, the number of exclamation marks, “self-media” identity of poster, had positive impact on PI. While the number of tweets the poster had posted, the average length of sentences, the number of adverbs, proportion of prepositions, the number of nouns, “traditional media” identity of poster, “foreign” topic-category had negative impact on PI. The number of professional terms, verbs and numeric, emotional fluctuation might have a positive or negative effect. In Fig. 15, for “disapprove” tweets, the number of fans of poster, proportion of prepositions, “conspiracy” topic-category, the number of first-person and sentences, emotional intensity, the number of verbs, emojis and images, “self-media” identity of poster had positive impact on PI. While the number of tweets poster had posted, the number of hashtags, the average length of sentences, the number of numeric, nouns and links had negative impact on PI. Text length, the number of places, emotional fluctuation might have a positive or negative effect. In Fig. 16, for “unclear” tweets, the number of fans and “self-media” identity of poster, the number of question marks, proportion of prepositions, the number of hashtags and adjectives, emotional intensity, the number of numeric had positive impact on PI. While the number of tweets poster had posted, average length of sentences, text length, the number of nouns, sentences, emotional fluctuation, the number of places, images, “medical_company” identity of poster, emotional positive probability had negative impact on PI.

Discussion

Popularity of information created by users when expressing different attitudes

First of all, there was more information supporting COVID-19 vaccination on Weibo than against vaccination, consistent with existing researches (Biasio et al., 2020; Jamison et al., 2020; Lazarus et al., 2021; Massey et al., 2020; J. Wang, Jing, et al., 2020), yet a few of users did not express a clear attitude (Elkin et al., 2020). Conspiracy theories were common in anti-vaccine tweets (Jamison et al., 2020). Different from Gandhi et al. (2020) and the active role of doctors in promoting children's immunization measures emphasized by Wheeler and Buttenheim (2013), although medical companies were committed to promoting COVID-19 vaccination, medical personnel were likely to induce immunization concerns. Consistent with Xu and Guo (2018), but inconsistent with Massey et al. (2020), we found that the overall popularity of anti-vaccine tweets was higher than pro-vaccine, but not significantly. Diverse conclusions might due to different types of vaccines and social media. In anti-vaccine tweets, vaccines' safety received widespread attention; in pro-vaccine tweets, vaccination precautions were widely disseminated (Massey et al., 2020). The most popular pro-vaccine tweets were created by traditional media, while the most popular anti-vaccine tweets were from non-medical companies. Massey et al. (2020)'s research among Americans found that tweets with the most likes, whether for or against vaccines, came from individuals, not media or institutions. This might be resulted from to national cultural differences between Chinese collectivism and American individualism (Huff & Kelley, 2003), or the urgency of epidemic that the government and media took higher participation in COVID-19 vaccine than HPV vaccine.

Characteristics of information created by users when expressing different attitudes

The information-feature networks of different attitudes not only highlighted the frequently used features, but also visualized the clustering of multi-dimensional features of tweets in the community subgraph. Massey et al. (2020) found that people tended to mention others for direct communication in anti-vaccine (HPV) tweets, and include location information in pro-vaccine tweets. But in our COVID-19 vaccine samples, tweets with different attitudes all rarely contained “@” or indicated geographic location. The existence of geo-tagging was a persuasive indicator of the transparency and credibility of the content-creators (Wirtz & Zimbres, 2018). On the one hand, it might be attributed to the COVID-19 vaccine being more novel than HPV vaccine, hence public lacked confidence in it. On the other hand, it might be that a COVID-19 vaccine was suitable for a wider population (HPV vaccine was mainly targeted at women), therefore users were more cautious when making tweets, to avoid making misleading remarks and exposing personal privacy. “V” users also created insufficiently informative tweets without a clear attitude (Rieh, 2002).

Features impact the popularity of information with different attitudes

Zhang et al. (2014) examined the impact of tweets' content and contextual features on the number of retweets and comments received on Weibo, emphasizing the significant impact of content features. Contrary to them, contextual factors outperformed content ones in explaining the variance in tweet popularity, suggesting that heuristic strategies dominated users' information processing, specifically about vaccine, compared with systematic strategies. High number of fans meant high exposure of authors' tweets, and high number of tweets meant high activity (Williams et al., 2015). The former had a positive impact on information's popularity, while the latter had a negative impact. Excessive posting might cause information recipients to doubt content's quality, and overloaded information might damage author's influence (Qiu et al., 2017). People with low education levels perceived poorly about COVID-19 (Labban et al., 2020), failed to recognize reasons behind medical recommendations and realize outcomes of their possible actions (Biasio et al., 2020), hence were not likely/definitely to get vaccinated (Khubchandani et al., 2021). Therefore, high readability was conducive to increasing information's popularity. This was contrary to the research of Gandhi et al. (2020) about general influenza vaccine, with whose knowledge public were more familiar. Pro-vaccine tweets expressing positive emotions were more popular (Xu & Guo, 2018), different from Ekram et al. (2019). But tweets questioning vaccines showing positive emotions might be refused. Regardless of the attitude or emotional polarity, strong emotional intensity was attractive (the number of exclamation marks and emotional intensity) (Gupta & Yang, 2019). However, emotion fluctuating largely reduced information's popularity. X. Wang, Jing, et al. (2020) claimed that adults preferred emotional roller coasters when reading books. Crisis might reduce public's emotional tolerance. Among pro-vaccine tweets, public paid more attention to the domestic status. The credibility of traditional media was questioned, public trusted self-media more. In anti-vaccine tweets, conspiracy theories went viral (Jamison et al., 2020). Vaccine providers did not popularize vaccine-knowledge effectively. Adjectives, emoticons and images were vivid and intuitive, and were conducive to information dissemination (Mode, 2020).

Theoretical contributions

Firstly, researches on vaccine hesitancy paid more attention to data obtained from questionnaire surveys, interviews or experimental methods, ignoring the impact of online contradictory information on people's psychology and behavior by competing for users' limited attention, lacking of objective authenticity. This research provided a brand-new perspective to interpret the scope and determinants of vaccine hesitancy by comparing contradictory information's popularity and its affecting factors. Secondly, during public health emergencies, the dissemination of disaster information and vaccine information interacted with each other. Few studies have investigated contradictory information in this complex context. This research made up for this gap. Thirdly, this research developed a uniformed framework to guide the process from constructing popularity index, extracting information characteristics, to exploring relationship between characteristics and popularity for different-attitude information, which could be applied into other fields of contradictory information, such as rumors and rumor rebuttals. Fourthly, we extracted features according to users’ systematic and heuristic information processing modes, and introduced Health Belief Models and Planned Behavior Theory to code topics and attitudes, enriching the application scope of theories and providing theoretical reference for future researches on feature extraction and topic mining in big data era. Finally, text-feature networks could visualize posting behavior rules of social media stakeholders, which could provide new forms of data resource for researches such as user behavior classification and prediction, construction of user portraits, and expand application modes and fields of social network theory.

Practical implications

Firstly, managers could quickly and accurately know citizens' willingness to get vaccinated on a large scale by evaluating vaccine-messages’ popularity on social media commonly used in specific country or community. Especially during public health emergencies, users concerning or questioning in tweets receiving widely attention due to lack of information clues. On the one hand, public health department should timely publish relatively consistent information to avoid information vacuum, cognitive defects and narrow biases. On the other hand, it's important to improve information's readability and information recipients' health literacy. Secondly, in targeted and tailored vaccine advocacy efforts, we must avoid one-size-fits-all strategies and instead consider posting patterns used by different stakeholders when discussing about different topics and expressing different opinions. Different patterns led to different-degree impact on information popularity. Public opinion departments should systematically monitor all posting-users’ tweet-feature networks and corresponding popularity among receivers in real time. So that they can timely discover the inflection point of public opinion evolution, and carry out risk aversion and traceability work by educating targeted users’ posting-behavior, to help pro-vaccine information dominate public opinion. Thirdly, we found that users' debates on controversial topics were accompanied by strong emotional conflicts and fluctuations. Some highly active and influential users as online opinion leaders, also expressed radical statements. We knew that these could exacerbate attitude polarization (Nan & Daily, 2015), push users into echo chamber, and cause invalidation of public opinion guidance (Asker & Dinas, 2019). Hence, we recommended that traditional media, who conveyed government's directives in China (Guo, 2020), should avoid too many personal emotions when reporting news, to improve its credibility. Meanwhile, self-media's influence should not be neglected. Besides, to refute conspiracy theories, on the one hand, countries should strengthen international vaccine mutual assistance, eliminating them from the source. Once spread, even if widely refuted, their exposure only increased (Majid & Pal, 2020). On the other hand, Internet managers should strengthen user posting restrictions, like adding real-name and location settings, reducing rumors, low-quality and repetitive information to avoid information distortion and overload (Soroya et al., 2021).

Limitations

Firstly, the same feature had different impact on popularity of tweets with different attitudes, such as: the number of hashtags had a negative impact on the popularity of tweets against vaccine, but had a positive impact for unclear attitudes. We should not only consider the number. Features such as length, hot or not, and semantic similarity between hashtags all mattered (Wang et al., 2016). Secondly, multi-dimensional features have been extracted, the interactive impact of which on information popularity could be analyzed in more detail later. Finally, our data limited to the early stage of vaccine promotion, data could be supplemented and sliced in more detail based on different stages of events to study the dynamic changes of information characteristics and their impact on information popularity.

Conclusions

This research firstly evaluated and compared the popularity of information expressing different attitudes towards COVID-19 vaccine or vaccination to reflect the vaccine hesitancy on social media. Then, it extracted the content and contextual features, visualized and compared their combining patterns frequently used in different-attitude information. Finally, it clarified the direction and degree of impact of features on information popularity. These findings could provide several suggestions for adjusting organizational strategies of contradictory information to reduce vaccine hesitancy.

Credit author statement

Dandan Wang: Conceptualization; Methodology; Software; Formal analysis; Data curation; Visualization; Writing – original draft; Writing – review & editing. Yadong Zhou: Software; Data curation; Visualization.

Funding

This work was supported by the (grant numbers 71661167007, 71420107026) and by the (grant number 2018YFC0806904-03).

35 in total

1. Parental vaccine concerns, information source, and choice of alternative immunization schedules.

Authors: Marissa Wheeler; Alison M Buttenheim
Journal: Hum Vaccin Immunother Date: 2013-07-30 Impact factor: 3.452

2. Biased assimilation and need for closure: examining the effects of mixed blogs on vaccine-related beliefs.

Authors: Xiaoli Nan; Kelly Daily
Journal: J Health Commun Date: 2015-03-09

3. Normative Mechanism of Rumor Dissemination on Twitter.

Authors: Hyegyu Lee; Hyun Jung Oh
Journal: Cyberpsychol Behav Soc Netw Date: 2017-02-21

4. Considering Emotion in COVID-19 Vaccine Communication: Addressing Vaccine Hesitancy and Fostering Vaccine Confidence.

Authors: Wen-Ying Sylvia Chou; Alexandra Budenz
Journal: Health Commun Date: 2020-10-30

5. Quality evaluation of HPV vaccine-related online messages in China: a cross-sectional study.

Authors: Wanzhou Wang; Jinliang Lyu; Mintao Li; Yunjing Zhang; Zhihu Xu; Yuanyuan Chen; Jiangjie Zhou; Shengfeng Wang
Journal: Hum Vaccin Immunother Date: 2020-10-15 Impact factor: 3.452

6. 'Should I vaccinate my child?' comparing the displayed stances of vaccine information retrieved from Google, Facebook and YouTube.

Authors: Lucy E Elkin; Susan R H Pullon; Maria H Stubbe
Journal: Vaccine Date: 2020-02-25 Impact factor: 3.641

7. Use of Deep Learning to Analyze Social Media Discussions About the Human Papillomavirus Vaccine.

Authors: Jingcheng Du; Chongliang Luo; Ross Shegog; Jiang Bian; Rachel M Cunningham; Julie A Boom; Gregory A Poland; Yong Chen; Cui Tao
Journal: JAMA Netw Open Date: 2020-11-02

8. Dimensions of Misinformation About the HPV Vaccine on Instagram: Content and Network Analysis of Social Media Characteristics.

Authors: Philip M Massey; Matthew D Kearney; Michael K Hauer; Preethi Selvan; Emmanuel Koku; Amy E Leader
Journal: J Med Internet Res Date: 2020-12-03 Impact factor: 5.428

9. Systematic Literature Review on the Spread of Health-related Misinformation on Social Media.

Authors: Yuxi Wang; Martin McKee; Aleksandra Torbica; David Stuckler
Journal: Soc Sci Med Date: 2019-09-18 Impact factor: 4.634

10. Assessing COVID-19 vaccine literacy: a preliminary online survey.

Authors: Luigi Roberto Biasio; Guglielmo Bonaccorsi; Chiara Lorini; Sergio Pecorelli
Journal: Hum Vaccin Immunother Date: 2020-10-29 Impact factor: 3.452

1 in total

1. Spatial-Temporal Pattern Evolution of Public Sentiment Responses to the COVID-19 Pandemic in Small Cities of China: A Case Study Based on Social Media Data Analysis.

Authors: Yuye Zhou; Jiangang Xu; Maosen Yin; Jun Zeng; Haolin Ming; Yiwen Wang
Journal: Int J Environ Res Public Health Date: 2022-09-08 Impact factor: 4.614

1 in total