Literature DB >> 35663909

Multiplicity and dynamics of social representations of the COVID-19 pandemic on Chinese social media from 2019 to 2020.

Anfan Chen¹, Jingwen Zhang², Wang Liao³, Chen Luo⁴, Cuihua Shen³, Bo Feng³.

Abstract

Documenting the emergent social representations of COVID-19 in public communication is necessary for critically reflecting on pandemic responses and providing guidance for global pandemic recovery policies and practices. This study documents the dynamics of changing social representations of the COVID-19 pandemic on one of the largest Chinese social media, Weibo, from December 2019 to April 2020. We draw on the social representation theory (SRT) and conceptualize topics and topic networks as a form of social representation. We analyzed a dataset of 40 million COVID-19 related posts from 9.7 million users (including the general public, opinion leaders, and organizations) using machine learning methods. We identified 12 topics and found an expansion in social representations of COVID-19 from a clinical and epidemiological perspective to a broader perspective that integrated personal illness experiences with economic and sociopolitical discourses. Discussions about COVID-19 science did not take a prominent position in the representations, suggesting a lack of effective science and risk communication. Further, we found the strongest association of social representations existed between the public and opinion leaders and the organizations' representations did not align much with the other two groups, suggesting a lack of organizations' influence in public representations of COVID-19 on social media in China.

Entities: Chemical

Keywords: COVID-19; Global pandemic recovery; Health communication; Machine learning; Social media; Social representation

Year: 2022 PMID： 35663909 PMCID： PMC9151658 DOI： 10.1016/j.ipm.2022.102990

Source DB: PubMed Journal: Inf Process Manag ISSN： 0306-4573 Impact factor: 7.466

Introduction

Over the past two decades, the world has experienced several major outbreaks of infectious diseases, including HIV/AIDS, SARS, Influenza A (H1N1), MERS, Ebola, Zika, and most recently, SARS-COV-2 or COVID-19. The processes for societies to make sense or form social representations, of such major public health crises vary widely depending on the historical, sociocultural, and communication contexts in which those public crises take place. Documenting the emergent social representations for crisis coping is necessary for critically reflecting on societal pandemic responses and for providing guidance for future policies and practices for global pandemic recovery. Despite the large amount of recent research that has documented COVID-19-related communication using social media data (Abd-Alrazaq et al., 2020; Park et al., 2020; Sarker et al., 2020; Shen et al., 2020; Zhao et al., 2020), the majority did not provide theoretical discussions illuminating higher-level dynamics in how multiple social groups collectively construct and influence the pandemic's social representations. In the current study, we draw on the social representation theory (SRT) (Moscovici, 1963; Wagner et al., 1999) and conceptualize the networks of COVID-19 discourse topics as a form of social representations generated in public discourses on social media. Furthermore, we examine how such representations evolve over time and explore how the representations from three different types of actors (the general public, opinion leaders, and organizations) either align with or diverge from each other. This study provides a critical understanding of how a society collectively shapes attention and constructs meanings of a new pandemic in the social media era. Understanding what representations are driving the discourses and what kind of actors are mediating the online space are of critical value for informing effective risk communication and intervention for ongoing local pandemic controls and global pandemic recovery. As the first country to discover and report COVID-19, China provides the earliest data on public engagement with the virus, its threat, and related sociopolitical contentions. By examining one of the most popular Chinese social media data on COVID-19, our research aims to suggest possible directions for global pandemic recovery in terms of what social representations of COVID-19 are vital to be communicated and who should and can contribute more voices to these processes. Drawing on lessons from the SARS outbreak of 2002, China has constructed a more efficient network of disease control and prevention systems. However, in the very initial outbreak stage, resources and planning from the central and strong governmental system for controlling pandemics and maintaining social stability might not be sufficient (Li, 2021; Nkengasong, 2020). Social media has afforded discussion of COVID-19 in real-time, making it one of the primary sources for the public to make sense of the pandemic and to cope with the unknown threat (Abdul-Baki et al., 2020). Moreover, the wide and intensive use of social media by the public, opinion leaders, and organizations in China has reflected different interests, experiences, and interpretations of the pandemic. Given these considerations, the main goal of this study is to document the dynamics of changing social representations of the COVID-19 pandemic in China during its initial outbreak phases (December 2019 to April 2020) and to critically reflect on the implied challenges for effective local and global pandemic recovery. We curated a dataset of more than 40 million posts from a variety of users (including the public, opinion leaders, and organizations) on Sina Weibo (hereafter as Weibo). By leveraging machine-learning-based content analyses and topic network analyses, we charted the transitions of social representations on this social media platform. Furthermore, we identified how the three types of actors interplayed with each other's representations and discussed the shifts in their relative power in shaping the pandemic discourses.

Literature review

Social representations of public health crises

The theoretical underpinning of this work is SRT, which posits that when society is faced with a new phenomenon, widely shared ideas about it may emerge spontaneously (Wagner et al., 1999). Here, social representation refers to the collective elaboration and construction of “a social object by the community for the purpose of behaving and communicating” (Moscovici, 1963, p. 251). In essence, social representations engage with understanding, defining, and prescribing the discourse structures of a new object, complete with its attributes (Moscovici, 2001). According to SRT, the sensemaking process is the driving force behind social representations, whereas communication is the central mechanism by which representations can be envisaged, negotiated, and consolidated (Wagner et al., 1999; Wagner, 2020). Infectious disease outbreaks are both public health and social crises and have been studied in terms of their social representations in contemporary media and public discourses (Joffe & Bettega, 2003; Mayor et al., 2013). Studies have indicated that when disease outbreaks originate domestically, social representations can shift to self-help and focus on government responses. For example, Ding (2009) documented how Chinese people relied on mobile phone calls, texts, and a limited number of internet forums to exchange local risk information and seek social support during the 2002 SARS epidemic. Beyond the conceptualization of public discourses as social representations, SRT offers additional insights into examining such discourses during crises. Moscovici (1963) argued that two phenomena, dispersion, and focalization, often precondition the emergence of social representations in unprecedented situations. First, when a crisis happens, diverse discourse topics from different groups may emerge and resolve over time, interacting to form new meanings and an alternate focus on the crisis (i.e., “dispersion”). Some studies have documented the diverse attention paid to different aspects of a crisis (Dredze et al., 2016; Hagen et al., 2018; Pascual-Ferrá et al., 2020). Second, as a crisis develops, the emergence of social representations starts to selectively focus on aspects preferred by social groups (i.e., “focalization”). For example, Washer (2004) examined the representations of SARS in British newspapers and concluded that although SARS was presented as a dangerous pandemic threat, it was simultaneously portrayed as a “‘contained’ threat because SARS was reported as being unlikely to affect the UK readers because the Chinese are so different from ‘us’” (p.2564). This illustrates that while multiple representations existed, the media chose to focus on constructing the representation of a ‘contained pandemic’ by othering the origin of the threat. Together, dispersion and focalization per SRT suggest the following patterns of online communication would emerge amid social crises. Primarily, multiple representations of the object will emerge, and their relative prominence will shift during the progression of the crises. Several recent studies have applied the topic modeling of social media data and revealed that the topics focused on specific keyword combinations (e.g., ‘home, stay, lockdown’ or ‘vaccine, cure, disease’), which were largely interpreted against the corresponding events and evolving pandemic policies (Schück et al., 2021). However, keyword combinations only suggest possible topics that represent public attention or opinion; more nuanced interpretations of the topics were lacking and relationships between topics (informed by the dynamic processes of dispersion and focalization across multiple social groups) have not been considered in the literature so far. In the current research, we analyze the very initial set of social media data generated in response to the COVID-19 pandemic to reveal emerging discourses and leveraged topic network analysis to discern the connectivity among different discourses that collectively construct the social reality of the pandemic.

Social representations as topic networks

We conceptualize social representations as discussion topic networks. This follows the tradition of using associative network analysis (de Rosa, 2002) in social representation research, which was developed to record people's spontaneous reactions and their relational structures when confronted with an object stimulus. Topic networks that illuminate connectivity among topics can inform higher-level interpretations of how specific topics are constructed in close relationships to form broader themes and meanings about the object (Chen et al, 2019; Chen & Zhang, 2021). For topic networks, we conceptualize nodes as distinct topics and ties as concurrent mentions of two or more topics in texts. Consequently, various topic nodes and their ties constitute a topic network. The social representations of COVID-19 are embodied in these topics (such as symptoms and illness experiences, preventive measures, policies, or conspiracy theories, etc.) and their co-occurrence on social media, reflecting a society's changing attention to different aspects of COVID-19. More importantly, such networks of discussion topics can reveal which topics are central and how different topics are interconnected with each other across time. Accordingly, our first research question asks the following: RQ1: In terms of COVID-19 discussion topic networks, what social representations were revealed on Weibo, and how did they change over different stages of the pandemic in China from 2019 to 2020?

Social media as a space for multiple social representations of crises

Previously, public health crisis communication was primarily viewed using a top-down approach, as the dissemination of information to the public about health risks and mitigation measures emanated directly from the “top,” such as government agencies and scientists (Benoit, 2014; Coombs, 2014). However, this approach has been challenged because social media has transformed the communication landscape, empowering the general public with greater influence in shaping public attention and constructing social representations of crises (Ding & Zhang, 2010). The latter present a bottom-up approach: users on social media actively engage in developing awareness, creating meanings, sharing opinions, and forging connections that can be mobilized as crises unfold (Austin et al., 2012; Graham et al., 2015; Lee et al., 2021; Zhao et al., 2018). Specifically, social media has been documented to facilitate real-time first-hand reporting of social crises. Within the context of pandemics, the self-reporting of symptoms and illness experiences often precede official accounts of infections and disease diagnoses (Shen et al., 2020). These bottom-up attempts also send a strong signal to the “top,” pushing the government for quick responses, both on pandemic control and for maintaining social stability (Zhang et al., 2020b). To address the intricate dynamics between the top-down and bottom-up influences, we considered three different actors on Weibo: the general public, opinion leaders, and organizations. Individual accounts comprise most social media users who are ordinary people, and the small portion of opinion leaders are individual accounts with a significantly larger number of followers due to their influences in various domains. Organizations, on the other hand, communicate as institutional accounts, including central and local government agencies, news media, business sectors, and nongovernmental organizations. Within a social media system, these three groups often hold different motivations, structural positions, credibility, and influences in online communication (Chen et al., 2019; Pan et al., 2018). For example, recent research shows opinion leaders and the general public have been strategically engaging with COVID-19 discussions (Chen et al., 2020b; Dai et al., 2021; Han et al., 2021), with opinion leaders often positioning themselves to leverage the general public's narratives to influence policy-making processes (Han & Wang, 2015; Luqiu et al., 2019). However, detailed insights on how the three actor groups collectively represented COVID-19 were lacking. Therefore, we need to examine how the three groups’ social representations of this crisis align with or diverge from each other, as asked in the second and third research questions: RQ2: What were the social representations of COVID-19 across the public, opinion leaders, and organizations on Weibo, and how did they change over the different stages of the pandemic in China from 2019 to 2020? RQ3: How did the social representations of COVID-19 from the three groups align with, diverge from, or influence each other?

The unfolding COVID-19 pandemic in Mainland China

Several crisis communication models have been proposed to guide the understanding of different crisis stages, including the three-stage model for crisis (Coombs, 2014) and the crisis and emergency risk communication (CERC) model with five stages (Reynolds & Seeger, 2005). These models in general follow a process of moving from the pre-crisis stage to the crisis stage, and to the post-crisis and resolution stage. They capture the natural trajectory of crisis development with different implications for sociopolitical reactions. Following the CERC model's five stages of pre-crisis, initial event, maintenance, resolution, and evaluation, we analyzed the important events and policy changes during the early pandemic in China (China's State Council Information Office, 2020). Based on that, we identified three distinct stages within our data frame: 1) the pre-crisis stage, 2) the initial and maintenance stage, and 3) the resolution and evaluation stage. The pre-crisis stage (Stage 1) started from December 1, 2019, when the first COVID patient was identified with symptoms (Huang et al., 2020), until January 22, 2020, the day before the Wuhan lockdown which escalated the local outbreak to a national crisis. This pre-crisis stage was marked by the highest uncertainty in the society. The initial and maintenance stage (Stage 2) was between January 23 and March 11, 2020. This stage started with the Wuhan lockdown, followed by a series of lockdowns and shutdowns of almost all social activities in cities and villages throughout China. This stage witnessed the most abruptions to society's normal operation and people's life. Because of the large-scale strict policies, China's outbreak was quickly contained. The resolution and evaluation stage (Stage 3) was between 12 March to April 30, 2020 (the endpoint of our observation and data collection). During this stage, the outbreak was gradually becoming under control, and lockdowns and social bans were eventually lifted. The society's operation was mostly back to normal.

Method

Weibo data collection

We collected COVID-19-related posts by leveraging a large Weibo user pool that tracked the posts of 250 million users (Hu et al., 2020). This pool accounted for 48.1% of all monthly active Weibo users in 2019 (Sina Weibo, 2020). Within this population, we further narrowed the pool to 20 million users that were active between Dec 1, 2019, and April 30, 2020. Active users were defined by the following criteria: (1) the number of followings, followers, and posts were all more than 50. These cutoff points were selected because accounts having numbers below 50 would be less likely to contribute much data on the COVID-19 discussion within our data collection period. This trimming process made near real-time data collection feasible; and (2) the latest post was posted within the past 30 days by the data collection date. This is the commonly accepted criteria of monthly active users who published one message during the last calendar month prior to the relevant date (Lien & Cao, 2014). Following the practices of previous studies (Chen et al., 2019; Chen et al., 2020a; Chen & Zhang, 2021; Shen et al., 2020; Zhang et al., 2020a), we used a comprehensive list of 179 keywords related to COVID-19 (for a complete list, see Supplementary Table S1) to retrieve all COVID-19 related posts from the published posts by 20 million active users. New posts of every user were retrieved within three hours. After removing duplicates, 41,935,863 posts generated by 9,667,743 users were retained for analysis.

Three groups of actors

Following the work of previous studies (Barberá et al., 2019; Chen et al., 2019; Dai et al., 2021; Han et al., 2021; Han & Wang, 2015; Luqiu et al., 2019; Pan et al., 2018), Weibo users in our study were categorized into three groups: the general public, opinion leaders, and organizations. Opinion leaders included those verified individual influential users, who have an orange badge on their homepage. Organizations consisted of verified government-affiliated accounts including different levels and departments of the Chinese government and their officials and verified news, business, and other non-governmental organizations (NGOs). These accounts have a blue badge on their homepage. The general public included the individual users who did not have any verification badge from Weibo. Among these various actors, the majority were the general public (n = 9,091,980, 94.04%), followed by opinion leaders (n = 482,765, 5.0%), and organizations (n = 92,998, 0.96%). Table 1 presents the descriptive information on these groups and their posts’ engagement metrics. Although opinion leaders only accounted for 5% of the total accounts, they generated 7.02% of the total posts, which then generated significantly larger quantities of comments, likes, and reposts, in comparison to the other two groups. Organizational accounts constituted less than 1% of all accounts but generated 3.47% of total posts. They had more followers on average and their posts had more comments, likes, and reposts than the general public.

Table 1

Descriptive statistics of the three Weibo actor groups and engagement metrics of their posts.

	General public	Opinion leaders	Organizations
Account, n (%)	9,091,980 (94.04)	482,765 (5.00)	92,998 (0.96)
Posts in the dataset, n (%)	37,538,758 (89.51%)	2,943,815 (7.02%)	1,453,290 (3.47%)
User's historical posts, mean (sd)	1,755.56 (4207.52)	5,178.75 (11716.98)	5,585.91 (14874.10)
Followers, mean (sd)	1,258.24 (57,740.13)	147,554.96 (1,233,268.11)	243,786 (2,347,561.22)
Followings, mean (sd)	447.71 (492.62)	570.21 (859.31)	442.06 (698.79)
Comments, mean (sd)	0.52 (169.78)	33.31 (3,213.71)	5.42 (592.30)
Likes, mean (sd)	1.5 (632.93)	104.87 (6,643.84)	40.58 (5,116.55)
Reposts, mean (sd)	0.86 (350.50)	53.61 (4,599.64)	2.4 (1,796.31)
Comments, n (%)	19,258,599 (19.02)	75,930,250 (74.94)	6,090,404 (6.01)
Likes, n (%)	55,896,780 (16.41)	239,051,396 (70.19)	45,617,629 (13.40)
Reposts, n (%)	32,257,389 (19.15)	122,207,818 (72.57)	13,940,884 (8.28)

Descriptive statistics of the three Weibo actor groups and engagement metrics of their posts.

COVID-19 topic coding

To analyze COVID-19 topics from this dataset, we first conducted a systematic content-coding and developed a coding frame that involved three stages. First, pilot content coding using semi-open coding was performed to identify topics. After consulting previous literature regarding pandemic discussions on social media (Glowacki et al., 2016; Seltzer et al., 2017), as well as recent studies on online discourses pertaining to COVID-19 (Pian et al., 2021; Han et al., 2021; Prati et al., 2021), one author and two research assistants developed a preliminary codebook with a set of 22 topics. These included COVID-19 updates, community-level disease control and prevention, personal protective equipment, and government policy, etc. Second, we randomly selected 10,000 posts from the dataset and trained three research assistants to validate the preliminary codebook. Following the rules of open coding (Chen et al., 2019), and after several rounds of back-and-forth discussions, the preliminary set of topics was merged and/or modified until the assistants agreed that the topics were comprehensive. This process resulted in a revised codebook with 12 topics. Third, we randomly sampled 5,000 new posts and each of the three assistants coded all of the posts independently, applying the revised codebook. The average inter-coder agreement was 0.876, and no new topic was identified. Finally, we slightly revised the names of the 12 topics and the three coders coded another 5000 posts independently, with a high average inter-coder reliability of 0.920. As a result of this coding, 12 topics were identified (see Supplementary Table S2 for topic definition and post examples), including: (1) conspiracy theory: narratives promoting conspiracy ideation; (2) fact-checking and correction: refutation and fact-checking of rumors or misinformation including conspiracies; (3) help seeking: seeking help from others and/or organizations; (4) individual narratives: personal narratives documenting daily experiences and emotions; (5) policies and prevention measures: pandemic control policies and disease and outbreak prevention measures; (6) pseudo-science and misinformation: rumors, pseudoscience, and misinformation about the virus and disease outbreak; (7) reports of others’ symptoms: contents that directly specify other people's symptoms and/or illness experiences; (8) reports of own and immediate family's symptoms: contents that directly report one's own or immediate family members’ symptoms and/or illness experiences; (9) science on COVID-19: information on science and medical discussions regarding COVID-19; (10) social and economic impacts: the broader social and economic impacts of pandemic; (11) updates on the pandemic in China: updates and reports of pandemic situations in China; (12) updates on the pandemic around the world: updates and reports of pandemic situations in other countries.

Machine-learning approach to content analyze all posts

We used supervised machine learning algorithms to identify the appearance of these 12 topics in each collected post, with the following steps: First, six research assistants independently coded the same set of 1,000 randomly selected posts, to determine the presence of each of the 12 topics in every post. Intercoder reliability was satisfactory, with Krippendorff's alpha ranging from 0.79 to 0.93 for the 12 topics.1 Then, the six research assistants coded another randomly selected 80,000 posts (each research assistant independently coded approximately 13,000 posts). Next, with the 80,000 coded posts as the ground truth, five machine learning algorithms were deployed to identify the 12 topics presented in the remaining set of approximately 40 million posts. The algorithms include Logistic Regression, Linear Discriminant Analysis, Gradient Boosting, Random Forest, and Extra Trees. An optimal predictive model was selected for each topic, based on the relative predictive performance evaluations (i.e., precision and F1 score) (see Supplementary Table S3). Finally, 12 optimal models determined whether a post contained one or more of the 12 topics, separately.

Topic network analysis

Data from the topic coding were transformed into matrices to reflect any associations or network ties between the twelve topics. Each matrix contained 12 rows and 12 columns, with the entry in each cell representing the frequency of co-appearance of two topics. This approach was repeated to create matrices to represent the topic networks for the three different actor groups and across the three pandemic stages. The posts during Stage 1, Stage 2, and Stage 3 were separated into three sub-datasets within each actor group. For each group, three matrices of topic networks were then created, resulting in a total of nine matrices (3 actor groups × 3 stages) (see Supplementary Table S4 for an example). To address RQ1, we employed network visualization of the topic networks to illustrate the social representations of COVID-19. Then, we calculated the normalized degree centrality for each topic at the three pandemic stages. This allowed us to identify how the overall representations changed over the three different stages. Here, the degree centrality refers to a given topic's number of ties with other topics. The normalized degree centrality standardizes the degree centrality by their weights in the network (Cheng & Chan, 2015; Guo & McCombs, 2011). To address RQ2, we analyzed the topic networks for each of the actor groups. For RQ3, we first used the quadratic assignment procedure (QAP), which is commonly used in social network analysis to examine correlations between two networks (Simpson, 2001). We examined the QAP correlations among three actor groups’ topic networks. We employed a multiple regression quadratic assignment procedure (MRQAP) to examine how each group's topic network influenced the other group's topic network over different pandemic stages while controlling for the other group's previous stage's network. MRQAP allows researchers to evaluate the unique effect of one independent network on the dependent network while partializing the effects of other networks (Dekker et al., 2007).

Results

COVID-19 social representations over three pandemic stages

Fig. 1 presents the appearance of the 12 topics in all posts across the three stages, with Panel A depicting the three topics that monotonically increased over stages, Panel B depicting the four topics that decreased, and panel C with the five topics that fluctuated over time. Overall, reports of others’ symptoms dominated the posts, with 71.75% in Stage 1, 66.31% in Stage 2, and 72.83% in Stage 3. The second most prominent topic was reports of own and immediate family's symptoms, with 24.18%, 39.61%, and 37.77% mentioning across the three stages. It is worth noting that even though pseudo-science and misinformation suffered a decrease from Stage 1 (20.05%) to Stage 3 (7.49%), conspiracy theory, which is a specific form of misinformation, exhibited a sharp increase from Stage 1(2.42%) to Stage 3 (9.48%). This corresponds to recently published research that revealed an increase in conspiracy posts on Weibo from January to April 2020 (Chen et al., 2020).

Fig. 1

Frequency of 12 COVID-19 topics identified from Weibo over the three pandemic stages.

Note. Panel A includes the three topics that monotonically increased over stages. Panel B includes the four topics that decreased. Panel C includes the five topics that fluctuated over time.

Frequency of 12 COVID-19 topics identified from Weibo over the three pandemic stages. Note. Panel A includes the three topics that monotonically increased over stages. Panel B includes the four topics that decreased. Panel C includes the five topics that fluctuated over time. Topic networks are depicted across the three stages in Fig. 2 . We expected to find the normalized degree centrality scores for the 12 topics would change over time, as depicted in Table 2 (As a robustness check, we analyzed eigenvector centrality and the results were consistent, as shown in Supplementary Table S5). Consistent with the patterns in Figure 1, topics including reports of others’ symptoms, reports of own and immediate family's symptoms, policies and prevention measures, individual narratives, and help seeking were the most central topics in the networks. More importantly, it is worth noting that three topics had increasing centrality from Stage 1 to Stage 3, including conspiracy theory, social and economic impacts, and reports of own and immediate family's symptoms. By contrast, topics that exhibited decreasing centrality scores centered on sciences about COVID, pseudo-science and misinformation, fact-checking and correction, updates on the epidemic around the world, and policies and preventive measures.

Fig. 2

COVID-19 topic networks identified from Weibo across the three pandemic stages.

Table 2

Normalized degree centrality for the 12 COVID-19 topics across three pandemic stages.

Topics	Normalized degree centrality, % (Ranking)
Topics	Stage 1	Stage 2	Stage 3
Topics with increasing centrality
Reports of own and immediate family's symptoms	18.88 (5)	21.53 (2)	22.33 (2)
Social and economic impacts	3.95 (10)	6.69 (7)	13.05 (5)
Conspiracy theory	1.79 (12)	2.34 (11)	5.31 (8)
Topics with decreasing centrality
Policies and prevention measures	22.98 (2)	15.87 (5)	12.56 (6)
Pseudo-science and misinformation	13.52 (6)	7.29 (6)	5.78 (7)
Science on COVID-19	7.69 (7)	4.04 (10)	3.83 (10)
Fact-checking and correction	7.31 (8)	4.72 (8)	3.69 (11)
Updates on the pandemic around the world	2.62 (11)	1.73 (12)	3.42 (12)
Topics without changing centrality
Reports of others’ symptoms	47.18 (1)	40.83(1)	39.71 (1)
Individual narratives	20.32 (3)	21.20 (3)	19.61 (3)
Help seeking	19.83 (4)	20.26 (4)	13.57 (4)
Updates on the pandemic in China	6.98 (9)	4.48 (9)	4.07 (9)

Note. % refers to the normalized degree centrality of each COVID-19 topic in the pandemic stage. R refers to the rank order of the corresponding centrality in the pandemic stage. ↑ indicates that the topic rank order in a stage increased compared to the previous stage, while ↓ indicates that the topic rank order decreased compared to the previous stage.

COVID-19 topic networks identified from Weibo across the three pandemic stages. Normalized degree centrality for the 12 COVID-19 topics across three pandemic stages. Note. % refers to the normalized degree centrality of each COVID-19 topic in the pandemic stage. R refers to the rank order of the corresponding centrality in the pandemic stage. ↑ indicates that the topic rank order in a stage increased compared to the previous stage, while ↓ indicates that the topic rank order decreased compared to the previous stage. Note. Each node represents a topic, while an edge connecting two nodes means that the two topics appear together in a post. The thickness of the edges represents the co-occurrence frequency of the two topics, with thicker edges representing a greater chance of co-occurrence between the two topics. The size of the node indicates the topic's degree of centrality in the network.

COVID-19 social representations across the three actor groups

To answer RQ2, QAP was applied to measure intra-stage correlations between the topic networks of different actor groups. Fig. 3 indicates that the intra-stage correlation coefficients between the public and opinion leaders were constantly the strongest across all three stages (Stage1: r = .95, p < .001; Stage 2: r = .98, p < .001; Stage 3: r = .98, p < .01). By comparison, correlations between opinion leaders and organizations (Stage 1: r = .83, p < .001; Stage 2: r = .73, p < .01; Stage 3: r = .62, p < .01), and between the public and organizations (Stage 1: r = .86, p < .001; Stage 2: r = .71, p < .01; Stage 3: r = .59, p < .01) were relatively weaker (see Supplementary Table S6 for details of correlations and CIs). The results also revealed that topic network associations between the public and opinion leaders became increasingly stronger between stage one and stage three, however, the associations became increasingly weaker for both the opinion leaders-organizations pair and the general public-organizations pair. This suggests that the association of social representations between the public and opinion leaders is much stronger than the relationship between the public and organizations, and between opinion leaders and organizations.

Fig. 3

Correlations of topic networks among three different actor groups of Weibo users across the three pandemic stages.

Note. All coefficients displayed are statistically significant. Following previous practices (Cho et al., 2005), since the sampling distribution of Pearson r was not normally distributed, r was converted to Fisher's z according to the r to z transformation formula [z = 0.5 log[(1 + r)/(1 - r)] for computing the confidence intervals of the given correlation values. The values of Fisher's z in the confidence interval were then converted back to Pearson's r using the equation r = [(e2z-1)/(e2z +1)]. If the confidence intervals of the different correlation values overlapped, there was no significant difference between them.

Correlations of topic networks among three different actor groups of Weibo users across the three pandemic stages. Note. All coefficients displayed are statistically significant. Following previous practices (Cho et al., 2005), since the sampling distribution of Pearson r was not normally distributed, r was converted to Fisher's z according to the r to z transformation formula [z = 0.5 log[(1 + r)/(1 - r)] for computing the confidence intervals of the given correlation values. The values of Fisher's z in the confidence interval were then converted back to Pearson's r using the equation r = [(e2z-1)/(e2z +1)]. If the confidence intervals of the different correlation values overlapped, there was no significant difference between them. Fig. 4 visually illustrates how the three actor groups represented different topics to characterize the COVID-19 discussions. Each node represents a topic, while an edge connecting two nodes means that the two topics appear together in a post. The thickness of the edges represents the co-occurrence frequency of the two topics, with thicker edges representing a greater chance of co-occurrence between the two topics. As shown in Fig. 4, the structures of topic networks among the public (4A) and those among opinion leaders (4B) were very similar, indicating that their topic networks were highly correlated across all stages. However, the topic networks among organizations (4C) shared less similarity with both. We also analyzed each topic's normalized degree centrality and their ranks in the networks for the general public, opinion leaders, and organizations across the three stages (see Supplementary Table S7).

Fig. 4

COVID-19 topic networks identified from Weibo across the three Weibo actor groups.

Note. Each node represents a topic, while an edge connecting two nodes means that the two topics appear together in a post. The thickness of the edges represents the co-occurrence frequency of the two topics, with thicker edges representing a greater chance of co-occurrence between the two topics. The size of the node indicates the topic's degree of centrality in the network.

COVID-19 topic networks identified from Weibo across the three Weibo actor groups. Note. Each node represents a topic, while an edge connecting two nodes means that the two topics appear together in a post. The thickness of the edges represents the co-occurrence frequency of the two topics, with thicker edges representing a greater chance of co-occurrence between the two topics. The size of the node indicates the topic's degree of centrality in the network.

Influences among the three actor groups’ on COVID-19 social representations

To answer RQ3, MRQAP was used to test the influence of one actor group on another regarding COVID-19 social representations across the three stages of the pandemic. Six regressions were conducted such that the topic network of a given actor group at a given stage was regressed on the three actor groups’ networks in the previous stage. Table 3 summarizes the MRQAP results. First, focusing on who influenced the topic networks of the general public, we found that organizations’ topic network in Stage 1 negatively predicted the public's network in Stage 2 (β = −0.40, p < 0.001), suggesting the general public was diverging from the organizations’ social presentations. From Stage 2 to Stage 3, a non-significant relation between organizations and the general public was observed (β = −0.01, p > 0.05). Further, the topic network of opinion leaders had a non-significant influence on the public (β = −0.07, p > 0.05). Second, when focusing on who influences opinion leaders, we found that the topic network of the public in Stage 1 had a large significant influence on the topic network of opinion leaders in Stage 2 (β = 0.72, p < 0.001), while organizations had a significant negative influence (β = −0.39, p < 0.01). This suggests that the general public played a role in leading the opinion leaders’ topic networks and that opinion leaders acted similarly to diverge from organizations. Finally, when looking at the organizations, we found that no other actor group had any significant relation with organizations.

Table 3

Coefficients from MRQAP analyses on the topic networks of the three actor groups over the three pandemic stages.

		B	β	Adjusted R²
Stage 2 topic network of the general public(Model 1)	Stage 1 - General public	16.91***	1.35***	0.93***
	Stage 1 - Opinion leaders	-15.62	-0.07
	Stage 1 - Organizations	-949.86***	-0.40***
Stage 3 topic network of the general public(Model 2)	Stage 2 - General public	0.44**	0.72**	0.90***
	Stage 2 -Opinion leaders	2.65	0.23
	Stage 2 - Organizations	-0.17	-0.01
Stage 2 topic network of opinion leaders(Model 3)	Stage 1 - General public	0.48***	0.72***	0.92***
	Stage 1 -Opinion leaders	6.44**	0.56**
	Stage 1 - Organizations	-48.89**	-0.39**
Stage 3 topic network of opinion leaders(Model 4)	Stage 2 - General public	0.01	0.12	0.88***
	Stage 2 - Opinion leaders	0.74*	0.80*
	Stage 2 - Organizations	0.09	0.02
Stage 2 topic network of organizations(Model 5)	Stage 1 - General public	0.01	0.01	0.93***
	Stage 1 - Opinion leaders	0.65	0.24
	Stage 1 - Organizations	18.28**	0.61**
Stage 3 topic network of organizations(Model 6)	Stage 2 - General public	-0.01	-0.19	0.97***
	Stage 2 - Opinion leaders	-0.02	-0.04
	Stage 2 - Organizations	2.43***	1.13***

Note. * p < 0.05, ** p< 0.01, *** p < 0.001. MRQAP = multiple regression quadratic assignment procedure. The first column lists the dependent variable, the topic network of an actor group at a later stage, and the second column lists the independent variables, the topic network of an actor group at a prior stage.

Coefficients from MRQAP analyses on the topic networks of the three actor groups over the three pandemic stages. Note. * p < 0.05, ** p< 0.01, *** p < 0.001. MRQAP = multiple regression quadratic assignment procedure. The first column lists the dependent variable, the topic network of an actor group at a later stage, and the second column lists the independent variables, the topic network of an actor group at a prior stage.

Discussion

Representations of COVID-19 expanded from health crisis to socio-political crisis

Based on the network representations of the 12 topics, we discuss the following patterns of emerging social representations of COVID-19. First, in Stage 1, with a lack of both scientific and public health understanding of the coronavirus, social media posts related to COVID-19 centered on describing and reporting symptoms and illness experiences. Hence, the social representation was primarily constructed through clinical and epidemiological lenses based on people's first-hand experiences (Han et al., 2020; Liu et al., 2020; Zhao et al., 2020). Given that COVID-19 was unknown, the increased focus on documenting symptoms and illnesses, coupled with help-seeking and the expressions of personal experiences, might have heightened fear of the unknown and increased anxiety by not knowing what to expect. It is also important to point out that the reports of others’ symptoms doubled the reports of one's own and immediate family's symptoms. With regard to infectious disease, the differential reporting of symptoms from others versus from one's own or immediate family could have been caused by inherent social stigmas attached to infectious diseases (Abdelhafiz & Alorabi, 2020; Bhanot et al., 2021; Islam et al., 2021) and collective social action of publicly identifying the ‘infected others’. Further, this public identification could be motivated by the desire to seek help for others (which was a related topic in our analysis) or by protecting the self and community through calling for social isolation. In the lens of SRT, these observations reflect social representations’ functions in terms of cognitive coping, social sharing of emotional experiences, and community empowerment. Stage 2 witnessed the most stringent pandemic controls. Although the predominant topics in Stage 1 still held central positions in Stage 2, we observed an increasing prominence of reports of own and immediate family's symptoms, social and economic impacts, and conspiracy theory. Specifically, the increase in conspiracy theory not only echoed the increasing politicization of science (Bolsen & Palm, 2021; Chu et al., 2021; Hart et al., 2020), it was also driven by heightened China-United States tensions during the pandemic (Chen et al., 2020a). We found that most conspiracy posts were posted by the general public (N = 2,114,753, 92.89%), followed by opinion leaders (N = 125,723, 5.52%) and the organizational accounts (N = 36,263, 1.59%). Within each group, conspiracy posts consisted of 5.63% of the public's posts, 4.27% of opinion leaders’ posts, and only 2.50% of organizations’ posts. While certain conspiracy arguments were promoted by the governments and media (Thacker, 2021), our data indicate the main force of sharing conspiracy posts on social media was still the public users, and within all actor groups, conspiracy posts consisted of less than 6% of each group's posts (see Supplementary Fig. 1 for temporal distribution of conspiracy posts). Stage 3 was similar to Stage 2 in terms of topic networks, and the topics arising in Stage 2 continued increasing their centralities in Stage 3. As the virus spread and infected more people, the social representations of COVID-19 started to concern more about the pandemic's impacts on one's own life and expanded to broader socio-economic and political domains. As SRT suggests, these shifts in social representations function to construct explanations of what had happened and what could happen next, for collective coping and action. The underlying sociopolitical infrastructures, including both domestic social tensions and international geopolitical tensions exacerbated during the pandemic may have contributed to the findings here. In contrast, the science of COVID-19 and its related topics (Caulfield et al., 2021; O'Connor et al., 2021; Scheufele et al., 2021; Schück et al., 2021), such as pseudo-science and misinformation or fact-checking and correction were not central to COVID-19 representations. During the analysis period, the topic of science on COVID-19 constituted around 10% of the total data (stage 1: 11.32%, stage 2: 6.34%, stage 3: 6.78%). In contrast, pseudoscience and misinformation started with a much higher percentage of 20.05% and went down to 7.49%. Relatedly, conspiracy theory started at 2.42% and increased to 9.48%. From a science and crisis communication perspective, this observation is worrisome and suggests the scientific representations of COVID-19 were unclear and contentious on social media, which could have downstream negative impacts on public understanding of the evolving science related to COVID-19. However, we also want to point out that these topics did not take on central positions in the representation networks. Scientific discourses on contagious diseases are not purely biomedical and clinical but involve a great degree of social, economic, and cultural factors. As such, the boundary between the reified scientific and the consensual universes discussed in SRT (Moscovici & Hewstone, 1983) was not yet clear from the beginning and could essentially become less and less clear, unless a strong force comes in to restructure the distinction and prioritize the biomedical and public health sciences of the pandemic. Social groups are likely to develop their own interpretations of unfamiliar or threatening phenomena via communication. This may explain why not just in China but also on a global scale, that misinformation and conspiracies are generated in real-time or even before scientific information is communicated. To some extent, the nature of social media in terms of decentralized channels and metric-based rewarding mechanisms may have paved the way for multi universes that are simultaneously constructed without a clearly articulated ‘objective reality’ for all.

Different representations from the public, opinion leaders, and organizations

Opinion leaders held the most discursive power on Weibo given their follower bases and the magnitude of post engagement. In combination with our findings showing that the opinion leaders’ and the public's representations were aligned more closely, we think this suggests a strong role of opinion leaders in magnifying the public's social representations. Ordinary users who constitute the public were not strategically connected or organized on social media. However, when social crisis happened, an enormous amount of shared experiences created a common ground for discussing and constructing the representations. On the contrary, organizational entities (including government and media)’s social representation processes might be reactive and relatively weak. The misalignment of COVID-19 topic networks across the groups reflects the dispersion in social representation processes when crises emerge, whereas the observed assimilation and influences on topic networks might reflect some focalization of COVID-19 representations in different groups. The results demonstrated a tight coupling between the public and opinion leaders and a clear dissimilarity between opinion leaders and organizations, and between the public and organizations during the pandemic. Organizational actors accounted for only 1% of all users, and their COVID-19 representations differed from the other two groups across the pandemic stages. This observation indicates that in representing COVID-19, the organizational actors created and maintained a focused agenda on discussing certain aspects of the pandemic, such as policies and socioeconomic impacts. Given that these organizational actors included government agencies, media, business sectors, and NGOs (which are all governed and influenced by the central government to different extents), it is expected that their social media posts did not engage significantly with personal disclosures or narratives. However, the findings that organizational actors negatively predicted the public and opinion leaders from Stage 1 to Stage 2 suggested that the two types of individual actors strategically diverted from the organizational actors’ agenda. This tension has been discussed in previous research examining how China handled previous pandemics (such as SARS and H1N1). Ineffective risk communication from the government was explained by a strategic approach that prioritized social and political strategies over transparency in informing the public about the outbreak, which eroded public trust in risk communication from the government (Wishnick, 2010; Zhang et al., 2020b). We operationalized opinion leaders by leveraging Weibo's V badges. V-users could be viewed as a de facto platform–assigned opinion leaders in the Chinese social media sphere due to their badges signaling influencer status. Prior research has indicated that opinion leaders would influence the public or gain public trust because they acted as information aggregators (or sharers) (Luqiu et al., 2019), or sometimes emerged as whistleblowers (or dissidents) who resisted the government's agenda (Zhang et al., 2020b). Interestingly, we found opinion leaders did not predict the public's representation. On the contrary, the public's representation predicted opinion leaders’ representation from Stage 1 to 2. This suggests that for unprecedented emerging health crises, the roles of opinion leaders can be very limited (particularly during the early stages) because there is not much information available to share or discuss. The crucial set of information emanates from the people who directly experienced the symptoms. Accordingly, it could be the case that opinion leaders closely followed what the public was expressing and used their roles to magnify their voices. This observation was also supported by a supplementary analysis where we checked on to what extent opinion leaders cited 1,358 official media accounts (including China Daily, Xinhua Net, CCTV, etc.) (Cyberspace Administration of China, 2021; Shen, 2021) in their posts, which can indicate whether they followed official discourses. We found there were significant news media citation differences across three groups. As shown in Supplementary Table S8 and Table S9, the citation patterns of the public and opinion leaders were similar whereas organizations were much more likely to cite news media. This indicates that opinion leaders were not relying more on official discourses. Organizations (which already included news media), as expected, were more likely to rely on and amplify media voices and official discourses about COVID-19. Organizations were more likely to cite news media across all 12 topics. Specifically, the percentages of citing news media were largest for topics of conspiracy theory and updates on the pandemic in China and around the world. This suggests again that organizations focused more on the broader sociopolitical and global aspects of the pandemic, whereas the public and opinion leaders focused on more personally and locally relevant issues that oftentimes reflect healthcare capacities and pandemic control policies. The different patterns of constructing COVID-19 representations and promoting official media discourses shed light on the power dynamics across the three groups, suggesting organizations on social media did not obtain more power than other entities in shaping the discourses. However, more research that delves into different types of organizations’ roles is needed. Within the category of organizations, official media accounts would differ from non-government organizations in their relations with the public and opinion leaders. As SRT explains, such differences are generated by various socio-structural factors, and our findings may be partly explained by the structural and communicative factors conditioned by social media. The understanding of social representation processes thus can be enriched by examining how various groups gradually influence each other on social media. Beyond looking at the topic networks, future research can also construct the user networks and apply network theories such as structural hole theory to unveil and discuss the actual communication structures across the different user groups.

Implications for global recovery

A few major observations from our findings have important implications for global pandemic recovery. First, monitoring social media's reporting on self and others’ symptoms and illness experiences needs to be incorporated into both local and global pandemic surveillance systems. As people start to first understand and relate to the pandemic through clinical symptoms, public health efforts need to respond to people's reports with scientific explanations, actionable prevention suggestions, and medical resource preparation. Second, scientific discussions on COVID-19 need to take a more prominent place on social media to lead to science and evidence-based pandemic responses. Otherwise, the rise of misinformation and politically incentivized conspiracies can have a detrimental impact on global collaborations for global recovery. Third, while the public and opinion leaders can leverage social media to organize community help, local and global organizations need to respond to the public's interests and work with the public and opinion leaders to develop response strategies that reflect all sectors’ interests. Ideally, the efforts are in directions to amplify scientific and evidence-based strategies for better disease control, resilience building, and pandemic recovery.

Limitations

Several limitations of the study should be noted when interpreting the data and results. First, we only investigated one social media platform, which may not represent other dynamics of social representation processes happening in other online or offline settings. Second, while we used supervised machine learning methods to categorize the topic landscape of COVID-19 discussions, we did not involve other dimensions of the data, such as sentiments, specific emotions, or metaphors, which were analyzed in prior research (Dai et al., 2021; Semino, 2021). Future research could extend computational approaches and employ qualitative and critical analyses to delve deeper into the representations of COVID-19 on other linguistic and rhetorical dimensions. For instance, beyond analyzing different groups’ topic networks, more qualitative comparisons of their respective framings of certain topics are worthy of future research. Third, we adopted Weibo's basic categorization of account types to delineate the public, opinion leaders, and organizations. Although this approach could be readily replicated in future research, we were not able to discuss the different subtypes of users in detail. For instance, scientists, medical professionals, government agencies at different levels (central health government agency vs. local police department), and non-governmental organizations have different priorities and perspectives on COVID-19 responses. Within these subgroups, more nuanced social representations such as social and scientific bases for different pandemic policy recommendations can be further examined. Forth, our data frame and the distinction between the three stages made it hard to account for other exogenous factors (e.g., specific events or historical trends) that could have influenced the correlations among the three groups’ topic networks. Future research can expand the data frame to include more time periods to account for other social and structural factors’ influences. Finally, our data only focused on the first six months of the COVID-19 pandemic, and we did not know how much data was later censored on the platform. Retrospectively, through a random sampling method, we selected 300,000 posts from the dataset and revisited the post URLs in April 2022. We identified that 1.29% of posts were later deleted. Future research can examine the censorship patterns across topics and user groups to discuss how social representations can be influenced by sociopolitical or platform interferences.

Conclusions

Documenting the emergent social representations in public perceptions is necessary to enable a critical reflection of pandemic responses and to provide guidance for local and global pandemic recovery. Our analyses of 40 million Chinese Weibo posts highlight the expansion in social representations of COVID-19 from a clinical and epidemiological perspective to a broader perspective that integrates personal infection and illness experiences with economic and sociopolitical discourses. From a pandemic response perspective, our findings suggest that when coping with unprecedented public health crises, governments, organizational actors, and opinion leaders should listen and respond to the public's initial reports and perspectives from the beginning. Further, they should leverage social media to construct and communicate more effective representations of COVID-19, hopefully emphasizing science communication and paving ways for science and evidence-based policies.

CRediT authorship contribution statement

Anfan Chen: Data curation, Methodology, Formal analysis, Writing – original draft, Visualization, Writing – review & editing, Funding acquisition, Project administration. Jingwen Zhang: Conceptualization, Supervision, Writing – original draft, Writing – review & editing, Project administration. Wang Liao: Conceptualization, Methodology, Writing – review & editing. Chen Luo: Methodology, Formal analysis, Writing – review & editing. Cuihua Shen: Conceptualization, Methodology, Writing – review & editing. Bo Feng: Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

36 in total

1. Crisis and emergency risk communication as an integrative model.

Authors: Barbara Reynolds; Matthew W Seeger
Journal: J Health Commun Date: 2005 Jan-Feb

2. Dynamic social representations of the 2009 H1N1 pandemic: Shifting patterns of sense-making and blame.

Authors: Eric Mayor; Véronique Eicher; Adrian Bangerter; Ingrid Gilles; Alain Clémence; Eva G T Green
Journal: Public Underst Sci Date: 2012-05-22

3. Misinformation about science in the public sphere.

Authors: Dietram A Scheufele; Andrew J Hoffman; Liz Neeley; Czerne M Reid
Journal: Proc Natl Acad Sci U S A Date: 2021-04-13 Impact factor: 11.205

4. Identifying the public's concerns and the Centers for Disease Control and Prevention's reactions during a health crisis: An analysis of a Zika live Twitter chat.

Authors: Elizabeth M Glowacki; Allison J Lazard; Gary B Wilcox; Michael Mackert; Jay M Bernhardt
Journal: Am J Infect Control Date: 2016-08-17 Impact factor: 2.918

5. Sensitivity of MRQAP Tests to Collinearity and Autocorrelation Conditions.

Authors: David Dekker; David Krackhardt; Tom A B Snijders
Journal: Psychometrika Date: 2007-08-07 Impact factor: 2.500

6. Health Communication Through News Media During the Early Stage of the COVID-19 Outbreak in China: Digital Topic Modeling Approach.

Authors: Zequan Zheng; Jiabin Zheng; Qian Liu; Qiuyi Chen; Guan Liu; Sihan Chen; Bojia Chu; Hongyu Zhu; Babatunde Akinwunmi; Jian Huang; Casper J P Zhang; Wai-Kit Ming
Journal: J Med Internet Res Date: 2020-04-28 Impact factor: 5.428

7. Social Network Analysis of COVID-19 Public Discourse on Twitter: Implications for Risk Communication.

Authors: Paola Pascual-Ferrá; Neil Alperstein; Daniel J Barnett
Journal: Disaster Med Public Health Prep Date: 2020-09-10 Impact factor: 5.556

8. Representations of SARS in the British newspapers.

Authors: Peter Washer
Journal: Soc Sci Med Date: 2004-12 Impact factor: 4.634

9. Unpacking the black box: How to promote citizen engagement through government social media during the COVID-19 crisis.

Authors: Qiang Chen; Chen Min; Wei Zhang; Ge Wang; Xiaoyue Ma; Richard Evans
Journal: Comput Human Behav Date: 2020-04-12