Literature DB >> 35765463

An entropy-based method to control COVID-19 rumors in online social networks using opinion leaders.

Abstract

- In the ongoing COVID-19 pandemic, people spread various COVID-19-related rumors and hoaxes that negatively influence human civilization through online social networks (OSN). The proposed research addresses the unique and innovative approach to controlling COVID-19 rumors through the power of opinion leaders (OLs) in OSN. The entire process is partitioned into two phases; the first phase describes the novel Reputation-based Opinion Leader Identification (ROLI) algorithm, including a unique voting method to identify the top-T OLs in the OSN. The second phase describes the technique to measure the aggregated polarity score of each posted tweet/post and compute each user's reputation. The empirical reputation is utilized to calculate the user's trust, the post's entropy, and its veracity. If the experimental entropy of the post is lower than the empirical threshold value, the post is likely to be categorized as a rumor. The proposed approach operated on Twitter, Instagram, and Reddit social networks for validation. The ROLI algorithm provides 91% accuracy, 93% precision, 95% recall, and 94% F1-score over other Social Network Analysis (SNA) measures to find OLs in OSN. Moreover, the proposed approach's rumor controlling effectiveness and efficiency is also estimated based on three standard metrics; affected degree, represser degree, and diffuser degree, and obtained 26%, 22%, and 23% improvement, respectively. The concluding outcomes illustrate that the influence of OLs is exceptionally significant in controlling COVID-19 rumors.

Entities: Chemical

Keywords: COVID-19 rumors; Entropy; Health misinformation; Online social network; Opinion leader; Reputation

Year: 2022 PMID： 35765463 PMCID： PMC9222031 DOI： 10.1016/j.techsoc.2022.102048

Source DB: PubMed Journal: Technol Soc ISSN： 0160-791X

Introduction

Since the last few times, the COVID-19 pandemic has spread swiftly at an alarming rate. The disease is considered a global crisis from the health and economic view. On February 15, 2020, World Health Organization director-General Tedros Adhanom Ghebreyesus stated, “We're not just fighting an epidemic; We're fighting an infodemic,” about COVID-19 at the virtual Munich Security Conference [[1], [2], [3]]. The term “infodemic” means facing complications in finding the answer to a problem due to an excess quantity of propaganda and rumors spread online and offline over the networks. So, the rumor leads to misinformation and produces mental fear, distress, and social disorder [4]. It deliberately makes agonize about the density of the COVID-19. COVID-19 rumors spread rapidly on social media without any authentication and legal verification. Even a single bit of illegal rumor dreadfully affects the entire society and instigates health consciousness [5,6]. The negative rumors spread fear and anxiety in society and impact human emotions and beliefs [7]. Eventually, WHO came forward has decided to make a myth-buster that clarifies and diffuses most of the myths about the COVID-19. It also includes various public advice, videos, and preventive measures to control the disease [8]. As we know that social media plays a critical role in widespread public events and activities. Sometimes they provide the white side of the picture and unknowingly or deliberately promote an image's dark side [5]. So it is social media's monopoly to decide whether to filter out the complete information or make it viral. They also provide the facility to access vast information to achieve a unified goal [9]. Whenever a specific emergency happens in the real world, different tweets, posts, and messages are circulated on social media without knowing the veracity [10]. Even though the national government agencies are ready to face the crisis and support providing all necessary services, all facilities are brought down and suspected because of various false rumors [11,12]. The main reason behind the rumor spreading is blind trust without knowing the facts. Rumors spreading in those communities are very rapid. They align with the previously presented values, i.e., people already believe in the community's other populace to a certain degree. So, whenever people struggle for basic needs and the existing condition is crucial, such a situation stimulates rumors. Besides, one of the leading factors affecting human psychological and mental decisions is trust. A trust provides the opportunity for the user with whom a user can share and accept information [13]. Any unreliable messages can quickly target a trusted system. In OSNs, it is challenging to identify the reliable and trusted users on which other users can rely without any uncertainty [14]. In Ref. [15], a three-valued subjective logic (3VSL) model is addressed that models the uncertainties in the graph using Dirichlet-Categorical (DC) distribution and network topologies. In Ref. [16], a trust management system is designed for a social reviewing system based on the indication theory and fuzzy logic. This research also considered time-dependent and content-dependent crown agreement for decision making. Similarly, other researchers also proposed multiple approaches based on trust value, trust score, trust degree, trust path, and various other factors [[17], [18], [19], [20]]. In the 1950s, OL's character in building and recognizing the individuals' sentiments can be followed back to specific works. Researchers suggested that a person is not just selecting to acquire the prominent administration position dependent on their characteristics; besides, they ought to have a few attributes that coordinate with a specific condition of the community and the bunch of common interests shared by its individuals. According to this theory, a person or set of people has the significant capacity to influence the community assessments on some recognizable and applicable issues in contrast with the others. Katz proposed ‘the two-step flow of communication’ that significantly addressed OL's power to transition sentiments and views from social media to the public community [22,23]. They also considered the influence integration based on three main components; quality of values, capability, and domain of interest. Leader's and follower's character and decisions are likely to change unusually over time and domain applicability [25]. In the era of digital advertising, at whatever point an OL is associated with a specific item or material, these items' buying power expands sooner when contrasted with different things. An OL impacts another user's conduct by turning over more information about the items through their insight and knowledge [26]. Thus, an OL is an individual or coalition of people who significantly impact the social network's human acceptance process. They can differentiate the valid and invalid things through their skills and technical knowledge. They have effective communication and a vibrant attitude to convince other human beings surrounding misinformation and rumors. Companies and organizations utilize OL's credibility to design and shape the conceptual business model. The abstract business model provides the plan and growth for small and medium businesses and recommends more values and benefits to potential customers [27]. Suppose the businessperson and marketer know their potential customer who follows a particular opinion leader on social media. In that case, they can adopt or contact the opinion leader to influence and advise them to choose the product manufactured by their company [28]. The significance of OLs is also very vibrant in knowledge graphs, where databases are made from various homogeneous and heterogeneous entities interconnected with semantic relations [29]. OLs can support identifying the most suitable techniques and procedures through digital marketing to spread the product information into large communities. An OL is supposed to be a social media model that motivates and propagates almost accurate media information. It is beneficial for companies to promote their product easily through brand marketing and gain more customers' attention [30]. Due to a broader range of knowledge and proficiency, OLs are competent in capturing potential customers willing to purchase a particular product. In this sense, OLs support customers and companies in making the correct decision based on the current market trends. In the online learning community, OLs share their views about the quality and value of the content shared by many users in text, audio, images, and videos. OLs can also guide the users to find reliable sources of study and knowledge material significant to a specific group of interested users to gain information about that topic [31,32]. In agriculture, due to the lack of information and technology-enabled resources, people who belong to villages and rural areas are not aware of the new technology-driven transformations and innovations. OLs support farmers and land workers by providing valuable agriculture-related information that eventually enhances the national economy. OL is a primary user who controls and transfers the appropriate information intelligently to the other communities in the knowledge graph. So, it is rather significant to transmit only relevant information that is effective to a particular group of users. OLs support and recommend trying to choose those customers who recommend authentic and correct products so that others do not face any problems due to the quality of the product. They also help resolve the cold start problem in the recommended system [33,34]. In the same way, the power of OLs can modify the perception and awareness of the severity of COVID-19 [35,36]. They rigorously examine the veracity by measuring the source information and inspecting the rumor's authenticity. Also, it is essential to scrutinize the influence and information dissemination of OLs in the social network to triumph in the battle between COVID-19 and potential epidemic outbreaks [37]. Thus, the following research objectives are identified based on the above discussion.

Research objectives

The leading research objectives (ROs) of the article are as follows: Analyzing and evaluating the power and significance of OLs to control COVID-19 rumors based on various rumor controlling metrics. : Discover a unique and innovative approach to find the top-T OLs in OSN based on the user's reputation. Identifying the veracity of the COVID-19 posts by measuring entropy. Inspecting and measuring the implication of reputation and trust in OL detection (seeFig. 1).

Fig. 1

Human-influenced viewpoint representation.

To achieve the mentioned ROs, we have presented an exceptional and innovative technique in which OL behaves like a represser who can control spreading rumors as much as possible. OLs substantially impact and prevent their supporters and believers [38,39]. So, if they follow any pattern or provide an opinion on a specific matter for society's benefit, their followers probably also pursue the same guidelines as shown in Fig. 2 . When the users interact either with the rumor diffuser or OLs, it may be possible that they either transform their viewpoint or are consistent with their beliefs.

Fig. 2

Network representation with diffuser, represser, and oblivious node.

Human-influenced viewpoint representation. Network representation with diffuser, represser, and oblivious node. Initially, a large number of COVID-19 tweets have extracted from social networks. Next, we have preprocessed all the tweets by handling negations, punctuations, degree modifiers, word shapes, emoticons, emojis, slang words, and sentiments-laden. Further, the reputation of each node is measured to figure out trust. Trust's central significance is verifying the tweet's trueness so that if any user spreads any information on the social network, the OL can confirm the tweet's authenticity [40]. The ROLI algorithm targets the list of top-T OLs responsible for controlling rumors [41]. Finally, we measure each tweet's entropy based on the probability that the other user responds to the tweets and estimated trust. If the tweet's entropy is less than the pre-defined threshold value λ, it seems to be a rumor. A report is generated against it; otherwise, it transmits in the network, usually per the other user's opinion.

Practical applications of research in society

In this research, we have identified the top-T OLs who can control the transmission or rumor as much as possible. This research's practical implication is very much in the real world because the diffusion of any negative news may develop the mystification related to COVID-19 treatment and diagnosis that affects human health and the inner sentiment support system [42]. The influence of OL is not only limited to preventing misinformation or rumors. Still, it is significant in various domains like education, agriculture, healthcare, defense, marketing, promotions, consumer behavior, and more [43]. The practical implication of this research is broadly elaborated as follows. First, the research outcomes are highly improved w.r.t. accuracy, precision, recall, F1-score, and lesser execution time than other standard SNA measures for all three datasets. Previous studies also computed the OLs based on different parameters, but only limited studies used the integration of reputation, trust, polarity, and entropy to determine OLs in OSNs. The impact of the proposed rumor controlling mechanism is considerably picked up over other SNA measures, as shown in the previous section. Second, three different datasets are chosen for analysis and evaluation because of their nature and dynamics. Twitter has a retweet facility that is not presented on Instagram. Instagram is an attention-based social network without any messy content. Reddit is a forum-based social network where a user can taste the essence of all kinds of information. The density of Reddit is higher than both of the networks but has a limited number of users. So, this work outperformed well for all kinds of OSNs. Previous studies used only datasets suitable to a specific domain or interest and did not cover various OSNs. Third, there are various real-world insinuations of this research. It is a unique approach supporting organizations and industries to identify their customer's sentiments and opinions about a specific product through OSNs. OLs can find healthcare experts, physicians, and advisers through online physician communities in healthcare. In agriculture, OLs can support farmers by providing advanced practices and propagating different pesticide information, leading to an enhanced sustainable agricultural system. OLs also help organizations differentiate between rumors and anti-rumor messages on OSN through the proposed approach by calculating the message's entropy. Similarly, the proposed system performed well for OLs detection and control of rumors in multiple fields and provinces. Thus, the formation of the entire article is as follows: The interrelated work description and limitations are discussed in the second section. The third section explains the credential details of the SIR epidemic model and SNA. We have also discussed the different SNA centralities measures and epidemic model states in this segment. In the fourth section, the proposed approach is elaborated on along with the complexity of each procedure. An innovative ROLI algorithm also suggested distinguishing the top-T OLs in the network. The operational behavior of the ROLI algorithm is based on the SIR model. A polarity-based method is proposed to identify the user's reputation and trust. Next, the trust and probability-dependent procedure is derived from calculating the entropy of each tweet. The fifth section discussed the metadata of extracted tweets derived from Twitter, Instagram, and Reddit datasets with the other characteristics and experimental results. The proposed approach's practical outcome is evaluated along with the performance metrics w.r.t. affected degree, represser degree, and diffuser degree. In the sixth section, we have discussed and analyzed the consequences of the ROLI algorithm with the other SNA measures w.r.t. accuracy, precision, recall, F1-score, and execution time. In the seventh section, the theoretical and practical implications of the proposed approach are explained. We have conversed about the article's future aspects, limitations, and final wrap-up in the eighth section. Thus, in a nutshell, the research article contains the following novelties:

Analysis of related work

The role of OLs is very prominent and data-centric in social media. They can insulate users against risks such as rumor or misinformation exposure, anxiety, mental illness, and many more; and create a layer of trust that provides healthy and accurate information. They can also condense the influence of rumors by their knowledge and experience. The COVID-19 disease has completely spread worldwide, and many people have already lost their lives due to this disease. Social media is emerging as a strapping platform for communicating trustworthy health-related advisory and guidelines. Many people came closer than ever via social media to triumph in the battle against this infection [44,45]. Thus the responsibility of OLs is now more essential and worthy to control the dissemination of fake news [46,47]. In this research, we concentrated on two different kinds of study. In the initial investigation, we have considered the various OL detecting techniques, while in the latter study, we focused on various rumor control strategies. So, initially, we discussed the OLs recognition mechanism in brief. According to Ref. [22], recognizing OLs is a two-walk procedure involving information gain and followers’ adoption. Initially, OLs gathered the information from the various resources and shared it with the associates after analyzing it. Over time, multiple methodologies and theories have come into the picture related to graph theory, mathematics, data mining, machine learning, game theory, metaheuristic algorithms, behavioral science, ontologies, etc. [[48], [49], [50], [51], [52], [53]]. So, the concise description of OL identification approaches is exposed in Table 1 .

Table 1

Brief Description of OL's recognition approaches.

Caption	Dataset	Details	Supports	Constraints
[54]	Reviews of Apple's iPhone	The relationship in the user's cluster is calculated based on various characteristics, and also OLs are detected using a text mining approach.	Straight-forward; Support in the recommended system.	Limited dataset; Used in online communities and forums only.
[55]	Epinions dataset	The reliable users are extracted by eliminating the unreliable user-based capacity-first maximum flow procedure.	Suitable for the recommended system; Control authenticity.	Restricted connections used; Only applied trust metrics.
[56]	Synthesized dataset	The propagative decision, spreading speed, and the count of the number of adopters determine the efficient OLs.	Support for marketing operations; Significant in the indirect proposal system.	Need initial adopters for spreading information; Bounded centralities.
[57]	Educational blogs dataset, Parent-child forum dataset	User's status, links, behaviors, and response time are measured to find the OLs.	Reinforced online blogs; Reduced execution time.	Restricted scope and domain; Applied Topic-specific constraints.
[58]	Amazon.com dataset	Domain-sensitive key users were found based on the user's effectiveness, expertise, and transition matrix in online forums.	Assist with promotions; Support authority.	Needed initial user's concern as input; Used in bounded and regulated space.
[49]	Facebook dataset, citation network dataset (Google scholar & DBLP)	A mathematical formula is derived from creating a series of OLs' actions and influence. A universal series is also designed to normalize the result.	Found affected path; Inflated accuracy.	The entire procedure is complex and requires lots of computation, Applicable to the limited domain.
[59]	Web-based Chinese stock Forum dataset	Clustering algorithms and the posted text are used to identify users' actions. Sentiment analysis-based case study discussed to find price trends.	Straight-forward and easily implementable; Assist in marketing.	Limited to online forums; Only PageRank centrality is used for comparison.
[60]	Synthesized dataset	Popularity and competency scores are measured to find the chances of the OLs based on the specific domain.	Easily implementable; High accuracy.	Vulnerable to the topic of interest; Not focused on retweets.
[61]	Epinions dataset	A novel approach used the trust value to find the OLs. Hybrid centrality measures are used to calculate trust values.	Simple calculation; Better results for hybrid centralities.	Few trust metrics like trust spread, trust maturity, and trust penetration are used for comparison.
[62]	Mobile01 Forum	A new algorithm suggested finding the OL by reducing candidates using overlapping communities and user relationships.	More scalable; Reduced overlapping influence issues.	Limited parameters for community detection; Applicable in online forums only.
[63]	turnbackhoax.id	The proposed approach focused on edge and centrality power. Edge power is used to find the relationship, while centrality power is used to decide each one's proportion to find the OLs.	Suitable for information spreading; Presented an agreement algorithm.	Limited relationship for establishing connections; Lesser number of evaluation metrics.
[64]	Sina micro-blog dataset	The topic of interest and user's transmission features are identified to define the user's control. The IC-based propagation model simulation identifies the OLs based on the number of infected nodes.	Significant for information transmission; Modified IC-based propagation model used.	Bounded scope; Fewer metrics used; Network and node pre-knowledge required.
[65]	Synthesized dataset	A trust model produced used the concept of fuzzy logic to measure reputation. The critical user with the highest status is selected as an OL.	Implement human reasoning; Produce results with more fidelity.	Fewer rules are used; A limited number of fuzzy rules are used.
[66]	Chinese Sina BBS	An augmented algorithm suggest based on domain sensitivity used for clusters identification while temporal attributes computed the OL's effect during a time.	Easily implementable; Better outcomes over PageRank.	Primary domain and topic information required. Compared the outcomes with the PageRank algorithm
[67]	Facebook, Twitter, Google+	An innovative, dynamic model is proposed based on discrete-time. A key user is selected depending on the equivalent opinion vector of the tightly connected elements.	Unique vigorous, dynamic model; Highly efficient Clustering outcome.	Complicated to understand; Not handling overlapping cliques.
[68]	Twitter	A new rank of centrality measure was invented based on user interest and specialty on some topic. OLs are selected based on maximum centrality.	New Rank centrality is defined; No need for a prior relationship.	Limited topic considered; Tweet's monitoring needed to check outcomes.
[69]	Local Motor community	Statistical components such as multi-featured are used to find the OLs. Also, the value and interest parameters are used to upgrade accuracy.	Reliable; Simple; Easy to implement.	Shortage of promotional degree, Restricted dataset.
[70]	slashdot dataset	Local and global OLs addressed based on the upgraded firefly algorithm and attractiveness score.	Acknowledge easily; Higher performance.	Static dataset; Few features are used for measurement.
[71]	travel community dataset	User's actions and their effects are used to acknowledge consumer's decisions. OLs are selected based on action-specific attributes.	Integrate SNA with Virtual travel groups; Significantly measured influencer affect.	Bounded dataset; Limited parameters are used for computation.
[72]	Wiki-vote and synthesized datasets	An upgraded whale optimization algorithm addressed to find OLs based on various objective functions and prestige. Also, an adjoining nodes-based procedure is used to identify the clique.	Highly optimized results; Unique application of the nature-inspired algorithm.	Lesser number of datasets are used; Differentiate outcomes with few algorithms.
-	Higgs Boson data from Twitter	Louvain method implemented for community detection. Next Betweenness centrality based approach is used to find OLs	Straight forward; Easily implementable.	Limited centralities used; Implemented on few datasets.
[74]	Wiki-vote and Bitcoin OTC trust weighted network	Conditional probability-dependent groups of OLs are identified using the Game theory approach. For individual users, a shapely value is calculated to find the payoff in each group. Groups with maximum synergy are considered for selection.	Produced highly accurate outcomes; Powerful application of game theory.	The user's initial trust score is required; Specific elements are used for measuring distance.

Brief Description of OL's recognition approaches. Now, we addressed the few studies related to rumor spreading that supported the foundation of this research. In Ref. [75], an innovative model is proposed based on the mean-field concept to find the communities' substantial threshold value. The user worked as a mobile agent with some probability responsible for the inter-group long-range movement and disease-free clusters. The proposed approach also determined the transmission effect of the disease in the network. In Ref. [76], A memory-based model is suggested that explains the impact of memory over time. Also, they have used a few attributes and memory rates that have more influence on rumor transmission. According to Ref. [77], a unique 8-phase ICSAR model is proposed covering different features that find rumor dissemination. They have also considered the eight reasonable elements to evaluate the rumor, including information interest, rumor goal, user inclination, trust scale on social media, transmission rate, augmented component, block score, and specialist influence. In Ref. [14], researchers addressed a modified trust-based SIR model that defined a few mathematical equations representing the SIM model dynamics in the network. They also stated that the trust component could reduce the network population by limiting an evaluative threshold to control rumor transmission momentum. In Ref. [78], the analyst investigated that followers are the leading player responsible for the rumor/antitumor diffusion or maintenance. So they studied the nature of users who already have faith in rumors and can influence others through their strong beliefs. As per [79], researchers established a new Susceptible–Hesitating–Affected–Resistant (SHAR) model that evaluates users' behaviors against rumors and local and universal rumor concern equilibrium. They also computed the model's reproduction and sensitivity of rumors towards parameter change. They have also analyzed the three different human viewpoints for experimental purposes. In Ref. [80], the authors investigated the importance and briefness of weak ties in rumor spreading. They argued that weak ties' role is not influential for rumor propagation, but the diffusion rate depends on identifying weak links. They have also defined a probability based on a weak-tie-dependent function. In [81], analysts proposed two transmission channels; the first is for point-to-point transmission, and the second is for global rumor transmission. A modified SIR model is also addressed using the mean-field formula. A geometric function and simulation have been performed to authenticate the proposed approach and verify the impact of rumor spreading. In Ref. [82], a novel rumor transmission model has been proposed based on the time interval and non-linear procedure. Initially, the primary reproduction number is calculated based on the next-generation matrix. Next, the researchers analyzed the rumor stability and factors needed for survival using experimental simulation. According to Ref. [83], the researchers developed an innovative SIR-based SKIR model that integrated rumor and anti-rumor-related facts. Using the game theory approach, they have also identified the active factors that stimulate the people to accept the rumors and the anti-rumors. They have also found the internal and external user characteristics that support the power of rumors using regression analysis. Also, the user's attitude and dynamic behavior are considered for rumor propagation. In Ref. [84], a rumor repudiation effectiveness index (REI) is developed to identify the factors that support the denial of rumors. They have also proposed four regression models to discover the dependence between REI and components to reject the rumors. The user's interest and topic sensitivity are also considered when implementing the proposed system. In Ref. [85], the NLP model is presented using the deep learning-based DistilBERT and SHAP models. Initially, they integrated many datasets, performed back-translation, and found more accurate outcomes using DistilBERT. The SHAP makes the model more observable under three different experimental provisions to enhance trust. Thus, various approaches are put forward for rumor controlling from the human and social perspectives. Most techniques are based on epidemic models, user response time, distribution speed, rumor transmission rate, content, behavior, attitude, and opinions [86]. The major pitfall with these approaches is the lack of user trust, reputation, and content accountability. Therefore, collective challenges are identified in the previously developed approaches, and how the proposed approach support overcoming those challenges is described as follows: Barely optimized result: The previously developed approaches can barely give the more optimized and enhanced results for all kinds of datasets. However, few approaches produced better results in a specific domain but lacked in other kinds of OSNs. In this article, a novel ROLI algorithm provides highly accurate results w.r.t. precision, recall, F1-score, and accuracy metrics. In addition, three performance metrics, diffuser degree, represser degree, and affected degree, authenticate the proposed approach's effectiveness in controlling rumors. Support few datasets: Most prior studies evaluated the online forums, the review communities, microblogs, online review forums, Twitter, Facebook, and other online platforms. Only a few approaches used multiple datasets to validate their work and detect OLs. The projected study supports mainly all types of OSNs as a user and network-specific features desirable to compute the user's centralities, trust, and reputation. Extremely convoluted system: The preceding approaches are complex, and composite contains many exponential functions, multi-dimensional vectors, composite logical structure, inconsistent flows, and logarithmic equations and functions. Moreover, many hypotheses and procedures needed too much execution time and a highly advanced machine for implementation. On the other side, the coherent organization of the proposed approach is simple and understandable that does not comprise any multifaceted and varying materialization. We have used centrality measures, voting-based procedures, and mathematical equations to compute OLs and the veracity of rumors. Leverage of data mining-based techniques: It has been observed that most of the former approaches leverage the data mining-based techniques and approaches like supervised, unsupervised learning, cluster analysis, text mining, fuzzy logic, probabilistic methods, network structure, and user's centrality and many more techniques to identify OLs ad rumor control. Though data mining-based methods are attained better results for various real-world applications, due to the masked and dynamic nature and high volume of information in OSNs, these techniques are not much more efficient and appropriate for social networking. This research does not deploy any data mining techniques. Only new innovative algorithms are designed for analysis purposes. Insufficiency of user trust and reputation: It has been observed that most of the approaches are lacking in considering user trust and reputation for OL identification. Although few researchers ruminated on trust and reputation, reasonable outcomes are not produced because they only focused on the methods to compute trust. In our approach, trust and reputation are considered and focus on the method of describing the propagation of rumors. Thus, our approach concentrated on three tasks: identifying the OLs in OSN, validating the tweet's veracity, and controlling the spreading of tweets. Deficiency of tweet's entropy: Most previous approaches did not consider the tweet's veracity by computing the entropy. Few techniques used the modified SIR models, network topologies, user features, tweets, and other elements to validate the legitimacy of the tweet. Although they have made lots of efforts and experiments to authenticate their work, entropy should also be considered for checking the veracity of the tweet. In our approach, an entropy-based method is addressed to verify the veracity of the tweet using the user's trust and probability to determine the chances of a retweet. Absence of user agreements: Most proposed approaches used user's centralities, network structure, topologies, tweet response time, tweet polarity, user behavior, user communities, and many other characteristics to discover the appropriate OLs. No study has focused on user agreement and voting practice for identifying the OLs. We suggest an innovative voting-based algorithm whose computational complexity is also very low to identify the OLs in OSNs. Besides, as the number of users in the network is enhanced, it is also conceivable to get more optimized results efficiently. Although a few techniques also worked on the discussed issues but either fulfilled only a few limitations or could not demonstrate the power of OLs in rumor control. So in this research, we may overcome most of the constraints and present an approach that successfully gave the novel solution by validating the procedure through experimental outcomes.

Background

Social networks play a significant role in information dissemination and misinformation control between people in the current electronic era. It is considered a rumor if certain information is flooded without official legal verification. In other words, a rumor contains faithless information that changes rapidly once transmitted from one person to another. The topology and dynamics of social networks have great significance in generating rumors. Various methods have been proposed to control the rumor spreading on the OSNs. The rumor diffusion model's behavior is often similar to spreading the epidemic [80,87]. The disease transmission in epidemic models is dependent on the variety of agents and types of disease [88]. In this research, the core behavior of the rumor propagation process is based on the classical SIR model, making it more practical and rational. The SIR model is the elementary epidemiological model that illustrates the community's theoretical model of disease transmission.

Social Network Analysis

A social network illustrates as a graph G = {V, E} that includes the collection of nodes V and links E. A node shows the individual, group, community, or organization, while a link illustrates a relationship among the individuals. Graph theory concepts are used to characterize the dynamics of the social network. The clustering coefficient, density, diameter, network size, structural hole, strong-weak ties, and many other related concepts are derived from the graph theory. Social network analysis involves measuring the node's centralities, community identification, homophily, bridges, triangles, link prediction, node classification, clique identification, and other related components [[89], [90], [91], [92]]. In-degree: Calculate the number of nodes related to the node. Out-degree: Calculate the number of nodes linked from the node. Clustering coefficient: Explore the scale to which a node is inclining towards making a cluster with the other nodes in the network. Network density: Explore the fraction between the existing links and the network's full potential links. Homophily: Explore the feature that the nodes may likely connect or union with other nodes in the future. Population size: Count the total number of nodes in the network. Closeness Centrality (CC): States the mean distance of the node from other nodes, i.e., the node can transmit the information rapidly as it is near the other nodes in the network. Betweenness Centrality(BC): This indicates the fraction between the entire shortest link through the node and the shortest path possible among the network nodes. Degree Centrality(DC): Find the sum of in-degree and out-degree, i.e., the total count of links that a node encounter in the network. Eigenvector Centrality(EC): States the node's power on the connected neighbors and allocate a value to the node based on the neighbors' relative high or low value. The higher eigenvector centrality node is connected with neighbors with higher eigenvector values. The particular case of EC is the Page-Rank (PR) algorithm used by Google to rank web content.

SIR model

The primary theory of the SIR epidemic model is based on the intensity of infection, dissemination pattern, and network topology [88]. The SIR model has three phases: Susceptible, Infected, and Recovered. A Susceptible (S) phase defines the stipulation of the still not contaminated nodes but expected to be infected. An Infected (I) phase represents the total number of contaminated nodes that can spread the infection to other network nodes. A Recovered (R) phase states the total number of recovered nodes tainted from the disease. Also, the node is recuperated from the contagion and may or may not distribute the infection depending upon the neighbor's interaction. Different researchers use a more comprehensive range of epidemic models such as SI, SIR, SIS, SEIR, SEIS, and SWIR for immunization, vaccination, infection measurement, information maximization, controlling policies, and others in different domains [[93], [94], [95], [96], [97]]. A node may exist in any one of the states at a particular time and can toggle its state over time. The total amount of users in the social network may vary, i.e., each time, some new nodes generate, and some old nodes disappear from the social network. The model uses o and f parameters to determine the origin and fatality degree. γ depicts the revival degree while β represents the transmission degree through which disease is distributed in the populace of size N at time t. Hence, the following differential equations, Eq. (1), Eq. (2), and Eq. (3) used to explain the model. Often, the transmission degree is relatively low compared to the revival degree, so the disease's effect depreciates after some time. If the disease's intensity is very soaring, the probability of infection is very high, and a node may progress the phase from being susceptible to infection. In contrast, if the disease's intensity is very low, a node may shift from infected to the recovered stage.

Proposed methodology materials-methods

The entire network is categorized into three types of nodes; oblivious node, diffuser node, and represser node. A diffuser node is a user who propagates a rumor from one person to another. They directly connect with the other users and are involved intensely during rumor diffusion. When an individual establishes a relationship with a diffuser node, the probability of affection becomes high. An Oblivious node is a user who has neutral behavior during the rumor spreading,i.e., they do not take any action on the rumor and neither stop nor forward them. The represser node is a user who never transmitted it to others whenever they received the rumor and tried to stop its transmission. For example, consider the network, as shown in Fig. 2. There is three colors node; The grey color nodes indicate the oblivious node. A green node represents the diffuser nodes, while red nodes depict the represser nodes in the network. When the represser node influences the diffuser node by its intelligence, it becomes a represser node. In the same way, the complete knowledge about the rumor veracity is transmitted in the network gradually. Sometimes, few nodes are not much influenced by the represser node and are firmed on their decisions. Thus, it is very complicated to divert others' decisions through social media; Still, OLs try to persuade others’ attitudes through their expertise [39,98]. In this research, we have illustrated the complete efforts performed by OLs and explained the procedure to find the OLs. So, all the computation and operations are done statistically, but in real work, they would be performed by OLs. So, we initially fetched many tweets from the three social media to effectively identify the population's polarity and sentiments. The ROLI algorithm is explored to find the effective top-T OLs in OSN. Also, a VADER python library is used to find the tweet's polarity covering various features and sentiments [99]. Each user's reputation is computed based on the polarity scores received from the neighbors and other nodes. Next, trust is measured to validate the posts' trueness, ensuring whether the posted tweet is a rumor. Algorithm 1 shows the pseudocode, and Fig. 3 shows the flow chart of the proposed approach.

Fig. 3

Flow chart of the proposed approach.

Flow chart of the proposed approach. Algorithm 1: Opinion Leader-based Rumor Detection (OLRD) Algorithm

Opinion leader identification

This research has proposed a new ROLI algorithm that identifies the most critical OL in the OSN based on the highest reputation. The concept of the ROLI algorithm came from product awareness in product marketing. The product manager's main motive is to promote the product by recognizing the users with many social media followers as a business strategy. The product managers try to attract these information spreaders by providing a few rewards or benefits. So it is a very classical problem to choose such a spreader that can maximize the total selling of the product regardless of network topology and other competitive business strategies. Hense such types of users are considered OLs who can control other nodes. Generally, most OLs identification approaches are based on various distance-based centralities measures. Over the last few years, The PageRank-based procedures have provided better outcomes in most cases and more deliberation towards solving the significant user identification problems. A user with a higher reputation in the network has more chances to get more votes from its neighbors. So, identifying the user's reputation is critical and depends on the total number of tweets and retweets posted by the user's neighbors. The general concept of the ROLI algorithm originated from the SIR epidemic model in which a user exists in any of the three stages: Susceptible, Infected, and Recovered. Many researchers have observed that if the disease's infection rate is high compared to the recovery rate, it is complicated to eliminate the condition that formed a pandemic. Every time an infected node attempted to spread the disease with the rate β to any of its neighbors. Similarly, a node can be recuperated with the rate γ over time. Hence, we have used the ROLI algorithm in which a node voted to its neighbors based on their reputation score. Each node's reputation is initially calculated based on its total degree, i.e., the complete number of connected nodes. It is also observed that most social networks follow the richer gets richer phenomenon, i.e., the most potent nodes tend to magnetize more other nodes in the network. So, in the real world, a node with a higher level of trust or weighted relationship with its neighbor or other nodes would prefer to vote for that node. In this algorithm, if we want to choose the top-T OLs, every player has a chance to vote t times. That node does not participate in further voting rounds if a particular user is selected as an OL in one round. The main reason behind this strategy is to avoid biasing among the nodes because the spreading power of the elected node may influence or affect the decision of other nodes. So as the elected node would not be involved in voting, the subsequent node's neighbors and their neighbor's power also shrink. Hence the selected nodes are separated from the entire process and can not use their control unnecessarily with the neighbors. Finally, after t number of rounds, top-T OLs have been selected successfully. In the ROLI Algorithm, a value pair (, ) is associated with every node, where depicts the voting score received from the node's neighbors and indicates the voting ability of the node, i.e., the total number of votes that the node can grant to its neighbors. Initially, all the nodes have the same capacity in the first round. Every node can vote for its neighbors, and all neighbors can also vote for the subsequent node. The voting score of the node is the collective aggregation of the votes that its neighbors have given, i.e., if a node receives a total of five votes from its neighbors based on its reputation, the voting score of the node would be five. After each round, the node with the highest voting score would be declared the OL. It also has noticed that the elected OL would not participate and set its voting score to zero in the subsequent round. All the nodes connected with the OL have to update their voting capacity in the upcoming phases. The node updates its voting capacity by until the value of reaches to zero. In this mechanism, indicates the diminishing variable whose value lies between 0 and 1. For simplicity, is measured as , where in the mid-degree of the network. The same are iterated the t number of times or until the required number of OLs are identified in the network. Thus, the entire structure of the ROLI algorithm is as shown in algorithm 2. Algorithm 2: Reputation-based Opinion Leader Identification (ROLI) Algorithm To understand the ROLI algorithm more clearly, consider the following network having nine nodes. Initially, the voting capacity of each node is set to be one, and the reputation of each node is calculated based on the aggregated degree, i.e., the total number of links connected with nodes. Fig. 4 (a) depicts the first round outcomes, and node #4 is elected as an OL. In the next round, all the neighbor nodes, i.e., #1, #2, #3, #5, and #6, connected with node #4, update their voting ability and voting score according to the mentioned rule. The average degree of the network is 2.22, so the value of becomes 0.45. Therefore, each connected node reduced its voting ability by the factor of and the new voting capability of the node becomes 0.55. In the next round, the voting capability of node #4 becomes 0. In Fig. 4(b), the reputation of the nodes is measured, and node # 7 is selected as OL. The same procedure is carried out for the remaining successive rounds until the identified number of OLs.

Fig. 4

ROLI Algorithm illustration (a) network structure after the first iteration, (b) network structure after the second iteration.

Computational complexity

In this article, two algorithms, (i) OLRD and (ii) ROLI, respectively, are proposed to identify and validate the efficiency of OLs. The first algorithm OLRD is a decisive algorithm that produces the outcomes of the rumor. This algorithm mainly involves three tasks. The first task is to find out the OLs using the ROLI algorithm, the second is to compute the polarity-based reputation, and the third is to measure each user's trust value in the network. The procedure to measure the time complexity of the ROLI algorithm is divided into three steps; The first step involved the time t1 required to measure the initial reputation and voting ability. The second step included the time t2 needed to select the node with the highest reputation score. The third step had the time t3 required to modify the reputation score. There are total n nodes and e edges in the network. The total of O(n) time is needed to measure the initial reputation of each node. Similarly, the initial voting ability of each node is 1. So, only O(e) time is needed to assign initial voting ability. Thus, the whole time complexity t1 is O(n) + O(e) ≈ O(e). If we choose the optimal procedure to find the node with the maximum reputation score, the time complexity t2 of the second step would be O(n). in the third step, the reputation score of only those nodes would be updated that are one or two units far away from the represser nodes that have been selected OLs in the previous round. The mid-degree of the network is . Thus the time complexity of step 3 is O ( ) ≈ O (). If we want to select t number of OLs, the overall complexity would be O (e + tn + t ). If the network is sparse, i.e., n » e and n » t, the time complexity of the algorithm can be O(n) for the network. A simple mathematical equation that includes a simple science and iterative function is successfully deployed to compute each user's polarity-based reputation and trust. First-order predicate logic is used to measure reputation. Therefore the complexity of measuring the trust and reputation of each node is O(n).

Stages of rumor transition

Diffusion stage

In this stage, the diffuser user attempted to spread the rumor with the rate β in the network. The user finds the appropriate piece of information that is to be transmitted. Sometimes, the diffuser may modify some of the information according to the current circumstances and spread the indistinct information to its neighbors. If the nodes do not want to participate in the transmission or have null facts, they ignore the transmission process.

Recognition stage

In this stage, the node decides whether to accept the rumor with the probability If the degree of trust is very high among the other users, the node may accept the rumor and propagate it to its neighbors, but the node rejects the rumor if the degree of trust is low.

Probability of retweeting/forwarding the tweet

In the social network, millions of users posted various posts and retweets related to different events every second of a day at the rate of . The user finds various new posts whenever the users refresh and reload the social media interface. The user decides to read, forward, retweet, or ignore the posts. Let u1 be the total number of first-order followers, and u2 is the second-order followers of a user x. The second-order followers are considered only those who retweet or read the posts read or retweeted by first-order followers. Let, indicates the total number of followers in the individual network, and depicts the probability of finding the total number of followers who pursue, read, and retweet the tweet at level d after its posting within time t. We have assumed that most users who read and retweet the post are equal to p. Let represents the total number of followers who read or retweet the tweets at level d after t. Therefore, the value of can be computed using Eq. (4). In this process, the total number of tweets and the appearance of the total number of l-order users are considered exponential. So, Poisson distribution is used to measure the total number of users who randomly read and retweet the posts at time t, represented using Eq. (5). So, the probability or first-order users to retweet can be represented using Eq. (6). Now, we have formulated the Poisson distribution for second-order users. Here, one interesting fact is that the second-order users only retweet the post that is retweeted by the first-order users on or after time t. Let consider the time on or before the first-order user read the tweets such that . So, the probability for first-order users to retweet, the post between t and τ can be depicted using Eq. (7). So, the probability , for second-order users to retweet can be represented using Eq. (8). Correspondingly, the probability can be shown usingEq.(9). Similarly, the probability of retweet for third-order users is represented using Eq. (10). Thus, the general formula to calculate the probability by the user at level-d is shown using Eq. (11). If an individual receives a bit of information from the other users in the network, it depends on the degree of trust that the user perceives from the other user. So, measures the probability of reading and retweeting the other users’ posts published in the network.

Tweet preprocessing and reputation calculation

There are two types of users in the social network, active and passive, based on their actions. Most active users respond to the tweets according to their domain of interest and topic. On the other hand, the passive user rarely responds to the tweet regardless of their topic interest. During the analysis, we only considered the active users' actions who have commented and retweeted on COVID-19-related posts. Generally, a tweet is significantly affected by the active user's response, environment, content, time interval, and domain [100]. The other's user tweet also influences the tweet's content and schism over time. So, we have redesigned our dataset in which each tweet is represented by the tuple <#user_id, content, time>. To measure each tweet's polarity, we have used the VADER (Valence Aware Dictionary and sEntiment Reasoner) python tool that categorized each tweet into the positive, negative, and neutral categories. This tool also defines positive, negative, or neutral polarity potency based on character case sensitivity, emojis, punctuation, emoticons, and slang. The primary elements of the tools include degree modifiers, conjunctions, and n-grams. The modified reputation measuring course of action calculates a user's reputation in the social network. Thus the reputation of the node x is measured using Eq. (12).where, In the above equations, indicates the positive polarity of the m-th tweet, indicates the negative polarity of the m-th tweet, and indicates the neutral polarity of the m-th tweet. A constant is the weightage assigned to positive and negative polarity, and is another weightage constant allocated to neutral polarity. Again, α and are non-zero integer constants whose value lies between 0 and 1. is the reputation of the user derived from a particular m-th tweet based on positive, negative, or neutral polarity, and is the reputation of other users who retweet on m-th tweet. is a moderating function to guarantee that the reputation of the authenticated user becomes robust against any forged post at level-d. We have performed various tests to find the appropriate value of and. We practically found that these parameters' impact affects a user's reputation in the network.

Trust computation

Trust plays an essential role in the rumor spreading over time. The role of trust is significant for accepting or rejecting rumors. The representation of trust is complicated as the variations in the user's preferences and attributes. Most users have thousands of friends on social media, but only a few hold the user's trust. Reputation portrays an essential character for trust computation [65,101,102]. If a user has a high reputation, most probably the user would trust them; but only in a few cases user's reputation is independent of trust. In the proposed approach, trust is calculated based on a user's reputation from neighbors and other users over time t in the network. We have computed the user y trust on user x by utilizing the user's reputation, as shown in Eq. (15).where , and is the reputation score of user x, y, and z, respectively. N(x) is the set of neighbors of user x, is a normalization component with while is a multiplicative operation.

Tweet's entropy calculation

A tweet's entropy signifies the importance and amount of information a tweet perceives during the transmission. Promotion and advertisement-like tweets have a low entropy, while news and informative-like tweets likely have high entropy. Generally, the user retweets or forwards only those tweets with some new and unique information or originated from some authorized source. There is a probability to determine the chances of retweeting on user x by other users. So, the user x, i-th tweet overall entropy E ( ) at time t is calculated using Eq. (16). The measured entropy is interestingly helpful to encode the hidden information in OSN as the network is hugely limber and changing. Thus, exhaustively finding the coalitions, synergy, and other network dynamics through entropy is convenient.

Parameters setting

The ideal environment is chosen to determine the optimal values of all the heuristic parameters to implement the proposed approach. The value of a few variables like network size, density, clustering coefficient, the total number of relationships, and centralities can be calculated using statistical and graph theory-based formulas. For the certifying purpose, 70% of the entire data set is selected for the testing task, and the remaining 30% of the dataset is used to validate the parameter's value. Initially, specific random parameter values are selected based on the specified constraints and conditions. Let m be the total number of users in the network, and j is the absolute number of iterations needed to find optimal values. Thus the amalgamation of m*j created a problem space to find out the optimum value. Moreover, If the size of m*j is enormous, the chances of obtaining the best values are very high for the heuristic parameters. So, considering this fact, we have chosen the Twitter dataset for analysis as the dataset size is enormous. In this research, we have used the four heuristic variables , and λ, respectively. A linear model Analysis of Variance (ANOVA) approach is used to identify the optimal values for all the parameters. This approach assists in discovering the relationship among the variables that support approving or rejecting the null hypothesis. In this experiment, four groups of size m*j are formed. There are 50 K users whose information is processed over 100 iterations in the first group. Similarly, the total size of the second, third, and fourth groups is 100 K*100, 200 K*100, and 500 K*100, respectively. Table 2 shows the outcomes of each group along with its Mean Square (MS), Sum of Squares (SS), F-statistics, and P-test values.

Table 2

Parameters values using ANOVA model.

m*j	Parameters				Degree of Freedom (DF)	Mean Square (MS)	Sum of Squares (SS)	F-statistic	P-test
m*j	σ	∂	τ	Λ	Degree of Freedom (DF)	Mean Square (MS)	Sum of Squares (SS)	F-statistic	P-test
(50 K*100)	0.20	0.10	0.10	0.50	3	7463.56	285,633	76.82	0.000
	0.40	0.25	0.20	0.60	3	7256.73	265,358	69.76	0.000
	0.60	0.50	0.40	0.70	3	8362.90	342,198	85.64	0.001
	0.80	0.85	0.60	0.80	3	9802.21	396,429	86.72	0.002
	1.00	0.85	0.75	0.95	3	5609.33	174,432	66.97	0.005
(100 K*100)	0.20	0.10	0.10	0.50	3	2520.32	137,658	53.66	0.000
	0.40	0.25	0.20	0.60	3	2867.91	163,908	57.28	0.000
	0.60	0.50	0.40	0.70	3	2583.54	140,779	54.96	0.000
	0.80	0.85	0.60	0.80	3	2664.11	117,758	52.51	0.001
	1.00	0.85	0.75	0.95	3	2232.55	108,950	50.24	0.003
(200 K*100)	0.20	0.10	0.10	0.50	3	985.33	96,743	42.88	0.000
	0.40	0.25	0.20	0.60	3	736.70	85,351	41.17	0.001
	0.60	0.50	0.40	0.70	3	655.33	83,762	38.94	0.002
	0.80	0.85	0.60	0.80	3	715.51	92,154	43.25	0.003
	1.00	0.85	0.75	0.95	3	648.84	73,279	39.94	0.006
(500 K*100)	0.20	0.10	0.10	0.50	3	1764.42	106,529	47.31	0.000
	0.40	0.25	0.20	0.60	3	1699.37	98,538	46.15	0.000
	0.60	0.50	0.40	0.70	3	1542.21	97,244	45.86	0.001
	0.80	0.85	0.60	0.80	3	1621.58	98,271	46.77	0.002
	1.00	0.85	0.75	0.95	3	1334.94	86,345	44.16	0.005

Parameters values using ANOVA model. From the above analysis, we have found the optimal value for all the parameters = 0.85, 0.75, and λ = 0.95. It is also observed that these values are also efficient for other datasets to compute the trust, reputation, and veracity of tweets. A few factors are needed to precompile before this computation in this research. If the network size is transformed over time, these parameters are also feasible to handle and control all possibilities perfectly.

Result analysis and performance evaluation

This part of the research discusses the descriptions of extracted posts/comments from the Twitter, Instagram, and Reddit social networks. Further, various attributes are gathered during the fetching of tweets like a timestamp, user_id, text, source, geo, coordinate, place, contributor, retweet count, sensitivity, favorite_count, and many more. Next, we experimented with the proposed ROLI algorithm and demonstrated the list of top-T (=10) OLs. We also conferred the significance and importance of the proposed approach over other SNA measures through appraising the various rumor-controlling metrics.

Dataset

Twitter dataset

Since the COVID-19 pandemic, social media have practiced a large volume of data and comments. Twitter also plays an essential role in extracting information about the pandemic from the last few times. This research has pulled many tweets from Twitter to find people's opinions and sentiments. The main reason for choosing Twitter is its popularity, and usability drastically increased during the pandemic. Some unauthorized sources have posted different kinds of rumors and misinformation. Later, Twitter added the fact-checking tie with the tweets to check the tweet's authenticity. So, in this research, we have also extracted Twitter's tweets dataset for analysis purposes. A Twitter streaming API is used to pull out the COVID-19 tweets. Twitter's API supports access to tweets, users, messages, trends, links, etc. Due to memory and CPU constraints, we have collected 65.3 M tweets from January 1, 2020, to March 30, 2020. We have used the five search keywords for obtaining the tweets. We have gathered the tweet in form to demonstrate the single tuple.

Instagram dataset

Instagram is another widely used social media for images, videos, and views sharing purposes. This research has also extracted the Instagram posts related to COVID-19. An Instagram API collects the posts from January 1, 2020, to March 30, 2020. Initially, a token is generated, and POST and GET requests are built through HTTP. However, the total number of posts is relatively more minor than the Twitter data set but contains fascinating facts, rumors, and misinformation about COVID-19. We have collected around 9.8 K posts, and 37 K comments originated from the different validated user accounts.

Reddit dataset

Reddit is an online discussion medium that integrates news, posts, comments, conversations, views, images, and queries. Although the recognition of Reddit is not so wider, it covers more rich content and information. Reddit contains many communities that provide information in a very creative and innovative way. We have used the python-based Reddit API to collect the content based on five searching words. We have assembled around 15.7 K comments derived from the various user communities and specific posts. Thus, the statistical description of the datasets is shown in Table 3 .

Table 3

Datasets statistical description.

Dataset (January 1- March 30, 2020)	Statistics
Dataset (January 1- March 30, 2020)	Twitter	Instagram	Reddit
Total number of tweets	65.3 M	46.8 K	25.7 K
% of the tweets in English	71.2%	89.3%	96.4%
% of tweets in other and regional languages	28.8%	10.7%	5.6%
% of verified accounts	8.4%	23.5%	64.8%
Total number of participating countries	173	148	94
Total number of searched keyword	5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’)	5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’)	5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’)
Density	0.00845	0.00591	0.0139
Clustering coefficient	0.000614	0.000472	0.000863

Datasets statistical description. A Java-based network analyzer tool, Gephi, is used to analyze the network. The device supports finding the relation among the users and measuring the network's clustering coefficient. Next, we have used the VADER python library to get the sentiments of each tweet. We have classified each tweet into three classes: positive, negative, and neutral. Further, each user's reputation is measured based on the aggregate polarity. Trust is also calculated to predict whether the users retweet, forward, or reject the message. Next, we applied the ROLI algorithm to find the top-T OLs in the network. We found the list of users who have posted maximum tweets or posts, and other people also commented on those tweets in the network. These tweets contain valuable information and provide a powerful direction toward controlling COVID-19. Numerous users have liked, disliked, retweeted, and shared these tweets. Further, we have calculated the entropy of each tweet based on measured trust. Moreover, a filtering operation discards those tweets whose users have low reputation scores. So this operation reduced the whole entropy processing time. Finally, we obtained the records of users whose tweets are mostly retweeted using SNA measures. We suggested reputation scores along with their SNA measures, as shown in Table 4 , Table 5 , and Table 6 for all three datasets.

Table 4

List of top-10 OLs along with their reputation score and other SNS measures for the Twitter dataset.

Node id	DC	Node id	CC	Node id	BC	Node id	PR	Node id	EC	Node id	Reputation
3,782,784	0.0353824	4,882,929	0.0099734	5,781,775	0.0504885	4,938,103	0.0038952	937,482	0.0144372	758,380	0.1276728
1,636,273	0.0353823	837,321	0.0099732	184,773	0.0504883	5,062,765	0.0038951	4,287,292	0.0144372	4,791,216	0.1276728
466,738	0.0353821	3,877,392	0.0099732	2,390,202	0.0504882	3,773,291	0.0038949	837,174	0.0144371	829,252	0.1276727
1,046,730	0.0353819	174,992	0.0099731	734,218	0.0504882	638,383	0.0038948	84,983	0.0144371	940,251	0.1276726
473,721	0.0353815	1,062,525	0.0099730	78,022	0.0504881	194,482	0.0038947	2,839,290	0.014437	3,936,037	0.1276726
5,254,646	0.0353811	494,775	0.0099730	519,287	0.0504880	2,784,929	0.0038946	3,921,043	0.0144369	7385	0.1276726
904,537	0.0353809	105,829	0.0099729	3,992,801	0.0504879	574,922	0.0038944	735,622	0.0144368	84,892	0.1276725
3,029,229	0.0353808	2,593,920	0.0099728	1,820,378	0.0504877	1,383,092	0.0038942	1,588,391	0.0144367	5,827,403	0.1276725
2,372,722	0.0353804	59,201	0.0099728	3,629,048	0.0504876	292,739	0.0038941	449,293	0.0144367	1,289,504	0.1276724
375,981	0.0353804	429,322	0.0099728	947,324	0.0504876	5,417,321	0.0038941	814,871	0.0144366	683,692	0.1276724

Table 5

List of top-10 OLs along with their reputation score and other SNS measures for the Instagram dataset.

Node id	DC	Node id	CC	Node id	BC	Node id	PR	Node id	EC	Node id	Reputation
3056	0.0954829	8402	0.0703421	17,392	0.0639235	19,048	0.0418937	8592	0.0390783	4011	0.0838588
10,154	0.0954829	738	0.0703421	9387	0.0639235	7283	0.0418937	5308	0.0390783	21,802	0.0838588
32,537	0.0954828	21,481	0.0703421	23,817	0.0639234	33,891	0.0418937	23,973	0.0390783	9820	0.0838588
2098	0.0954828	31,412	0.0703420	5927	0.0639234	491	0.0418936	12,094	0.0390783	17,391	0.0838588
812	0.0954828	1184	0.0703420	10,042	0.0639234	9382	0.0418936	5909	0.0390782	11,896	0.0838587
22,904	0.0954827	9823	0.0703420	22,893	0.0639234	17,495	0.0418936	32,984	0.0390782	36,003	0.0838587
4929	0.0954827	37,192	0.0703420	387	0.0639233	30,874	0.0418936	973	0.0390782	6298	0.0838587
1090	0.0954826	26,177	0.0703419	31,903	0.0639233	5983	0.0418935	128	0.0390782	25,009	0.0838586
25,709	0.0954826	2017	0.0703419	2851	0.0639233	24,981	0.0418935	31,983	0.0390781	14,892	0.0838586
8341	0.0954826	18,451	0.0703419	7389	0.0639232	13,722	0.0418935	5582	0.0390781	439	0.0838586

Table 6

List of top-10 OLs along with their reputation score and other SNS measures for the Reddit dataset.

Node id	DC	Node id	CC	Node id	BC	Node id	PR	Node id	EC	Node id	Reputation
1906	0.1073821	369	0.0852855	4329	0.1019903	6525	0.0753870	9241	0.0726944	8033	0.1080275
5648	0.1073821	6407	0.0852855	16,132	0.1019903	12,599	0.0753870	5481	0.0726944	7075	0.1080275
21,653	0.1073821	918	0.0852855	2621	0.1019903	3745	0.0753870	5575	0.0726944	11,575	0.1080275
1842	0.1073820	8212	0.0852855	11,921	0.1019902	14,492	0.0753870	942	0.0726943	18,284	0.1080275
13,654	0.1073820	13,734	0.0852854	20,729	0.1019902	5999	0.0753869	8219	0.0726943	13,057	0.1080274
5608	0.1073820	6566	0.0852854	1022	0.1019902	20,562	0.0753869	1926	0.0726943	5402	0.1080274
4251	0.1073820	6962	0.0852854	3799	0.1019902	2224	0.0753869	21,107	0.0726942	1099	0.1080274
10,648	0.1073819	19,048	0.0852854	407	0.1019901	8974	0.0753869	1601	0.0726942	12,056	0.1080274
1149	0.1073819	1529	0.0852854	11,089	0.1019901	10,425	0.0753869	180	0.0726942	5148	0.1080273
8457	0.1073819	8979	0.0852853	931	0.1019901	19,616	0.0753868	12,621	0.0726942	9236	0.1080273

List of top-10 OLs along with their reputation score and other SNS measures for the Twitter dataset. List of top-10 OLs along with their reputation score and other SNS measures for the Instagram dataset. List of top-10 OLs along with their reputation score and other SNS measures for the Reddit dataset. In the above results, each table consists of twelve columns, and all the columns are in pairs, i.e., one column complements another adjacent column. The first column indicates the node id, and the adjacent second column depicts the corresponding SNA measures value of that node. For example, in Table 4, the degree centrality of the top node #3782784 is 0.0353824, and the closeness centrality of the top node #4882929 is 0.0099734. Similarly, Table 5, Table 6 identify the top-10 OLs and their SNA measure. The last column of each table identifies the reputation of top nodes in chronological order for all three datasets: Twitter, Instagram, and Reddit, respectively, as per the proposed approach. These tables represent the value of standard SNA centrality measures and show that different nodes have different centrality measures. The same node is not mandatory, securing the top position for all the centrality measures as the complex structure of the social network. As per literature, various techniques and methods have been proposed to discover the OLs, and most researchers used only these SNA measures for analysis purposes. Once the top-T (=10) OLs are identified in the network, the next step is choosing the optimum threshold value to calculate the entropy. Selecting a particular threshold value λ for declaring a tweet as a rumor is a critical task. Therefore, after various analyses and experiments, we have chosen λ = 0.95, i.e., if the entropy of the tweet is less than 0.95, the tweet would be reported as a rumor; otherwise, OLs may only forward or add their comments to the post and forward to the followers and other users in the network.

Performance metrics for rumor controlling

To evaluate the performance and effectiveness of the proposed approach for COVID-19 rumor controlling, we have used the performance metrics based on the behavior of the oblivious node , diffuser node , and represser node . As mentioned earlier, rumor spreading is similar to the disease-spreading behavior in the traditional SIR epidemic model. The authenticity of the proposed approach depends on how quickly it identifies and controls the total number of rumors in the network. Thus, we have used three metrics: diffuser degree, represser degree, and affected degree, respectively, for measuring the approach's performance.

Diffuser degree

The diffuser degree explains the behavior of the diffuser node in the network. If the rumors spreading rate β is high, most nodes might be infected or influenced by the rumors. It is mandatory to transmit the correct information in the network as early as possible to avoid any crisis. So, The diffuser degree is defined as the ratio between the total number of diffuser nodes and the total count of represser nodes, oblivious nodes, and diffuser nodes in the network at time t, as shown in Eq. (17). In Fig. 5 , we can observe that as soon as the count of OLs increased, the diffuser degree reduced gradually; But due to some spreader nodes with a strong belief in the rumor, it is impossible to reach the zero level. Also, the proposed approach reduced the number of diffusers 26% faster than other SNA measures as the number of OLs increased gradually.

Fig. 5

Visualization of diffuser degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.

Represser degree

Represser degree states the control of the represser node in the network because eventually, the represser nodes are considered the OLs in the network. So, if the represser nodes identify the tweet's trueness as soon as possible, they can help stop the rumor from spreading. Thus, a represser degree is defined as the ratio between the total number of represser nodes and the total count of represser nodes, oblivious nodes, and diffuser nodes in the network at time t, as shown in Eq. (18). In Fig. 6 , we can monitor the relationship between the represser degree and diffuser degree over time t. As soon as the represser nodes spread the actuality of the rumor with the rate γ, the represser degree increased, and the depressor degree decreased with time. Again, the proposed approach performed better, spreading the tweet's veracity around 22% faster than other SNA measures.

Fig. 6

Visualization of represser degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.

Affected degree

An affected degree is a scale that explains the effect of OLs in the network. It measures the total number of users whom OLs influence. The represser degree define as the fraction between the total number of influenced nodes and the total count of represser nodes, oblivious nodes, diffuser nodes, and affected nodes in the network at time t, as shown in Eq. (19).where depicts the number of users influenced by the OLs. Further in Fig. 7 , we can infer that as represser nodes spread the rumor's actuality with the rate γ, the affected degree increases with time, i.e., more number of users influenced by the OLs. The proposed approach produced better outcomes and impacted the users approximately 23% faster than other SNA measures.

Fig. 7

Visualization of affected degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.

Visualization of affected degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit. Each OL has different followers, and rumor spreaders are also presented in the network. So, controlling the rumors among the general users varies depending on the total number of followers and network structure. Again, we can observe how the OLs influenced and controlled the rumors for all the three networks over other SNA measures. As the number of OLs increases progressively, the control of OLs on the other users also spreads rapidly. In Fig. 8 (a), the red fragment shows the influence of OLs on the Twitter dataset, while the different color fragments (purple, blue, sky blue, green, and yellow) show the effect of OLs from lower to the higher transition state.

Fig. 8

Visual representation of OLs impact for (a) Twitter, (b) Instagram, and (c) Reddit dataset.

Visual representation of OLs impact for (a) Twitter, (b) Instagram, and (c) Reddit dataset. Similarly, Fig. 8(b) and (c) demonstrate the impact of OLs on the other datasets. We can observe that the effect of OLs is significantly higher as the number of OLs increase gradually in each network. Thus they can identify the integrity of the tweets as much as possible within a limited time interval. The red fragment in each part depicts the influence of OLs in the network.

Performance comparison of ROLI algorithm

Further, we have compared the results with the other SNA measures to authorize the ROLI algorithm's performance. We compared the impact of the proposed ROLI algorithm with the multiple centralities measures, which are also used globally to find the prominent users in the network. The biggest challenge with the OSN is the lack of ground truth about the key users. The various approaches use different contexts, network-dependent parameters, and theories. So, these approaches do not follow universal principles; standard centrality measures are used to match up and assess the results. We utilized the four performance metrics, accuracy, precision, recall, and F1-score, to ensure the reaching of the approach. For analysis, four primary outcomes: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN) are required. All these metrics can be calculated using the following formulas: Accuracy = Precision Recall = F1-score = In Fig. 9 , we have compared the outcomes from the proposed ROLI algorithm w.r.t. mentioned performance metrics. For experiment purpose, we have chosen the value of β = 0.0055 and γ = 0.0085. So, all the results and experiments have been performed based on these parameters. We observe that the proposed algorithm provides an average of 91% accuracy, 93% precision, 95% recall, and 94% F1-score. If we deeply observed the results for all the three datasets, we found that the proposed ROLI method produces 90% accuracy, 97% precision, 92% recall, and 94% F1-score for the Twitter dataset. Similarly, for Instagram, 91% accuracy, 94% precision, 95% recall, and 95% F1-score are obtained; for the Reddit dataset, 92% accuracy, 97% precision, 94% recall, and 95% F1-score are achieved.

Fig. 9

Analysis of Performance metrics between Proposed ROLI and standard SNA measures for (a) Twitter, (b) Instagram, and (c) Reddit dataset.

Analysis of Performance metrics between Proposed ROLI and standard SNA measures for (a) Twitter, (b) Instagram, and (c) Reddit dataset. Another significant aspect that affects the algorithm's importance is the execution time. The previous section discussed the algorithm's complexity, almost equivalent to O(n). We have considered only the time needed to discover the designated OLs from the network. We have also performed extra work to compute our approach's reputation, trust, and entropy. Each method has its parameters to estimate the OLs and is very materialistic. In Fig. 10 , we demonstrated the execution time required by each SNA measure and the proposed ROLI algorithm. From the analysis, we examined that the proposed algorithm needed a shorter time, comparatively lower than the other SNA measures for all three datasets. The proposed method consumes only 73 min for Twitter, 34 min for Instagram, and 17 min for the Reddit dataset for processing. Thus, we can strongly recommend that the proposed algorithm perfectly controls COVID-19 rumors and supports strong beliefs and reputation on the social network.

Fig. 10

Execution time analysis of ROLI algorithm with standard SNA measures.

Theoretical implications of the proposed approach

Nowadays, most human decision-making is influenced by social media activities. The importance of OL is very vital and impactful for managing various kinds of rumors and misinformation transmitted over OSNs. In this research, we have explored the power of OL to control COVID-19 rumors. Thus, this research's contribution is significant in various domains to prevent rumors and hoaxes comprehensively. First, we have defined a reputation-based OL detection algorithm by calculating the polarity of the tweet. The impression of reputation is very worthy in OSN to determine the node's influence on others and gain loyalty. Only limited approaches used reputation and trust to find OLs in limited domains in previous studies. This study's main essence is effectively utilizing reputation and trust. Also, the polarity of each tweet is measured to identify the population's sentiments about the COVID-19 epidemic. Second, we did not apply any data mining approach in this study. Although data mining approaches outperformed well in real-world applications and produced improved outcomes, they are not appropriate for OSN due to the dynamic nature of the network. In this approach, we have merely used the software tools, statistical formulas, and methods to discover OLs and validate the integrity of the post. One of the principal merits of this approach is that it is very beneficial for large datasets. As the number of users steadily increased, more users would exchange their opinions and views. So, at that moment, each user can measure other users’ reputations and degree of trust more accurately and precisely. Third, the proposed system's complexity is straightforward and does not include any multifaceted structure and formulation. For OL detection, we have used the voting score-based approach in which the user assigned its vote to the user having a higher reputation. Trust and polarity-based formulas are used to measure each post's entropy to validate the post. Thus, this approach uses a very elementary strategy with lesser complexity. Most previously developed methods involved complex, composite, and lengthy computations that are difficult to implement and understand. Fourth, this study gives exceptional progress toward innovation in information science by exploring and demonstrating the power of OLs to control COVID-19 rumors in the current pandemic situation. Also, the operational behavior of the proposed ROLI approach is analogous to the SIR epidemic model that shows the spreading of disease in the real world. Such research also supports preventing a high amount of misinformation transition in OSNs. It is essential to identify the set of users who have an immense impact on their followers to improve the trustworthiness of OSNs. So this research contributes enormously to fulfilling this objective with higher accuracy and effectiveness.

Conclusion, limitation, and future scope

In the present pandemic condition, the COVID-19 disease has drastically affected the world. Social media's role and power are vital regarding COVID-19-related rumors and misinformation. Although the World Health Organization (WHO) and other official government organizations have already circulated various guidelines and control measures to avoid the disease, numerous sources on social media [105,106]. Thus, controlling such rumors and misinformation is essential to save public health. Since Twitter, Instagram, and other social media have drastically increased after this pandemic, consumers have posted different information without checking the source's authenticity. In this research, we have proposed an approach that can control the COVID-19-related rumors on social media up to certain limits as much as possible. We have extracted many tweets from Twitter, Instagram, and Reddit social networks for analysis purposes. We have also fetched the tweet's user id, country, and time information to discover the list of maximum active users along with the region. After prepossessing the tweets, we have used the VADER python library to find each tweet's polarity score. The aggregated polarity score measures each user's reputation in the network. Next, we calculated the degree of trust used to calculate each tweet's entropy. So, an entropy-based approach is addressed to verify the reality of the tweet's source and content. We have also mentioned the ROLI algorithm to find the list of OLs. They matched each tweet's entropy with the pre-decided threshold value and announced whether it was a rumor. The addressed ROLI method averages 91% accuracy, 93% precision, 95% recall, and 94% F1-score to identify OLs. It also reduced the number of diffusers by 26% faster, spreading the tweet's veracity by around 22% and influencing users around 23% faster than other SNA measures. Thus, the experimental analysis demonstrated that the proposed approach enhanced effects over standard SNA measures w.r.t. various performance metrics. Real-time monitoring of tweets, rumors, patterns, trends, and misinformation is required to avoid any trouble. Such control may identify only the verified information and ensure that only verified and trusted information would be transmitted to prevent dangerous consequences. One of the biggest challenges with the social network is the large dataset volume due to millions of frequently posted data by millions of users [107]. In the future, we might make an effort to design a model that can dynamically supervise the tweets and their origin under the constraints to be more vulnerable to the rumors. Also, we would attempt to discover some more advanced techniques and measuring tools to visualize and interpretative OLs recognition techniques that eventually avoid rumors and misinformation by analyzing a real-time dynamic dataset. The work might be extended to find OLs using users' posts like images, audio, videos, etc. Other social networks, like Facebook, YouTube, Flicker, Instagram, and others, can also be utilized for extracting the dataset to find the OLs. A new model might be designed that can dynamically oversee the tweets and their origin under the supervision of OLs so that other people would be more aware of the COVID-19 rumors. Some more advanced deep-learning-based models can also be associated with other users' multi-relational characteristics, such as user's response time, geographical location, the domain of interest, etc., as upcoming directives for computing [108].

Author contribution

The author contributed to the study conception and design, material preparation, data collection and analysis.

Funding

The author declare that no funds, grants, or other support were received during the preparation of this manuscript.

Data availability

Not applicable.

Declaration of competing interest

The author declare that there are no known conflicts of interest associated with this publication. There has been no significant financial support for this work that could have influenced its outcome.

Input; 1. Rumor threshold λ

2. Total m number of tweets posted by n number of users

Output:Decision about rumor spreading

Steps:

1. Apply the ROLI algorithm to find the top-T OLs.

2. For ∀ t in m do

Preprocess t by removing iterating characters, hashtags, hashtags, and URLs.

3. End for;

4. For ∀ j in n do

Calculate the polarity-based reputation

Compute the degree of trust Tj.

5. End for;

6. For ∀ t in m do

Measure the entropy Et (xi) of each tweet.

if (Et (xi) < λ)

Report the tweet as a rumor and discontinue the rumor spreading.

Else

Transmit the tweet typically. rj(t).

7. End if;

8. End for;

Input: The total A number of users in the network

Output: Top-T OLs

Steps:

1. Identified the initial reputation rx of each node

2. Assign the value to (vsx, vx) ← (rx, 1) to each node x

3. O [] ← Set of OLs

4. While (i ≤ T)

5. For x in A do

6. if x ε O:

7. set vsx ← 0 and vx ← 0

else if (x ε N(j) and j ← O)

8. set vsx ← ∑k=1N(x)rk and vx ← (vx−f)

Else

9. set vsx ← rx and vx ← 1

end if;

10. Find the node j with max (vsj) and set O ← O ∪ {j} and A ← A - {j}

11. i ← i+1

end for;

end while;

12. Find the list of top-T OLs return by O

13. End;

18 in total