| Literature DB >> 35765463 |
Lokesh Jain1,2.
Abstract
- In the ongoing COVID-19 pandemic, people spread various COVID-19-related rumors and hoaxes that negatively influence human civilization through online social networks (OSN). The proposed research addresses the unique and innovative approach to controlling COVID-19 rumors through the power of opinion leaders (OLs) in OSN. The entire process is partitioned into two phases; the first phase describes the novel Reputation-based Opinion Leader Identification (ROLI) algorithm, including a unique voting method to identify the top-T OLs in the OSN. The second phase describes the technique to measure the aggregated polarity score of each posted tweet/post and compute each user's reputation. The empirical reputation is utilized to calculate the user's trust, the post's entropy, and its veracity. If the experimental entropy of the post is lower than the empirical threshold value, the post is likely to be categorized as a rumor. The proposed approach operated on Twitter, Instagram, and Reddit social networks for validation. The ROLI algorithm provides 91% accuracy, 93% precision, 95% recall, and 94% F1-score over other Social Network Analysis (SNA) measures to find OLs in OSN. Moreover, the proposed approach's rumor controlling effectiveness and efficiency is also estimated based on three standard metrics; affected degree, represser degree, and diffuser degree, and obtained 26%, 22%, and 23% improvement, respectively. The concluding outcomes illustrate that the influence of OLs is exceptionally significant in controlling COVID-19 rumors.Entities:
Keywords: COVID-19 rumors; Entropy; Health misinformation; Online social network; Opinion leader; Reputation
Year: 2022 PMID: 35765463 PMCID: PMC9222031 DOI: 10.1016/j.techsoc.2022.102048
Source DB: PubMed Journal: Technol Soc ISSN: 0160-791X
Fig. 1Human-influenced viewpoint representation.
Fig. 2Network representation with diffuser, represser, and oblivious node.
Brief Description of OL's recognition approaches.
| Caption | Dataset | Details | Supports | Constraints |
|---|---|---|---|---|
| [ | Reviews of Apple's iPhone | The relationship in the user's cluster is calculated based on various characteristics, and also OLs are detected using a text mining approach. | Straight-forward; Support in the recommended system. | Limited dataset; Used in online communities and forums only. |
| [ | Epinions dataset | The reliable users are extracted by eliminating the unreliable user-based capacity-first maximum flow procedure. | Suitable for the recommended system; Control authenticity. | Restricted connections used; Only applied trust metrics. |
| [ | Synthesized dataset | The propagative decision, spreading speed, and the count of the number of adopters determine the efficient OLs. | Support for marketing operations; Significant in the indirect proposal system. | Need initial adopters for spreading information; Bounded centralities. |
| [ | Educational blogs dataset, Parent-child forum dataset | User's status, links, behaviors, and response time are measured to find the OLs. | Reinforced online blogs; Reduced execution time. | Restricted scope and domain; Applied Topic-specific constraints. |
| [ | Amazon.com dataset | Domain-sensitive key users were found based on the user's effectiveness, expertise, and transition matrix in online forums. | Assist with promotions; Support authority. | Needed initial user's concern as input; Used in bounded and regulated space. |
| [ | Facebook dataset, citation network dataset (Google scholar & DBLP) | A mathematical formula is derived from creating a series of OLs' actions and influence. A universal series is also designed to normalize the result. | Found affected path; Inflated accuracy. | The entire procedure is complex and requires lots of computation, Applicable to the limited domain. |
| [ | Web-based Chinese stock Forum dataset | Clustering algorithms and the posted text are used to identify users' actions. Sentiment analysis-based case study discussed to find price trends. | Straight-forward and easily implementable; Assist in marketing. | Limited to online forums; Only PageRank centrality is used for comparison. |
| [ | Synthesized dataset | Popularity and competency scores are measured to find the chances of the OLs based on the specific domain. | Easily implementable; High accuracy. | Vulnerable to the topic of interest; Not focused on retweets. |
| [ | Epinions dataset | A novel approach used the trust value to find the OLs. Hybrid centrality measures are used to calculate trust values. | Simple calculation; Better results for hybrid centralities. | Few trust metrics like trust spread, trust maturity, and trust penetration are used for comparison. |
| [ | Mobile01 Forum | A new algorithm suggested finding the OL by reducing candidates using overlapping communities and user relationships. | More scalable; Reduced overlapping influence issues. | Limited parameters for community detection; Applicable in online forums only. |
| [ | turnbackhoax.id | The proposed approach focused on edge and centrality power. Edge power is used to find the relationship, while centrality power is used to decide each one's proportion to find the OLs. | Suitable for information spreading; Presented an agreement algorithm. | Limited relationship for establishing connections; Lesser number of evaluation metrics. |
| [ | Sina micro-blog dataset | The topic of interest and user's transmission features are identified to define the user's control. The IC-based propagation model simulation identifies the OLs based on the number of infected nodes. | Significant for information transmission; Modified IC-based propagation model used. | Bounded scope; Fewer metrics used; Network and node pre-knowledge required. |
| [ | Synthesized dataset | A trust model produced used the concept of fuzzy logic to measure reputation. The critical user with the highest status is selected as an OL. | Implement human reasoning; Produce results with more fidelity. | Fewer rules are used; A limited number of fuzzy rules are used. |
| [ | Chinese Sina BBS | An augmented algorithm suggest based on domain sensitivity used for clusters identification while temporal attributes computed the OL's effect during a time. | Easily implementable; Better outcomes over PageRank. | Primary domain and topic information required. Compared the outcomes with the PageRank algorithm |
| [ | Facebook, Twitter, Google+ | An innovative, dynamic model is proposed based on discrete-time. A key user is selected depending on the equivalent opinion vector of the tightly connected elements. | Unique vigorous, dynamic model; Highly efficient Clustering outcome. | Complicated to understand; Not handling overlapping cliques. |
| [ | A new rank of centrality measure was invented based on user interest and specialty on some topic. OLs are selected based on maximum centrality. | New Rank centrality is defined; No need for a prior relationship. | Limited topic considered; Tweet's monitoring needed to check outcomes. | |
| [ | Local Motor community | Statistical components such as multi-featured are used to find the OLs. Also, the value and interest parameters are used to upgrade accuracy. | Reliable; Simple; Easy to implement. | Shortage of promotional degree, Restricted dataset. |
| [ | slashdot dataset | Local and global OLs addressed based on the upgraded firefly algorithm and attractiveness score. | Acknowledge easily; Higher performance. | Static dataset; Few features are used for measurement. |
| [ | travel community dataset | User's actions and their effects are used to acknowledge consumer's decisions. OLs are selected based on action-specific attributes. | Integrate SNA with Virtual travel groups; Significantly measured influencer affect. | Bounded dataset; Limited parameters are used for computation. |
| [ | Wiki-vote and synthesized datasets | An upgraded whale optimization algorithm addressed to find OLs based on various objective functions and prestige. Also, an adjoining nodes-based procedure is used to identify the clique. | Highly optimized results; Unique application of the nature-inspired algorithm. | Lesser number of datasets are used; Differentiate outcomes with few algorithms. |
| - | Higgs Boson data from Twitter | Louvain method implemented for community detection. Next Betweenness centrality based approach is used to find OLs | Straight forward; Easily implementable. | Limited centralities used; Implemented on few datasets. |
| [ | Wiki-vote and Bitcoin OTC trust weighted network | Conditional probability-dependent groups of OLs are identified using the Game theory approach. For individual users, a shapely value is calculated to find the payoff in each group. Groups with maximum synergy are considered for selection. | Produced highly accurate outcomes; Powerful application of game theory. | The user's initial trust score is required; Specific elements are used for measuring distance. |
Fig. 3Flow chart of the proposed approach.
Fig. 4ROLI Algorithm illustration (a) network structure after the first iteration, (b) network structure after the second iteration.
Parameters values using ANOVA model.
| m*j | Parameters | Degree of Freedom (DF) | Mean Square (MS) | Sum of Squares (SS) | F-statistic | P-test | |||
|---|---|---|---|---|---|---|---|---|---|
| (50 K*100) | 0.20 | 0.10 | 0.10 | 0.50 | 3 | 7463.56 | 285,633 | 76.82 | 0.000 |
| 0.40 | 0.25 | 0.20 | 0.60 | 3 | 7256.73 | 265,358 | 69.76 | 0.000 | |
| 0.60 | 0.50 | 0.40 | 0.70 | 3 | 8362.90 | 342,198 | 85.64 | 0.001 | |
| 0.80 | 0.85 | 0.60 | 0.80 | 3 | 9802.21 | 396,429 | 86.72 | 0.002 | |
| 3 | 5609.33 | 174,432 | 66.97 | ||||||
| (100 K*100) | 0.20 | 0.10 | 0.10 | 0.50 | 3 | 2520.32 | 137,658 | 53.66 | 0.000 |
| 0.40 | 0.25 | 0.20 | 0.60 | 3 | 2867.91 | 163,908 | 57.28 | 0.000 | |
| 0.60 | 0.50 | 0.40 | 0.70 | 3 | 2583.54 | 140,779 | 54.96 | 0.000 | |
| 0.80 | 0.85 | 0.60 | 0.80 | 3 | 2664.11 | 117,758 | 52.51 | 0.001 | |
| 3 | 2232.55 | 108,950 | 50.24 | ||||||
| (200 K*100) | 0.20 | 0.10 | 0.10 | 0.50 | 3 | 985.33 | 96,743 | 42.88 | 0.000 |
| 0.40 | 0.25 | 0.20 | 0.60 | 3 | 736.70 | 85,351 | 41.17 | 0.001 | |
| 0.60 | 0.50 | 0.40 | 0.70 | 3 | 655.33 | 83,762 | 38.94 | 0.002 | |
| 0.80 | 0.85 | 0.60 | 0.80 | 3 | 715.51 | 92,154 | 43.25 | 0.003 | |
| 3 | 648.84 | 73,279 | 39.94 | ||||||
| (500 K*100) | 0.20 | 0.10 | 0.10 | 0.50 | 3 | 1764.42 | 106,529 | 47.31 | 0.000 |
| 0.40 | 0.25 | 0.20 | 0.60 | 3 | 1699.37 | 98,538 | 46.15 | 0.000 | |
| 0.60 | 0.50 | 0.40 | 0.70 | 3 | 1542.21 | 97,244 | 45.86 | 0.001 | |
| 0.80 | 0.85 | 0.60 | 0.80 | 3 | 1621.58 | 98,271 | 46.77 | 0.002 | |
| 3 | 1334.94 | 86,345 | 44.16 | ||||||
Datasets statistical description.
| Dataset (January 1- March 30, 2020) | Statistics | ||
|---|---|---|---|
| Total number of tweets | 65.3 M | 46.8 K | 25.7 K |
| % of the tweets in English | 71.2% | 89.3% | 96.4% |
| % of tweets in other and regional languages | 28.8% | 10.7% | 5.6% |
| % of verified accounts | 8.4% | 23.5% | 64.8% |
| Total number of participating countries | 173 | 148 | 94 |
| Total number of searched keyword | 5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’) | 5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’) | 5 (‘Covid19’, ‘coronavirus’, ‘#2019-ncov’, ‘#covid_19’, ‘#pandemic’) |
| Density | 0.00845 | 0.00591 | 0.0139 |
| Clustering coefficient | 0.000614 | 0.000472 | 0.000863 |
List of top-10 OLs along with their reputation score and other SNS measures for the Twitter dataset.
| Node id | DC | Node id | CC | Node id | BC | Node id | PR | Node id | EC | Node id | Reputation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3,782,784 | 0.0353824 | 4,882,929 | 0.0099734 | 5,781,775 | 0.0504885 | 4,938,103 | 0.0038952 | 937,482 | 0.0144372 | ||
| 1,636,273 | 0.0353823 | 837,321 | 0.0099732 | 184,773 | 0.0504883 | 5,062,765 | 0.0038951 | 4,287,292 | 0.0144372 | ||
| 466,738 | 0.0353821 | 3,877,392 | 0.0099732 | 2,390,202 | 0.0504882 | 3,773,291 | 0.0038949 | 837,174 | 0.0144371 | ||
| 1,046,730 | 0.0353819 | 174,992 | 0.0099731 | 734,218 | 0.0504882 | 638,383 | 0.0038948 | 84,983 | 0.0144371 | ||
| 473,721 | 0.0353815 | 1,062,525 | 0.0099730 | 78,022 | 0.0504881 | 194,482 | 0.0038947 | 2,839,290 | 0.014437 | ||
| 5,254,646 | 0.0353811 | 494,775 | 0.0099730 | 519,287 | 0.0504880 | 2,784,929 | 0.0038946 | 3,921,043 | 0.0144369 | ||
| 904,537 | 0.0353809 | 105,829 | 0.0099729 | 3,992,801 | 0.0504879 | 574,922 | 0.0038944 | 735,622 | 0.0144368 | ||
| 3,029,229 | 0.0353808 | 2,593,920 | 0.0099728 | 1,820,378 | 0.0504877 | 1,383,092 | 0.0038942 | 1,588,391 | 0.0144367 | ||
| 2,372,722 | 0.0353804 | 59,201 | 0.0099728 | 3,629,048 | 0.0504876 | 292,739 | 0.0038941 | 449,293 | 0.0144367 | ||
| 375,981 | 0.0353804 | 429,322 | 0.0099728 | 947,324 | 0.0504876 | 5,417,321 | 0.0038941 | 814,871 | 0.0144366 |
List of top-10 OLs along with their reputation score and other SNS measures for the Instagram dataset.
| Node id | DC | Node id | CC | Node id | BC | Node id | PR | Node id | EC | Node id | Reputation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 3056 | 0.0954829 | 8402 | 0.0703421 | 17,392 | 0.0639235 | 19,048 | 0.0418937 | 8592 | 0.0390783 | ||
| 10,154 | 0.0954829 | 738 | 0.0703421 | 9387 | 0.0639235 | 7283 | 0.0418937 | 5308 | 0.0390783 | ||
| 32,537 | 0.0954828 | 21,481 | 0.0703421 | 23,817 | 0.0639234 | 33,891 | 0.0418937 | 23,973 | 0.0390783 | ||
| 2098 | 0.0954828 | 31,412 | 0.0703420 | 5927 | 0.0639234 | 491 | 0.0418936 | 12,094 | 0.0390783 | ||
| 812 | 0.0954828 | 1184 | 0.0703420 | 10,042 | 0.0639234 | 9382 | 0.0418936 | 5909 | 0.0390782 | ||
| 22,904 | 0.0954827 | 9823 | 0.0703420 | 22,893 | 0.0639234 | 17,495 | 0.0418936 | 32,984 | 0.0390782 | ||
| 4929 | 0.0954827 | 37,192 | 0.0703420 | 387 | 0.0639233 | 30,874 | 0.0418936 | 973 | 0.0390782 | ||
| 1090 | 0.0954826 | 26,177 | 0.0703419 | 31,903 | 0.0639233 | 5983 | 0.0418935 | 128 | 0.0390782 | ||
| 25,709 | 0.0954826 | 2017 | 0.0703419 | 2851 | 0.0639233 | 24,981 | 0.0418935 | 31,983 | 0.0390781 | ||
| 8341 | 0.0954826 | 18,451 | 0.0703419 | 7389 | 0.0639232 | 13,722 | 0.0418935 | 5582 | 0.0390781 |
List of top-10 OLs along with their reputation score and other SNS measures for the Reddit dataset.
| Node id | DC | Node id | CC | Node id | BC | Node id | PR | Node id | EC | Node id | Reputation |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1906 | 0.1073821 | 369 | 0.0852855 | 4329 | 0.1019903 | 6525 | 0.0753870 | 9241 | 0.0726944 | ||
| 5648 | 0.1073821 | 6407 | 0.0852855 | 16,132 | 0.1019903 | 12,599 | 0.0753870 | 5481 | 0.0726944 | ||
| 21,653 | 0.1073821 | 918 | 0.0852855 | 2621 | 0.1019903 | 3745 | 0.0753870 | 5575 | 0.0726944 | ||
| 1842 | 0.1073820 | 8212 | 0.0852855 | 11,921 | 0.1019902 | 14,492 | 0.0753870 | 942 | 0.0726943 | ||
| 13,654 | 0.1073820 | 13,734 | 0.0852854 | 20,729 | 0.1019902 | 5999 | 0.0753869 | 8219 | 0.0726943 | ||
| 5608 | 0.1073820 | 6566 | 0.0852854 | 1022 | 0.1019902 | 20,562 | 0.0753869 | 1926 | 0.0726943 | ||
| 4251 | 0.1073820 | 6962 | 0.0852854 | 3799 | 0.1019902 | 2224 | 0.0753869 | 21,107 | 0.0726942 | ||
| 10,648 | 0.1073819 | 19,048 | 0.0852854 | 407 | 0.1019901 | 8974 | 0.0753869 | 1601 | 0.0726942 | ||
| 1149 | 0.1073819 | 1529 | 0.0852854 | 11,089 | 0.1019901 | 10,425 | 0.0753869 | 180 | 0.0726942 | ||
| 8457 | 0.1073819 | 8979 | 0.0852853 | 931 | 0.1019901 | 19,616 | 0.0753868 | 12,621 | 0.0726942 |
Fig. 5Visualization of diffuser degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.
Fig. 6Visualization of represser degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.
Fig. 7Visualization of affected degree for proposed approach and standard SNA measures for Twitter, Instagram, and Reddit.
Fig. 8Visual representation of OLs impact for (a) Twitter, (b) Instagram, and (c) Reddit dataset.
Fig. 9Analysis of Performance metrics between Proposed ROLI and standard SNA measures for (a) Twitter, (b) Instagram, and (c) Reddit dataset.
Fig. 10Execution time analysis of ROLI algorithm with standard SNA measures.
| Input; 1. Rumor threshold λ |
| 2. Total m number of tweets posted by n number of users |
| Output:Decision about rumor spreading |
| Steps: |
| 1. Apply the ROLI algorithm to find the top-T OLs. |
| 2. For ∀ t in m do |
| Preprocess t by removing iterating characters, hashtags, hashtags, and URLs. |
| 3. End for; |
| 4. For ∀ j in n do |
| Calculate the polarity-based reputation |
| Compute the degree of trust |
| 5. End for; |
| 6. For ∀ t in m do |
| Measure the entropy Et ( |
| if (Et ( |
| Report the tweet as a rumor and discontinue the rumor spreading. |
| Else |
| Transmit the tweet typically. |
| 7. End if; |
| 8. End for; |
| Input: The total A number of users in the network |
| Output: Top-T OLs |
| Steps: |
| 1. Identified the initial reputation |
| 2. Assign the value to ( |
| 3. O [] ← Set of OLs |
| 4. While (i ≤ T) |
| 5. For x in A do |
| 6. if x ε O: |
| 7. set |
| else if (x ε N(j) and j ← O) |
| 8. set |
| Else |
| 9. set |
| end if; |
| 10. Find the node j with max ( |
| 11. i ← i+1 |
| end for; |
| end while; |
| 12. Find the list of top-T OLs return by O |
| 13. End; |