| Literature DB >> 35223368 |
Hossein Shafiei1, Aresh Dadlani2.
Abstract
Online social networks have attracted billions of active users over the past decade. These systems play an integral role in the everyday life of many people around the world. As such, these platforms are also attractive for misinformation, hoaxes, and fake news campaigns which usually utilize social trolls and/or social bots for propagation. Detection of so-called social trolls in these platforms is challenging due to their large scale and dynamic nature where users' data are generated and collected at the scale of multi-billion records per hour. In this paper, we focus on fickle trolls, i.e., a special type of trolling activity in which the trolls change their identity frequently to maximize their social relations. This kind of trolling activity may become irritating for the users and also may pose a serious threat to their privacy. To the best of our knowledge, this is the first work that introduces mechanisms to detect these trolls. In particular, we discuss and analyze troll detection mechanisms on different scales. We prove that the order of centralized single-machine detection algorithm is O ( n 3 ) which is slow and impractical for early troll detection in large-scale social platforms comprising of billions of users. We also prove that the streaming approach where data is gradually fed to the system is not practical in many real-world scenarios. In light of such shortcomings, we then propose a massively parallel detection approach. Rigorous evaluations confirm that our proposed method is at least six times faster compared to conventional parallel approaches.Entities:
Keywords: Large-scale networks; Online social networks; Troll detection
Year: 2022 PMID: 35223368 PMCID: PMC8857750 DOI: 10.1186/s40537-022-00572-9
Source DB: PubMed Journal: J Big Data ISSN: 2196-1115
Comparison of various troll and bot detection approaches discussed in the literature
| Ref. | Dataset | Technique | Limitation(s) |
|---|---|---|---|
| [ | Troll vulnerability metrics to predict a post is likely to become the victim of a troll attack. | Focuses on the contents of posts and the activity history of users; does not consider trolling behaviour directly. | |
| [ | Takes Holistic approach, i.e., it considers various features such as sentiment analysis, time and frequency of action and etc. | The approach is slow since it considers a magnitude of features also it suffers from false positive detection | |
| [ | Multi feature analysis, i.e., it considers the timing of tweets and the contents | It only focuses on the dataset, e.g., the usage of formal tone in trolls instead of slang and slurs | |
| [ | Classification based on multiple behavioural and content-based features such as wording and hashtags or mentions | It suffers from high false positive and only concentrates on the behaviours extracted from one specific dataset | |
| [ | Classification based on bot detection using Botometer and geolocation data | Inaccuracy of Botometer and the ability of trolls and bots to mask their real location |
Fig. 1Changes in the number of followers per week for the case study
Fig. 2An example of an attribute graph
Fig. 3Another example of an attribute graph
Fig. 4The concept of streaming approach
Fig. 5Various examples of node placement inside machines
Fig. 6Evaluation of single machine approach
Fig. 7Evaluation of the streaming approach
Fig. 8Evaluation of the proposed massively parallel approach
Comparison of our approach with PowerGraph. Here, M represents number of machines
| PowerGraph | Exec. Time (s) | 749 | 534 | 313 |
| Max. Memory (GB) | 743 | 612 | 509 | |
| Proposed approach | Exec. Time (s) | 107 | 88 | 59 |
| Max. Mem. (GB) | 340 | 229 | 159 |