| Literature DB >> 35747397 |
Mosleh Hmoud Al-Adhaileh1, Theyazn H H Aldhyani2, Ans D Alghamdi3.
Abstract
The concentration of this paper is on detecting trolls among reviewers and users in online discussions and link distribution on social news aggregators such as Reddit. Trolls, a subset of suspicious reviewers, have been the focus of our attention. A troll reviewer is distinguished from an ordinary reviewer by the use of sentiment analysis and deep learning techniques to identify the sentiment of their troll posts. Machine learning and lexicon-based approaches can also be used for sentiment analysis. The novelty of the proposed system is that it applies a convolutional neural network integrated with a bidirectional long short-term memory (CNN-BiLSTM) model to detect troll reviewers in online discussions using a standard troll online reviewer dataset collected from the Reddit social media platform. Two experiments were carried out in our work: the first one was based on text data (sentiment analysis), and the second one was based on numerical data (10 attributes) extracted from the dataset. The CNN-BiLSTM model achieved 97% accuracy using text data and 100% accuracy using numerical data. While analyzing the results of our model, we observed that it provided better results than the compared methods.Entities:
Year: 2022 PMID: 35747397 PMCID: PMC9213121 DOI: 10.1155/2022/4637594
Source DB: PubMed Journal: Appl Bionics Biomech ISSN: 1176-2322 Impact factor: 1.664
Figure 1Workflow of the used methodology.
Description of the attributes of the used dataset.
| Attribute name | Description |
|---|---|
|
| Class labeling. |
|
| This attribute indicates text based feature written and posted by the reviewer or user on Reddit portal. |
|
| Sentiment polarity of the given comment text (-1 is negative and 1 is positive. |
|
| The number of like the reviewer has gotten on his/her comments and reviews texts through the Reddit platform. |
|
| This attribute represents the number of dislikes received by the reviewer on his/her posts and comments. |
|
| This attribute is similar the comment karma. Conversely, link karma property does not expose the karma of the comments, but the karma of published posts of the user. |
|
| In the Reddit forum, this property symbolizes the user's karma. Users who are rude, spamming, or spreading hoaxes are likely to have a lower karma than those who do not engage in such behavior. |
|
| If the user or reviewer has a verified email address, this characteristic is displayed. In the event that this address is not verified, it could imply that the author just set up the phony profile for the purpose of making troll posts and has since abandoned it. |
|
| There are two possible values for this attribute: 1 or nil. Users with accounts worth at least $1 are eligible for premium participation. Because premium membership on this network costs money, users who have it are less likely to be trolls. |
|
| Moderators on the Reddit platform are known for referring to hoaxes and controversial posts. The controversial characteristic means that the user has previously had a post rated as controversial. Moderators may have already flagged certain posts or comments from an account belonging to a persistent troll. |
Splitting of the used dataset.
| Total number of samples | Training 80% | Validation 10% | Testing 20% |
|---|---|---|---|
| 16695 | 12020 | 1336 | 3339 |
Figure 2Structure of the CNN-BiLSTM model for troll reviewer detection using sentiment analysis.
Figure 3Confusion matrix of CNN-BiLSTM using sentiment analysis.
Figure 4Confusion matrix of CNN-BiLSTM using numerical attributes.
Classification results of the CNN-BiLSTM model.
| Type of experiment | Precision % | Sensitivity % | Specificity |
| Accuracy % |
|---|---|---|---|---|---|
| Experiment based on text data | 0.99 | 0.922 | 0.993 | 0.955 | 0.97 |
| Experiment based on numerical data | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 |
Figure 5Performance plot of the CNN-BiLSTM model using text data.
Figure 6Performance plot of the CNN-LSTM and model using numerical data.
Significant results of the CNN-BiLSTM against existing methods.
| Models | Dataset | Accuracy % |
|---|---|---|
| SVM [ | Numerical data | 0.98 |
| CNN [ | Text data | 0.95 |
| Proposed CNN-BiLSTM | Numerical data | 1.0 |