| Literature DB >> 36159389 |
Amit Neil Ramkissoon1, Wayne Goodridge1.
Abstract
Fake news detection continues to be a major problem that affects our society today. Fake news can be classified using a variety of methods. Predicting and detecting fake news has proven to be challenging even for machine learning algorithms. This research employs Legitimacy, a unique ensemble machine learning model to accomplish the task of Credibility-Based Fake News Detection. The Legitimacy ensemble combines the learning potential of a Two-Class Boosted Decision Tree and a Two-Class Neural Network. The ensemble technique follows a pseudo-mixture-of-experts methodology. For the gating model, an instance of Two-Class Logistic Regression is implemented. This study validates Legitimacy using a standard dataset with features relating to the credibility of news publishers to predict fake news. These features are analysed using the ensemble algorithm. The results of these experiments are examined using four evaluation methodologies. The analysis of the results reveals positive performance with the use of the ensemble ML method with an accuracy of 96.9%. This ensemble's performance is compared with the performance of the two base machine learning models of the ensemble. The performance of the ensemble surpasses that of the two base models. The performance of Legitimacy is also analysed as the size of the dataset increases to demonstrate its scalability. Hence, based on our selected dataset, the Legitimacy ensemble model has proven to be most appropriate for Credibility-Based Fake News Detection. © Springer Japan KK, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.Entities:
Keywords: Credibility-Based Fake News Detection; Decision trees; Ensemble learning; Legitimacy model; Logistic regression; Neural networks
Year: 2022 PMID: 36159389 PMCID: PMC9483524 DOI: 10.1007/s12626-022-00127-7
Source DB: PubMed Journal: Rev Socionetwork Strateg ISSN: 1867-3236
Comparison of related works
| Related work | Key characteristics | Limitations |
|---|---|---|
| [ | Investigates ML methods alongside fake news features. Determines that Naïve Bayes and linguistic features are best suited for fake news detection | Their work, however, has ignored the significant role that the credibility of the publisher plays in detecting fake news |
| [ | Investigates ML methods for fake news detection. Determines that Naïve Bayes is best for fake news detection | Their work ignores the use of ensemble learning in fake news detection and only focuses on supervised learning approaches |
| [ | Investigates ML methods for fake news detection. Determines the highest accuracy was obtained when using unigrams features and linear SVM | This work is similar to that conducted in [ |
| [ | In their work, they propose a supervised learning-based technique for the detection of fake reviews from online textual content | Their work only investigates the textual features of the news and utilises only supervised learning for prediction rather than ensemble learning |
| [ | Their study explores different textual properties that can be used to distinguish fake content from real by training different models with the fake news features | In their paper, they propose a solution to the fake news detection problem using the machine learning ensemble approach, however, they focus only on the textual analysis and ignore the importance of the publisher’s credibility |
| [ | In their article, they proposed an ensemble classification model for the detection of fake news that has achieved a better accuracy compared to the state-of-the-art. The proposed model extracts key features from the fake news datasets, and the extracted features are then classified using the ensemble model consisting of three popular machine learning models namely, Decision Tree, Random Forest, and Extra Tree Classifier | Their model does not use features that cover the entire spread of those related to fake news as well and the model ignores the combination of Boosted Decision Trees and Neural Networks |
| [ | In their exploration, they found that among the different machine learning algorithms used, Gradient Boosting with optimized parameters performs the best for a multi-class fake news dataset | Their research is an attempt to improve the existing fake news classification using a multi-class dataset with the motivation that it can be helpful for future researchers working in this area. However, their research ignores the use of any other model than that of gradient boosting |
| [ | In their paper, they attempt to develop an ensemble-based architecture for fake news detection. The individual models are based on Convolutional Neural Networks (CNN) and Bi-directional Long Short-Term Memory (LSTM) | This ensemble model employs the use of computationally heavy supervised models to detect fake news which may make it inappropriate for energy-constrained devices |
| Proposed Work | Legitimacy is a unique ensemble learning model for the task of Credibility-based Fake News Detection. This pseudo-mixture-of-experts model consists of two underlying techniques namely, a Two-Class Boosted Decision Tree and a Two-Class Neural Network using Two-Class Logistic Regression as a gating model. This research attempts to improve the performance of credibility-based fake news detection by utilising an ensemble learning prediction model | Legitimacy is based on credibility and content-based fake news detection. Other areas of its application are yet to be explored |
Fig. 1Experimental setup
Fig. 2Legitimacy ROC Curve
Fig. 3Legitimacy precision/Recall curve
Fig. 4Legitimacy lift curve
Fig. 5Legitimacy evaluation metrics
Fig. 6Two-class boosted decision tree ROC curve
Fig. 7Two-class boosted decision tree precision/Recall curve
Fig. 8Two-class boosted decision tree lift curve
Fig. 9Two-class boosted decision tree
Fig. 10Two-class neural network ROC curve
Fig. 11Two-class neural network precision/Recall curve
Fig. 12Two-class neural network lift curve
Fig. 13Two-class neural network
Scalability evaluation metrics
| #Nodes | TP | FP | FN | TN | Accuracy | Precision | Recall | AUC | |
|---|---|---|---|---|---|---|---|---|---|
| 17,551 | 0 | 0 | 193 | 5950 | 96.9% | 100% | 0% | 0 | 0.643 |
| 100,577 | 0 | 0 | 1227 | 33,975 | 96.5% | 100% | 0% | 0 | 0.5 |
| 109,710 | 0 | 0 | 1382 | 37,016 | 96.4% | 100% | 0% | 0 | 0.5 |