| Literature DB >> 34975284 |
Balasubramanian Palani1, Sivasankar Elango1, Vignesh Viswanathan K2.
Abstract
The progressive growth of today's digital world has made news spread exponentially faster on social media platforms like Twitter, Facebook, and Weibo. Unverified news is often disseminated in the form of multimedia content like text, picture, audio, or video. The dissemination of such false news deceives the public and leads to protests and creates troubles for the public and the government. Hence, it is essential to verify the authenticity of the news at an early stage before sharing it with the public. Earlier fake news detection (FND) approaches combined textual and visual features, but the semantic correlations between words were not addressed and many informative visual features were lost. To address this issue, an automated fake news detection system is proposed, which fuses textual and visual features to create a multimodal feature vector with high information content. The proposed work incorporates the bidirectional encoder representations from transformers (BERT) model to extract the textual features, which preserves the semantic relationships between words. Unlike the convolutional neural network (CNN), the proposed capsule neural network (CapsNet) model captures the most informative visual features from an image. These features are combined to obtain a richer data representation that helps to determine whether the news is fake or real. We investigated the performance of our model against different baselines using two publicly accessible datasets, Politifact and Gossipcop. Our proposed model achieves significantly better classification accuracy of 93% and 92% for the Politifact and Gossipcop datasets, respectively, compared to 84.6% and 85.6% for the SpotFake+ model.Entities:
Keywords: BERT; Capsule neural network; Deep learning; Fake news detection; Routing-by-agreement
Year: 2021 PMID: 34975284 PMCID: PMC8714044 DOI: 10.1007/s11042-021-11782-3
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
A summary and comparative study of existing social-context based fake news detection
| Work | Model | Dataset | Description | Limitations |
|---|---|---|---|---|
| Wu et al. [ | LSTM, RNN | Focused on diffusion network information, identified propagation pathways of social media messages, solved data sparsity problem | Domain knowledge expertise required, Content of the news has not been used. | |
| Ma et al. [ | Recursive NN | Twitter-15, Twitter-16 | Used propagation tree to learn the representations from the structural and textual properties | Difficult in predicting non-rumors, Not used user information features |
| Liu el al. [ | RNN, CNN | Weibo, Twitter 15, Twitter 16 | Captured both local and global variations of user characteristics along propagation paths | User characteristics to identify the users’ tendency has not been analysed |
| Guo et al. [ | HSA-BLSTM | Weibo, Twitter | Learned most useful information and combined with social context features | Less accuracy |
| Ma et al. [ | RNN, GRU | LIU, PHEME, FNC | Unified approach for multi-task learning such as rumour detection and stance classification, task-invariant and task-specific features are learned | Users trustworthiness evaluation was not incorporated |
| Li et al. [ | LSTM, attention | RumorEval, PHEME | In rumor detection layer, user credibility information has been incorporated. In addition, attention mechanism is introduced in the rumor detection task. | Accuracy is very less |
| Ke Wu et al. [ | Hybrid SVM | Sina Weibo | Extracted propagation patterns in the form of graphs and then hybrid SVM is used for classification. Random walk graph kernel has been used to model the propagation tree | Deep learning models are not explored. |
| Savyan et al. [ | UbCadet model (k-NN, ensemble) | Twitter, Yelp | Captured user-behavioural characteristics from the tweet text content, hashtag, post time and geolocation | Semantic analysis has not been considered on tweet contents. |
A summary and comparative study of existing textual-based fake news detection
| Work | Model | Dataset | Description | Limitations |
|---|---|---|---|---|
| Ozbay and Alatas [ | TF-IDF, ML models | ISOT | Extracted textual feature vector using TF-IDF. Twenty-three supervised classifiers are used | Different word embedding techniques, ensemble and DL based classifiers were not used |
| Faustini Covoes [ | BoW, word2vec, RF, SVM | FakeBrCorpus, TwitterBR, btvlifestyle | Extracted the textual features using word embedding techniques | DL models has not been utilized |
| Ozbay and Alatas [ | GWO, SSO | BuzzFeed, Liar | A meta-heuristic algorithm was used to preserve the global search ability | Word embedding techniques and hybrid model were not used |
| Perez-Rosas et al. [ | Linear-SVM | FakeNewsAMT, Celebrity news dataset | Used linguistic-based features such as lexical, syntactic, and semantic level features, Cross-domain classification also performed | DL models were not employed |
| Ahmed et al. [ | TF-IDF, Linear-SVM | ISOT | The feature extractor TF-IDF combined with a Linear-SVM classifier achieved the best performance | DL based methods has not been utilized |
| Kumar et al. [ | PSO, ML classifiers | Selected an optimal feature set using PSO | Biased to English text only tweets, PSO results has not been compared with other optimization algorithms | |
| Akyol et al. [ | GBT,MLP, RF | Facebook, Google+, LinkedIn | Obtained datasets in four categories: Microsoft, Economic, Palestine and Obama | Recent DL models and word embedding methods were not used |
| Ma et al. [ | RNN, GRU, LSTM | Twitter, Weibo | Feature vector of words in the post extracted using TF_IDF | Different word embedding techniques and hybrid model were not tried |
| Kaliyar et al. [ | BERT, CNN | Fakenews (2016 U.S presidential election) | Preserved semantic and long-term dependency in sentences and eliminates the ambiguity issue | Hybrid features and different echo-chambers have not been explored |
| Asghar et al. [ | Bi-LSTM, CNN | PHEME | Explored in both directions of the sentence to capture contextual information | The model works on English text datasets and textual features only |
| Shu et al. [ | GRU encoder, co-attention | FakeNewsNet | Co-attention mechanism was utilized to discover top-K important sentences and user reviews | Fake-checking contents and user related information have not been utilized |
| Chen et al. [ | RNN, soft attention | Twitter, Weibo | Collected distinct linguistic features over time. Learned latent representation from paragraph vector | Propagation pattern of rumours were not utilized |
| Yu et al. [ | CNN | Twitter, Weibo | Extracted key features from the text and high-level interactions among those features | Prediction accuracy was less and no word embedding methods were used |
| Wang [ | CNN, Bi-LSTM | LIAR | Created a larger dataset, Used CNN for the textual-feature extraction and Bi-LSTM for meta-data feature extraction | Obtained less prediction accuracy |
| Yin et al. [ | PCA, CNN, SVM | Private dataset | Extracted feature vectors using PCA and CNN | Prediction accuracy was less |
A summary and comparative study of existing multimodal fake news detection
| Work | Model | Dataset | Description | Limitations |
|---|---|---|---|---|
| Jin et al. [ | RNN-attention, LSTM, VGG-19 | Twitter, Weibo | Extracts the textual, visual and social-context features and fused all these features by attention mechanism | Obtained very less prediction accuracy |
| Singh et al. [ | PCA, K-means, ELM | NSL-KDD | Data pre-processing using PCA and K-means. ELM model is adopted to the maximum number of IoT applications | Different feature extraction methods and DL based models were not utilized |
| Yang.K et al. [ | Adaptive tag (AT) | Toutiao news | Extracted new tags from the images and texts. Based on user’s feedback, AT algorithm selects the user interested tags | DL model has not been used |
| Yang et al. [ | TI-CNN | U.S president election news (Kaggle) | TI-CNN model was used to capture explicit and hidden features from text and images for fake news detection | User’s characteristics and social network structures were not used |
| Wang et al. [ | EANN (Text-CNN, VGG-19) | Twitter, Weibo | Obtained event-invariant features by the event discriminator component of adversarial network | Prediction is an additional task and does not have any clear idea to discover correlations across the modalities |
| Khattar et al. [ | MVAE (Encoder-Decoder) | Twitter, Weibo | Learned the shared representation or latent vector of multimodal information. Predicted the fake news based on the latent vector | Fake news prediction is secondary task |
| Shivangi et al. [ | SpotFake (BERT, VGG-19) | Twitter, Weibo | Extracted semantically meaningful textual features and visual feature using BERT and VGG-19 respectively | CNN takes more training time and requires huge data collection. Not able to handle full length articles |
| Shivangi et al. [ | SpotFake+ (XL-Net, VGG-19) | FakeNewsNet (Politifact, Gossipcop) | Captured the textual (Pre-trained XL-Net) and visual (VGG-19) features | More training time, VGG-19 does not capture most important visual features since it has pooling layer which leads to information loss |
A comparison study of proposed model with existing techniques
| Feature | Existing models | Limitations | Proposed solution and its merits |
|---|---|---|---|
| Textual [ | TF-IDF, BoW, word2vec, RNN, LSTM, Bi-LSTM, CNN, Text-CNN | Fails to extract semantic-based relationship among words. processing of input sequence is either from left to right or right to left. Only one word is taken at a time. | BERT model is used. It is a pre-trained model and it follows the transformer architecture, in which the multi-head attention is used to preserve the semantic relations among words. Masked language modeling (MLM) and Next sentence prediction (NSP) tasks are introduced |
| Visual [ | VGG-19 (CNN model) | Taken more training time. Larger dataset is required for better generalization. Due to the pooling operation, it fails to extract informative visual features. Also, it consumes more number of hyperparameters while training the data. | CapsNet model is used. It requires less training data and incurring less training time compared to CNN. Routing-by-agreement algorithm is used, in which squashing activation function is performed. Margin loss function is introduced. Number of hyperparameters are smaller than the CNN |
Abbreviations used in this paper
| Abbreviation | Expansion |
|---|---|
| BERT | Bidirectional encoder representations from transformers |
| Bi-LSTM | Bidirectional long short-term memory |
| CapsNet | Capsule neural network |
| CB-Fake | CapsNet BERT – Fake |
| CCL | Class capsule layer |
| CNN | Convolutional neural network |
| COL | Convolutional layer |
| EANN | Event adversarial neural network |
| FND | Fake news detection |
| FFN | Feed forward neural network |
| GAN | Generative adversarial network |
| GRU | Gated recurrent unit |
| GWO | Grey wolf optimization |
| LSTM | Long short-term memory |
| MLM | Masked language model |
| MVAE | Multivariational autoencoder |
| NB | Naive Bayes |
| NLP | Natural language processing |
| NSP | Next sentence prediction |
| PCA | Principal component analysis |
| PCL | Primary capsule layer |
| PSO | Particle swarm optimization |
| RF | Random forest |
| RNN | Recurrent neural network |
| SGD | Stochastic gradient descent |
| SSO | Salp swarm optimization |
| SVM | Support vector machine |
| TF-IDF | Term frequency – Inverse document frequency |
| TI-CNN | Text Image – CNN |
| VGG-19 | Visual Geometry Group – 19 |
Fig. 1BERT fine-tuning model [8]
Variations of original BERT model
| Parameter Name | Value of Parameter | |
|---|---|---|
| BERT-base | BERT-large | |
| Total number of layers | 12 | 24 |
| Hidden layer size | 768 | 1024 |
| Attention heads count | 12 | 16 |
| Total Number of Parameters | 110M | 340M |
Fig. 2Complete flow diagram of CapsNet model
Fig. 3Block diagram of proposed CB-Fake model for fake news detection
Fig. 4A high-level diagram of textual feature representation using BERT
The statistics of the FakeNewsNet dataset
| Dataset | Politifact | Gossipcop |
|---|---|---|
| Real News | 624 [499] | 16817 [15223] |
| Fake News | 432 [376] | 5323 [4784] |
The details of training and testing data
| Details | Politifact | Gossipcop |
|---|---|---|
| Total samples (TS) | 875 | 20,007 |
| Training data (70% x TS) | 612 | 14,004 |
| Testing data (30% x TS) | 263 | 6,003 |
Hyperparameters of CapsNet layers for visual feature representation
| Layer | Num_Capsules | Num_routes | In_channels | Out_channels | Kernel_size |
|---|---|---|---|---|---|
| COL | – | – | 1 | 256 | 9 |
| PCL | 8 | – | 256 | 32 | 9 |
| CCL | 2 | 32 * 6 * 6 | 8 | 16 | – |
Comparison of BERT model with base classifier on textual features of dataset
| Classifier | Politifact | Gossipcop | ||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1-Score | Accuracy | Precision | Recall | F1-Score | |
| NB [ | 0.61 | 0.76 | 0.87 | 0.81 | 0.62 | 0.79 | 0.91 | 0.85 |
| SVM [ | 0.58 | 0.46 | 0.91 | 0.61 | 0.49 | 0.46 | 0.91 | 0.61 |
| RF [ | 0.84 | 0.89 | 0.84 | 0.87 | 0.85 | 0.98 | 0.85 | 0.91 |
| SGD | 0.83 | 0.87 | 0.83 | 0.85 | 0.81 | 0.88 | 0.87 | 0.87 |
| BERT | 0.89 | 0.95 | 0.92 | 0.97 | ||||
| CB-Fake | 0.92 | 0.91 | 0.87 | 0.81 | 0.84 | |||
(Maximum accuracy and F1-Score are shown in bold)
Comparison of BERT model with decision fusion classifier on textual features of dataset
| Classifier | Politifact | Gossipcop | ||||||
|---|---|---|---|---|---|---|---|---|
| Accuracy | Precision | Recall | F1-Score | Accuracy | Precision | Recall | F1-Score | |
| NB+SVM+RF | 0.81 | 0.77 | 0.92 | 0.84 | 0.86 | 0.97 | 0.86 | 0.91 |
| RF+SVM+SGD | 0.79 | 0.71 | 0.93 | 0.81 | 0.85 | 0.98 | 0.85 | 0.91 |
| NB+SVM+SGD | 0.81 | 0.83 | 0.85 | 0.84 | 0.83 | 0.90 | 0.88 | 0.89 |
| NB+RF+SGD | 0.87 | 0.95 | 0.84 | 0.89 | 0.82 | 0.88 | 0.88 | 0.88 |
| BERT | 0.89 | 0.95 | 0.92 | 0.97 | ||||
| CB-Fake | 0.92 | 0.91 | 0.87 | 0.81 | 0.84 | |||
(Maximum accuracy and F1-Score are shown in bold)
The performance of the proposed CB-Fake model with baselines on FakeNewsNet dataset
| Modality | Models | Politifact | Gossipcop |
|---|---|---|---|
| Textual | SVM [ | 0.58 | 0.497 |
| LR [ | 0.642 | 0.648 | |
| NB [ | 0.617 | 0.624 | |
| CNN [ | 0.629 | 0.723 | |
| XLNet + dense layer [ | 0.74 | 0.836 | |
| XLNet + CNN [ | 0.721 | 0.84 | |
| XLNet + LSTM [ | 0.721 | 0.807 | |
|
|
|
| |
| Visual | VGG19 [ | 0.654 | 0.80 |
| Multimodal (Textual+Visual) | EANN [ | 0.74 | 0.86 |
| MVAE [ | 0.673 | 0.775 | |
| SpotFake [ | 0.721 | 0.807 | |
| SpotFake+ [ | 0.846 | 0.856 | |
|
|
|
|
(Maximum accuracy is shown in bold)
Fig. 5The performance of the BERT model and base classifiers
Fig. 6The performance of the BERT model and decision fusion classifier
Fig. 7Confusion matrix results of the proposed CB-Fake model on testing data
Fig. 8The performance of the proposed CB-Fake model with state-of-the-art methods