| Literature DB >> 33432264 |
Rohit Kumar Kaliyar1, Anurag Goswami1, Pratik Narang2.
Abstract
In the modern era of computing, the news ecosystem has transformed from old traditional print media to social media outlets. Social media platforms allow us to consume news much faster, with less restricted editing results in the spread of fake news at an incredible pace and scale. In recent researches, many useful methods for fake news detection employ sequential neural networks to encode news content and social context-level information where the text sequence was analyzed in a unidirectional way. Therefore, a bidirectional training approach is a priority for modelling the relevant information of fake news that is capable of improving the classification performance with the ability to capture semantic and long-distance dependencies in sentences. In this paper, we propose a BERT-based (Bidirectional Encoder Representations from Transformers) deep learning approach (FakeBERT) by combining different parallel blocks of the single-layer deep Convolutional Neural Network (CNN) having different kernel sizes and filters with the BERT. Such a combination is useful to handle ambiguity, which is the greatest challenge to natural language understanding. Classification results demonstrate that our proposed model (FakeBERT) outperforms the existing models with an accuracy of 98.90%. © Springer Science+Business Media, LLC, part of Springer Nature 2021.Entities:
Keywords: BERT; Deep learning; Fake news; Neural network; Social media
Year: 2021 PMID: 33432264 PMCID: PMC7788551 DOI: 10.1007/s11042-020-10183-2
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.757
Fig. 1Examples of some fake news spread over social media (Source: Facebook®;)
Fig. 2Approaches for fake news detection
Fig. 3An Overview of existing word-embedding models
Parameters for BERT-Base
| Parameter Name | Value of Parameter |
|---|---|
| Number of Layers | 12 |
| Hidden Size | 768 |
| Attention Heads | 12 |
| Number of Parameters | 110M |
Parameters for BERT-Large
| Parameter Name | Value of Parameter |
|---|---|
| Number of Layers | 24 |
| Hidden Size | 1024 |
| Attention Heads | 16 |
| Number of Parameters | 340M |
Fig. 4CNN model
CNN layered architecture
| Layer | Input size | Output size | Param number |
|---|---|---|---|
| Embedding | 1000 | 1000 × 100 | 25187700 |
| Conv1D | 1000 × 100 | 996 × 128 | 64128 |
| Maxpool | 996 × 128 | 199 × 128 | 0 |
| Conv1D | 199 × 128 | 195 × 128 | 82048 |
| Maxpool | 195 × 128 | 39 × 128 | 0 |
| Conv1D | 39 × 128 | 35 × 128 | 82048 |
| Maxpool | 35 × 128 | 1 × 128 | 0 |
| Flatten | 1 × 128 | 128 | 0 |
| Dense | 128 | 128 | 16512 |
| Dense | 128 | 2 | 258 |
LSTM layered architecture
| Layer | Input size | Output size | Param number |
|---|---|---|---|
| Embedding | 1000 × 100 | 1000 × 100 | 25187700 |
| Dropout | 1000 × 100 | 1000 × 100 | 0 |
| Conv1D | 1000 × 100 | 1000 × 32 | 16032 |
| Maxpool | 1000 × 32 | 500 × 32 | 0 |
| Conv1D | 500 × 32 | 500 × 64 | 6208 |
| Maxpool | 500 × 64 | 250 × 64 | 0 |
| LSTM | 250 × 64 | 100 | 66000 |
| Batch-Normalization | 100 | 100 | 400 |
| Dense | 100 | 256 | 25856 |
| Dense | 256 | 128 | 32896 |
| Dense | 128 | 64 | 8256 |
| Dense | 64 | 2 | 130 |
Fig. 5FakeBERT model
FakeBERT layered architecture
| Layer | Input size | Output size | Param number |
|---|---|---|---|
| Embedding | 1000 | 1000 × 100 | 25187700 |
| Conv1D | 1000 × 100 | 998 × 128 | 38528 |
| Conv1D | 1000 × 100 | 997 × 128 | 51328 |
| Conv1D | 1000 × 100 | 996 × 128 | 64128 |
| Maxpool | 998 × 128 | 199 × 128 | 0 |
| Maxpool | 997 × 128 | 199 × 128 | 0 |
| Maxpool | 996 × 128 | 199 × 128 | 0 |
| Concatenate | 199 × 128, 199 × 128, 199 × 128 | 597 × 128 | 0 |
| Conv1D | 597 × 128 | 593 × 128 | 82048 |
| Maxpool | 593 × 128 | 118 × 128 | 0 |
| Conv1D | 118 × 128 | 114 × 128 | 82048 |
| Maxpool | 114 × 128 | 3 × 128 | 0 |
| Flatten | 3 × 128 | 384 | 0 |
| Dense | 384 | 128 | 49280 |
| Dense | 128 | 2 | 258 |
Attributes in the fake news dataset
| Attribute | Number of Instances |
|---|---|
| ID (unique value to the news article) | 20800 |
| title (main heading related to particular news) | 20242 |
| author (name of the creator of that news) | 18843 |
| text (complete news article) | 20761 |
| label (information about that the article as fake or real) | 20800 |
Fake news dataset with the class labels
| Class label | Number of Instances |
|---|---|
| True | 10540 |
| False | 10260 |
Optimal hyperparameters with CNN
| Hyperparameter | Value |
|---|---|
| Number of convolution layers | 3 |
| Number of max pooling layers | 3 |
| Number of dense layers | 2 |
| Number of Flatten layers | 1 |
| Loss function | Categorical-crossentropy |
| Activation function | Relu |
| Learning rate | 0.001 |
| Optimizer | Ada-delta |
| Number of epochs | 10 |
| Batch size | 128 |
Optimal hyperparameters with LSTM
| Hyperparameter | Value |
|---|---|
| Number of convolution layers | 2 |
| Number of max pooling layers | 2 |
| Number of dense layers | 4 |
| Dropout rate | .2 |
| Optimizer | Adam |
| Activation function | Relu |
| Loss function | Binary-crossentropy |
| Number of epochs | 10 |
| Batch size | 64 |
Optimal hyperparameters with FakeBERT
| Hyperparameter | Value |
|---|---|
| Number of convolution layers | 5 |
| Number of max pooling layers | 5 |
| Number of dense layers | 2 |
| Number of Flatten layers | 1 |
| Dropout rate | .2 |
| Optimizer | Adadelta |
| Activation function | Relu |
| Loss function | Categorical-crossentropy |
| Number of epochs | 10 |
| Batch size | 128 |
Representation of confusion matrix
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | True negative (TN) | False positive (FP) |
| Actual positive | False negative (FN) | True positive (TP) |
Confusion matrix for MNB with GloVe
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 853 (TN) | 111 (FP) |
| Actual positive | 73 (FN) | 898 (TP) |
Confusion matrix for KNN with GloVe
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 282 (TN) | 762 (FP) |
| Actual positive | 200 (FN) | 836 (TP) |
Confusion matrix for DT with GloVe
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 631 (TN) | 413 (FP) |
| Actual positive | 135 (FN) | 901 (TP) |
Confusion matrix for RF with GloVe
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 683 (TN) | 361 (FP) |
| Actual positive | 234 (FN) | 802 (TP) |
Classification results with BERT and GloVe
| Word embedding model | Classification model | Accuracy (%) |
|---|---|---|
| TF-IDF (using unigrams and bigrams) | Neural Network | 94.31 |
| BOW (Bag of words) | Neural Network | 89.23 |
| Word2Vec | Neural Network | 75.67 |
| GloVe | MNB | 89.97 |
| GloVe | DT | 73.65 |
| GloVe | RF | 71.34 |
| GloVe | KNN | 53.75 |
| BERT | MNB | 91.20 |
| BERT | DT | 79.25 |
| BERT | RF | 76.40 |
| BERT | KNN | 59.10 |
| GloVe | CNN | 91.50 |
| GloVe | LSTM | 97.25 |
| BERT | CNN | 92.70 |
| BERT | LSTM | 97.55 |
| BERT |
Fig. 6Classification results with GloVe
Confusion matrix for LSTM with GloVe
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 1030 (TN) | 8 (FP) |
| Actual positive | 47 (FN) | 995 (TP) |
Confusion matrix for CNN with BERT
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 1004 (TN) | 63 (FP) |
| Actual positive | 90 (FN) | 942 (TP) |
Confusion matrix for LSTM with BERT
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 1032 (TN) | 7 (FP) |
| Actual positive | 44 (FN) | 998 (TP) |
Confusion matrix for FakeBERT with BERT
| Predicted negative | Predicted positive | |
|---|---|---|
| Actual negative | 1045 (TN) | 6 (FP) |
| Actual positive | 17 (FN) | 1012 (TP) |
Fig. 7Accuracy and cross entropy loss using CNN
Fig. 8Accuracy and cross entropy loss using FakeBERT
Fig. 9Classification results with BERT
False Positive Rate (FPR) and False Negative Rate (FNR)
| Word Embedding Model | Classification Model | FPR | FNR |
|---|---|---|---|
| TF-IDF (using unigrams and bigrams) | Neural Network | 0.04684 | 0.0742 |
| BOW (Bag of words) | Neural Network | 0.1040 | 0.0862 |
| Word2Vec | Neural Network | 0.1320 | 0.3416 |
| GloVe | MNB | 0.1151 | 0.0752 |
| GloVe | DT | 0.3956 | 0.1303 |
| GloVe | RF | 0.3458 | 0.2259 |
| GloVe | KNN | 0.7299 | 0.1931 |
| BERT | MNB | 0.0985 | 0.0789 |
| BERT | DT | 0.1660 | 0.2429 |
| BERT | RF | 0.1245 | 0.3318 |
| BERT | KNN | 0.4037 | 0.4110 |
| GloVe | CNN | 0.0989 | 0.0776 |
| GloVe | LSTM | 0.0080 | 0.0482 |
| BERT | CNN | 0.0590 | 0.0872 |
| BERT | LSTM | 0.0077 | 0.0451 |
| BERT |
Our proposed model vs existing benchmarks with real-world fake news dataset
| Authors | Accuracy(%) |
|---|---|
| Ghanem et al [ | 48.80 |
| Singh et al [ | 87.00 |
| Ahmed et al [ | 89.00 |
| Ruchansky et al [ | 89.20 |
| Ahmed et al [ | 92.00 |
| Liu et al [ | 92.10 |
| O’Brien et al [ | 93.50 |
| Our Proposed model (FakeBERT) |
Fig. 10Cross entropy loss with CNN,LSTM,and FakeBERT