| Literature DB >> 35818516 |
Sakshini Hangloo1, Bhavna Arora1.
Abstract
The growth in the use of social media platforms such as Facebook and Twitter over the past decade has significantly facilitated and improved the way people communicate with each other. However, the information that is available and shared online is not always credible. These platforms provide a fertile ground for the rapid propagation of breaking news along with other misleading information. The enormous amounts of fake news present online have the potential to trigger serious problems at an individual level and in society at large. Detecting whether the given information is fake or not is a challenging problem and the traits of social media makes the task even more complicated as it eases the generation and spread of content to the masses leading to an enormous volume of content to analyze. The multimedia nature of fake news on online platforms has not been explored fully. This survey presents a comprehensive overview of the state-of-the-art techniques for combating fake news on online media with the prime focus on deep learning (DL) techniques keeping multimodality under consideration. Apart from this, various DL frameworks, pre-trained models, and transfer learning approaches are also underlined. As till date, there are only limited multimodal datasets that are available for this task, the paper highlights various data collection strategies that can be used along with a comparative analysis of available multimodal fake news datasets. The paper also highlights and discusses various open areas and challenges in this direction.Entities:
Keywords: Deep learning; Fake news detection; Multimodal; Pretrained models; Rumor detection; Text embedding; Transfer learning
Year: 2022 PMID: 35818516 PMCID: PMC9261148 DOI: 10.1007/s00530-022-00966-y
Source DB: PubMed Journal: Multimed Syst ISSN: 0942-4962 Impact factor: 2.603
A relative comparison of proposed work with various related surveys
| Ref. | Discussion | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|---|---|
| [ | Proposes various visual and statistical features of a visual content | ✔ | × | × | × | × | × | × |
| [ | Presents a comprehensive review of fake news detection techniques on social media from the data mining perspective | ✔ | × | × | × | × | ||
| [ | Provides an overview of techniques of developing a rumor classification system consisting of detection, tracking, stance classification, and veracity classification modules | ✔ | × | × | ||||
| [ | Examined and compared the relative strength of the user, linguistic, network and temporal features of rumors over time | × | × | × | × | × | × | |
| [ | provides an extensive study of automatic rumor detection on three paradigms: the hand-crafted feature-based approaches, the propagation structure-based approaches and the neural networks-based approaches | × | × | × | × | |||
| [ | Survey provides a review of techniques for manipulation and detection of face images including DeepFake methods. In particular, facial manipulation are reviewed based on following four types: attribute manipulation, face synthesis, identity swap (DeepFakes), and expression swap | × | × | × | × | |||
| [ | Gives an understanding of fake news creation, source identification, propagation patters, detection and containment strategies | × | × | × | ||||
| [ | Presents a detailed review of state-of-the-art FND methods using DL, open issues along with future directions are also suggested | × | × | × | ||||
| [ | Reviews the methods for detecting fake news from four verticals: the false information, writing style, propagation patterns, and the source credibility | × | × | × | × | × | ||
| [ | Presents an overview of the state-of-the-art fake news detection methods utilizing users, content, and context features | × | × | × | ||||
| [ | Provides an overview of the different forms of fabricated content on social media ranging from text-only to multimedia content and discusses various detection techniques for the same | × | × | × | × | × | ||
| [ | proposed work explores the problem of rumors detection using textual content of social media on collected Twitter data | × | × | × | × | |||
| [ | Compares, reviews and provides insights into twenty-seven popular fake news detection datasets | × | × | × | × | × | ||
| Present Study | The prime focus is on various deep learning approaches to fake news detection on social media keeping the multimodal data under consideration |
Notes: 1: Overview of ML/DL-based FND; 2: Open tools and initiatives; 3: DL frameworks & tools; 4: Review of MFND frameworks; 5: Datasets; 6: Data collection; 7: Open issues; Notations: ✔:Considered;×: Not considered
Fig. 1Roadmap of the proposed survey
Fig. 2Fake news trends (2012–2021) [71]
Fig. 3Key terms related to Fake News
Fact-checking sites and online tools that are used for debunking false news online
| Name | Tool/Extension | Methodology/Action |
|---|---|---|
| AltNews | Fact-checking website | Continuously monitors social media and mainstream media platforms for identifying incorrect information related mainly to Indian politics and entertainment, and evaluates the veracity of a claim by Manual Fact-checking |
| APF Fact Check | Fact-checking website | It uses many simple tools to verify online information. Fact-checking is carried out by editors and a worldwide network of journalists |
| BS Detector | It is an extension of Google Chrome, Mozilla | Identifies and marks fake and satirical news sites, as well as other suspected news sources. It puts a warning label to the top of potentially dangerous websites, as well as identifies fake links on Facebook and Twitter |
| Emergent | Fact-checking website | Emergent is a real-time rumor tracker that assesses news credibility and gives a True, False, or Unverified label |
| Fact-Checker | Fact-checking website | A project of The Washington Post, grades news articles from zero to four "Pinocchios” based on the factual accuracy of their content |
| FakerFact | A Chrome and Firefox extension | Distinguishes a fake news article from the real one and categorizes it as opinion, satire, agenda-driven, journalism, and sensationalism |
| InVid Verification Plugin | Use with Chrome, Firefox | A plugin to debunk fake images and videos. The tool uses reverse-image searching to debunk fake videos and also provide the users with metadata to take informed decision |
| PolitiFact | Fact-checking website | Tests the statements made on the Internet by political analysts and politicians and rate them. Its journalists evaluate original statements and each statement receives a “Truth-O-Meter” rating as “True”, “Mostly True”, “Half True”, “Mostly False”, “False”, and “Pants on Fire” |
| Snopes | Fact-checking website | Conducts in-depth fact-checking research on hot issues, which are frequently picked depending on reader interest. “True”, “Mostly True”, “Mostly False”, “False”, “Unproven”, “Miscaptioned”, “Misattributed” are some of the annotations used to classify the content |
| Reverse Image Search (TinEye) | Browser extension | Can be used to see if the image has been taken from somewhere online. The tool comes with a Compare feature, which can be helpful to see how your image differs from the original one |
| BOOM | Fact-checking website | Manually checks the posts, debunks fake news, and prevents further spread |
| SurfSafe | Browser Plugin | Alerts users about misinformation by scanning images and videos on the web pages they’re looking at. Performs reverse-image search by looking for the same content that appears on trusted source sites and flagging well-known doctored images |
| YouTube Data Viewer | A web-based video verification tool | Simple tool for extracting hidden data and metadata from YouTube videos which is particularly valuable for locating original content |
Fig. 4False claims debunked by fact-checking organizations
Fig. 5Taxonomy of Fake News Detection models
Classification of prominent state-of-the art ML/DL FND techniques based on the proposed taxonomy
| Level 1 | Level 2 | Level 3 | Related work | |
|---|---|---|---|---|
| Fake News Detection Methods | Feature Based | Content | Single-Modal | [ |
| Multi-Modal | [ | |||
| Context | Network | [ | ||
| User | [ | |||
| Temporal | [ | |||
| Knowledge Based | Automatic | – | [ | |
| Manual | Expert Based | [ | ||
| Crowdsourced | [ | |||
| Learning Based | ML | – | [ | |
| DL | – | [ | ||
| Detection Based | Post-level | – | [ | |
| Event Level | – | [ | ||
| Language Based | Mono-Lingual | – | [ | |
| Multi-Lingual | – | [ | ||
| Degree of Fakeness | Two-Class | – | [ | |
| Multi-Class | – | [ | ||
| Platform | Main-Stream | – | [ | |
| Social Media | – | [ |
Fig. 6Classification of Deep Learning Models
Fig. 7Discriminative Models (a) RNN (b) LSTM (c) CNN. [126]
Fig. 8Generative Models (a) Auto Encoder (b) Generative Adversarial Network (c) Transformer. [131]
Fig. 9Transfer learning
Pretrained Image Models
| Network | Author(s), Year | Salient Features | Parameters | FLOP | Top 5 Accuracy |
|---|---|---|---|---|---|
| AlexNET | Krizhevsky et al. (2012) | Deeper | 62 M | 1.5B | 84.70% |
| VGGNet | Simonyan et al. (2014) | Fixed-size kernel | 138 M | 19.6B | 92.30% |
| Inception | Szegedy et al. (2014) | Wider parallel kernel | 6.4 M | 2B | 93.30% |
| ResNET | He et al. (2015) | Shortcut connections | 60.3 M | 11B | 95.51% |
Fig. 10Taxonomy of Word Embeddings
Comparison of various Text Embeddings
| Embedding | Advantage | Weakness |
|---|---|---|
| Word2Vec | Consume much less space than one-hot encoded vectors Maintain semantic representation of word Capable of capturing multiple degrees of similarity between words using simple vector arithmetic | Can’t handle OOV words No shared representation is used at subword level Scaling to new languages requires separate embedding matrices |
| GloVe | Can handle Out-Of-Vocabulary words | Gives random vectors to OOV words which confuses the model in long run |
| BERT | Creates contextualized vectors Learns representations at a “subword” (also called WordPieces) level | Computationally intensive Neglects dependency present between the masked positions Suffers from the pretrain-finetune inconsistency |
| ELMo | Generates contextualized word embeddings Can handle Out-Of-Vocabulary words | Complex Bi-LSTM structure makes train and embedding generation very slow Representing long-term context dependencies becomes difficult |
| XLNet | Provides autoregressive pretraining Enables bidirectional learning by maximizing the expected likelihood over all permutations of the factorization order | XLNet is pre-trained to capture long-term dependencies but can underperform on short sequences XLNet is generally more resource-intensive and takes longer to train and to infer compared to BERT |
Fig. 11Overview of ML/DL frameworks and libraries
Comparison of popular Deep Learning Frameworks
| Software | Platform | Written in | Interface | Open MP support | Open CL support | CUDA support | RNN | CNN | Has pre-trained Models |
|---|---|---|---|---|---|---|---|---|---|
| TensorFlow | Windows, Linux, macOS, Android | Python, C + + , CUDA | C/C + + , R, Python (Keras), Java, JavaScript | × | via SYCL support | ||||
| PyTorch | Windows, Linux, macOS, Android | C/C + + , Python, CUDA | Python, C + + | Via separately maintained package | |||||
| Caffee | Linux, macOS, Windows | C + + | Python, C + + MATLAB, | Under development | |||||
| Theano | Cross-platform | Python | Python (Keras) | Under development | Through Lasagne's model zoo | ||||
| Chainer | Linux, macOS | Python | Python | × | × | ||||
| MXNet | Linux, AWS macOS, iOS, Windows, Android, JavaScript | Small C + + core library | C + + , Python, MATLAB, JavaScript, Scala, Perl, R | On roadmap | |||||
Microsoft Cognitive Toolkit (CNTK) | Linux, Windows, macOS (via Docker on roadmap) | C + + | Python (Keras), C + + Command Line | × |
Fig. 12A general framework for Multimodal Fake News Detection
Fig. 13Various components of Multimodal Fake News Detection
Fig. 14General schemes for multimodal fusion (a) Early Fusion, (b) Late Fusion, (c) Intermediate Fusion
Fig. 15Illustration of Knowledge Distillation process [16]
Review of literature of existing multimodal fake news frameworks
| Model, Ref | Contribution | Feature (s) used | Feature Extraction | Multi-modal Fusion | Activation function | Future Scope | |
|---|---|---|---|---|---|---|---|
| Text | Visual | ||||||
Att-RNN [25] | Fuses the textual and visual features along with the social context using attention mechanism | M, SC | Word2Vec | VGG-19 | Attention network | Sigmoid, Softmax, ReLU, Tanh | To improve the proposed model's performance |
EANN [15] | Uses event discriminator to discover event-specific data to enhance the detection efficiency on new events | M, ES | Text-CNN | VGG-19 | Dense layer (Concatenation) | Softmax, ReLU | To improve the fusion network |
SAME [98] | Fuses the multimodal information along with user sentiment using adversarial learning | M, SC | GloVe | VGG-19 | Adversarial network | Softmax, ReLU | Early detection |
MKEMN [16] | The model gathers the event-invariant features shared between different events and captures the external knowledge connections for effective news verification. | M, EK, ES | GloVe | VGG-19 | CNN | Softmax, Tanh | Use memory network to exploit the rumor propagation information |
MVAE [27] | Model discovers correlations across the modalities leveraging VAE | M | Word2Vec | VGG-19 | Concatenation | Tanh, Softmax | Utilizing tweet propagation and user characteristics. |
SpotFake [28] | The prime novelty of the proposed model is the use of pre-trained language model BERT | M, SC | BERT | VGG-19 | Dense layer (Concatenation) | Sigmoid, ReLU | improvement on longer length articles |
SpotFake+ [95] | The prime novelty of the proposed model is the use of the pre-trained language model XLNet | M | XLNet | VGG-19 | Dense layer (Concatenation) | Sigmoid, ReLU | Incorporate meta-level feature modalities |
MTMN [96] | Address early detection by fusing features shared by various topics with global features of latent topics and modeling intra-modal and inter-modal data in a combined framework | M, ES | BERT | ResNET50 | Blended Attention network | Softmax | Explore effective ways to learn background knowledge |
SAFE [97] | Produces joint representation of textual and visual features of an article and uses cosine function to measure the similarity between them | M | Text-CNN | Text-CNN (image2sentence) | Cosine Similarity | Softmax, ReLU | Incorporate user and network information |
- [94] | proposes visual features that are extracted from multiple images, additionally, cosine similarity of the title and image tags embeddings is calculated to find the image-text similarity | M | BERT | VGG-16 | Attention network | Softmax | Improve the performance of the proposed model |
MCAN [129] | Proposes multiple co-attention layers that fuse and learn inter-modality relations | M | BERT | VGG-19 | Multiple co-attention layers | ReLU | To extend the fusion with the co-attention network to fake news diffusion. |
HMCAN [26] | Propose a multi-modal contextual attention network that takes data from different modalities which complement one another | M | BERT | ResNET50 | Attention network | Softmax | Explore an effective way to exploit visual data and utilize auxiliary information |
KMAGCN [99] | The model represents posts as graphs instead as word sequences to capture long-range non-consecutive semantic relations and leverages knowledge concepts along with multimodal information | M, EK | Adaptive graph convolutional network | VGG-19 (128-D) | Feature-level attention mechanism | Softmax, ReLU | To improve the proposed model's performance |
CARMN [91] | The model keeps unique properties intact while reducing the noise induced while fusing different modalities | M | Word2Vec (32-D) | VGG-19 | Multichannel CNN | Softmax, ReLU | Event-level multimodal fake news |
FND-SCTI [4] | Fuses multi-modal data along with an image-augmented text representation in a multi-task setting | M | Word2Vec | VGG-19 | Hierarchical attention network, VAE | tanh, Softmax | Bot detection using user characteristics |
- [110] | The proposed system consists of four independent parallel networks with individual predictions that are merged with the max voting ensemble method | M | GloVe | Image caption (CaptionBot) | Ensemble with max voting | Softmax, Tanh, ReLU | Incorporating better image forensic techniques |
MMCN [92] | Fuses the text and image embeddings by considering inter-modal relationships using a multi-modal cross-attention network | M | BERT | ResNET50 | Cross Attention network | Softmax | To explore effective ways to utilize background knowledge |
M Multimodal, SC Social Context, EK External Knowledge, ES Event Specific features
Experimental Setup of Multimodal Fake News Detection Models
| Model | Ref. | Dataset | Batch size | Learning rate | Dropout | Epochs | Optimizer | Loss function | Performance Evaluation |
|---|---|---|---|---|---|---|---|---|---|
| Att-RNN | [25] | Twitter16, Weibo | 128 | – | – | 100 | Stochastic gradient descent | Cross Entropy | Acc- ~78%, ~68% |
| EANN | [15] | Twitter15, Weibo | 100 | – | – | 100 | – | Cross Entropy | Acc- ~71%, ~82% |
| SAME | [98] | FakeNewsNet | 128 | 0.001 | 0.5 | – | RMSprop | Adversarial, Hybrid similarity, Cross entropy | Acc- ~77%, ~80% |
| MKEMN | [16] | Twitter15, PHEME | 128 | – | – | – | – | Cross Entropy | Acc- ~86%, ~81% |
| MVAE | [27] | Twitter15, Weibo | 128 | 0.00001 | – | 300 | Adam | VAE Loss | Acc- ~74%, ~82% |
| SpotFake | [28] | Twitter15, Weibo | 256 | 0.0005, 0.001 | 0.4 | – | Adam | – | Acc- ~72%, ~80% |
| SpotFake+ | [95] | FakeNewsNet | – | – | 0.4 | – | – | – | Acc- ~84%, ~85% |
| SAFE | [97] | FakeNewsNet | – | – | – | – | – | Cross Entropy | Acc- ~87%, ~83% |
| – | [94] | FakeNewsNet | 32 | 0.00005 | 0.2 | 60 | Adam | – | F1 score- ~76% |
| MCAN | [129] | Twitter16, Weibo | – | – | – | 100 | Adam | Cross Entropy | Acc- ~80%, ~89% |
| HMCAN | [26] | Weibo, Twitter15, PHEME | 256 | 0.001 | – | 150 | Adam | Cross Entropy | Acc- ~85%, ~89%, ~88% |
| KMAGCN | [99] | Weibo, Twitter15, PHEME | 128 | 0.01 | – | 300 | Adam | Cross Entropy | Acc- ~84%, ~78%, ~86% |
| CARMN | [91] | Twitter16, Weibo | 150 | – | – | 150 | Adam | Cross Entropy | Acc- ~74%, ~85% |
| FND-SCTI | [4] | Twitter15, Weibo | 128 | 0.00001 | – | 300 | Adam | VAE Loss | Acc- ~75%, ~83% |
| – | [110] | AllData, Kaggle datasets | 32 | – | – | 40 | Adam | Cross Entropy | Acc- ~95%, ~95%, ~95% |
| MMCN | [92] | Weibo, PHEME | 64, 256 | 0.001 | – | 150 | Adam | Cross Entropy | Acc- ~87%, ~87% |
| MTMN | [96] | Weibo, PHEME | 256 | 0.001 | – | 200 | Adam | Cross Entropy | Acc- ~88%, ~88% |
Fig. 16Requirements for fake news detection datasets defined by [44]
Fig. 17Requirements for fake news detection datasets as defined by [70]
Multimodal Fake News datasets
| Dataset | Year of release | Statistics | Domain | Contents | Labels | Collected from | Used in |
|---|---|---|---|---|---|---|---|
| Twitter 15 [ | 2015 | 361 (I) 7032 (F) 5008 (R) | Posts related to 11 events | Text, visual | 2 | [ | |
| Twitter 16 [ | 2016 | 413 (I) 9596 (F) 6225 (R) | Posts related to 17 events | Text, visual | 2 | [ | |
| Weibo [ | 2016 | 9528 (I) 4749 (F) 4779 (R) | Crawl the verfi ed false rumor posts from May, 2012 to Jan, 2016 | Text, visual | 2 | Weibo (Non-rumor tweets are verifi ed by Xinhua News Agency, an authoritative news agency in China) | [ |
| PHEME [ | 2016 | 2672 (I) 1972 (F) 3830 (R) | 9 different events, which include 5 cases of breaking news | Tweet, conversational threads | 3 | [ | |
| ALLData [ | 2018 | 20,015 (I) 11,941 (F) 8074 (R) | 2016 US Presidential elections | The title, text, image, author and website | 2 | Fake and real news scraped from 240 websites and authoritative news websites, i.e., the New York Times, Washington Post, etc. respectively | [ |
| FakeNewsNet [ | 2019 | 19,200 (I) 5367 (F) 17,222 (R) | Politics, Entertainment | Text, image url, conversational threads, location, and timestamp of engagement | 2 | Content is crawled from PolitiFact, GossipCop, E! online; For user engagements Twitter API is used | [ |
Note: I—Total Number of Images, F—Number of Fake claims, R—Number of Real claims
Fig. 18Fake news detection challenges