| Literature DB >> 35855700 |
Dimple Tiwari1, Bharti Nagpal2.
Abstract
Social media materialized as an influential platform that allows people to share their views on global and local issues. Sentiment analysis can handle these massive amounts of unstructured reviews and convert them into meaningful opinions. Undoubtedly, COVID-19 originated as the enormous challenge across the world that physically and financially bruted humankind. Meanwhile, farmers' protests shook up the world against three pieces of legislation passed by the Indian government. Hence, an artificial intelligence-based sentiment model is needed for suggesting the right direction toward outbreaks. Although Deep Neural Network (DNN) gained popularity in sentiment analysis applications, these still have a limitation of sequential training, high-dimension feature space, and equal feature importance distribution. In addition, inaccurate polarity scoring and utility-based topic modeling are other challenging aspects of sentiment analysis. It motivates us to propose a Knowledge-Enriched Attention-based Hybrid Transformer (KEAHT) model by enriching the explicit knowledge of Latent Dirichlet Allocation (LDA) topic modeling and lexicalized domain ontology. A pre-trained Bidirectional Encoder Representation from Transformer (BERT) is employed to train within a minimum training corpus. It provides the facility of attention mechanism and can solve complex text problems accurately. A comparative study with existing baselines and recent hybrid models affirms the credibility of the proposed KEAHT in the field of Natural Language Processing (NLP). This model emphasizes artificial intelligence's role in handling the situation of the global pandemic and democratic dispute in a country. Furthermore, two benchmark datasets, namely "COVID-19-Vaccine-Labelled-Tweets" and "Indian-Farmer-Protest-Labelled-Tweets", are also constructed to accommodate future researchers for outlining the essential facts associated with the outbreaks. © Ohmsha, Ltd. and Springer Japan KK, part of Springer Nature 2022.Entities:
Keywords: Bidirectional encoder representation from transformer (BERT); COVID-19 vaccine; Indian farmer protest; Latent Dirichlet Allocation (LDA); Lexicon approach; Social networks
Year: 2022 PMID: 35855700 PMCID: PMC9275547 DOI: 10.1007/s00354-022-00182-2
Source DB: PubMed Journal: New Gener Comput ISSN: 0288-3635 Impact factor: 1.180
Summary of existing lexicon-based research for sentiment analysis
| Research | Dictionary approach | Corpus approach | Dataset type | Number of datasets |
|---|---|---|---|---|
| (Taboada et al. 2011) [ | ✓ | Reviews | 4 | |
| (Abdulla et al. 2014) [ | ✓ | Tweets & Comments | 2 | |
| (Kundi et al. 2014) [ | ✓ | Tweets | 1 | |
| (Saif et al. 2016) [ | ✓ | Tweets | 2 | |
| (Keshavarz et al. 2017) [ | ✓ | ✓ | 6 | |
| (Khoo et al. 2018) [ | ✓ | Reviews | 2 |
Summary of existing machine learning-based research for sentiment analysis
| Research | Supervised | Unsupervised | Semi-supervised | Adopted Method | Dataset type | Number of datasets |
|---|---|---|---|---|---|---|
| (Fernández-Gavilanes et al. 2016) [ | ✓ | Propagation | Reviews and Tweets | 3 | ||
| (Unnisa et al. 2016) [ | ✓ | Spectral Clustering | Tweets | 1 | ||
| (Khan2 et al. 2016) [ | ✓ | SentiWordNet and SVM | Reviews | 7 | ||
| (Khan1 et al. 2017) [ | ✓ | Information Gain and Cosine Similarity | Reviews | 1 | ||
| (Kumar2 et al. 2019) [ | ✓ | Dirichlet-Multinomial Distribution | Reviews | 1 | ||
| (Rintyarna et al. 2019) [ | ✓ | NB Multinomial, Bayesian Network, Multilayer Perceptron, J48, Logistic, Random Tree, and Random Forest | Reviews | 2 | ||
| (Kumar1 et al. 2021) [ | ✓ | Collaborative Filtering | Feedback | 1 |
Summary of existing deep learning and transfer learning-based research for sentiment analysis
| Research | CNN | RNN | LSTM | Transformer | Dataset type | Number of datasets |
|---|---|---|---|---|---|---|
| (Daval-Frerot et al. 2018) [ | ✓ | 1 | ||||
| (Rehman et al. 2019) [ | ✓ | ✓ | Reviews | 1 | ||
| (Rani et al. 2019) [ | ✓ | Reviews | 1 | |||
| (Basiri et al. 2021) [ | ✓ | ✓ | Reviews + Tweets | 8 | ||
| (Singh et al. 2021) [ | ✓ | 1 | ||||
| (Jain et al. 2021) [ | ✓ | ✓ | Tweets | 1 |
Fig. 1Lexicon-based polarity calculation
Fig. 2The procedure of LDA topic modeling
The LDA-based topic modeling calculation
| Input | Description |
|---|---|
| Corpus: | Dataset- |
| Document: | |
| Dictionary: | . |
| Latent: | . |
1. For all the topics Choose a word distribution 2. For all the documents •Choose •Choose a topic distribution •For all the words Choose a topic index Choose a word | |
Fig. 3An additive attention representation
Fig. 4The architecture of the proposed KEAHT sentiment model
The proposed KEAHT algorithm for sentiment analysis
Tokenization- Here, Porter-stemmer- (m > 1 and (*s or *T)) Where test for a stem with m > 1 ending in S (stem) or T LDA = Where The conditional probability of each word lexicon in the sentence is calculated 1. Proportionately, sentence S splits into two sections, Here, Here, |
The attributes of the tweets datasets before and after the knowledge enrichment process
| Dataset | Domain | DataType |
|---|---|---|
| COVID-19 Vaccine | User_Name | object |
| User_Location | object | |
| Date | object | |
| Text | object | |
| Hashtags | object | |
| Indian Farmer Protest | Date | object |
| Tweets | object | |
| Source | object | |
| Location | object | |
| Domains added after the knowledge enrichment | ||
| COVID-19 Vaccine & Indian Farmer Protest | Clean-Tweet | object |
| Text_Length | int64 | |
| Word_Count | int64 | |
| Polarity | float64 | |
| Sentiment_Type | object | |
Two random tweets from the COVID-19 vaccine dataset with the highly positive sentiment polarity
| Tweet Number | Highly Positive Tweets |
|---|---|
| 1 | |
| 2 |
Two random tweets from the COVID-19 vaccine dataset with the highly negative sentiment polarity
| Tweet Number | Highly Negative Tweets |
|---|---|
| 1 | |
| 2 |
Two random tweets from the Indian farmer protests dataset with the highly positive sentiment polarity
| Tweet Number | Highly Positive Tweets |
|---|---|
| 1 | |
| 2 |
Two random tweets from the Indian farmer protests dataset with the highly negative sentiment polarity
| Tweet Number | Highly Negative Tweets |
|---|---|
| 1 | |
| 2 |
Fig. 5Sentiment polarity distribution of collected tweets
Fig. 6Word count distribution of collected tweets
Fig. 7Word cloud calculated for the tweets
LDA topic modeling with network-based ontology graph
Corpus-2 (IFP): Indian farmer protest-associated tweets |
1.# LDA-based topic modeling 2.Consider 3. 4.| Select a word distribution 5. 6. 7.| Consider the topic distribution 8.| 9.| | sample a topic 10.| | sample a word 11.| 12. 13.# Network ontology representation of Topic related words 14. 15.| 16.| | 17.| | | TI = TI + 1; 18.| | 19.| 20. |
Fig. 8Network diagram of topic and word extracted from COVID-19 vaccine-associated tweets
Fig. 9Network diagram of topic and word extracted from Indian farmer protests associated tweets
Sentiment extraction with lexicalized dictionary approach
1.analysis = WL(GW) 2. 3.| 4.| | sentiment = “Positive”; 5.| 6.| | sentiment = “Negative”; 7.| 8.| | sentiment = “Neutral” 9.| 10. |
Fig. 10Token counts of collected tweets
Confusion matrix
| Actual values | |||
|---|---|---|---|
| Predicted values | Positive | Negative | |
| Positive | TP | FP | |
| Negative | FN | TN | |
Sentiment count calculated by lexicalized dictionary
| Dataset | Positive | Neutral | Negative | Total |
|---|---|---|---|---|
| COVID-19 Vaccine | 1465 | 1249 | 274 | 2988 |
| Indian farmer protest | 1239 | 1483 | 621 | 3343 |
Aspect associated topics calculated by the LDA topic modeling
| COVID-19 Vaccine-associated tweets | |
|---|---|
| Topic-1 | Covid-vaccine, Vaccine, COVID-19, Russia, Coronavirus, Vaccines, |
| Topic-2 | Covid-vaccine, Vaccine, COVID-19, Russia, World, India, Russian, |
| Topic-3 | Russia, Vaccine, Covid-vaccine, Putin, COVID-19, Russian vaccine, World, Coronavirus, |
| Topic-4 | Covid-vaccine, Russia, Vaccine, COVID-19, World, |
| Topic-5 | Covid-vaccine, |
| Topic-6 | Vaccine, COVID-19, Russia, |
| Topic-7 | Covid-vaccine, vaccine, |
| Topic-8 | Covid-vaccine, Vaccines, COVID-19, India, |
| Topic-9 | Covid-vaccine, Russia, Russian, |
| Topic-10 | |
| Indian farmer protests associated tweets | |
| Topic-1 | Farmers, Modi, Government, AMP, |
| Topic-2 | Farmers, Protest, India, Government, Farm, Protesting |
| Topic-3 | Farmers, AMP, India, Modi, |
| Topic-4 | Farmers, Support, Day, Indian, People, Protest, Government, |
| Topic-5 | Farmers, Just, |
| Topic-6 | Farmers, People, Modi, Farm, Laws, |
| Topic-7 | Farmers, Like, Support, |
| Topic-8 | Farmers, AMP, India, Laws, Farm, New, Support, |
| Topic-9 | Farmers, Laws, Farm, Talks, Government, |
| Topic-10 | Farmers, AMP, India, BJP, |
Fig. 11Word-frequency count of collected tweets
The accuracy (Acc) and loss obtained by the proposed KEAHT model
| Dataset | Training-Acc | Validation-Acc | Testing-Acc | Training Loss | Validation Loss |
|---|---|---|---|---|---|
| COVID-19 Vaccine | 92.84% | 89.29% | 91% | 23.68% | 34.64% |
| Indian farmer protest | 92.63% | 81.43% | 81.49% | 23.08% | 54.75% |
Fig. 12Training history of the proposed KEAHT model
Fig. 13Confusion matrix calculated by the proposed KEAHT model
The classification report of the proposed KEAHT model
| COVID-19 Vaccine-associated tweets | ||||
|---|---|---|---|---|
| Sentiment | Precision | Recall | F1-Score | Support |
| Positive | 0.93 | 0.98 | 0.95 | 130 |
| Neutral | 0.92 | 0.90 | 0.91 | 140 |
| Negative | 0.72 | 0.62 | 0.67 | 29 |
| Micro-average | 0.86 | 0.83 | 0.84 | 299 |
| Weighted-average | 0.90 | 0.91 | 0.90 | 299 |
| Indian farmer protest-associated tweets | ||||
| Positive | 0.84 | 0.87 | 0.85 | 141 |
| Neutral | 0.87 | 0.85 | 0.86 | 140 |
| Negative | 0.60 | 0.59 | 0.60 | 54 |
| Micro-average | 0.77 | 0.77 | 0.77 | 335 |
| Weighted-average | 0.81 | 0.81 | 0.81 | 335 |
Comparative results of the existing and proposed KEAHT model
| Dataset | Approach | Model | Accuracy | Precision | Recall | F1-Score |
|---|---|---|---|---|---|---|
| COVID-19 vaccine-associated tweets | Machine Learning | Ada Boost | 78% | 80 | 78 | 76 |
| Extra Tree | 80% | 80 | 80 | 79 | ||
| Gradient-Boost | 71% | 72 | 71 | 67 | ||
| Random Forest | 80% | 81 | 80 | 78 | ||
| Extreme-Gradient Boosting | 77% | 83 | 77 | 75 | ||
| Light GBM | 69% | 67 | 69 | 67 | ||
| Deep-Learning | Sequential RELU | 72% | 74 | 72 | 73 | |
| CNN | 82% | 67 | 82 | 74 | ||
| RNN LSTM | 82% | 68 | 81 | 74 | ||
| Transfer Learning | BERT-BASIC | 76% | 70 | 77 | 73 | |
| Hybrid | KEAHT-Proposed | |||||
| Indian farmer protest-associated tweets | Machine Learning | Ada Boost | 70% | 76 | 70 | 67 |
| Extra Tree | 75% | 75 | 75 | 74 | ||
| Gradient-Boost | 59% | 76 | 59 | 51 | ||
| Random Forest | 76% | 77 | 76 | 75 | ||
| Extreme-Gradient Boosting | 71% | 76 | 71 | 67 | ||
| Light GBM | 66% | 64 | 66 | 63 | ||
| Deep-Learning | Sequential RELU | 68% | 65 | 66 | 65 | |
| CNN | 65% | 70 | 65 | 67 | ||
| RNN LSTM | 67% | 78 | 67 | 54 | ||
| Transfer Learning | BERT-BASIC | 76% | 77 | 77 | 77 | |
| Hybrid | KEAHT-Proposed |
Description of existing state-of-the-art hybrid models of sentiment analysis
| Author | Model | Method |
|---|---|---|
| (Asghar et al. 2017) [ | T-SAF | The Twitter-Sentiment Analysis Framework (T-SAF) is a hybrid model proposed for classifying the tweets using an emoticon, slang, and domain-specific SentiWordNet classifier |
| (Zainuddin et al. 2017) [ | ABSA + SentiWordNet + PCA + SVM | The complete Aspect-based Sentiment Analysis (ABSA) hybrid model has been proposed with the joint capability of SentiWordNet, Principle Component Analysis (PCA), and SVM |
| (Ma et al. 2018) [ | Sentic-LSTM | An explicit knowledge enhances Sentic-LSTM hybrid model was proposed for targeted aspect-based sentiment analysis |
| (Liu et al. 2019) [ | AttDR-2DCNN, | An Attention-based Bidirectional and two-Dimensional Convolutional Neural Network (AttDR-2DCNN) was introduced for document-level sentiment classification |
| (Du et al. 2019) [ | BGRU + Capsule | A Bi-GRU, a capsule-based hybrid model, has been proposed with the implicit semantic calculation facility |
| (Meskele1 and Fransincar, 2019) [ | ALDONA | A Lexicalized Domain Ontology and Neural Attention Model (ALDONA) has been introduced to handle the multiple polarity aspects in text classification |
| (Meskele2 and Fransincar, 2020) [ | ALDONAr | A Lexicalized Domain Ontology and Regularized Neural Attention Model (ALDONAr), built for aspect-based sentence-level sentiment analysis |
| (Pathak et al. 2021) [ | TLSA | A Topic-Level Sentiment Analysis (TLSA) model has been proposed for extracting the opinion of people regarding different cryptocurrencies using the deep learning approach |
Fig. 14Comparative performance (Accuracy) of the existing state-of-the-art and proposed KEAHT hybrid model