| Literature DB >> 34660170 |
Abstract
As data grow rapidly on social media by users' contributions, specially with the recent coronavirus pandemic, the need to acquire knowledge of their behaviors is in high demand. The opinions behind posts on the pandemic are the scope of the tested dataset in this study. Finding the most suitable classification algorithms for this kind of data is challenging. Within this context, models of deep learning for sentiment analysis can introduce detailed representation capabilities and enhanced performance compared to existing feature-based techniques. In this paper, we focus on enhancing the performance of sentiment classification using a customized deep learning model with an advanced word embedding technique and create a long short-term memory (LSTM) network. Furthermore, we propose an ensemble model that combines our baseline classifier with other state-of-the-art classifiers used for sentiment analysis. The contributions of this paper are twofold. (1) We establish a robust framework based on word embedding and an LSTM network that learns the contextual relations among words and understands unseen or rare words in relatively emerging situations such as the coronavirus pandemic by recognizing suffixes and prefixes from training data. (2) We capture and utilize the significant differences in state-of-the-art methods by proposing a hybrid ensemble model for sentiment analysis. We conduct several experiments using our own Twitter coronavirus hashtag dataset as well as public review datasets from Amazon and Yelp. For concluding results, a statistical study is carried out indicating that the performance of these proposed models surpasses other models in terms of classification accuracy. © King Fahd University of Petroleum & Minerals 2021.Entities:
Keywords: COVID-19; Coronavirus; Data mining; Deep learning; Ensemble algorithms; Machine learning; Pandemic; Sentiment analysis; Social media
Year: 2021 PMID: 34660170 PMCID: PMC8502794 DOI: 10.1007/s13369-021-06227-w
Source DB: PubMed Journal: Arab J Sci Eng ISSN: 2191-4281 Impact factor: 2.807
Fig. 1Proposed deep learning ensemble model for sentiment analysis
Web 2.0 data description
| Web app | # Records | Pos/Neg distribution |
|---|---|---|
| 4242 | 58% / 42% | |
| MySpace | 1041 | 85% / 15% |
| YouTube | 3407 | 68% / 32% |
| BBC | 1000 | 14% / 86% |
| Runners World | 1046 | 68% / 32% |
| Digg | 1077 | 27% / 73% |
Total number of records along with the distribution of positive and negative labels for Web 2.0 datasets
Fig. 2Data preprocessing pipeline for our datasets
Translating emoticons and emojis to sentiment polarity
Table showing different combinations of characters with their corresponding meanings in terms of emotions, sentiments and polarity
Fig. 3Model selection process using different sets of hyperparameters for the proposed deep learning language model
Fig. 4Model evaluation of the proposed deep learning algorithm using accuracy and loss curves during training and validation on the COVID-19 dataset
Evaluation of the customized ensemble deep learning language model on the Twitter COVID-19 dataset using different sets of hyperparameters
| # Neurons | 100 | 200 | 300 |
|---|---|---|---|
| # Hidden Layers = 1 | 80.55% | 81.90% | 83.25% |
| # Hidden Layers = 2 | 88.40% | 91.26% | |
| # Hidden Layers = 3 | 87.33% | 90.65% | 92.18% |
| # Hidden Layers = 1 | 80.35% | 81.66% | 80.28% |
| # Hidden Layers = 2 | 86.20% | 89.15% | |
| # Hidden Layers = 3 | 86.33% | 89.57% | 88.72% |
| # Hidden Layers = 2 | – | – | |
Measures in bold show the best classification accuracy for different hyperparameter settings of hidden layers and numbers of neurons in the network. For this table, experimental results are reported using Twitter COVID-19 training, validation, and testing datasets
Comparative performance on sets 1 and 2 comprising Twitter, Amazon, and Yelp datasets
| Dataset | Custom DLL | Microsoft | IBM | Proposed ensemble | |
|---|---|---|---|---|---|
| 90.25% | 87.10% | 88.25% | 84.40% | ||
| Amazon reviews | 95.70% | 93.55% | 94.20% | 89.33% | |
| Yelp reviews | 96.66% | 95.28% | 95.90% | 94.90% |
The results highlight our ensemble deep learning language model on sets 1 and 2 datasets. Our model consistently outperformed other existing classifiers
Statistical significance testing of algorithms for classification
| Algorithm | |
|---|---|
| Proposed ensemble > Google | |
| Proposed ensemble > Microsoft | |
| Proposed ensemble > IBM |
p values were calculated by pairwise binomial tests the on Twitter COVID-19 dataset. C1 “>” C2 indicates that C1 produces better results than C2 in a statistical manner
Comparative performance on set 3 comprising Web 2.0 datasets
| Dataset | Custom DLL | Microsoft | IBM | Proposed ensemble | |
|---|---|---|---|---|---|
| 72.2% | 71.5% | 70.8% | 68.1% | ||
| MySpace | 83.5% | 84.2% | 85.8% | 80.9% | |
| YouTube | 78.9% | 79.5% | 77.5% | 74.4% | |
| BBC | 31.4% | 29.7% | 30.5% | 27.1% | |
| Runners World | 76.6% | 78.2% | 77.4% | 71.5% | |
| Digg | 46.5% | 48.2% | 46.8% | 42.4% |
The results highlight our ensemble deep learning language model on Web 2.0 dataset. Our model consistently outperformed other existing classifiers