| Literature DB >> 35632116 |
Hager Saleh1, Sherif Mostafa1, Abdullah Alharbi2, Shaker El-Sappagh3,4, Tamim Alkhalifah5.
Abstract
Sentiment analysis was nominated as a hot research topic a decade ago for its increasing importance in analyzing the people's opinions extracted from social media platforms. Although the Arabic language has a significant share of the content shared across social media platforms, its content's sentiment analysis is still limited due to its complex morphological structures and the varieties of dialects. Traditional machine learning and deep neural algorithms have been used in a variety of studies to predict sentiment analysis. Therefore, a need of changing current mechanisms is required to increase the accuracy of sentiment analysis prediction. This paper proposed an optimized heterogeneous stacking ensemble model for enhancing the performance of Arabic sentiment analysis. The proposed model combines three different of pre-trained Deep Learning (DL) models: Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU) in conjunction with three meta-learners Logistic Regression (LR), Random Forest (RF), and Support Vector Machine (SVM) in order to enhance model's performance for predicting Arabic sentiment analysis. The performance of the proposed model with RNN, LSTM, GRU, and the five regular ML techniques: Decision Tree (DT), LR, K-Nearest Neighbor (KNN), RF, and Naive Bayes (NB) are compared using three benchmarks Arabic dataset. Parameters of Machine Learning (ML) and DL are optimized using Grid search and KerasTuner, respectively. Accuracy, precision, recall, and f1-score were applied to evaluate the performance of the models and validate the results. The results show that the proposed ensemble model has achieved the best performance for each dataset compared with other models.Entities:
Keywords: Arabic sentiment analysis; deep learning; ensemble learning; machine learning
Mesh:
Year: 2022 PMID: 35632116 PMCID: PMC9147256 DOI: 10.3390/s22103707
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.847
Figure 1Steps of the predicting sentiment analysis for Arabic data.
Figure 2The logistic regression boundary curve.
Figure 3An illustration of the Decision tree.
Figure 4The RF which consists of three different decision trees. Each one was trained using a subset of the training dataset.
Figure 5The K-nearest neighbor diagram.
Figure 6The architecture of the proposed stacking ensemble model.
Figure 7Compressed representation for the RNN [62].
Figure 8Unfolded network representation for the the RNN [62].
Figure 9Representation for the Long Short Term Memory.
Figure 10Illustrate Gated Recurrent Unit.
The best values parameters of DL models for each dataset.
| Dataset | Models | Neurons | Dropout | reg_rate1 |
|---|---|---|---|---|
| ASTC dataset | RNN | 300 | 0.7 | 0.0001 |
| LSTM | 150 | 0.6 | 0.01 | |
| GRU | 300 | 0.2 | 0.4 | |
| ArTwitter dataset | RNN | 1000 | 0.4 | 0.7 |
| LSTM | 950 | 0.2 | 0.2 | |
| GRU | 500 | 0.4 | 0.05 | |
| AJGT dataset | RNN | 500 | 0.5 | 0.4 |
| LSTM | 400 | 0.8 | 0.0006 | |
| GRU | 750 | 0.8 | 0.05 |
The performance results of ML, DL, and the proposed models for ASTC dataset.
| Approach Models | Models | Matrix Size | Cross Validation Performance | Test Performance | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | PRE | REC | F1 | ACC | PRE | REC | F1 | |||
| ML models | DT | Unigram | 92.16 | 92.13 | 92.14 | 92.12 | 89.55 | 89.57 | 89.55 | 89.55 |
| Bi-gram | 77.54 | 82.83 | 77.56 | 76.61 | 75.97 | 81.11 | 75.97 | 74.96 | ||
| Tri-gram | 73.67 | 77.13 | 63.68 | 58.55 | 62.93 | 63.15 | 62.93 | 62.75 | ||
| Four-gram | 60.97 | 72.27 | 60.95 | 54.29 | 60.39 | 72.36 | 60.39 | 53.44 | ||
| KNN | Unigram | 90.72 | 90.72 | 90.72 | 90.72 | 88.58 | 88.6 | 88.58 | 88.57 | |
| Bi-gram | 83.45 | 83.78 | 83.45 | 83.41 | 69.79 | 74.54 | 69.79 | 68.31 | ||
| Tri-gram | 72.24 | 81.02 | 72.24 | 70.17 | 66.92 | 67.11 | 66.92 | 66.85 | ||
| Four-gram | 71.27 | 74.89 | 71.27 | 70.33 | 64.3 | 64.63 | 64.3 | 64.12 | ||
| LR |
|
|
|
|
|
|
|
|
| |
| Bi-gram | 86.82 | 87.42 | 86.82 | 86.77 | 80.79 | 82.54 | 80.79 | 80.54 | ||
| Tri-gram | 79.1 | 83.38 | 79.1 | 78.43 | 70.81 | 78.64 | 70.81 | 68.71 | ||
| Four-gram | 75.23 | 82.05 | 75.23 | 73.87 | 67.08 | 78.13 | 67.08 | 63.57 | ||
| RF | Unigram | 92.79 | 92.87 | 92.85 | 92.78 | 90.68 | 90.72 | 90.68 | 90.67 | |
| Bi-gram | 78.28 | 82.86 | 78.53 | 78.26 | 78.18 | 78.19 | 78.18 | 78.18 | ||
| Tri-gram | 66.02 | 76.97 | 66.03 | 62.25 | 66.78 | 69.37 | 66.78 | 65.59 | ||
| Four-gram | 63.35 | 75.94 | 63.41 | 58.26 | 63.83 | 69.36 | 63.83 | 60.96 | ||
| NB | Unigram | 87.51 | 87.63 | 87.51 | 87.5 | 86.09 | 86.13 | 86.09 | 86.09 | |
| Bi-gram | 86.15 | 86.67 | 86.15 | 86.1 | 78.97 | 79.98 | 78.97 | 78.78 | ||
| Tri-gram | 78.95 | 82.7 | 78.95 | 78.34 | 67.91 | 68.95 | 67.91 | 67.43 | ||
| Four-gram | 74.76 | 78.02 | 74.76 | 74.03 | 64.05 | 66.79 | 64.05 | 62.45 | ||
| DL models | RNN | CBOW | 94.92 | 94.92 | 94.92 | 94.92 | 90.18 | 90.18 | 90.18 | 90.17 |
|
|
|
|
|
|
|
|
|
|
| |
| GRU | CBOW | 94.89 | 94.87 | 94.87 | 94.87 | 88.6 | 88.61 | 88.6 | 88.6 | |
| The proposed model |
|
|
|
|
|
|
|
|
|
|
| Stacking SVM | CBOW | 98.07 | 98.07 | 98.07 | 98.07 | 92.1 | 92.11 | 92.1 | 92.1 | |
| Stacking RF | CBOW | 97.27 | 97.28 | 97.27 | 97.27 | 91.98 | 91.99 | 91.98 | 91.98 | |
The performance results of ML, DL, and the proposed model for dataset ArTwitter.
| Approach Models | Models | Matrix Size | Cross-Validation Performance | Test Performance | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | PRE | REC | F1 | ACC | PRE | REC | F1 | |||
| ML models | DT | Unigram | 76.79 | 77.49 | 76.28 | 77.73 | 72.63 | 73.67 | 72.63 | 72.41 |
| Bi-gram | 60.13 | 76.05 | 60.32 | 53.15 | 56.27 | 75.18 | 56.27 | 46.81 | ||
| Tri-gram | 53.08 | 70.64 | 53.33 | 38.82 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| Four-gram | 52.69 | 70.2 | 52.63 | 38.16 | 51.92 | 75.28 | 51.92 | 36.57 | ||
| KNN | Unigram | 79.42 | 79.84 | 79.42 | 79.36 | 66.5 | 69.88 | 66.5 | 64.77 | |
| Bi-gram | 51.22 | 73.0 | 51.22 | 36.92 | 49.62 | 75.13 | 49.62 | 33.47 | ||
| Tri-gram | 48.97 | 31.17 | 48.97 | 32.61 | 51.41 | 75.14 | 51.41 | 35.46 | ||
| Four-gram | 49.36 | 35.3 | 49.36 | 35.96 | 50.64 | 50.18 | 50.64 | 45.81 | ||
| LR |
|
|
|
|
|
|
|
|
| |
| Bi-gram | 68.27 | 77.41 | 68.27 | 65.6 | 56.78 | 74.04 | 56.78 | 47.97 | ||
| Tri-gram | 53.97 | 74.64 | 53.97 | 40.95 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| Four-gram | 53.4 | 75.68 | 53.4 | 39.61 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| RF | Unigram | 78.78 | 80.44 | 78.91 | 78.6 | 73.15 | 74.0 | 73.15 | 72.97 | |
| Bi-gram | 60.71 | 75.62 | 60.51 | 54.84 | 57.03 | 75.55 | 57.03 | 48.15 | ||
| Tri-gram | 53.33 | 75.66 | 53.46 | 39.58 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| Four-gram | 53.33 | 75.63 | 53.33 | 39.23 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| NB | Unigram | 84.23 | 85.1 | 84.23 | 84.1 | 74.17 | 75.68 | 74.17 | 73.7 | |
| Bi-gram | 60.83 | 75.18 | 60.83 | 53.78 | 55.5 | 66.49 | 55.5 | 45.7 | ||
| Tri-gram | 53.97 | 74.64 | 53.97 | 40.95 | 52.17 | 75.34 | 52.17 | 37.12 | ||
| Four-gram | 53.4 | 75.68 | 53.4 | 39.61 | 51.17 | 74.34 | 51.17 | 31.12 | ||
| DL models | RNN | CBOW | 87.12 | 87.12 | 87.12 | 87.12 | 81.86 | 81.88 | 81.86 | 81.77 |
| LSTM | CBOW | 87.83 | 87.83 | 87.83 | 87.83 | 81.33 | 81.6 | 81.33 | 81.27 | |
|
|
|
|
|
|
|
|
|
|
| |
| The proposed model | Stacking LR | CBOW | 91.99 | 92.07 | 91.99 | 91.99 | 82.35 | 82.93 | 82.35 | 82.3 |
|
|
|
|
|
|
|
|
|
|
| |
| Stacking RF | CBOW | 92.12 | 92.2 | 92.12 | 92.11 | 82.86 | 83.14 | 82.86 | 82.85 | |
The performance results of ML, DL, and the proposed models for AJGT dataset.
| Approach Models | Models | Matrix Size | Cross Validation Performance | Test Performance | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | PRE | REC | F1 | ACC | PRE | REC | F1 | |||
| ML models | DT | Unigram | 78.82 | 79.1 | 77.92 | 79.28 | 71.39 | 72.44 | 71.39 | 71.05 |
| Bi-gram | 60.76 | 75.31 | 60.97 | 53.26 | 56.39 | 69.18 | 56.39 | 47.66 | ||
| Tri-gram | 51.53 | 70.41 | 51.39 | 36.45 | 51.11 | 75.28 | 51.11 | 35.76 | ||
| Four-gram | 50.0 | 35.05 | 50.07 | 33.64 | 50.0 | 25.0 | 50.0 | 33.33 | ||
| KNN | Unigram | 78.26 | 79.3 | 78.26 | 78.06 | 68.89 | 70.09 | 68.89 | 68.42 | |
| Bi-gram | 51.18 | 60.19 | 51.18 | 39.84 | 50.28 | 51.16 | 50.28 | 38.6 | ||
| Tri-gram | 50.0 | 25.0 | 50.0 | 33.33 | 50.28 | 75.07 | 50.28 | 33.95 | ||
| Four-gram | 50.07 | 27.67 | 50.07 | 34.23 | 50.0 | 25.0 | 50.0 | 33.33 | ||
| LR |
|
|
|
|
|
|
|
|
| |
| Bi-gram | 67.99 | 76.69 | 67.99 | 65.11 | 56.94 | 67.97 | 56.94 | 49.14 | ||
| Tri-gram | 53.54 | 75.21 | 53.54 | 40.79 | 51.11 | 75.28 | 51.11 | 35.76 | ||
| Four-gram | 50.56 | 53.47 | 50.56 | 34.66 | 50.0 | 25.0 | 50.0 | 33.33 | ||
| RF | Unigram | 78.26 | 80.81 | 78.96 | 78.07 | 75.83 | 77.56 | 75.83 | 75.45 | |
| Bi-gram | 60.76 | 76.34 | 60.56 | 53.59 | 56.94 | 68.83 | 56.94 | 48.88 | ||
| Tri-gram | 52.43 | 65.32 | 52.29 | 38.32 | 51.11 | 75.28 | 51.11 | 35.76 | ||
| Four-gram | 50.42 | 50.09 | 50.49 | 33.94 | 50.0 | 50.0 | 50.0 | 47.92 | ||
| NB | Unigram | 83.47 | 83.85 | 83.47 | 83.43 | 76.94 | 77.01 | 76.94 | 76.93 | |
| Bi-gram | 60.9 | 72.04 | 60.9 | 54.67 | 56.94 | 67.97 | 56.94 | 49.14 | ||
| Tri-gram | 53.54 | 75.21 | 53.54 | 40.79 | 51.11 | 75.28 | 51.11 | 35.76 | ||
| Four-gram | 50.56 | 53.47 | 50.56 | 34.66 | 50.0 | 25.0 | 50.0 | 33.33 | ||
| DL models | RNN | CBOW | 86.86 | 86.86 | 86.86 | 86.86 | 82.78 | 83.04 | 82.78 | 82.74 |
|
|
|
|
|
|
|
|
|
|
| |
| GRU | CBOW | 89.01 | 89.01 | 89.01 | 89.01 | 84.72 | 84.9 | 84.72 | 84.7 | |
| The proposed |
|
|
|
|
|
|
|
|
|
|
| Stacking SVM | CBOW | 93.4 | 93.48 | 93.4 | 93.4 | 86.01 | 86.01 | 86.01 | 86.01 | |
| Stacking RF | CBOW | 92.9 | 93.05 | 92.99 | 92.98 | 85.83 | 85.89 | 85.83 | 85.83 | |
Figure 11Comparison of performance the best models for ASTC dataset, (a) Cross-validation performance and (b) testing performance.
Figure 12Comparison of performance the best models for ArTwitter, (a) Cross-validation performance and (b) testing performance.
Figure 13Comparison of performance the best models for AJGT dataset, (a) Cross-validation performance and (b) testing performance.
The comparison of results to previous studies.
| Paper | Alg. | Dataset | Performance |
|---|---|---|---|
| Alayba et al. [ | CNN+LSTM | ASTD | 77% of ACC |
| ArTwitter | 88% of ACC | ||
| Hanane Elfaik [ | Bi-LSTM | ASTD | 76.83% of ACC |
| ArTwitter | 92.39% of ACC | ||
| Al-Saqqa et al. [ | voting algorithm based on | ArTwitter | 86% of ACC |
| Al-Azani et al. [ | CNN and LSTM | ArTwitter | 86.45% of ACC |
| Alomari et al. [ | SVM | AJGT | 88.72% of ACC |
| Al-Azani et al. [ | Voting, Bagging, | Arabic Tweets | 85% of F1 |
| The proposed stacking model | The pre-trained RNN, | ASTC dataset | For cross-validation, |
| For testing, | |||
| The pre-trained RNN, | ArTwitter | For cross-validation, | |
| For testing, | |||
| The pre-trained RNN, | AJGT | For cross-validation, | |
| For testing, |