| Literature DB >> 34754140 |
Vimala Balakrishnan1, Zhongliang Shi1, Chuan Liang Law2, Regine Lim1, Lee Leng Teh3, Yue Fan1.
Abstract
We present a benchmark comparison of several deep learning models including Convolutional Neural Networks, Recurrent Neural Network and Bi-directional Long Short Term Memory, assessed based on various word embedding approaches, including the Bi-directional Encoder Representations from Transformers (BERT) and its variants, FastText and Word2Vec. Data augmentation was administered using the Easy Data Augmentation approach resulting in two datasets (original versus augmented). All the models were assessed in two setups, namely 5-class versus 3-class (i.e., compressed version). Findings show the best prediction models were Neural Network-based using Word2Vec, with CNN-RNN-Bi-LSTM producing the highest accuracy (96%) and F-score (91.1%). Individually, RNN was the best model with an accuracy of 87.5% and F-score of 83.5%, while RoBERTa had the best F-score of 73.1%. The study shows that deep learning is better for analyzing the sentiments within the text compared to supervised machine learning and provides a direction for future work and research.Entities:
Keywords: Customer reviews; Deep learning; Ensemble models; Sentiment rating; Word embeddings
Year: 2021 PMID: 34754140 PMCID: PMC8569508 DOI: 10.1007/s11227-021-04169-6
Source DB: PubMed Journal: J Supercomput ISSN: 0920-8542 Impact factor: 2.557
Summary of studies using deep learning approaches in review sentiment analysis
| References | Datasets | Technique/Algorithms | Result |
|---|---|---|---|
| [ | Traveloka—Indonesian | RNN – Word2Vec | Accuracy: 91.9% |
| [ | Hotel reviews—Chinese | Bi-LSTM; LSTM; RNN; CNN – Word2Vec | |
| [ | Drug reviews | CNN; LSTM BERT-LSTM | |
| [ | Movie Reviews – Rotten Tomatoes English | RNN; RNTN; CNN Bi-LSTM; BERT | Accuracy: Bert Base 94.0% – BERT large 94.7% |
| [ | Amazon and IMDB—English | LSTM-CNN-GS | Accuracy – 97.8% |
| [ | Eight different Amazon products: amazon instant video, books, electronics, home and kitchen, movie review, media, kindle, and camera—English | Selective Memory based CNN | Average accuracy – 92.85 |
| [ | JD.com—Chinese | BERT-CNN | |
| [ | JD.com—Chinese | BERT-CNN | Accuracy—95.7% |
| [ | Twitter – Italian | BERT; ALBERTo | Average |
BERT, Bi-directional Encoder Representations from Transformers; RNN, Recurrent Neural Network; CNN, Convolutional Neural Network; LSTM, Long Term Short Memory
Fig. 1Overall proposed methodology
Fig. 2Word cloud overview for reviews with scores 1 and 5
Types of data augmentation used in the present study [36]
| Operations | Description | Example |
|---|---|---|
| Original text | – | The quick brown fox jumps over the lazy dog |
| Random swap | Two words are randomly selected and swapped | The |
| Random deletion | Randomly remove a word from the sentence | The quick brown jumps over the lazy dog |
| Random insertion | Randomly introduce and insert a new word | The quick |
| Synonym replacement | Selecting | The quick sluggish |
Bold words refer to the changes made as per the operation listed
Pre-processing steps
| Pre-processing steps | Examples |
|---|---|
| Convert the text to lowercase | I love these dresses SO MUCH!!!! |
| I love these dresses so much!!!! | |
| Remove leading and trailing spaces | I love these dresses so much!!!! |
| I love these dresses so much!!!! | |
| Remove of punctuations, numbers, special characters | I love these dresses so much!!!! |
| I love these dresses so much | |
| Remove of stop words | I love these dresses so much |
| I love dresses so much | |
| Lemmatization | I love dresses so much |
| I love dress so much |
Fig. 3General Neural Network Architecture [45]
Performance of NN models in percentage (%) for the original dataset: Word2Vec versus FastText
| Feature extraction | Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|---|
| 5-class | ||||||
| | CNN | 37.96 | 32.96 | 31.56 | 59.07 | 61.65 |
| RNN | 43.80 | 41.47 | 41.92 | 64.76 | 62.22 | |
| Bi-LSTM | 45.95 | 42.51 | ||||
| | CNN | 37.45 | 32.94 | 31.11 | 58.78 | 61.14 |
| RNN | 43.36 | 40.34 | 41.05 | 64.16 | 62.08 | |
| Bi-LSTM | 44.93 | 40.90 | 41.33 | 64.48 | 62.83 | |
| 3-class | ||||||
| | CNN | 58.56 | 50.99 | 52.30 | 65.16 | 80.79 |
| RNN | 60.89 | 57.49 | 70.65 | 81.42 | ||
| Bi-LSTM | 60.51 | 57.99 | 58.31 | |||
| | CNN | 57.14 | 49.84 | 50.36 | 64.27 | 80.52 |
| RNN | 60.75 | 57.10 | 58.27 | 70.29 | 81.47 | |
| Bi-LSTM | 60.18 | 57.68 | 58.02 | 70.92 | 81.22 | |
Best scores in bold
Performance of NN models in percentage (%) for the augmented dataset: Word2Vec versus FastText
| Feature extraction | Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|---|
| 5-class | ||||||
| | CNN | 72.52 | 68.05 | 69.77 | 80.89 | 80.00 |
| RNN | 83.55 | 83.60 | ||||
| Bi-LSTM | 75.32 | 76.24 | 75.62 | 85.70 | 83.20 | |
| | CNN | 66.52 | 59.89 | 62.03 | 76.00 | 75.25 |
| RNN | 74.38 | 73.99 | 74.11 | 84.14 | 81.25 | |
| Bi-LSTM | 66.76 | 67.38 | 66.76 | 80.52 | 77.95 | |
| 3-class | ||||||
| | CNN | 83.63 | 78.51 | 80.72 | 85.15 | 90.98 |
| RNN | 88.85 | 90.76 | ||||
| Bi-LSTM | 84.06 | 87.42 | 85.57 | 91.63 | 92.87 | |
| | CNN | 76.05 | 68.34 | 71.41 | 78.18 | 87.60 |
| RNN | 83.72 | 85.27 | 84.45 | 90.09 | 92.36 | |
| Bi-LSTM | 81.89 | 84.11 | 82.94 | 89.51 | 91.74 | |
Best scores in bold
Performance of BERT models in percentage (%) for the original dataset: 5- versus 3-class setups
| Class | Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|---|
| 5-class | BERT | 57.03 | 51.18 | 52.28 | 77.30 | 69.34 |
| ALBERT | 52.45 | 51.36 | 51.44 | 75.96 | 67.88 | |
| RoBERTa | 55.37 | 54.79 | 54.69 | 77.74 | 69.99 | |
| 3-class | BERT | 68.93 | 68.39 | 68.54 | 84.92 | 85.48 |
| ALBERT | 66.73 | 67.84 | 67.19 | 84.95 | 84.55 | |
| RoBERTa | 70.08 | 71.49 |
Best scores in bold
Performance of BERT models in percentage (%) for the augmented dataset: 5- versus 3-class setups
| Class | Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|---|
| 5-class | BERT | 57.48 | 53.32 | 53.03 | 80.45 | 73.55 |
| ALBERT | 54.76 | 51.54 | 52.49 | 76.34 | 69.03 | |
| RoBERTa | 59.26 | 57.02 | 57.79 | 79.13 | 72.44 | |
| 3-class | BERT | 71.34 | 70.89 | 71.01 | 86.31 | 86.65 |
| ALBERT | 69.27 | 68.51 | 68.83 | 84.53 | 85.57 | |
| RoBERTa | 73.25 | 73.04 |
Best scores in bold
Performance for the ensemble models in percentage (%) for the augmented dataset
| Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|
| CNN-RNN | 90.9 | 87.5 | 89.2 | 98.8 | 94.8 |
| CNN – Bi-LSTM | 88.8 | 86.2 | 87.3 | 98.5 | 93.8 |
| RNN – Bi-LSTM | 90.8 | 91.3 | 91.1 | 99.1 | 95.6 |
| CNN – RNN—Bi-LSTM |
Best scores in bold
Performance for the machine learning models in percentage (%)
| Model | Precision | Recall | AUC | Accuracy | |
|---|---|---|---|---|---|
| Logistic regression | 43.93 | 35.97 | 37.68 | 64.14 | 62.26 |
| Naïve Bayes | 43.76 | 38.40 | 39.90 | 66.15 | 62.12 |
| Decision tree | 43.88 | 30.27 | 30.84 | 66.30 | 60.20 |
| Random Forest | 46.15 | 26.27 | 24.80 | 55.02 | 59.17 |
| Support vector machine | 37.71 | 36.21 | 36.82 | 64.23 | 56.11 |