| Literature DB >> 35494818 |
Muhammad Zaid Naeem1, Furqan Rustam1, Arif Mehmood2, Imran Ashraf3, Gyu Sang Choi3.
Abstract
The Internet Movie Database (IMDb), being one of the popular online databases for movies and personalities, provides a wide range of movie reviews from millions of users. This provides a diverse and large dataset to analyze users' sentiments about various personalities and movies. Despite being helpful to provide the critique of movies, the reviews on IMDb cannot be read as a whole and requires automated tools to provide insights on the sentiments in such reviews. This study provides the implementation of various machine learning models to measure the polarity of the sentiments presented in user reviews on the IMDb website. For this purpose, the reviews are first preprocessed to remove redundant information and noise, and then various classification models like support vector machines (SVM), Naïve Bayes classifier, random forest, and gradient boosting classifiers are used to predict the sentiment of these reviews. The objective is to find the optimal process and approach to attain the highest accuracy with the best generalization. Various feature engineering approaches such as term frequency-inverse document frequency (TF-IDF), bag of words, global vectors for word representations, and Word2Vec are applied along with the hyperparameter tuning of the classification models to enhance the classification accuracy. Experimental results indicate that the SVM obtains the highest accuracy when used with TF-IDF features and achieves an accuracy of 89.55%. The sentiment classification accuracy of the models is affected due to the contradictions in the user sentiments in the reviews and assigned labels. For tackling this issue, TextBlob is used to assign a sentiment to the dataset containing reviews before it can be used for training. Experimental results on TextBlob assigned sentiments indicate that an accuracy of 92% can be obtained using the proposed model.Entities:
Keywords: Bag of words; Movies reviews; Sentiment classification; Supervised machine learning; Text analysis
Year: 2022 PMID: 35494818 PMCID: PMC9044332 DOI: 10.7717/peerj-cs.914
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Comprehensive summary of research works discussed in the related work.
| Reference | Approach | Model | Aim |
|---|---|---|---|
|
| Lexicon-Based | SentiWordNet | Movie review classification |
|
| Machine Learning | RLPI, Hybrid Features, KNN | IMDb reviews classification |
|
| Deep Learning | CNN LSTM | IMDb reviews classification |
|
| Machine Learning | BoW-DOUBLE and Average emotion-DOUBLE | IMDb reviews classification |
|
| Deep Learning | CNN | IMDb reviews classification |
|
| Deep Learning | Multilayer perceptron, CNN and LSTM | IMDb reviews classification |
|
| Deep Learning | Bi-LSTM | IMDb review and Stanford sentiment treebank v2 (SST2) |
|
| Deep Learning | LSTM | IMDb reviews classification |
|
| Deep & Machine Learning | NN | IMDb reviews classification |
|
| Deep Learning | CNN | IMDb reviews classification |
|
| Machine Learning | SVM + (SVM-RFE) | IMDb reviews classification |
|
| Machine Learning | NB + ARM | IMDb reviews classification |
Description of IMDb dataset variables.
| Review | Label |
|---|---|
| Gwyneth Paltrow is absolutely great in this mo… | 0 |
| I own this movie. Not by choice, I do. I was r… | 0 |
| Well I guess it supposedly not a classic becau… | 1 |
| I am, as many are, a fan of Tony Scott films… | 0 |
| I wish “that ‘70s show” would come back on tel… | 1 |
Contradiction in TextBlob and original dataset labels.
| Review | TextBlob | Original |
|---|---|---|
| Movie makers always author work mean yes things condensed sake viewer interest look Anne Green gables wonderful job combining important events cohesive whole simply delightful believe chose combine three novels together Anne Avonlea dreadful mess look missed Paul Irving little Elizabeth widows windy poplars Anne college years heaven sake delightful meet Priscilla rest redmond gang Kevin Sullivan taken things one movie time instead jumbling together combining characters events way movie good leave novels montgomery beautiful work something denied movie let seeing successful way brough Anne green gables life | Positive | Negative |
Hyperparameters used for optimizing the performance of models.
| Model | Hyperparameters | Values range used for tuning |
|---|---|---|
| RF | n_estimators = 300, random_state = 50, max_depth = 300 | n_estimators = {50 to 500}, random_state = {2 to 60}, max_depth = {50 to 500} |
| SVM | kernel= ‘linear’, C = 3.0, random_state = 50 | Kernel = {‘linear’ ‘poly’, ‘sigmoid’}, C = {1.0 to 5.0}, random_state = {2 to 60} |
| DT | random_state = 50, max_depth = 300 | random_state = {2 to 60}, max_depth = {50 to 500} |
| GBC | n_estimators = 300, random_state = 50, max_depth = 300, learning_rate = 0.2 | n_estimators = {50 to 500}, random_state = {2 to 60}, max_depth = {50 to 500}, learning_rate = {0.1 to 0.8} |
Figure 1The work flow of proposed methodology for movie review classification.
Figure 2Preprocessing steps for movies review dataset.
Text from sample review before and after punctuation removal.
| Before puncutation removal | After punctuation removal |
|---|---|
| @Gwyneth Paltrow is absolutely… !!!great in this movie | Gwyneth Paltrow is absolutely great in this movie |
| I own this movie. This is number 1 movie… I didn’t like by choice, I do | I own this movie This is number 1 movie I didnt like by choice I do |
| I wish “that ‘70s show” would come back on tel | I wish that 70s show would come back on tel |
Sample text from movie reviews after removing numeric values.
| Input data | After numeric removal |
|---|---|
| Gwyneth Paltrow is absolutely great in this movie. | Gwyneth Paltrow is absolutely great in this movie |
| I own this movie This is number 1 movie I didnt like by choice I do. | I own this movie This is number movie I didnt like by choice I do |
| I wish that 70s show would come back on tel. | I wish that s show would come back on tel |
Sample output of the review text after changing the case of review text.
| Input data | After case lowering |
|---|---|
| Gwyneth Paltrow is absolutely great in this movie. | gwyneth paltrow is absolutely great in this movie |
| I own this movie This is number movie I didnt like by choice I do. | i own this movie this is number movie i didnt like by choice i do |
| I wish that s show would come back on tel. | i wish that s show would come back on tel |
Text from sample review before and after stemming.
| Input data | After stemming |
|---|---|
| gwyneth Paltrow is absolutely great in this movie. | gwyneth paltrow is absolute great in this movie |
| i own this movie this is number movie i didnt like by choice I do. | i own this movie this is number movie i didnt like by choice i do |
| i wish that s show would come back on tel. | i wish that s show would come back on tel |
Sample reviews before and after the stop words removal.
| Input data | After stopwords removal |
|---|---|
| gwyneth Paltrow is absolutely great in this movie. | gwyneth paltrow absolute great movie |
| i own this movie this is number movie i didnt like by choice I do. | own movie number movie didnt like choice do |
| i wish that s show would come back on tel. | wish show would come back tel |
BoW features from the preprocessed text of sample reviews.
| No. | absolute | back | choice | come | didnt | do | great | gwyneth | like |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
| 2 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 |
| 3 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| No. | movie | number | own | paltrow | show | tel | wish | would | |
| 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |
| 2 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | |
| 3 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
TF-IDF features from the preprocessed text of sample reviews.
| No. | absolute | back | choice | come | didnt | do | great | gwyneth | like |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.467351 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.467351 | 0.467351 | 0.000000 |
| 2 | 0.000000 | 0.000000 | 0.346821 | 0.000000 | 0.346821 | 0.346821 | 0.000000 | 0.000000 | 0.346821 |
| 3 | 0.000000 | 0.408248 | 0.000000 | 0.408248 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| No. | movie | number | own | paltrow | show | tel | wish | would | |
| 1 | 0.355432 | 0.000000 | 0.000000 | 0.467351 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| 2 | 0.527533 | 0.346821 | 0.346821 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | |
| 3 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.408248 | 0.408248 | 0.408248 | 0.408248 |
Figure 3Confusion matrix.
Accuracy of the selected models with BoW features.
| Classifier | Accuracy |
|---|---|
| DT | 0.72 |
| RF | 0.86 |
| GBC | 0.85 |
| SVM | 0.87 |
Performance evaluation metrics using BoW features.
| Model | Precision | Recall | F1 score | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | |
| DT | 0.71 | 0.72 | 0.72 | 0.72 | 0.71 | 0.72 | 0.72 | 0.72 | 0.72 |
| RF | 0.85 | 0.87 | 0.86 | 0.88 | 0.84 | 0.86 | 0.86 | 0.86 | 0.86 |
| GBC | 0.83 | 0.87 | 0.85 | 0.88 | 0.82 | 0.85 | 0.86 | 0.85 | 0.85 |
| SVM | 0.86 | 0.88 | 0.87 | 0.88 | 0.86 | 0.87 | 0.87 | 0.87 | 0.87 |
Accuracy of models with TF-IDF features.
| Classifier | Accuracy |
|---|---|
| DT | 0.71 |
| RF | 0.86 |
| GBC | 0.86 |
| SVM | 0.89 |
Performance evaluation metrics using TF-IDF features.
| Model | Precision | Recall | F1 score | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | |
| DT | 0.72 | 0.71 | 0.71 | 0.70 | 0.72 | 0.71 | 0.71 | 0.71 | 0.71 |
| RF | 0.86 | 0.86 | 0.86 | 0.86 | 0.85 | 0.86 | 0.86 | 0.86 | 0.86 |
| GBC | 0.84 | 0.87 | 0.86 | 0.88 | 0.83 | 0.86 | 0.86 | 0.85 | 0.86 |
| SVM | 0.88 | 0.90 | 0.89 | 0.90 | 0.88 | 0.89 | 0.89 | 0.89 | 0.89 |
Performance of classifiers using GloVe features.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.65 | 0.64 | 0.65 | 0.65 | 0.64 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 |
| RF | 0.74 | 0.75 | 0.74 | 0.74 | 0.72 | 0.77 | 0.74 | 0.73 | 0.75 | 0.74 |
| GBC | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.66 | 0.65 | 0.65 | 0.65 | 0.65 |
| SVM | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 | 0.75 |
Performance evaluation of classifiers using Word2Vec features.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 |
| RF | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 | 0.80 |
| GBC | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 | 0.65 |
| SVM | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 | 0.88 |
Figure 4Performance comparison between machine learning models using original dataset and BoW,TF-IDF, GloVe, Word2Vec features.
Performance evaluation of classifiers using BoW features on the TextBlob annotated dataset.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.79 | 0.85 | 0.61 | 0.73 | 0.87 | 0.57 | 0.72 | 0.87 | 0.59 | 0.73 |
| RF | 0.85 | 0.84 | 0.90 | 0.87 | 0.98 | 0.47 | 0.72 | 0.90 | 0.62 | 0.76 |
| GBC | 0.82 | 0.85 | 0.70 | 0.78 | 0.92 | 0.55 | 0.73 | 0.98 | 0.62 | 0.75 |
| SVM | 0.92 | 0.94 | 0.84 | 0.89 | 0.94 | 0.84 | 0.89 | 0.94 | 0.84 | 0.89 |
Performance evaluation of classifiers using TF-IDF features on the TextBlob annotated dataset.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.79 | 0.85 | 0.62 | 0.73 | 0.87 | 0.58 | 0.72 | 0.86 | 0.60 | 0.73 |
| RF | 0.84 | 0.85 | 0.88 | 0.87 | 0.98 | 0.51 | 0.74 | 0.91 | 0.65 | 0.78 |
| GBC | 0.83 | 0.86 | 0.73 | 0.79 | 0.92 | 0.57 | 0.75 | 0.89 | 0.64 | 0.77 |
| SVM | 0.92 | 0.92 | 0.88 | 0.90 | 0.96 | 0.78 | 0.87 | 0.94 | 0.82 | 0.88 |
Performance evaluation of classifiers using GloVe features on the TextBlob annotated dataset.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.72 | 0.81 | 0.47 | 0.64 | 0.81 | 0.48 | 0.64 | 0.81 | 0.48 | 0.64 |
| RF | 0.80 | 0.71 | 0.81 | 0.76 | 0.94 | 0.39 | 0.67 | 0.87 | 0.51 | 0.69 |
| GBC | 0.72 | 0.81 | 0.47 | 0.64 | 0.81 | 0.48 | 0.65 | 0.81 | 0.48 | 0.64 |
| SVM | 0.81 | 0.83 | 0.71 | 0.77 | 0.93 | 0.46 | 0.70 | 0.88 | 0.56 | 0.72 |
Performance evaluation of classifiers using Word2Vec features on the TextBlob annotated dataset.
| Model | Accuracy | Precision | Recall | F1 Score | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | Pos. | Neg. | W avg. | ||
| DT | 0.69 | 0.78 | 0.87 | 0.83 | 0.99 | 0.24 | 0.62 | 0.87 | 0.48 | 0.62 |
| RF | 0.79 | 0.78 | 0.87 | 0.83 | 0.99 | 0.24 | 0.62 | 0.87 | 0.38 | 0.63 |
| GBC | 0.70 | 0.80 | 0.44 | 0.62 | 0.80 | 0.44 | 0.62 | 0.80 | 0.44 | 0.62 |
| SVM | 0.88 | 0.90 | 0.83 | 0.87 | 0.95 | 0.71 | 0.83 | 0.92 | 0.77 | 0.84 |
Figure 5Performance comparison between machine learning models using the TextBlob dataset and BoW,TF-IDF, GloVe, Word2Vec features.
Figure 6LSTM, CNN-LSTM, and GRU architectures.
Performance analysis of deep learning models.
| Model | Accuracy | Class | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| LSTM | 0.80 | Neg. | 0.83 | 0.79 | 0.81 |
| Pos. | 0.93 | 0.93 | 0.94 | ||
| Avg. | 0.88 | 0.87 | 0.87 | ||
| CNN-LSTM | 0.90 | Neg. | 0.78 | 0.88 | 0.83 |
| Pos. | 0.96 | 0.91 | 0.93 | ||
| Avg. | 0.87 | 0.90 | 0.88 | ||
| GRU | 0.86 | Neg. | 0.84 | 0.88 | 0.86 |
| Pos. | 0.88 | 0.83 | 0.85 | ||
| Avg. | 0.86 | 0.86 | 0.86 |
Performance analysis of the proposed methodology.
| Year | Reference | Model | Accuracy |
|---|---|---|---|
| 2016 |
| RF | 0.90 |
| 2017 |
| CNN + LSTM | 0.895 |
| 2017 |
| BoW-DOUBLE and Average emotion-DOUBLE | 0.83 |
| 2018 |
| CNN | 0.89 |
| 2019 |
| CNN + LSTM | 0.89 |
| 2019 |
| LSTM + DNN | 0.885 |
| 2020 |
| TF-IDF + LR | 0.891 |
| 2020 |
| LSTM | 0.899 |
| 2020 |
| NN | 0.91 |
| 2021 |
| CNN | 0.883 |
| 2021 |
| SVM + (SVM-RFE) | 0.895 |
| 2021 |
| NB + ARM | 0.784 |
| 2021 | Proposed | SVM + TextBlob + BoW & TF-IDF | 0.92 |
Statistical T-test output values.
| Student | Output value |
|---|---|
| T-statistic | −0.182 |
| Critical value | 0.000 |