| Literature DB >> 35222733 |
Gagandeep Kaur1,2, Amit Sharma3.
Abstract
The reviews posted online by the end-users can help the business owners obtain a fair evaluation of their products/services and take the necessary steps. However, due to the large volume of online reviews being generated from time to time, it becomes challenging for business owners to track each review. The Customer Review Summarization (CRS) model that can present the summarized information and offer businesses with significant acumens to understand the reason behind customers' choices and behavior, would therefore be desirable. We propose the Hybrid Analysis of Sentiments (HAS) for the perspective of effective CRS in this paper. The HAS consists of steps like pre-processing, feature extraction, and review classification. The pre-processing phase removes the unwanted data from the text reviews using Natural Language Processing (NLP) based on different pre-processing functions. For efficient feature extraction, the hybrid mechanism consisting of aspect-related features and review-related features is proposed to build the unique feature vector for each customer review. Review classification is performed using different supervised classifiers like Support Vector Machine (SVM), Naïve Bayes, and Random Forest. The experimental results show that HAS efficiently performed the sentiment analysis and outperformed the existing state-of-the-art techniques with an F1 score of 92.2%.Entities:
Keywords: Aspect category extraction; Customer review summarization; Feature extraction; Hybrid features; Sentiment analysis; Supervised classification
Year: 2022 PMID: 35222733 PMCID: PMC8858572 DOI: 10.1007/s12652-022-03748-6
Source DB: PubMed Journal: J Ambient Intell Humaniz Comput
Fig. 1Proposed framework for hybrid analysis of sentiments
Illustration of pre-processing algorithm
| Before pre-processing | After pre-processing |
|---|---|
| “Pork shu mai was more than usually greasy and had to share a table with loud and rude family” | “Pork greasy share table loud rude family” |
| “Food was absolutely amazing” | “Food absolute amaze” |
| “I’ve asked a cart attendant for a lotus leaf wrapped rice and she replied back rice and just walked away” | “ask cart attendant lotus leaf wrap rice reply rice walk away” |
| “I was very disappointed with this restaurant” | “disappoint restaurant” |
Fig. 2Illustration of ARF
Fig. 3F1-score analysis of the proposed feature engineering techniques using different classifiers
Fig. 4Precision analysis of the proposed feature engineering techniques using different classifiers
Fig. 5Recall analysis of the proposed feature engineering techniques using different classifiers
Performance analysis of different features engineering techniques using different classifiers
| SVM | RF | NB | Average | |
|---|---|---|---|---|
| F1-Score | ||||
| ARF | 0.781 | 0.764 | 0.793 | 0.7799 |
| RRF | 0.832 | 0.803 | 0.842 | 0.8256 |
| HAS | 0.891 | 0.916 | 0.9096 | |
| Precision | ||||
| ARF | 0.807 | 0.781 | 0.797 | 0.7950 |
| RRF | 0.864 | 0.819 | 0.853 | 0.8453 |
| HAS | 0.912 | 0.931 | 0.9296 | |
| Recall | ||||
| ARF | 0.771 | 0.764 | 0.778 | 0.7710 |
| RRF | 0.828 | 0.803 | 0.833 | 0.8213 |
| HAS | 0.912 | 0.891 | 0.9073 |
Bold values indicate that among the classifiers SVM, RF, and NB, the F1-score and precision performances of the proposed feature engineering approaches using SVM are higher than the other two classifiers. The recall performance of NB is better than SVM and RF
Fig. 6F1-score analysis with varying data size
Fig. 7ASAT analysis with varying data size
Comparative analysis of HAS model using SemEval-2014 restaurant review dataset with state-of-the-art techniques
| SA/ABSA technique | Precision | Recall | F1-score |
|---|---|---|---|
| SABSA | 0.844 | 0.831 | 0.838 |
| SentiVec | 0.862 | 0.842 | 0.854 |
| TF-IDF + N-gram + SVM | 0.851 | 0.838 | 0.845 |
| SEML | 0.841 | 0.824 | 0.833 |
| MTMVN | 0.793 | 0.773 | 0.785 |
| HAS |
Bold values indicate the proposed HAS model shows the improved performances for all the three datasets for the parameters: F1-score, precision, and recall
Comparative analysis of HAS model using sentiment140 dataset with state-of-the-art techniques
| SA/ABSA technique | Precision | Recall | F1-score |
|---|---|---|---|
| SABSA | 0.858 | 0.839 | 0.8485 |
| SentiVec | 0.877 | 0.858 | 0.8675 |
| TF-IDF + N-gram + SVM | 0.866 | 0.846 | 0.856 |
| SEML | 0.854 | 0.837 | 0.8455 |
| MTMVN | 0.817 | 0.789 | 0.803 |
| HAS |
Bold values indicate the proposed HAS model shows the improved performances for all the three datasets for the parameters: F1-score, precision, and recall
Comparative analysis of HAS model using STS-gold dataset with state-of-the-art techniques
| SA/ABSA Technique | Precision | Recall | F1-score |
|---|---|---|---|
| SABSA | 0.791 | 0.784 | 0.7875 |
| SentiVec | 0.845 | 0.837 | 0.841 |
| TF-IDF + N-gram + SVM | 0.831 | 0.817 | 0.824 |
| SEML | 0.826 | 0.809 | 0.8175 |
| MTMVN | 0.776 | 0.758 | 0.767 |
| HAS |
Bold values indicate the proposed HAS model shows the improved performances for all the three datasets for the parameters: F1-score, precision, and recall