| Literature DB >> 33954232 |
Sadaf Hussain Janjua1, Ghazanfar Farooq Siddiqui1, Muddassar Azam Sindhu1, Umer Rashid1.
Abstract
Social media is a vital source to produce textual data, further utilized in various research fields. It has been considered an essential foundation for organizations to get valuable data to assess the users' thoughts and opinions on a specific topic. Text classification is a procedure to assign tags to predefined classes automatically based on their contents. The aspect-based sentiment analysis to classify the text is challenging. Every work related to sentiment analysis approached this issue as the current research usually discusses the document-level and overall sentence-level analysis rather than the particularities of the sentiments. This research aims to use Twitter data to perform a finer-grained sentiment analysis at aspect-level by considering explicit and implicit aspects. This study proposes a new Multi-level Hybrid Aspect-Based Sentiment Classification (MuLeHyABSC) approach by embedding a feature ranking process with an amendment of feature selection method for Twitter and sentiment classification comprising of Artificial Neural Network; Multi-Layer Perceptron (MLP) is used to attain improved results. In this study, different machine learning classification methods were also implemented, including Random Forest (RF), Support Vector Classifier (SVC), and seven more classifiers to compare with the proposed classification method. The implementation of the proposed hybrid method has shown better performance and the efficiency of the proposed system was validated on multiple Twitter datasets to manifest different domains. We achieved better results for all Twitter datasets used for the validation purpose of the proposed method with an accuracy of 78.99%, 84.09%, 80.38%, 82.37%, and 84.72%, respectively, compared to the baseline approaches. The proposed approach revealed that the new hybrid aspect-based text classification functionality is enhanced, and it outperformed the existing baseline methods for sentiment classification.Entities:
Keywords: Aspect-based sentiment classification; Feature extraction; Feature selection; Hybrid approach; Information gain; Multi-layer perception; Principal component analysis
Year: 2021 PMID: 33954232 PMCID: PMC8053014 DOI: 10.7717/peerj-cs.433
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Figure 1Proposed multi-level hybrid aspect-based sentiment classification (MuLeHyABSC) model.
Detailed number of Tweets in STC, TAS, FGD, ATC & STS datasets.
| Sr No. | Dataset | Positive Tweets | Negative Tweets | Considered Tweets |
|---|---|---|---|---|
| 1 | STC | 519 | 572 | 1,091 |
| 2 | TAS | 1,832 | 5,741 | 7,573 |
| 3 | FGD | 1,171 | 3,186 | 4,357 |
| 4 | ATC | 423 | 1,219 | 1,642 |
| 5 | STS | 180 | 177 | 357 |
| Total | 4,125 | 10,895 | 15,020 |
Rule Based method used for detection of sentiment words.
| Read the aspect in the sentence |
| |
| Get sentiment word |
| |
| Add sentiment word into Results |
| |
| Remove sentiment word |
| |
| |
| Display no sentiment word |
| |
| Compute the sentiment word value |
| Display sentiment word value |
Figure 2Proposed architecture of Multi-Layer Perceptron (MLP) for MuLeHyABSC.
Polarity scores with % of negative, positive and neutral labels in each category of STS dataset.
| Sr No. | Category | Negative Scores (Polarity) | Positive Scores (Polarity) | Negative% | Positive% | Neutral% |
|---|---|---|---|---|---|---|
| 1 | Company | 48.59 | 18.33 | 57.20 | 23.60 | 19.20 |
| 2 | Person | 9.6 | 26.67 | 19.60 | 64.80 | 15.60 |
| 3 | Movie | 1.69 | 8.89 | 5.30 | 78.90 | 15.80 |
| 4 | Product | 9.04 | 26.11 | 20.60 | 54 | 25.40 |
| 5 | Location | 7.91 | 1.67 | 49.00 | 21.20 | 29.80 |
| 6 | Misc. | 23.16 | 13.89 | 42.60 | 31.80 | 25.60 |
| 7 | Event | 0 | 4.4 | 0 | 82.85 | 17.15 |
Analysis of proposed approach (MuLeHyABSC) for different approaches on testing model STC.
The bold emphasis shows the highest results achieved by the proposed approach.
| Sr. No. | Approach | Features | Accuracy (%) | Precision | Recall | |
|---|---|---|---|---|---|---|
| 1 | MuLeHyABSC+MLP | POS tags + unigram | ||||
| 2 | MuLeHyABSC+SVC | POS tags + unigram | 75.79 | 0.758 | 0.757 | 0.757 |
| 3 | MuLeHyABSC+LR | POS tags + unigram | 72.08 | 0.712 | 0.722 | 0.716 |
| 4 | MuLeHyABSC+DT | POS tags + unigram | 73.02 | 0.725 | 0.732 | 0.728 |
| 5 | MuLeHyABSC+KN | POS tags + unigram | 70.16 | 0.714 | 0.721 | 0.717 |
| 6 | MuLeHyABSC+RF | POS tags + unigram | 72.08 | 0.712 | 0.722 | 0.716 |
| 7 | MuLeHyABSC+AB | POS tags + unigram | 75.34 | 0.755 | 0.753 | 0.752 |
| 8 | MuLeHyABSC+ETC | POS tags + unigram | 76.02 | 0.765 | 0.762 | 0.76 |
| 9 | MuLeHyABSC+GB | POS tags + unigram | 74.71 | 0.738 | 0.727 | 0.732 |
| 10 | MuLeHyABSC+NB | POS tags + unigram | 67.57 | 0.702 | 0.675 | 0.659 |
Analysis of proposed approach (MuLeHyABSC) for different approaches on testing model TAS.
The bold emphasis shows the highest results achieved by the proposed approach.
| Sr No. | Approach | Features | Accuracy (%) | Precision | Recall | |
|---|---|---|---|---|---|---|
| 1 | MuLeHyABSC+MLP | POS tags + unigram | ||||
| 2 | MuLeHyABSC+SVC | POS tags + unigram | 78.43 | 0.773 | 0.764 | 0.768 |
| 3 | MuLeHyABSC+LR | POS tags + unigram | 80.44 | 0.811 | 0.824 | 0.805 |
| 4 | MuLeHyABSC+DT | POS tags + unigram | 78.43 | 0.773 | 0.764 | 0.768 |
| 5 | MuLeHyABSC+KN | POS tags + unigram | 82.12 | 0.815 | 0.827 | 0.816 |
| 6 | MuLeHyABSC+RF | POS tags + unigram | 80.33 | 0.792 | 0.803 | 0.795 |
| 7 | MuLeHyABSC+AB | POS tags + unigram | 80.16 | 0.815 | 0.818 | 0.816 |
| 8 | MuLeHyABSC+ETC | POS tags + unigram | 81.58 | 0.802 | 0.815 | 0.804 |
| 9 | MuLeHyABSC+GB | POS tags + unigram | 80.33 | 0.792 | 0.803 | 0.795 |
| 10 | MuLeHyABSC+NB | POS tags + unigram | 72.43 | 0.733 | 0.714 | 0.721 |
Analysis of proposed approach (MuLeHyABSC) for different approaches on testing model FGD.
The bold emphasis shows the highest results achieved by the proposed approach.
| Sr No. | Approach | Features | Accuracy (%) | Precision | Recall | |
|---|---|---|---|---|---|---|
| 1 | MuLeHyABSC+MLP | POS tags + unigram | ||||
| 2 | MuLeHyABSC+SVC | POS tags + unigram | 76.06 | 0.755 | 0.77 | 0.747 |
| 3 | MuLeHyABSC+LR | POS tags + unigram | 74.65 | 0.723 | 0.746 | 0.702 |
| 4 | MuLeHyABSC+DT | POS tags + unigram | 72.86 | 0.731 | 0.724 | 0.727 |
| 5 | MuLeHyABSC+KN | POS tags + unigram | 74.31 | 0.721 | 0.743 | 0.689 |
| 6 | MuLeHyABSC+RF | POS tags + unigram | 74.31 | 0.721 | 0.743 | 0.689 |
| 7 | MuLeHyABSC+AB | POS tags + unigram | 71.06 | 0.726 | 0.702 | 0.713 |
| 8 | MuLeHyABSC+ETC | POS tags + unigram | 73.02 | 0.725 | 0.738 | 0.731 |
| 9 | MuLeHyABSC+GB | POS tags + unigram | 70.31 | 0.711 | 0.723 | 0.716 |
| 10 | MuLeHyABSC+NB | POS tags + unigram | 74.31 | 0.721 | 0.743 | 0.689 |
Analysis of proposed approach (MuLeHyABSC) for different approaches on testing model ATC.
The bold emphasis shows the highest results achieved by the proposed approach.
| Sr No. | Approach | Features | Accuracy (%) | Precision | Recall | |
|---|---|---|---|---|---|---|
| 1 | MuLeHyABSC+MLP | POS tags + unigram | ||||
| 2 | MuLeHyABSC+SVC | POS tags + unigram | 76.22 | 0.751 | 0.769 | 0.759 |
| 3 | MuLeHyABSC+LR | POS tags + unigram | 80.15 | 0.793 | 0.808 | 0.795 |
| 4 | MuLeHyABSC+DT | POS tags + unigram | 79.93 | 0.781 | 0.799 | 0.785 |
| 5 | MuLeHyABSC+KN | POS tags + unigram | 79.63 | 0.779 | 0.796 | 0.784 |
| 6 | MuLeHyABSC+RF | POS tags + unigram | 76.22 | 0.751 | 0.769 | 0.759 |
| 7 | MuLeHyABSC+AB | POS tags + unigram | 71.72 | 0.731 | 0.729 | 0.729 |
| 8 | MuLeHyABSC+ETC | POS tags + unigram | 80.24 | 0.785 | 0.802 | 0.789 |
| 9 | MuLeHyABSC+GB | POS tags + unigram | 79.93 | 0.782 | 0.799 | 0.786 |
| 10 | MuLeHyABSC+NB | POS tags + unigram | 71.72 | 0.731 | 0.729 | 0.729 |
Analysis of proposed approach (MuLeHyABSC) for different approaches on testing model STS.
The bold emphasis shows the highest results achieved by the proposed approach.
| Sr No. | Approach | Features | Accuracy (%) | Precision | Recall | |
|---|---|---|---|---|---|---|
| 1 | MuLeHyABSC+MLP | POS tags + unigram | ||||
| 2 | MuLeHyABSC+SVC | POS tags + unigram | 80.55 | 0.812 | 0.805 | 0.808 |
| 3 | MuLeHyABSC+LR | POS tags + unigram | 79.16 | 0.802 | 0.791 | 0.796 |
| 4 | MuLeHyABSC+DT | POS tags + unigram | 80.55 | 0.812 | 0.805 | 0.808 |
| 5 | MuLeHyABSC+KN | POS tags + unigram | 76.16 | 0.772 | 0.761 | 0.766 |
| 6 | MuLeHyABSC+RF | POS tags + unigram | 79.16 | 0.802 | 0.791 | 0.796 |
| 7 | MuLeHyABSC+AB | POS tags + unigram | 80.55 | 0.812 | 0.805 | 0.808 |
| 8 | MuLeHyABSC+ETC | POS tags + unigram | 72.55 | 0.732 | 0.725 | 0.728 |
| 9 | MuLeHyABSC+GB | POS tags + unigram | 80.55 | 0.812 | 0.805 | 0.808 |
| 10 | MuLeHyABSC+NB | POS tags + unigram | 66.66 | 0.683 | 0.666 | 0.674 |
Some experimental results of multiple iterations of activation functions of proposed approach (MuLeHyABSC) for STC.
The bold emphasis shows the highest results achieved by the proposed approach.
| Iterations | Activation function | Neurons | Accuracy (%) | Precision | Recall | F-measure |
|---|---|---|---|---|---|---|
| 1 | Identity | 50, | 77.25 | 0.775 | 0.772 | 0.773 |
| 2 | Sigmoid | 50, | 78.99 | 0.789 | 0.789 | 0.789 |
| 3 | Tanh | 50, | 79.25 | 0.792 | 0.792 | 0.792 |
| 4 | ReLu | 50, | 76.88 | 0.769 | 0.768 | 0.768 |
| 5 | Identity | 50,50 |
Results of existing benchmark 1 for STC & STS datasets Zainuddin, Selamat & Ibrahim (2018).
The bold emphasis shows the highest results of existing benchmark 1 for STC & STS datasets.
| Approach | Features | Accuracy | Precision | Recall | |
|---|---|---|---|---|---|
| ABSA+Sentiwordnet+PCA+SVM (STC) | POS tags+unigram | ||||
| ABSA+Sentiwordnet+PCA+SVM (STS) | POS tags |
Results of existing benchmark 2 for STS dataset Go, Bhayani & Huang (2009).
The bold emphasis shows the highest results of existing benchmark 2 for STS dataset.
| Features | Keyword | Naive Bayes | MaxEnt | SVM |
|---|---|---|---|---|
| Unigram + Bigram | N/A | 81.6 | ||
| Unigram + POS | N/A | 79.9 | 79.9 |
OVERALL comparison of MuLeHyABSC—proposed system’s achieved results with existing benchmarks.
The bold emphasis shows the highest results achieved by the proposed approach.
| Approach | Features | Accuracy (%) | Precision | Recall | F-measure |
|---|---|---|---|---|---|
| Existing Benchmark 1 (STC dataset) | POS tags + unigram | 74.24 | 0.751 | 0.742 | 0.738 |
| MuLeHyABSC+MLP (STC dataset) | POS tags + unigram | ||||
| MuLeHyABSC+MLP (TAS dataset) | POS tags + unigram | ||||
| MuLeHyABSC+MLP (FGD dataset) | POS tags + unigram | ||||
| MuLeHyABSC+MLP (ATC dataset) | POS tags + unigram | ||||
| Existing Benchmark 1 (STS dataset) | POS tags | 76.55 | 0.779 | 0.766 | 0.76 |
| Existing Benchmark 2 (STS dataset) | unigram + bigram | 83.00 | 0.835 | 0.827 | 0.83 |
| MuLeHyABSC+MLP (STS dataset) | POS tags + unigram |
Figure 3General evaluation of proposed model (MuLeHyABSC) with existing benchmark 1 Zainuddin, Selamat & Ibrahim (2018) & benchmark 2 Go, Bhayani & Huang (2009).