| Literature DB >> 28231286 |
Muhammad Zubair Asghar1, Aurangzeb Khan2, Shakeel Ahmad3, Maria Qasim1, Imran Ali Khan4.
Abstract
With the rapid increase in social networks and blogs, the social media services are increasingly being used by online communities to share their views and experiences about a particular product, policy and event. Due to economic importance of these reviews, there is growing trend of writing user reviews to promote a product. Nowadays, users prefer online blogs and review sites to purchase products. Therefore, user reviews are considered as an important source of information in Sentiment Analysis (SA) applications for decision making. In this work, we exploit the wealth of user reviews, available through the online forums, to analyze the semantic orientation of words by categorizing them into +ive and -ive classes to identify and classify emoticons, modifiers, general-purpose and domain-specific words expressed in the public's feedback about the products. However, the un-supervised learning approach employed in previous studies is becoming less efficient due to data sparseness, low accuracy due to non-consideration of emoticons, modifiers, and presence of domain specific words, as they may result in inaccurate classification of users' reviews. Lexicon-enhanced sentiment analysis based on Rule-based classification scheme is an alternative approach for improving sentiment classification of users' reviews in online communities. In addition to the sentiment terms used in general purpose sentiment analysis, we integrate emoticons, modifiers and domain specific terms to analyze the reviews posted in online communities. To test the effectiveness of the proposed method, we considered users reviews in three domains. The results obtained from different experiments demonstrate that the proposed method overcomes limitations of previous methods and the performance of the sentiment analysis is improved after considering emoticons, modifiers, negations, and domain specific terms when compared to baseline methods.Entities:
Mesh:
Year: 2017 PMID: 28231286 PMCID: PMC5322980 DOI: 10.1371/journal.pone.0171649
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Proposed System.
Sample Datasets.
| Datasets | Total # Reviews | Dataset Description |
|---|---|---|
| Dataset#1 | 350 | Drug |
| Dataset#2 | 273 | Car |
| Dataset#3 | 412 | Hotel |
Partial list of positive and negative emoticons.
| Emoticon | Meaning | Sentiment Class |
|---|---|---|
| :-D | Laughing | Positive |
| :-) | smile | Positive |
| o:)- | innocent | Positive |
| 8-) | cool | Positive |
| :$ | Happy blush | Positive |
| :( | defeated | Negative |
| :’( | Crying | Negative |
| :o | shocked | Negative |
| >( | Grumpy | Negative |
| (@) | Angry red | Negative |
| X| | Dead | Negative |
Partial list of positive modifiers (enhancers).
| Modifier | Strength | Modifier | Strength |
|---|---|---|---|
| Completely | +100% | Pretty | +20% |
| Totally | +70% | Very | +50% |
| Really | +15% | Too | +45% |
| Most | +90% | Extremely | +80% |
| Extraordinarily | +75% |
Partial list of negative modifiers (reducers).
| Modifier | Strength | Modifier | Strength |
|---|---|---|---|
| hardly | -70% | a little | -40% |
| less | -50% | some | -25% |
| quite | -20% | a bit | -35% |
| minor | -30% | slight | -40% |
| a few | -25% | low | -20% |
Partial list of domain specific terms with predicted sentiment class.
| Unigram | Bigram | ||
|---|---|---|---|
| Term | Predicted Sentiment Class | Term | Predicted Sentiment Class Using Eq ( |
| Acute | +ive | Fast acting | +ive |
| Abrasion | -ive | Heart-burn | - ive |
| Nausea | -ive | Covering up | +ive |
| Chronic | -ive | abdominal distension | -ive |
| Exhaust | -ive | Nervous breakdown | -ive |
| Deaden | +ive | Heart beat | neu |
| Relieve | +ive | Bring down | +ive |
| Able | +ive | Cough up | -ive |
| Psychotic | -ive | Pulse rate | neu |
| Insight | +ive | Color blindness | -ive |
Words and their Sentiment coverage.
| Term | SentiWordNet Polarity | Modified polarity and score using | Example Sentence |
|---|---|---|---|
| heart-burn | not found | negative(-0.5) (using | I do not like this medicine. It caused |
| sore throat | not found | negative(-0.4) (using | It caused |
| Growth | neutral (1) | negative (-1) (using | The abnormal |
| Relax | Neutral(0.625) | positive(+0.625) (using | It really works well and relaxes my anxiety. |
| Hospital | Neutral(0.8125) | Negative(-0.8125) (using | I am in |
| Clot | neutral (1) | negative (-1) (using | The doctor diagnosed a blood |
| Dressing | neutral (1) | Positive(+1) (using | The patient’s dressings need to be changed regularly. |
Comparative results obtained for noise reduction phase.
| Datasets | Sentences | Incorrect Words Extracted | Correct Words Extracted | Accuracy (%) |
|---|---|---|---|---|
| Dataset1 | 8540 | 1431 | 1291 | 90.216 |
| Dataset2 | 2000 | 524 | 462 | 88.167 |
| Dataset3 | 2543 | 874 | 728 | 83.295 |
Fig 2Accuracy results of EC module.
Fig 3Accuracy results of MNC module.
Fig 4Accuracy results of DSC module.
Experimental Results for Dataset1 (P: Precision, R: Recall, F: F-measure).
| Positive | Negative | ||||||
|---|---|---|---|---|---|---|---|
| Study | Technique | P | R | F | P | R | F |
| Kalaivani and Shunmuganathan [ | Supervised (opinion words) | 0.80 | 0.76 | 0.69 | 0.74 | 0.64 | 0.51 |
| Kundi et al. [ | Lexicon-based unsupervised (opinion words and emoticons) | 0.81 | 0.79 | 0.80 | 0.79 | 0.82 | 0.80 |
| Kundi et al.[ | Lexicon-based Unsupervised (opinion words and emoticons) | 0.86 | 0.78 | 0.81 | 0.80 | 0.82 | 0.80 |
| Proposed | Lexicon-enhanced-Rule-based ( | 0.79 | 0.84 | 0.89 | 0.81 | ||
Experimental Results for Dataset2 (P: Precision, R:Recall, F:F-measure).
| Positive | Negative | ||||||
|---|---|---|---|---|---|---|---|
| Study | Technique | P | R | F | P | R | F |
| Kalaivani and Shunmuganathan [ | Supervised (opinion words) | 0.79 | 0.63 | 0.70 | 0.78 | 0.71 | 0.74 |
| Kundi et al. [ | Lexicon-based unsupervised (opinion words and emoticons) | 0.74 | 0.51 | 0.60 | 0.73 | 0.63 | 0.67 |
| Kundi et al.[ | Lexicon-based Unsupervised (opinion words and emoticons) | 0.82 | 0.78 | 0.79 | 0.75 | 0.73 | 0.73 |
| Proposed | Lexicon-enhanced-Rule-based ( | 0.79 | 0.77 | 0.74 | |||
Experimental Results for Dataset3 (P: Precision, R: Recall, F:F-measure).
| Positive | Negative | ||||||
|---|---|---|---|---|---|---|---|
| Study | Technique | P | R | F | P | R | F |
| Kalaivani and Shunmuganathan [ | Supervised (opinion words) | 0.52 | 0.71 | 0.59 | 0.83 | 0.76 | 0.79 |
| Kundi et al. [ | Lexicon-based unsupervised (opinion words and emoticons) | 0.74 | 0.65 | 0.72 | 0.82 | ||
| Kundi et al.[ | Lexicon-based Unsupervised (opinion words and emoticons) | 0.71 | 0.85 | 0.77 | 0.77 | 0.77 | 0.77 |
| Proposed | Lexicon-enhanced-Rule-based ( | 0.93 | 0.74 | ||||
Descriptive statistics of the proposed system on three datasets.
| Statistic | Drug | Car | Hotel |
|---|---|---|---|
| Reviews | 350 | 373 | 412 |
| Sentences | 3525 | 3553 | 3561 |
| Average Length (sentence/review) | 10.61 | 9.56 | 10.21 |
| Std. Dev sentence/review | 8.06 | 12.14 | 11.21 |
| Min. sentence/review | 1.00 | 1.00 | 1.00 |
| Max. sentence/review | 35.00 | 19.00 | 41.00 |
| Total no. of tokens | 52041 | 52231 | 52482 |
| Average tokens (tokens/sentence) | 18.47 | 18.58 | 18.38 |
| std. Dev tokens/sentence | 10.21 | 10.04 | 10.43 |
| Min. tokens/sentence | 1.00 | 1.00 | 1.00 |
| Max. tokens/sentence | 82.00 | 79.00 | 88.00 |
| avg. stop words/sentence | 4.00 | 3.00 | 4.00 |
| avg. negations/sentence | 2.00 | 1.00 | 1.00 |
| avg. modifiers/sentence | 2.00 | 2.00 | 3.00 |