| Literature DB >> 30142154 |
Hongyu Han1, Yongshi Zhang1, Jianpei Zhang1, Jing Yang1, Xiaomei Zou1.
Abstract
Sentiment analysis is widely studied to extract opinions from user generated content (UGC), and various methods have been proposed in recent literature. However, these methods are likely to introduce sentiment bias, and the classification results tend to be positive or negative, especially for the lexicon-based sentiment classification methods. The existence of sentiment bias leads to poor performance of sentiment analysis. To deal with this problem, we propose a novel sentiment bias processing strategy which can be applied to the lexicon-based sentiment analysis method. Weight and threshold parameters learned from a small training set are introduced into the lexicon-based sentiment scoring formula, and then the formula is used to classify the reviews. In this paper, a completed sentiment classification framework is proposed. SentiWordNet (SWN) is used as the experimental sentiment lexicon, and review data of four products collected from Amazon are used as the experimental datasets. Experimental results show that the bias processing strategy reduces polarity bias rate (PBR) and improves performance of the lexicon-based sentiment analysis method.Entities:
Mesh:
Year: 2018 PMID: 30142154 PMCID: PMC6108458 DOI: 10.1371/journal.pone.0202523
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Overview of the proposed sentiment classification framework.
Comparison table of POS tags.
| Penn POS | Description | SWN POS |
|---|---|---|
| JJ | Adjective | a |
| JJR | Adjective, comparative | a |
| JJS | Adjective, superlative | a |
| RB | Adverb | r |
| RBR | Adverb, comparative | r |
| RBS | Adverb, superlative | r |
| VB | Verb, base form | v |
| VBD | Verb, past tense | v |
| VBG | Verb, gerund or present participle | v |
| VBN | Verb, past participle | v |
| VBP | Verb, non-3rd person singular present | v |
| VBZ | Verb, 3rd person singular present | v |
| NN | Noun, singular or mass | n |
| NNS | Noun, plural | n |
| NNP | Proper noun, singular | n |
| NNPS | Proper noun, plural | n |
Part of SentiWordNet 3.0.
| POS | ID | PosScore | NegScore | SynsetTerms | Gloss |
|---|---|---|---|---|---|
| a | 1936528 | 0 | 0.75 | notional#1 imaginary#1 fanciful#2 | not based on fact; unreal; “the falsehood about some fanciful secret treaties”- F.D.Roosevelt; “a small child’s imaginary friends”; “to create a notional world for oneself” |
| a | 1796304 | 0.5 | 0.25 | fanciful#3 | having a curiously intricate quality; “a fanciful pattern with intertwined vines and flowers” |
| a | 643598 | 0.5 | 0 | notional#3 fanciful#1 | indulging in or influenced by fancy; “a fanciful mind”; “all the notional vagaries of childhood” |
| n | 4268142 | 0 | 0 | sparker#1 spark_arrester#2 | a wire net to stop sparks from an open fireplace or smokestack |
| r | 449609 | 0.25 | 0 | sentimentally#1 | in a sentimental manner; “‘I miss the good old days,’ she added sentimentally” |
| v | 2771169 | 0.125 | 0 | light_up#3 clear_up#4 clear#3 brighten#2 | become clear; “The sky cleared after the storm” |
Performance evaluation on four datasets.
| BAT | SO-CAL | Proposed | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Accuracy | F-measure | PBR | Accuracy | F-measure | PBR | Accuracy | F-measure | PBR | |
| DVD | 64.74% | 64.78% | 0.28% | 67.41% | 68.00% | 6.40% | |||
| Electronics | 65.58% | 65.69% | 1.99% | 66.95% | 67.20% | 4.10% | |||
| Books | 62.74% | 62.82% | 1.36% | 65.98% | 66.90% | 7.90% | |||
| Kitchen | 69.62% | 69.67% | 67.81% | 68.45% | 6.65% | 0.76% | |||
Fig 2The influence of training set size on experimental results.
(A): Accuracy. (B): F-measure. (C): PBR.