| Literature DB >> 25431524 |
A Weichselbraun1, S Gindl2, A Scharl2.
Abstract
This paper presents a novel method for contextualizing and enriching large semantic knowledge bases for opinion mining with a focus on Web intelligence platforms and other high-throughput big data applications. The method is not only applicable to traditional sentiment lexicons, but also to more comprehensive, multi-dimensional affective resources such as SenticNet. It comprises the following steps: (i) identify ambiguous sentiment terms, (ii) provide context information extracted from a domain-specific training corpus, and (iii) ground this contextual information to structured background knowledge sources such as ConceptNet and WordNet. A quantitative evaluation shows a significant improvement when using an enriched version of SenticNet for polarity classification. Crowdsourced gold standard data in conjunction with a qualitative evaluation sheds light on the strengths and weaknesses of the concept grounding, and on the quality of the enrichment process.Entities:
Keywords: Big data; Common-sense knowledge; Concept grounding; Contextualization; Disambiguation; Knowledge extraction; Opinion mining; Sentiment analysis; Social Web; Web intelligence
Year: 2014 PMID: 25431524 PMCID: PMC4235782 DOI: 10.1016/j.knosys.2014.04.039
Source DB: PubMed Journal: Knowl Based Syst ISSN: 0950-7051 Impact factor: 8.038
Fig. 1Screenshot of a Web intelligence portal built for the NOAA Climate Program Office, showing results for a query on “climate change” based on news media coverage between January and April 2014.
Fig. 2Overview of the contextualization, concept grounding and enrichment framework.
Fig. 3Computation of (i) ConceptNet candidate concepts and their similarity to the positive or negative interpretation of the ambiguous sentiment term and (ii) the maximum similarity score .
Candidate concept selection and extracted textual information for the term approach.
| Concept | Retrieved context terms |
|---|---|
| 1. Come | Come, toward, something, move, … |
| 2. Approach/v (move towards) | Approach, move, towards, draw, drive, … |
| 3. Movement/n (a natural event that involves a change in the position or location of something) | Movement, that, location, change, event, something, position, involve, natural, … |
| 32. Approach/n (ideas or actions intended to deal with a problem) | Idea, deal, approach, with, intend, action, situation, problem, … |
Amazon and IMDb corpus characteristics.
| Corpus | Reviews | Total counts | Avg per review | ||
|---|---|---|---|---|---|
| Sent. | Words | Sent. | Words | ||
| Amazon electronics | 2000 | 19,911 | 298,622 | 10 | 149 |
| Amazon software | 2000 | 24,120 | 380,760 | 12 | 190 |
| IMDb comedy | 2000 | 25,481 | 410,874 | 13 | 205 |
| IMDb crime | 2000 | 30,155 | 494,686 | 15 | 247 |
| IMDb drama | 2000 | 27,026 | 432,820 | 14 | 216 |
10-fold cross validation of the baseline (b) versus context-aware (c) sentiment analysis.
| Electronics | + | 0.62 | 0.66 | 0.83 | 0.83 | 0.71 | 0.74 | 0.65 | 0.70 | |||||
| − | 0.74 | 0.77 | 0.48 | 0.58 | 0.58 | 0.66 | ||||||||
| Software | + | 0.60 | 0.60 | 0.82 | 0.91 | 0.69 | 0.72 | 0.63 | 0.65 | |||||
| − | 0.71 | 0.81 | 0.44 | 0.40 | 0.54 | 0.53 | ||||||||
| Comedy | + | 0.59 | 0.80 | 0.92 | 0.89 | 0.72 | 0.84 | 0.64 | 0.83 | |||||
| − | 0.82 | 0.87 | 0.36 | 0.77 | 0.50 | 0.82 | ||||||||
| Crime | + | 0.60 | 0.95 | 0.80 | 0.49 | 0.68 | 0.64 | 0.63 | 0.73 | |||||
| − | 0.69 | 0.66 | 0.46 | 0.97 | 0.55 | 0.78 | ||||||||
| Drama | + | 0.57 | 0.73 | 0.86 | 0.93 | 0.69 | 0.82 | 0.61 | 0.79 | |||||
| − | 0.72 | 0.90 | 0.36 | 0.66 | 0.48 | 0.76 | ||||||||
Selected ambiguous terms, their respective context terms, and the corresponding ConceptNet and WordNet grounding.
| Term | Context term | ConceptNet | WordNet | |
|---|---|---|---|---|
| Adventure | (+) | During nostradamus diary | Activity, magical journey, fun trip | Wild and exciting undertaking |
| (−) | Educational windvd frame | Software, band, video game | Wild and exciting undertaking | |
| Development | (+) | Creating dreamweaver nduc | Progression from simpler to more complex forms | Growth |
| (−) | Onecare paperport auction | Recent event that has some relevance for the present situation | Development | |
| God | (+) | Reading cuppa hdd | One of greater rank or station or quality | Deity |
| (−) | Folder quicklaunch netbook | An incorporal being believed to have powers to affect the course of human events | God | |
| Challenge | (+) | Maris hal role | Confrontation (call into challenge) | A call to engage in a contest or fight |
| (−) | Skulls luke student | Invite, call into question | A call to engage in a contest or fight | |
| Ridiculous | (+) | Chazz jimmy chadwick | Funny | Farcical |
| (−) | Tremors burt shortbus | Goofy | Absurd | |
| Plot | (+) | Brita ryan jai | Piece of fiction that narrates a chain of related events | Chart or map showing the movements or progress of an object |
| (−) | Hancock redford surratt | Conspiracy | Chart or map showing the movements or progress of an object | |
Enrichment statistics.
| Amazon reviews | IMDb reviews | |
|---|---|---|
| Positive context terms | 793,948 | 2,060,333 |
| Negative context terms | 549,120 | 2,608,472 |
| – Grounded concepts | 1018 | 1637 |
| – Positive | 2287 (2141 unique) | 3649 (3248 unique) |
| – Negative | 2072 (1773 unique) | 3437 (2633 unique) |
| – Senses and definitions | 519 | 857 |
| – Synonyms | 3015 (2072 unique) | 5012 (3245 unique) |
| – Antonyms | 108 (94 unique) | 159 (138 unique) |