| Literature DB >> 27563360 |
Kia Dashtipour1, Soujanya Poria2, Amir Hussain1, Erik Cambria3, Ahmad Y A Hawalah4, Alexander Gelbukh5, Qiang Zhou6.
Abstract
With the advent of Internet, people actively express their opinions about products, services, events, political parties, etc., in social media, blogs, and website comments. The amount of research work on sentiment analysis is growing explosively. However, the majority of research efforts are devoted to English-language data, while a great share of information is available in other languages. We present a state-of-the-art review on multilingual sentiment analysis. More importantly, we compare our own implementation of existing approaches on common data. Precision observed in our experiments is typically lower than the one reported by the original authors, which we attribute to the lack of detail in the original presentation of those approaches. Thus, we compare the existing works by what they really offer to the reader, including whether they allow for accurate implementation and for reliable reproduction of the reported results.Entities:
Keywords: Artificial intelligence; Natural language processing; Opinion mining; Sentic computing; Sentiment Analysis
Year: 2016 PMID: 27563360 PMCID: PMC4981629 DOI: 10.1007/s12559-016-9415-7
Source DB: PubMed Journal: Cognit Comput ISSN: 1866-9956 Impact factor: 5.418
Fig. 1Number of publications on English sentiment analysis, per year [42]
Fig. 2Number of publications on multilingual sentiment analysis, per year [28]
Fig. 3Flowchart of the approach of [52]
Quantitative comparison of multilingual sentiment analysis approaches
| Paper | Approach | Machine learning techniques | Reported accuracy (%) | Accuracy in our tests | |
|---|---|---|---|---|---|
| Movie reviews (%) | Product reviews (%) | ||||
| Singh et al. [ | SentiWordNet | NB, SVM | 81.14 |
| 65 |
| Shi and Li [ | Supervised machine learning | SVM | 85 |
| 68 |
| Boiy and Moens [ | Machine learning | SVM, MNB, MaxEnt | 86.35 |
| 65 |
| Tan and Zhang [ | Feature selection techniques such as document frequency, Chi-square, mutual information, and information gain | SVM, NB, K-nearest neighbour classifier, Winnow classifier | 82 | 62 |
|
| Al-Ayyoub et al. [ | Lexicon-based | SVM | 86.89 | 61 |
|
| Balahur and Turchi [ | Hybrid + SVM SMO | Hybrid, SVM SMO | 69.09 | 62 |
|
| Mahyoub et al. [ | Lexicon-based | SVM | 96 | 61 |
|
| Zagibalov and Carroll [ | Seed-word selection | SVM | 81 | 61 |
|
| Zhu et al. [ | Bootstrapping | SVM | 62.09 | 57 |
|
| Habernal et al. [ | Supervised machine learning | SVM, MaxEnt | 64 |
| 58 |
| Mizumoto et al. [ | Bootstrapping | Bootstrapping | 45 |
| 41 |
Bold values indicate best performance
Qualitative comparison of multilingual sentiment analysis approaches
| Method | Languages | Advantages | Disadvantages |
|---|---|---|---|
| Shi and Li [ | English | Very simple to implement | Feature selection is ineffective |
| Boiy and Moens [ | English | Can be easily extended to other languages | Computationally expensive |
| Singh et al. [ | English | Useful for both small and large datasets | Computationally expensive: heavy PMI calculation |
| Mizumoto et al. [ | English | Automatically produces a dictionary for stock market sentiment analysis | Only applicable to stock market sentiment analysis |
| Habernal et al. [ | Czech | Large Czech dataset created, which can be used for other researchers | Only applicable to Czech sentiment analysis; needs further development |
| Tan and Zhang [ | Chinese | Various feature selection techniques such as information gain, Chi-square test, mutual information, and document frequency | Requires more trained data |
| Zagibalov and Carroll [ | Chinese | Can be extended to multilingual sentiment analysis | Computationally expensive |
| Balahur and Turchi [ | English, French, Italian, German, Spanish | Can be used for more than one language | No resources available for multilingual sentiment analysis |
| Zhu et al. [ | Chinese | Effective feature selection | Requires very large dataset |
| Mahyoub et al. [ | Arabic | Proposed Arabic SentiWordNet | Cannot handle informal words |
| Al-Ayyoub et al. [ | Arabic | Proposed Arabic linguistic tools | Cannot handle different dialects |
| Ghorbel and Jacot [ | French | Good precision; one of few works on French | Need in translation affects precision |