| Literature DB >> 33180825 |
Vuk Batanović1,2, Miloš Cvetanović1, Boško Nikolić1.
Abstract
Choosing a comprehensive and cost-effective way of articulating and annotating the sentiment of a text is not a trivial task, particularly when dealing with short texts, in which sentiment can be expressed through a wide variety of linguistic and rhetorical phenomena. This problem is especially conspicuous in resource-limited settings and languages, where design options are restricted either in terms of manpower and financial means required to produce appropriate sentiment analysis resources, or in terms of available language tools, or both. In this paper, we present a versatile approach to addressing this issue, based on multiple interpretations of sentiment labels that encode information regarding the polarity, subjectivity, and ambiguity of a text, as well as the presence of sarcasm or a mixture of sentiments. We demonstrate its use on Serbian, a resource-limited language, via the creation of a main sentiment analysis dataset focused on movie comments, and two smaller datasets belonging to the movie and book domains. In addition to measuring the quality of the annotation process, we propose a novel metric to validate its cost-effectiveness. Finally, the practicality of our approach is further validated by training, evaluating, and determining the optimal configurations of several different kinds of machine-learning models on a range of sentiment classification tasks using the produced dataset.Entities:
Year: 2020 PMID: 33180825 PMCID: PMC7660500 DOI: 10.1371/journal.pone.0242050
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Annotator agreement percentages and Krippendorff’s alpha scores on the movie verification corpus.
| Label interpretation | Intra-group pairwise agreements | Inter-group agreements | |||||||
|---|---|---|---|---|---|---|---|---|---|
| IG | EG | CG | IG & EG | IG & CG | EG & CG | ||||
| % | alpha | % | alpha | % | alpha | alpha | alpha | alpha | |
| Polarity | 0.966 | 0.929 | 0.933 | 0.861 | 0.948 | 0.887 | 0.895 | 0.874 | 0.857 |
| Subjectivity | 0.989 | 0.896 | 0.976 | 0.795 | 0.970 | 0.725 | 0.823 | 0.754 | 0.748 |
| Four-class sentiment | 0.955 | 0.934 | 0.873 | 0.814 | 0.815 | 0.697 | 0.853 | 0.724 | 0.721 |
| Six-class sentiment | 0.922 | 0.892 | 0.821 | 0.750 | 0.802 | 0.679 | 0.801 | 0.687 | 0.678 |
| Sarcasm | 0.991 | 0.829 | 0.983 | 0.628 | 0.974 | 0.131 | 0.658 | 0.391 | 0.396 |
Annotator agreement percentages and Krippendorff’s alpha scores on the book verification corpus.
| Label interpretation | Intra-group pairwise agreements | Inter-group agreements | |||||||
|---|---|---|---|---|---|---|---|---|---|
| IG | EG | CG | IG & EG | IG & CG | EG & CG | ||||
| % | alpha | % | alpha | % | alpha | alpha | alpha | alpha | |
| Polarity | 0.977 | 0.935 | 0.977 | 0.935 | 0.908 | 0.731 | 0.935 | 0.802 | 0.807 |
| Subjectivity | 0.971 | 0.929 | 0.954 | 0.889 | 0.838 | 0.520 | 0.880 | 0.661 | 0.625 |
| Four-class sentiment | 0.965 | 0.948 | 0.902 | 0.852 | 0.751 | 0.570 | 0.869 | 0.700 | 0.657 |
| Six-class sentiment | 0.948 | 0.924 | 0.884 | 0.832 | 0.711 | 0.517 | 0.848 | 0.664 | 0.623 |
| Sarcasm | 0.994 | 0.931 | 1.000 | 1.000 | 0.977 | 0.324 | 0.859 | 0.559 | 0.544 |
Averaged efficiencies of annotators in each group.
| Annotator group | Average length / speed of annotation | |
|---|---|---|
| Movie verification corpus (464 comments) | Book verification corpus (173 comments) | |
| IG | ~6h / ~77 texts/h | ~2h / ~87 texts/h |
| EG | ~9h / ~52 texts/h | ~3h / ~58 texts/h |
| CG | ~3.5h / ~133 texts/h | ~1.25h / ~138 texts/h |
Values of the proposed annotation cost-effectiveness metric ACE on the verification corpora.
| Label interpretation | Movie verification corpus | Book verification corpus | ||
|---|---|---|---|---|
| IG vs CG | EG vs CG | IG vs CG | EG vs CG | |
| Polarity | 0.883 | -0.378 | 2.379 | 1.517 |
| Subjectivity | 1.926 | 0.975 | 2.572 | 1.592 |
| Four-class sentiment | 2.116 | 1.138 | 2.597 | 1.527 |
| Six-class sentiment | 1.975 | 0.663 | 2.564 | 1.525 |
| Sarcasm | 2.219 | 1.227 | 2.614 | 1.725 |
Fig 1Distribution of texts in the main SentiComments.SR corpus across sentiment labels.
Best evaluation results of linear models.
| Setting | Task results | |||
|---|---|---|---|---|
| Polarity | Subjectivity | Four-class | Six-class | |
| Bag-of-words features | 0.782 | 0.871 | 0.64 | 0.566 |
| Bag-of-embeddings features | 0.783 | 0.873 | 0.628 | 0.557 |
| Bag-of-words + bag-of-embeddings features | 0.783 | |||
Evaluation results of transformer-based models.
| Model / Setting | Task results | ||||
|---|---|---|---|---|---|
| Polarity | Subjectivity | Four-class | Six-class | ||
| BERT Base Multilingual Cased | Original texts | 0.725 | 0.862 | 0.538 | 0.493 |
| Corrected texts | 0.735 | 0.497 | |||
| Corrected texts + CR&EN | 0.867 | 0.573 | |||
| Stemmed texts + CR&EN | 0.715 | 0.864 | 0.574 | 0.478 | |
| DistilBERT Base Multilingual Cased | Original texts | 0.720 | 0.864 | 0.455 | |
| Corrected texts | 0.545 | ||||
| Corrected texts + CR&EN | 0.715 | 0.867 | 0.542 | 0.459 | |
| Stemmed texts + CR&EN | 0.713 | 0.857 | 0.538 | 0.451 | |
| XLM MLM | Original texts | 0.739 | 0.873 | 0.634 | 0.553 |
| Corrected texts | 0.873 | ||||
| Corrected texts + CR&EN | 0.779 | 0.646 | 0.547 | ||
| Stemmed texts + CR&EN | 0.760 | 0.870 | 0.618 | 0.532 | |
| Corrected texts | |||||
| BERT Base Multilingual Cased | 0.785 | 0.879 | 0.635 | 0.604 | |
| DistilBERT Base Multilingual Cased | 0.772 | 0.883 | 0.634 | 0.576 | |
| XLM MLM | |||||
Fig 2Comparison of the best results of different model families across various sentiment analysis tasks.