| Literature DB >> 35600329 |
Stefan Munnes1, Corinna Harsch1, Marcel Knobloch1, Johannes S Vogel1,2, Lena Hipp1,2, Erik Schilling3.
Abstract
Can we rely on computational methods to accurately analyze complex texts? To answer this question, we compared different dictionary and scaling methods used in predicting the sentiment of German literature reviews to the "gold standard" of human-coded sentiments. Literature reviews constitute a challenging text corpus for computational analysis as they not only contain different text levels-for example, a summary of the work and the reviewer's appraisal-but are also characterized by subtle and ambiguous language elements. To take the nuanced sentiments of literature reviews into account, we worked with a metric rather than a dichotomous scale for sentiment analysis. The results of our analyses show that the predicted sentiments of prefabricated dictionaries, which are computationally efficient and require minimal adaption, have a low to medium correlation with the human-coded sentiments (r between 0.32 and 0.39). The accuracy of self-created dictionaries using word embeddings (both pre-trained and self-trained) was considerably lower (r between 0.10 and 0.28). Given the high coding intensity and contingency on seed selection as well as the degree of data pre-processing of word embeddings that we found with our data, we would not recommend them for complex texts without further adaptation. While fully automated approaches appear not to work in accurately predicting text sentiments with complex texts such as ours, we found relatively high correlations with a semiautomated approach (r of around 0.6)-which, however, requires intensive human coding efforts for the training dataset. In addition to illustrating the benefits and limits of computational approaches in analyzing complex text corpora and the potential of metric rather than binary scales of text sentiment, we also provide a practical guide for researchers to select an appropriate method and degree of pre-processing when working with complex texts.Entities:
Keywords: German literature; automated text analysis; computer-assisted text analysis; dictionary; scaling method; sentiment analysis; word embeddings
Year: 2022 PMID: 35600329 PMCID: PMC9114298 DOI: 10.3389/fdata.2022.886362
Source DB: PubMed Journal: Front Big Data ISSN: 2624-909X
Overview of various sentiment classification methods.
|
|
|
|
|
|---|---|---|---|
| Gold standard | Human-coded | ++ | ++ |
| Dictionary | Prefabricated | − | −− |
| Corpus-specific (e.g., word embeddings) | + | + | |
| Maschine | Supervised (e.g., wordscores) | + | ++ |
| learning | Unsupervised (e.g., wordfish) | − | −− |
Figure 1Scatter plot and ICCs of ratings between pairs of coders.
Illustration of minimal vs. maximal pre-processing on an examplary review.
|
|
|
|
|---|---|---|
| “Rezensentin Christiane Pöhlmann freut sich zu früh über Literatur aus Lettland. Inga Abeles Roman dämpft ihr Leseglück doch recht schnell mit der Geschichte einer jungen Lettin zwischen dem drängenden Wunsch nach Selbstverwirklichung als Drehbuchautorin und Depression, die Pöhlmann zufolge einfach zu viel zwischen die Buchdeckel klemmen will, Perspektivwechsel, Monologe, Briefe, alternative Milieus, abstrakte Passagen über Lektüre, Exil und Russland. Die persönliche Tragödie der Protagonistin kommt darüber zu kurz, bedauert Pöhlmann.” | “Rezensentin” “Christiane” “Pöhlmann” “freut” “sich” “zu” “früh” “über” “Literatur” “aus” “Lettland” “Inga” “Abeles” “Roman” “dämpft” “ihr” “Leseglück” “doch” “recht” “schnell” “mit” “der” “Geschichte” “einer” “jungen” “Lettin” “zwischen” “dem” “drängenden” “Wunsch” “nach” “Selbstverwirklichung” “als” “Drehbuchautorin” “und” “Depression” “die” “Pöhlmann” “zufolge” “einfach” “zu” “viel” “zwischen” “die” “Buchdeckel” “klemmen” “will” “Perspektivwechsel” “Monologe” “Briefe” “alternative” “Milieus” “abstrakte” “Passagen” “über” “Lektüre” “Exil” “und” “Russland” “Die” “persönliche” “Tragödie” “der” “Protagonistin” “kommt” “darüber” “zu” “kurz” “bedauert” “Pöhlmann” | “” “” “” “freut” “” “” “frueh” “ueb” “literatur” “” “” “” “” “” “daempft” “” “leseglueck” “” “recht” “schnell” “” “” “” “” “jung” “lettin” “” “” “draengend” “wunsch” “” “selbstverwirklich” “” “drehbuchautorin” “” “depression” “” “” “zufolg” “einfach” “” “viel” “” “” “buchdeckel” “klemm” “” “perspektivwechsel” “monolog” “brief” “alternativ” “milieus” “abstrakt” “passag” “ueb” “lektu” “exil” “” “russland” “” “perso” “tragoedi” “” “protagonistin” “kommt” “darueb” “” “kurz” “bedauert” “” |
Characteristics and results for prefabricated dictionaries.
|
|
| |||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| |||
| SentiWS | Minimal | 15,591 | 15,559 | 6,033 | 0.32 | 8.55 (0.07) | 552 | |
| Maximal | 2,343 | 2,575 | 6,031 | 0.29 | 8.88 (0.15) | 540 | ||
| Negation | Minimal | 31,150 | 31,150 | 6,012 | 0.38 | 6.34 (0.05) | 427 | |
| Negation | Maximal | 4,918 | 4,918 | 6,033 | 0.36 | 9.23 (0.16) | 421 | |
| Rauh | Minimal | 17,330 | 19,750 | 6,038 | 0.39 | 9.38 (0.08) | 429 | |
| Maximal | 4,028 | 6,161 | 6,041 | 0.37 | 16.00 (0.27) | 439 | ||
| Negation | Minimal | 37,080 | 37,080 | 6,035 | 0.39 | 8.23 (0.07) | 422 | |
| Negation | Maximal | 10,020 | 9,784 | 6,041 | 0.36 | 15.10 (0.26) | 483 | |
| GerVADER | Minimal | 18,020 | 16,477 | 6,029 | 0.34 | - | 633 | |
| Maximal | 4,072 | 3,331 | 6,033 | 0.32 | - | 660 | ||
Average number (and share of average number of tokens) of tokens matched by the dictionary.
Number of reviews that deviate more than 2 standard deviations from the human-coded results.
Characteristics and results for self- and pre-trained GloVe dictionaries.
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
| ||||
| self-trained | hc | 0.25 | Minimal | 425 | 5,748 | 0.21 | 3.77 (0.03) | 683 | |
| 0.25 | Maximal | 257 | 5,779 | 0.28 | 3.76 (0.06) | 575 | |||
| 0.25 | Bigram | Minimal | 1,017 | 5,585 | 0.17 | 2.97 (0.02) | 704 | ||
| 0.25 | Bigram | Maximal | 269 | 5,823 | 0.24 | 4.04 (0.07) | 695 | ||
| self-trained | RZ | 0.25 | Minimal | 317 | 6,038 | 0.17 | 11.11 (0.09) | 747 | |
| 0.25 | Maximal | 252 | 6,004 | 0.26 | 6.16 (0.10) | 589 | |||
| 0.25 | Bigram | Minimal | 452 | 6,041 | -0.01 | 15.19 (0.13) | 964 | ||
| 0.25 | Bigram | Maximal | 179 | 5,919 | 0.15 | 4.78 (0.08) | 720 | ||
| pre-trained | hc | 0.3 | Case ins. | 8,096 | 6,041 | 0.15 | 36.18 (0.30) | 772 | |
| 0.4 | Case ins. | 1,916 | 6,041 | 0.23 | 16.39 (0.14) | 681 | |||
| 0.5 | Case ins. | 322 | 5,963 | 0.26 | 5.01 (0.04) | 573 | |||
| pre-trained | RZ | 0.3 | Case ins. | 2,223 | 6,041 | 0.10 | 31.54 (0.26) | 886 | |
| 0.4 | Case ins. | 811 | 6,041 | 0.10 | 20.14 (0.17) | 828 | |||
| 0.5 | Case ins. | 159 | 5,958 | 0.10 | 5.89 (0.05) | 803 | |||
Minimum cosine similarity of word vectors to each seed.
Number of positive and negative words each.
Average number (and share of average number of tokens) of tokens matched by the dictionary.
Number of reviews that deviate more than 2 standard deviations from the human-coded results.
Figure 2Sentiment of words estimated by supervised wordscores.
Characteristics and results for supervised and unsupervised Methods.
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
| Wordscores | Minimal | 1,193 | 11,124 | 3,026 | 0.58 | 119.34 | 84 |
| Maximal | 797 | 7760 | 3,026 | 0.61 | 58.91 | 76 | |
| Wordfish | Minimal | - | - | 6,041 | –0.05 | 119.50 | 1,095 |
| Maximal | - | - | 6,041 | –0.01 | 58.93 | 943 |
In contrast to dictionaries, almost all tokens (reported average) are used for scaling.
Number of reviews that deviate more than 2 standard deviations from the human-coded results.
Figure 3Sentiment of words estimated by unsupervised wordfish.