| Literature DB >> 31156504 |
Simon Provoost1, Jeroen Ruwaard2,3, Ward van Breda4, Heleen Riper1,2,3,5, Tibor Bosse6.
Abstract
INTRODUCTION: Sentiment analysis may be a useful technique to derive a user's emotional state from free text input, allowing for more empathic automated feedback in online cognitive behavioral therapy (iCBT) interventions for psychological disorders such as depression. As guided iCBT is considered more effective than unguided iCBT, such automated feedback may help close the gap between the two. The accuracy of automated sentiment analysis is domain dependent, and it is unclear how well the technology is applicable to iCBT. This paper presents an empirical study in which automated sentiment analysis by an algorithm for the Dutch language is validated against human judgment.Entities:
Keywords: automated support; benchmarking and validation; cognitive behavioral therapy (CBT); depression; e-mental health; embodied conversational agent (ECA); internet interventions; sentiment analysis and opinion mining
Year: 2019 PMID: 31156504 PMCID: PMC6530336 DOI: 10.3389/fpsyg.2019.01065
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
FIGURE 1Sentence exemplifying how the algorithm comes to an overall sentiment score for a sentence containing one sentiment word (“disturbing”) from the lexicon combined with a strengthening word (“deeply”) identified by the grammatical engine.
FIGURE 2Sliders used by human judges to evaluate texts on sentiment and emotions. Translated from Dutch: (top) “How positive or negative is this text?”, answer labels: very negative; negative; neutral; positive; very positive, and (bottom) “To what extent does the text contain the following?”, answer labels: none; a whole lot, five emotions top to bottom: pensiveness; optimism; annoyance; acceptance; serenity.
FIGURE 3The summed algorithm scores over all 493 texts, with respect to all 33 detectable emotions.
Summary statistics of averaged human (M = 8.1 evaluations per text) and algorithm judgment of the N = 493 total number of texts.
| Min. | Max. | |||
|---|---|---|---|---|
| Human judges | 0.00 | −0.71 | 0.67 | 0.30 |
| Algorithm | 0.04 | −0.97 | 0.99 | 0.44 |
| Human judges | 0.47 | 0.01 | 0.78 | 0.15 |
| Algorithm | 0.47 | 0.00 | 1.00 | 0.44 |
| Human judges | 0.40 | 0.04 | 0.81 | 0.17 |
| Algorithm | 0.39 | 0.00 | 1.00 | 0.44 |
| Human judges | 0.37 | 0.01 | 0.86 | 0.20 |
| Algorithm | 0.40 | 0.00 | 1.00 | 0.42 |
| Human judges | 0.25 | 0.02 | 0.86 | 0.17 |
| Algorithm | 0.40 | 0.00 | 1.00 | 0.42 |
| Human judges | 0.38 | 0.02 | 0.79 | 0.16 |
| Algorithm | 0.37 | 0.00 | 1.00 | 0.41 |
FIGURE 4Probability distributions of human and algorithm evaluations.
FIGURE 5Scatterplots of the algorithm versus the mean human scores for all of the 493 patient texts, including a line representing the linear fit model.
Results of intra-class correlation on human-human agreement.
| Intraclass correlation | 95% confidence interval | ||||||
|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | Value | df1 | df2 | significant | ||
| Sentiment | 0.58 | 0.54 | 0.61 | 71 | 492 | 25143 | 0.00 |
| Pensiveness | 0.22 | 0.20 | 0.25 | 16 | 492 | 25143 | 0.00 |
| Annoyance | 0.28 | 0.16 | 0.31 | 21 | 492 | 25143 | 0.00 |
| Optimism | 0.46 | 0.43 | 0.50 | 46 | 492 | 25143 | 0.00 |
| Acceptance | 0.34 | 0.32 | 0.38 | 28 | 492 | 25143 | 0.00 |
| Serenity | 0.24 | 0.21 | 0.26 | 17 | 492 | 25143 | 0.00 |
Krippendorff’s alpha values for human-human agreement.
| Emotion | α |
|---|---|
| Sentiment | 0.51 |
| Pensiveness | 0.17 |
| Annoyance | 0.20 |
| Optimism | 0.28 |
| Acceptance | 0.23 |
| Serenity | 0.17 |
ICC values for human-algorithm agreement.
| Intraclass correlation | 95% confidence interval | ||||||
|---|---|---|---|---|---|---|---|
| Lower bound | Upper bound | Value | df1 | df2 | significant | ||
| Sentiment | 0.55 | 0.48 | 0.61 | 3.4 | 492 | 492 | 0.00 |
| Pensiveness | 0.12 | 0.03 | 0.21 | 1.3 | 492 | 492 | 0.00 |
| Annoyance | 0.00 | −0.09 | 0.09 | 1 | 492 | 492 | 0.5 |
| Optimism | 0.23 | 0.14 | 0.31 | 1.6 | 492 | 492 | 0.00 |
| Acceptance | 0.00 | −0.09 | 0.09 | 1 | 492 | 492 | 0.5 |
| Serenity | 0.14 | 0.06 | 0.23 | 1.3 | 492 | 492 | 0.00 |
Cohen’s kappa values for human-algorithm agreement.
| Emotion | κ | 95% CI |
|---|---|---|
| Sentiment | 0.58 | 0.52 to 0.63 |
| Pensiveness | 0.01 | −0.01 to 0.03 |
| Annoyance | −0.01 | −0.04 to 0.02 |
| Optimism | 0.09 | 0.03 to 0.14 |
| Acceptance | −0.07 | −0.14 to 0.00 |
| Serenity | 0.03 | 0.00 to 0.06 |
Comparison of the average human judges’ and algorithm’s evaluations with respect to sentiment.
| Algorithm | ||||
|---|---|---|---|---|
| Negative | Neutral | Positive | ||
| Human Average | Negative | 144 | 15 | 38 |
| Neutral | 30 | 19 | 41 | |
| Positive | 25 | 19 | 162 | |
Comparison of the average human judges’ and algorithm’s evaluations with respect to optimism.
| Algorithm | |||
|---|---|---|---|
| Not present | Present | ||
| Human Average | Not present | 32 | 12 |
| Present | 211 | 238 | |