| Literature DB >> 34910757 |
Matthew Byrne1, Lucy O'Malley1, Anne-Marie Glenny1, Iain Pretty1, Martin Tickle1.
Abstract
BACKGROUND: Online reviews may act as a rich source of data to assess the quality of dental practices. Assessing the content and sentiment of reviews on a large scale is time consuming and expensive. Automation of the process of assigning sentiment to big data samples of reviews may allow for reviews to be used as Patient Reported Experience Measures for primary care dentistry. AIM: To assess the reliability of three different online sentiment analysis tools (Amazon Comprehend DetectSentiment API (ACDAPI), Google and Monkeylearn) at assessing the sentiment of reviews of dental practices working on National Health Service contracts in the United Kingdom.Entities:
Mesh:
Year: 2021 PMID: 34910757 PMCID: PMC8673612 DOI: 10.1371/journal.pone.0259797
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Definitions of different sentiment scores.
| Sentiment assigned | Description |
|---|---|
| Positive | Sentiment of text is positive reflecting good experiences or actions |
| Negative | Sentiment of the Text is negative representing bad experiences or actions |
| Neutral | The review text contains language with little or no sentiment |
| Mixed | There is a mixture of positive and negative elements in the text |
Distribution of assigned sentiment scores in whole dataset and sample.
| Whole dataset (Proportion of response/1) n = 15800 | Sample (Proportion of response/1) n = 270 | |
|---|---|---|
| Positive | 0.82 (n = 12956) | 0.83 (n = 224) |
| Negative | 0.13 (n = 2054) | 0.12 (n = 32) |
| Neutral | 0.04 (n = 632) | 0.04 (n = 11) |
| Mixed | 0.01 (n = 158) | 0.01 (n = 3) |
Fleiss Kappa comparing three human reviewers.
| Fleiss Kappa Reviewer 1 vs 2 vs 3 and Kappa for Individual Categories | |||||||
|---|---|---|---|---|---|---|---|
| Rating Category | Conditional Probability | Kappa | Asymptotic Standard Error | Z | P Value | Lower 95% Asymptotic CI Bound | Upper 95% Asymptotic CI Bound |
| Overall | N/A | 0.836 | 0.027 | 30.828 | 0 | 0.783 | 0.889 |
| Positive | 0.98 | 0.897 | 0.035 | 25.537 | 0 | 0.828 | 0.966 |
| Negative | 0.911 | 0.898 | 0.035 | 25.563 | 0 | 0.829 | 0.967 |
| Neutral | 0.5 | 0.496 | 0.035 | 14.124 | 0 | 0.427 | 0.565 |
| Mixed | 0.62 | 0.595 | 0.035 | 16.934 | 0 | 0.526 | 0.664 |
Cohen’s Kappa comparing Monkeylearn to Human reviewers and Google Cloud Services to human reviewers.
| Cohen’s Kappa Pooled human reviews vs Monkeylearn and Google, with Kappa for Individual Categories | ||||||||
|---|---|---|---|---|---|---|---|---|
| Rating Category | Conditional Probability | Kappa | Asymptotic Standard Error | Z | P Value | Lower 95% Asymptotic CI Bound | Upper 95% Asymptotic CI Bound | |
| Amazon vs Pooled Human reviews | Overall | N/A | 0.66 | 0.05 | 12.262 | 0 | 0.562 | 0.757 |
| Positive | 0.964 | 0.79 | 0.061 | 12.987 | 0 | 0.671 | 0.91 | |
| Negative | 0.714 | 0.672 | 0.061 | 11.038 | 0 | 0.552 | 0.791 | |
| Neutral | 0.167 | 0.148 | 0.061 | 2.427 | 0.015 | 0.028 | 0.267 | |
| Mixed | 0.2 | 0.185 | 0.061 | 3.038 | 0.002 | 0.066 | 0.304 | |
| Monkeylearn vs Pooled Human reviews | Overall | N/A | 0.728 | 0.052 | 14.101 | 0 | 0.606 | 0.83 |
| Positive | 0.966 | 0.808 | 0.061 | 13.283 | 0 | 0.689 | 0.928 | |
| Negative | 0.842 | 0.816 | 0.061 | 13.412 | 0 | 0.697 | 0.936 | |
| Neutral | 1 | 1 | 0.061 | 16.432 | 0 | 0.881 | 1.19 | |
| Mixed | 0 | -.0.33 | 0.061 | -0.534 | 0.593 | -0.152 | 0.087 | |
| Google vs Pooled Human Reviews | Overall | N/A | 0.706 | 0.049 | 14.52 | 0 | 0.61 | 0.801 |
| Positive | 0.965 | 0.828 | 0.061 | 13.599 | 0 | 0.708 | 0.947 | |
| Negative | 0.842 | 0.816 | 0.061 | 13.412 | 0 | 0.697 | 0.936 | |
| Neutral | 0 | -0.002 | 0.061 | -0.03 | 0.976 | -0.121 | 0.117 | |
| Mixed | 0.188 | 0.136 | 0.061 | 2.24 | 0.025 | 0.017 | 0.256 | |
Polychoric correlation matrix demonstrating agreement between each group.
| Pooled Human raters | Amazon ACDAPI | Monkeylearn | ||
|---|---|---|---|---|
| Pooled Human raters | 1.000 | 0.782 | 0.831 | 0.783 |
| Amazon ACDAPI | 0.782 | 1.000 | 0.775 | 0.704 |
| Monkeylearn | 0.831 | 0.775 | 1.000 | 0.775 |
| 0.783 | 0.704 | 0.775 | 1.000 |
Examples of 3 main error types demonstrated between the ACDAPI and Human reviewer.
Sample text synthesised to prevent identification of patients or staff.
| ACDAPI Sentiment | Human Assigned Sentiment | Review text | Error Type |
|---|---|---|---|
| Negative | Positive | My daughter was frightened about the prospect of 2 planned extractions. The dentist and nurse were kind and patient. Together they made a traumatic experience easier for my daughter | Syntax error: |
| Neutral | Positive | Always has time to talk about my dental hygiene in depth. The dentist could relate dental problems to other health issues and recommend treatments. If I have any problems I can come back before the recommended 6 months. | Language used is not overtly emotive. Satisfaction is easily inferred by human reviewers. |
| Negative | Mixed | My dentist is excellent, but he is the only reason that I attend this practice. The reception staff are rude. I cannot see why this is not being addressed after I have complained. | Strength of conflicting emotions. This patient feels strongly that their dentist is good but that the other staff are poor. ACDAPI overestimates the negative compared to the human reviewers. |