| Literature DB >> 27303321 |
Jennifer Tosti-Kharas1, Caryn Conley2.
Abstract
In this paper we evaluate how to effectively use the crowdsourcing service, Amazon's Mechanical Turk (MTurk), to content analyze textual data for use in psychological research. MTurk is a marketplace for discrete tasks completed by workers, typically for small amounts of money. MTurk has been used to aid psychological research in general, and content analysis in particular. In the current study, MTurk workers content analyzed personally-written textual data using coding categories previously developed and validated in psychological research. These codes were evaluated for reliability, accuracy, completion time, and cost. Results indicate that MTurk workers categorized textual data with comparable reliability and accuracy to both previously published studies and expert raters. Further, the coding tasks were performed quickly and cheaply. These data suggest that crowdsourced content analysis can help advance psychological research.Entities:
Keywords: Mechanical Turk; coding; content analysis; crowdsourcing; qualitative research methods
Year: 2016 PMID: 27303321 PMCID: PMC4884742 DOI: 10.3389/fpsyg.2016.00741
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Summary task information.
| HITs requested | 200 | 40 | 200 | 40 | 40 | 40 | 100 |
| HITs completed | 117 (59%) | 40 (100%) | 55 (28%) | 12 (30%) | 40 (100%) | 40 (100%) | 100 (100%) |
| HITs accepted | 93 (79%) | 32 (80%) | 47 (85%) | 11 (92%) | 37 (93%) | 36 (90%) | 83 (83%) |
| Passages per HIT | 1 | 5 | 1 | 5 | 5 | 5 | 1 |
| HITs analyzed | 102 (87%) | 35 (88%) | 41 (75%) | 9 (75%) | 38 (95%) | 39 (98%) | 90 (90%) |
| Unique Turkers | 24 | 27 | 10 | 7 | 27 | 27 | 22 |
| HITs per Turker | 4.88 | 1.48 | 5.50 | 1.71 | 1.48 | 1.48 | 4.55 |
| Turkers with rejected HITs | 13 | 6 | 3 | 1 | 3 | 3 | 5 |
| Average seconds to complete (median) | 69.96 (48) | 233.5 (190.5) | 50.87 (33) | 164.17 (120.5) | 147.85 (109) | 157.08 (128.5) | 114.65 (85.5) |
| Total time task was available (hours: minutes: seconds) | 96:43:48 | 53:37:03 | 54:47:01 | 23:08:51 | 71:28:25 | 96:52:36 | 23:48:29 |
| Amount paid per HIT | $0.02 | $0.12 | $0.01 | $0.06 | $0.10 | $0.12 | $0.05 |
| Effective hourly rate | $1.03 | $1.85 | $0.71 | $1.32 | $2.43 | $2.75 | $1.57 |
| Total paid to Turkers | $1.86 | $3.84 | $0.47 | $0.66 | $3.70 | $4.32 | $4.15 |
Redemption and Emotional Tone have multiple rounds of coding in order to achieve 100% HIT completion rate. We adjusted the effective hourly rate as well as number of passages coded per HIT to encourage Turkers to complete these tasks.
Intraclass correlation coefficients (ICC).
| Redemption | 0.89 (20) | 0.95 (20) | 0.79 | |
| Round 1 | 0.79 (102) | 0.78 (10) | 1.00 (10) | |
| Round 2 | 0.89 (180) | 1.00 (10) | 0.89 (10) | |
| Contamination | 0.82 (20) | 0.89 (20) | 0.78 | |
| Round 1 | 0.79 (102) | 0.72 (10) | 0.89 (10) | |
| Round 2 | 0.93 (180) | 0.90 (10) | 0.90 (10) | |
| Closure | 0.89 (194) | 0.77 (10) | 0.76 (10) | 0.72 |
| Emotional tone | 0.90 (30) | 0.88 (30) | 0.85 | |
| Round 1 | 0.81 (41) | 0.79 (10) | 0.80 (10) | |
| Round 2 | 0.92 (45) | 0.93 (10) | 0.95 (10) | |
| Round 3 | 0.95 (190) | 0.94 (10) | 0.88 (10) | |
| Positive affect | 0.72 (448) | 0.72 (100) | 0.77 (100) | 0.87 |
| Enthusiastic | 0.76 (90) | 0.66 (20) | 0.86 (20) | |
| Excited | 0.75 (90) | 0.92 (20) | 0.80 (20) | |
| Inspired | 0.67 (89) | 0.73 (20) | 0.75 (20) | |
| Proud | 0.62 (90) | 0.63 (20) | 0.72 (20) | |
| Strong | 0.48 (89) | 0.62 (20) | 0.53 (20) | |
| Negative affect | 0.78 (450) | 0.88 (100) | 0.81 (100) | 0.80 |
| Ashamed | 0.74 (90) | 0.94 (20) | 0.70 (20) | |
| Distressed | 0.79 (90) | 0.83 (20) | 0.84 (20) | |
| Guilty | 0.50 (90) | 0.65 (20) | 0.74 (20) | |
| Scared | 0.27 (90) | 0.09 (20) | 0.61 (20) | |
| Upset | 0.84 (90) | 0.95 (20) | 0.96 (20) |
We report ICC2 for Turker ratings and ICC1 statistics for expert ratings and accuracy between Turkers and experts.
Values in parentheses indicate number of HITs.
The studies included in this analysis were: Adler and Poulin (.
| How would you rate the overall emotional tone of this story? (required) |
| Redemption and contamination | ||
| Round 1 | 1 | At least 1 rating matched the Turker majority rating |
| Round 2 | 5 | At least 6 of 10 ratings matched the Turker majority |
| Closure | 5 | At least 4 of 5 ratings were within ±1 |
| Emotional Tone | ||
| Round 1 | 1 | Ratings were within ±1 |
| Rounds 2 and 3 | 5 | At least 4 of 5 ratings were within ±1 |
| Discrete Emotions | 1 | At least 8 of 10 ratings were within ±1 |
Since we used majority and mean Turker ratings as part of our HIT acceptance criteria, we could only accept HITs and therefore compensate Turkers once we had collected all ratings. Turkers generally prefer to be paid promptly. While our approach led to a slight delay in the time to HIT acceptance, it was still within reasonable and acceptable limits of payment.
For example, if M = 2.53 and SD = 1.45, the range for acceptance was 1.08–3.98, so we accepted ratings of 1–4.