| Literature DB >> 30150952 |
Richard W Hass1, Marisa Rivera1, Paul J Silvia2.
Abstract
A new system for subjective rating of responses to divergent thinking tasks was tested using raters recruited from Amazon Mechanical Turk. The rationale for the study was to determine if such raters could provide reliable (aka generalizable) ratings from the perspective of generalizability theory. To promote reliability across the Alternative Uses and Consequence task prompts often used by researchers as measures of Divergent Thinking, two parallel scales were developed to facilitate feasibility and validity of ratings performed by laypeople. Generalizability and dependability studies were conducted separately for two scoring systems: the average-rating system and the snapshot system. Results showed that it is difficult to achieve adequate reliability using the snapshot system, while good reliability can be achieved on both task families using the average-rating system and a specific number of items and raters. Additionally, the construct validity of the average-rating system is generally good, with less validity for certain Consequences items. Recommendations for researchers wishing to adopt the new scales are discussed, along with broader issues of generalizability of subjective creativity ratings.Entities:
Keywords: consensual assessment technique; creativity; divergent thinking; generalizability theory; originality
Year: 2018 PMID: 30150952 PMCID: PMC6099101 DOI: 10.3389/fpsyg.2018.01343
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Statements that formed the rating scale for alternative uses and consequences ratings obtained from MTurk workers.
| 1 | Very obvious/ordinary use | Very obvious consequence |
| 2 | Somewhat obvious use | Somewhat obvious consequence |
| 3 | Non-obvious use | Non-obvious consequence |
| 4 | Somewhat imaginative use | Somewhat imaginative/detailed consequence |
| 5 | Very imaginative/re-contextualized use | Very imaginative/detailed consequence |
Estimated variance components and percent of variance accounted for from the G-study using MTurk average ratings.
| 0.167 | 24.4 | 0.067 | 11.0 | |
| 0.069 | 10.2 | 0.000 | 0.00 | |
| 0.181 | 26.6 | 0.351 | 56.9 | |
| 0.139 | 20.4 | 0.088 | 14.2 | |
| 0.008 | 1.2 | 0.015 | 2.5 | |
| 0.021 | 3.1 | 0.008 | 1.2 | |
| 0.096 | 14.1 | 0.088 | 14.2 | |
Estimated variance components and percent of variance accounted for from the G-Study using snapshot scores.
| 0.098 | 6.4 | 0.100 | 7.1 | |
| 0.041 | 2.7 | 0.016 | 1.1 | |
| 0.286 | 18.7 | 0.161 | 11.5 | |
| 0.000 | 0 | 0.034 | 2.4 | |
| 0.104 | 6.8 | 0.026 | 1.9 | |
| 0.027 | 1.8 | 0.000 | 0 | |
| 0.971 | 63.6 | 1.065 | 75.9 | |
D-study estimates using the MTurk average ratings.
| 1 | 0.167 | 0.068 | |
| 3 | 0.060 | 0.117 | |
| 2 | 0.035 | 0.000 | |
| 3 | 0.003 | 0.005 | |
| 2 | 0.069 | 0.044 | |
| 6 | 0.004 | 0.001 | |
| 6 | 0.016 | 0.015 | |
| 0.088 | 0.064 | ||
| 0.187 | 0.182 | ||
| 𝔼 | 0.65 | 0.52 | |
| Φ | 0.47 | 0.27 |
Estimates of variance components, relative error variance (.
Figure 1Plot of changes in the generalizability coefficient (𝔼ρ2) as the number of raters varies from 1 to 10 (n items = 2) for Average Ratings (Left) and Snapshot Ratings (Right). Horizontal line indicates 𝔼ρ2 = 0.80.
Figure 2Plot of changes in the generalizability coefficient (𝔼ρ2) as the number of items varies from 1 to 10 (n raters = 3) for Average Ratings (Left) and Snapshot Ratings (Right). Horizontal line indicates 𝔼ρ2 = 0.80.
Figure 3Plot of changes in the dependability coefficient (Φ) as the number of raters varies from 1 to 10 (n items = 2) for Average Ratings (Left) and Snapshot Ratings (Right). Horizontal line indicates Φ = 0.80.
Figure 4Plot of changes in the dependability coefficient (Φ) as the number of items varies from 1 to 10 (n raters = 3) for Average Ratings (Left) and Snapshot Ratings (Right). Horizontal line indicates Φ = 0.80.
D-study estimates using the snapshot ratings.
| 1 | 0.098 | 0.100 | |
| 3 | 0.095 | 0.054 | |
| 2 | 0.020 | 0.008 | |
| 3 | 0.035 | 0.009 | |
| 2 | 0.000 | 0.017 | |
| 6 | 0.005 | 0.000 | |
| 6 | 0.162 | 0.117 | |
| 0.196 | 0.203 | ||
| 0.317 | 0.265 | ||
| 𝔼 | 0.33 | 0.33 | |
| Φ | 0.24 | 0.27 |
Estimates of variance components, relative error variance (.
Means, standard deviations, and correlations among fluency, Silvia average scores, MTurk average scores, and snapshot scores.
| 1. Silvia Brick | 1.69 | 0.30 | |||||||||||||||
| 2. Silvia Knife | 1.80 | 0.32 | 0.52 | ||||||||||||||
| 3. Silvia 12 Inches | 1.41 | 0.25 | 0.11 | 0.14 | |||||||||||||
| 4. Silvia Sleep | 1.53 | 0.27 | 0.17 | 0.24 | 0.25 | ||||||||||||
| 5. Average Brick (MTurk) | 2.32 | 0.56 | 0.80 | 0.51 | 0.06 | 0.06 | |||||||||||
| 6. Average Knife (MTurk) | 2.72 | 0.60 | 0.40 | 0.82 | 0.16 | 0.16 | 0.50 | ||||||||||
| 7. Average 12 Inches (MTurk) | 2.89 | 0.50 | 0.25 | 0.28 | 0.56 | 0.32 | 0.21 | 0.27 | |||||||||
| 8. Average Sleep (MTurk) | 2.83 | 0.37 | 0.20 | 0.21 | 0.27 | −0.10 | 0.29 | 0.27 | 0.40 | ||||||||
| 9. Snapshot Brick (MTurk) | 2.26 | 0.70 | 0.41 | 0.31 | 0.17 | 0.25 | 0.39 | 0.19 | 0.24 | 0.20 | |||||||
| 10. Snapshot Knife (MTurk) | 2.59 | 0.65 | 0.38 | 0.57 | 0.04 | 0.03 | 0.40 | 0.48 | 0.20 | 0.33 | 0.30 | ||||||
| 11. Snapshot 12 Inches (MTurk) | 2.43 | 0.80 | 0.34 | 0.33 | 0.30 | 0.37 | 0.25 | 0.26 | 0.48 | 0.14 | 0.34 | 0.16 | |||||
| 12. Snapshot Sleep(MTurk) | 2.64 | 0.60 | 0.10 | 0.19 | −0.03 | 0.27 | 0.08 | 0.09 | 0.10 | 0.06 | 0.18 | 0.22 | 0.23 | ||||
| 13. Fluency Brick | 6.84 | 2.66 | 0.12 | 0.04 | −0.07 | 0.02 | 0.20 | 0.09 | 0.15 | 0.20 | 0.26 | 0.25 | 0.13 | 0.18 | |||
| 14. Fluency Knife | 6.36 | 2.79 | 0.09 | −0.07 | −0.10 | 0.12 | 0.13 | −0.06 | 0.08 | 0.08 | 0.07 | 0.14 | 0.08 | 0.06 | 0.48 | ||
| 15. Fluency 12 Inches | 6.45 | 2.70 | 0.00 | −0.02 | −0.18 | −0.10 | 0.03 | −0.03 | 0.00 | 0.02 | −0.08 | 0.04 | 0.11 | 0.11 | 0.31 | 0.46 | |
| 16. Fluency Sleep | 6.49 | 2.71 | −0.09 | −0.16 | −0.04 | −0.22 | −0.07 | −0.20 | −0.02 | −0.08 | −0.01 | 0.00 | −0.04 | 0.06 | 0.22 | 0.25 | 0.47 |
Correlations are Pearson product moment correlations. Fluency scores showed minimal skew. Brick and Knife were the alternative uses items. Twelve inches and sleep were the consequences items.