| Literature DB >> 27351378 |
Elizabeth A Necka1, Stephanie Cacioppo2, Greg J Norman1, John T Cacioppo1.
Abstract
The reliance on small samples and underpowered studies may undermine the replicability of scientific findings. Large sample sizes may be necessary to achieve adequate statistical power. Crowdsourcing sites such as Amazon's Mechanical Turk (MTurk) have been regarded as an economical means for achieving larger samples. Because MTurk participants may engage in behaviors which adversely affect data quality, much recent research has focused on assessing the quality of data obtained from MTurk samples. However, participants from traditional campus- and community-based samples may also engage in behaviors which adversely affect the quality of the data that they provide. We compare an MTurk, campus, and community sample to measure how frequently participants report engaging in problematic respondent behaviors. We report evidence that suggests that participants from all samples engage in problematic respondent behaviors with comparable rates. Because statistical power is influenced by factors beyond sample size, including data integrity, methodological controls must be refined to better identify and diminish the frequency of participant engagement in problematic respondent behaviors.Entities:
Mesh:
Year: 2016 PMID: 27351378 PMCID: PMC4924794 DOI: 10.1371/journal.pone.0157732
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Demographic Comparison Between Samples.
| MTurk Sample | Campus Sample | Community Sample | ||||
|---|---|---|---|---|---|---|
| Demographics | ||||||
| Age | 35.5 (11.9) | 21.3 (3.5) | 33.7 (12.7) | |||
| Gender | ||||||
| Male | 407 | 41 | 57 | |||
| Female | 300 | 43 | 41 | |||
| Years of Education | 15.1 (2.2) | 14.2 (1.9) | 15.6 (2.9) | |||
| Ethnicity | ||||||
| African American | 37 | 8 | 55 | |||
| American Indian/Alaskan Native | 3 | 0 | 3 | |||
| Asian | 50 | 25 | 4 | |||
| Caucasian | 563 | 33 | 24 | |||
| Native Hawaiian/Pacific Islander | 3 | 0 | 0 | |||
| Hispanic | 34 | 10 | 7 | |||
| More than one race | 14 | 7 | 1 | |||
| Other | 3 | 1 | 4 | |||
| Marital Status | ||||||
| Married | 240 | 0 | 6 | |||
| Cohabitating | 88 | 2 | 5 | |||
| Separated | 4 | 1 | 2 | |||
| Divorced | 50 | 0 | 10 | |||
| Widowed | 5 | 1 | 1 | |||
| Never Married | 320 | 80 | 74 | |||
Survey presentation error led to lost demographic information on some participants in the MTurk sample.
Mean Frequency of Engagement in Potentially Problematic Responding Behaviors.
| MTurk Sample | Campus Sample | Community Sample | ||||
|---|---|---|---|---|---|---|
| Reporting Practice | Frequency | Frequency | Frequency | |||
| Other | Self | Other | Self | Other | Self | |
| Begins studies without paying full attention to the instructions? | 31.3% (24.2%) | 10.2% (16.7%) | 33.6% (20.4%) | 13.0% (16.2%) | 28.6% (28.8%) | 12.2% (23.3%) |
| Responds without really thinking about a question? | 26.8% (21.6%) | 8.6% (14.1%) | 35.5% (17.5%) | 16.4% (14.5%) | 27.6% (25.8%) | 6.9% (17.8%) |
| Responds to questions in ways that are not entirely truthful? | 24.0% (21.4%) | 5.8% (13.4%) | 26.1% (17.5%) | 8.4% (9.4%) | 25.3% (26.8%) | 9.0% (23.6%) |
| Responds in ways that they deem to be socially acceptable? | 45.2% (26.4%) | 34.5% (36.4%) | 50.6% (29.4%) | 38.6% (34.8%) | 46.6% (34.0%) | 31.8% (39.2%) |
| Responds in a way that helps the researcher find support for his or her hypotheses? | 46.3% (31.6%) | 32.3% (37.5%) | 29.0% (28.8%) | 17.6% (30.2%) | 41.6% (31.9%) | 33.9% (38.8%) |
| Falsely reports the frequency with which they engage in certain behaviors? | 21.6% (21.0%) | 3.6% (10.4%) | 24.9% (18.0%) | 6.3% (12.4%) | 20.0% (22.1%) | 4.1% (11.1%) |
| Falsely reports one's age? | 12.3% (17.1%) | 2.2% (10.2%) | 4.5% (12.2%) | 0.3% (1.7%) | 7.7% (15.0%) | 1.9% (10.5%) |
| Falsely reports one's ethnicity? | 10.2% (17.0%) | 1.6% (9.0%) | 4.3% (9.7%) | 1.0% (6.3%) | 7.0% (13.8%) | 1.0% (7.3%) |
| Falsely reports one's gender? | 8.9% (15.5%) | 1.2% (6.6%) | 0.8% (3.6%) | 0.0% (0.0%) | 4.2% (12.9%) | 0.2% (1.6%) |
| Uses a search engine to find the answer to a survey or the key to an experimental task? | 16.7% (20.7%) | 4.5% 13.5%) | 5.0% (11.4%) | 0.0% (0.0%) | 13.8% (24.2%) | 3.6% (13.6%) |
| Spoken to other research participants to find answers to a survey or how to complete a task? | 21.1% (26.3%) | 5.8% (16.6%) | 5.9% (9.8%) | 0.5% (2.4%) | 11.4% (19.9%) | 3.0% (10.3%) |
| Provides privileged information (e.g. answers or instructions on how to complete a certain task) to other research participants? | 11.7% (18.4%) | 2.9% (12.2%) | 9.8% (14.9%) | 1.5% (6.7%) | 18.4% (27.4%) | 6.6% (21.1%) |
| Completes studies while multitasking (e.g. listening to music, checking one’s cell phone, etc.)? | 41.2% (25.5%) | 21.0% (24.0%) | 11.3% (14.2%) | 2.0% (7.0%) | 27.8% (28.6%) | 8.5% (19.0%) |
| Leaves the page of a study and returns at a later point in time? | 27.1% (22.2%) | 11.1% (15.3%) | 12.0% (13.2%) | 3.2% (12.0%) | 13.5% (20.4%) | 3.0% (10.3%) |
| Intentionally participates in the same study more than once? | 11.1% (16.9%) | 2.9% (9.9%) | 7.5% (13.1%) | 0.0% (0.0%) | 7.0% (14.7%) | 1.9% (8.0%) |
| Uses more than one [name when signing up for studies]? | 4.9% (11.8%) | 0.8% (6.7%) | 4.3% (8.1%) | 0.0% (0.0%) | 9.8% (19.8%) | 0.6% (4.4%) |
| Uses a VPN to appear to have a US IP address? | 7.7% (13.4%) | 0.9% (6.7%) | ||||
| Completes studies while completely alone? | 61.0% (21.9%) | 73.4% (23.6%) | ||||
| Completes studies while in the presence of others? | 26.8% (21.1%) | 15.8% (21.9%) | ||||
| Completes studies in a sleepy state? | 26.3% (20.4%) | 12.2% (15.8%) | 41.0% (20.2%) | 32.0% (25.1%) | 21.9% (22.8%) | 11.2% (17.7%) |
| Completes studies under the influence of alcohol or other drugs? | 12.7% (16.7%) | 3.8% (12.2%) | 4.4% (7.1%) | 2.4% (11.9%) | 13.1% (20.2%) | 3.5% (14.2%) |
| Looks for studies by a researcher that they already know? | 56.2% (25.6%) | 34.1% (29.1%) | 17.4% (20.1%) | 3.0% (8.1%) | 10.9% (20.6%) | 5.8% (17.4%) |
| Thoughtfully reads each question in a survey? | 64.2% (19.4%) | 80.2% (16.4%) | 55.7% (20.2%) | 71.2% (22.2%) | 52.6% (27.8%) | 76.8% (28.3%) |
| Contacts a researcher if there was a glitch with their survey? | 50.5% (28.0%) | 32.9% (33.5%) | 51.1% (33.4%) | 19.9% (33.1%) | 44.2% (35.1%) | 19.4% (32.5%) |
| Participates in a survey because the topic is interesting? | 49.4% (25.6%) | 43.5% (27.4%) | 38.1% (23.5%) | 43.7% (29.8%) | 44.3% (32.3%) | 43.2% (35.4%) |
Standard deviations are listed in parentheses. All frequency estimates are percentages. In all materials for the MTurk sample, we called participants “workers” and researchers “requesters” in order to adhere to the terminology used by MTurk.
a Approximately half of laboratory and community samples saw wording for these behaviors that was inconsistent with the wording presented to MTurk participants and were excluded from analyses on these behaviors.
b For MTurk participants, we clarified that this excluded online forums such as TurkOpticon or TurkerNation
c For MTurk participants, “spoken to other research participants” was replaced with “uses TurkOpticon, TurkerNation, or another forum”
d For MTurk participants, “to other research participants” was replaced with “on forums such as TurkOpticon or TurkerNation”
e For MTurk participants, “watching TV” was included as an example
f For MTurk participants, this question stated “Uses more than one MTurk worker ID account.” For campus- and community-based participants, this stated “Uses more than one name when signing up on SONA”
g For campus- and community-based participants, these items were excluded due to their irrelevance to assessing problematic responding behaviors in a physical testing environment
Fig 1Estimates of the frequency of problematic respondent behaviors based on self-estimates.
Error bars represent standard errors. Behaviors for which MTurk participants report greater engagement than more traditional samples are starred. Behaviors for which campus and community samples vary are bolded. Behaviors which vary consistently in both the FO and the FS condition are outlined in a box. Significance was determined after correction for false discovery rate using the Benjamini-Hochberg procedure. Note that frequency estimates are derived in the most conservative manner possible (scoring each range as the lowest point of its range), but analyses are unaffected by this data reduction technique. For complete text of each behavior, see Table 1.
Fig 2Estimates of the frequency of problematic respondent behaviors based on estimates of others’ behaviors.
Error bars represent standard errors. Behaviors for which MTurk participants report greater engagement than more traditional samples are starred. Behaviors for which campus and community samples vary are bolded. Behaviors which vary consistently in both the FO and the FS condition are outlined in a box. Significance was determined after correction for false discovery rate using the Benjamini-Hochberg procedure. Note that frequency estimates are derived in the most conservative manner possible (scoring each range as the lowest point of its range), but analyses are unaffected by this data reduction technique. For complete text of each behavior, see Table 1.