| Literature DB >> 26244107 |
Andy T Woods1, Carlos Velasco2, Carmel A Levitan3, Xiaoang Wan4, Charles Spence2.
Abstract
This article provides an overview of the recent literature on the use of internet-based testing to address important questions in perception research. Our goal is to provide a starting point for the perception researcher who is keen on assessing this tool for their own research goals. Internet-based testing has several advantages over in-lab research, including the ability to reach a relatively broad set of participants and to quickly and inexpensively collect large amounts of empirical data, via services such as Amazon's Mechanical Turk or Prolific Academic. In many cases, the quality of online data appears to match that collected in lab research. Generally-speaking, online participants tend to be more representative of the population at large than those recruited for lab based research. There are, though, some important caveats, when it comes to collecting data online. It is obviously much more difficult to control the exact parameters of stimulus presentation (such as display characteristics) with online research. There are also some thorny ethical elements that need to be considered by experimenters. Strengths and weaknesses of the online approach, relative to others, are highlighted, and recommendations made for those researchers who might be thinking about conducting their own studies using this increasingly-popular approach to research in the psychological sciences.Entities:
Keywords: Citizen science; Haxe; Internet-based testing; Mechanical Turk; Perception; Prolific academic
Year: 2015 PMID: 26244107 PMCID: PMC4517966 DOI: 10.7717/peerj.1058
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Articles per year found on the Web of Science prior to 2015.
The number of articles found on the Web of Science prior to 2015 with the search term ‘Mechanical Turk’ within the ‘psychology’ research area (search conducted on 12th March, 2015).
Popular recruitment platforms and some of their characteristics.
| Mechanical Turk | Prolific academic | ||
|---|---|---|---|
| Participants | Potentially available | 10k–500k | 5k |
| Mostly originate from | USA, India | USA, UK | |
| Can specify country from which to recruit? | yes | yes | |
| Participant reputation system | yes | yes | |
| Money | Fee on top of participant fee | 10%–30% | 10% |
| Bonus Payments possible | yes | yes (on request) | |
| Minimum payments | no | yes | |
| Access without US credentials? | no | yes | |
| Researcher-participant messaging | yes | yes | |
Notes.
Note that some MTurkers have a “Masters” performance-based qualification (see https://www.reddit.com/r/mturk/comments/1qmaqc/how_do_i_earn_masters_qualification/). MTurk charges researchers 30% of their participant fees for recruiting from this Masters group. Be aware that when creating a task for MTurkers to do using Amazon’s own ‘web interface’ creation tool, ‘Masters’ is set as the default group from which you wish to recruit.
Age and sex characteristics of 4 recent large internet- and phone-based sample studies.
Note that 12.5% of Mason & Suri’s (2012) participants did not report their gender.
| Recruitment platform | Sample |
| % female | Average age (SD) | |
|---|---|---|---|---|---|
|
| MTurk | US | 2,737 | 40% | 29.9 (9.6) |
|
| TextMyBrain | World | 4,080 (study 1) | 65% | 26 (11) |
|
| MTurk | World | 2,896 (5 studies) | 55% | 32 |
|
| MTurk | World | 3,006 | 55% | 32.8 (11.5) |
Figure 2The distribution of ages for US and Indian participants recruited via Mechanical Turk or tested in a lab-based setting in the USA and India (Woods et al., 2015).
Figure 3The rate of experiment completion over a four-hour period (n = 360; collected February, 2015, from 8 pm onward, Eastern Standard Time; R Pechey, A Attwood, M Munafò, NE Scott-Samuel, A Woods & TM Marteau, pers. comm., 2015).
The first author’s suspicion is that ‘long tail’ sign-ups typically observed in MTurk are the result of some participants signing up and then quitting a study, and the resultant ‘time-out’ delay before a new person can take the unfinished slot.
Popular online research platforms, their main features, strengths and weaknesses, as reported by their developers.
Survey conducted through Google Forms, on 13-3-2015, which is not listed in the below table on account of being mostly questionnaire-focused and thus ‘neutral territory’ for responders.
| JsPsych | Inquisit | LimeSurvey | ScriptingRT | Qualtrics | SoPHIE | Tatool | Unipark | WebDMDX | Xperiment | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Open-source | yes | no | yes | yes | no | citeware | yes | no | no | yes (in beta) | |
| Yearly Fee for one researcher (USD) | 1495 | ? | ? | 138.12 | |||||||
| Publish directly to crowd sourcing sites | no | with addons | no | no | MTurk | yes | yes | no | no | MTurk, ProlificAc | |
| Questionnaire vs Perceptual Research focus (Q vs R) |
|
|
|
|
|
|
|
|
|
| |
| Coding required for | Software setup | yes | no | no | yes | no | no | no | no | no | yes |
| Creating a study | yes | script based | no | script based | no | no | no | no | script based | script based | |
| Possible trial orderings | Random | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes |
| Counterbalanced | yes | yes | no | no | yes | yes | yes | yes | yes | yes | |
| Blocked | yes | yes | yes | no | yes | yes | yes | yes | yes | yes | |
| Reaction times measurable in | ms | ms | ms | ms | ms | ms | ms | ms | ms | ms | |
| Image, sound & video stimuli | yes | yes | yes | yes | yes | yes | yes | yes | yes | yes | |
Notes.
Suffixed rows requested by reviewer/data not provided by developers.
Many academic institutions have licenses with Qualtrics already. Individual academic pricing was not disclosed to us and could not be found via search engines. Note also that some features (e.g., De Leeuw (2014) more advanced randomization) may require a more expensive package.
“Free as CiteWare, Commercial Hosting Service from SoPHIE Labs (950 USD/year)”.
Although all platforms let the researcher provide a URL where the participant can undertake a study, some crowd-sourcing sites need to communicate directly with the testing software in order to know, for example, whether the participant should be paid.
“None directly; but it can be used to publish on any platform that allows for custom JavaScript and HTML content”.
See http://www.millisecond.com/support/docs/v4/html/howto/interopsurveys.htm.
“Any crowd sourcing site that allows an external link to Tatool to run an experiment (no login required)”.
“It uses an HTML POST command so pretty much anything, depends how skilled you are. We provide a site running a general purpose script to gather data and email it to experimenters should people not be in a position to setup a site to gather the data”.
As discussed in the text, both these packages run outside of the browser and thus likely to more reliably and more accurately measure reaction time.
This was a topic of contention amongst our reviewers. However, as LimeSurvey is extendable with packages such as http://www.w3.org/TR/hr-time/, timing accuracy within this framework is quite on par with other browser based frameworks.
Figure 4An example TurkOpticon requester profile.
An example TurkOpticon requester profile (74 MTurkers having provided feedback on the requester).
Figure 5Likelihood of stimuli of different presentation durations appearing on screen.
Likelihood of stimuli of different presentation durations appearing on screen, or doing so with the wrong start time, end time, and/or duration (screen refresh of 16.67 ms).