| Literature DB >> 29449783 |
Antonio A Arechar1, Lucas Molleman2,3, Simon Gächter2,4,5,6.
Abstract
Online labor markets provide new opportunities for behavioral research, but conducting economic experiments online raises important methodological challenges. This particularly holds for interactive designs. In this paper, we provide a methodological discussion of the similarities and differences between interactive experiments conducted in the laboratory and online. To this end, we conduct a repeated public goods experiment with and without punishment using samples from the laboratory and the online platform Amazon Mechanical Turk. We chose to replicate this experiment because it is long and logistically complex. It therefore provides a good case study for discussing the methodological and practical challenges of online interactive experimentation. We find that basic behavioral patterns of cooperation and punishment in the laboratory are replicable online. The most important challenge of online interactive experiments is participant dropout. We discuss measures for reducing dropout and show that, for our case study, dropouts are exogenous to the experiment. We conclude that data quality for interactive experiments via the Internet is adequate and reliable, making online interactive experimentation a potentially valuable complement to laboratory studies.Entities:
Keywords: Amazon Mechanical Turk; Behavioral research; Experimental methodology; Internet experiments; Public goods game; Punishment
Year: 2017 PMID: 29449783 PMCID: PMC5807491 DOI: 10.1007/s10683-017-9527-2
Source DB: PubMed Journal: Exp Econ ISSN: 1386-4157
Methodological differences in conducting interactive experiments in the laboratory and on MTurk
| Phase/challenge | Laboratory | Online (MTurk) |
|---|---|---|
|
| ||
| Show-up fees | Typically a small part of total payoffs. Guaranteed when participant shows up to the session | Relatively large show-up fees promote recruitment rates, thereby facilitating prompt group formation. Experimenter can approve or reject the task submitted; if rejected no fee is paid |
| Inviting participants | Invitations sent well in advance, participants commit to a session. Recruitment often from a pre-existing database | Sessions advertised online as HITs and can be completed immediately |
| Selection into the experiment | At sign-up, participants know very little about the experiment. Details of the task are communicated once participants are in the laboratory | Experiments are typically advertised as HITs with a brief task description. ‘Workers’ browse available HITs and accept those of their preference |
| Experienced participants | Invitation conditioned on well-defined criteria of the laboratory’s records | HITs targeted at subsets of MTurk workers; experimenter can specify exclusion criteria. Many MTurk workers will have participated in many prior studies |
|
| ||
| Duplicate participants | Registration protocols usually prevent duplicate participation | Amazon acts against multiple worker accounts, but they exist |
| Comprehension | Participants can ask questions; comprehension questions ensure understanding | Experimenter is physically absent and cannot answer questions directly. Compulsory comprehension questions can be added but may make experiment (too) long for some participants |
|
| ||
| Forming groups | Easy to guess how many participants will attend; group settings can be pre-defined | Hard to guess how many participants will attend; groups can be constructed ‘on the fly’ |
| Deception | In experimental economics deception is prohibited and laboratories foster reputations for non-deception | Because all requesters use the same subject pool, some participants may have experienced deception because requesters from other disciplines may use it |
| Communication | Hardly an issue; experimenter can restrict communication between subjects | Participants may in principle collude through external channels though this is difficult in practice |
| Experimental flow | Closed form software like z-Tree specifies session progress | Scripted browser navigation specifies progress |
| Attrition (‘dropout’) | Hardly an issue; participants that start a session usually finish it | Major challenge to internal validity, if dropout rates vary with treatment, selection bias may arise |
|
| ||
| Payments | Cash usually paid upon completion | Automatic transfer through Amazon |
| Cost per participant | Relatively high but predictable | Relatively low but varies with attrition |
Fig. 1Attrition throughout the course of the experiment. Colors depict the group size. We always started with groups of four but let participants continue if a member dropped out. (Color figure online)
Fig. 2Contributions over time. Numbers in parentheses are the mean contributions in each experimental condition. Error bars indicate 95% confidence intervals (clustered at the group level)
Cooperation dynamics
| Contributions to the public good | ||||||
|---|---|---|---|---|---|---|
| No punishment | Punishment | |||||
| Laboratory | MTurk | Pooled | Laboratory | MTurk | Pooled | |
| Period | −0.900*** | −1.074*** | −1.037*** | 1.139 | 0.514* | 0.682** |
| (0.309) | (0.187) | (0.160) | (0.710) | (0.289) | (0.282) | |
| Final period | −3.400 | −2.292** | −2.512*** | −10.203** | −4.184** | −5.795*** |
| (2.253) | (0.958) | (0.881) | (4.881) | (1.688) | (1.797) | |
| MTurk | 5.421*** | 4.193 | ||||
| (1.867) | (4.904) | |||||
| Constant | 10.470*** | 17.046*** | 11.402*** | 25.980*** | 35.272*** | 29.601*** |
| (1.592) | (0.624) | (1.650) | (3.898) | (3.792) | (4.232) | |
| N | 720 | 2480 | 3200 | 720 | 2480 | 3200 |
| F | 8.75 | 33.66 | 34.45 | 2.19 | 3.12 | 3.75 |
Tobit estimation with left-censoring for ‘No punishment’ and right-censoring for ‘Punishment’. ‘Period’ is period number; ‘Final period’ is a dummy for last period; ‘MTurk’ is a dummy for the MTurk sample. Robust standard errors clustered on groups
* p < 0.1; ** p < 0.05; *** p < 0.01
Cooperation dynamics (no punishment)
| Contribution to the public good (no punishment) | |||
|---|---|---|---|
| Laboratory | MTurk | Pooled | |
| Period | −0.401** | −0.503*** | −0.485*** |
| (0.204) | (0.094) | (0.085) | |
| Final period | −2.826 | −1.316 | −1.600** |
| (1.941) | (0.827) | (0.757) | |
| Mean peer contribution in | 0.953*** | 1.043*** | 1.027*** |
| (0.125) | (0.060) | (0.054) | |
| MTurk | 0.759 | ||
| (0.778) | |||
| Constant | −0.830 | −0.006 | −0.696 |
| (1.674) | (1.237) | (0.931) | |
| N | 648 | 2232 | 2880 |
| F | 29.05 | 163.74 | 177.16 |
Left-censored Tobit estimation. ‘Period’ is period number; ‘Mean peer contribution in t−1′ is the average contribution of the other members in the group in t−1; ‘MTurk’ is a dummy for the MTurk sample. Robust standard errors clustered on groups
* p < 0.1; ** p < 0.05; *** p < 0.01
Fig. 3Frequencies of punishment over time. Frequencies are calculated by counting instances of assigning non-zero deduction points out of the total number of punishment opportunities per participant, per recipient, per period. Mean punishment frequencies in parenthesis. Error bars indicate 95% confidence intervals clustered on groups
Fig. 4Directionality and severity of punishment in our laboratory and online samples. Stacked bars show frequency distributions of punishment decisions. Each bar shows the distribution for a given difference between punishers and their target’s contribution to the public good
Determinants of peer punishment
| Decision to punish (0 = no; 1 = yes) | ||||||||
|---|---|---|---|---|---|---|---|---|
| Logit estimation | ||||||||
| Prosocial punishment | Antisocial punishment | |||||||
| Lab | MTurk | Pooled | Pooled and controls | Lab | MTurk | Pooled | Pooled and controls | |
| Punisher’s contribution | 0.046 | −0.003 | 0.011 | −0.002 | −0.143*** | −0.153*** | −0.142*** | −0.130*** |
| (0.051) | (0.023) | (0.022) | (0.026) | (0.024) | (0.026) | (0.018) | (0.020) | |
| Target’s contribution | −0.059 | −0.118*** | −0.100*** | −0.097*** | −0.013 | 0.010 | 0.002 | 0.004 |
| (0.038) | (0.029) | (0.023) | (0.023) | (0.025) | (0.021) | (0.015) | (0.020) | |
| Others’ avg. contribution | 0.038 | 0.054*** | 0.047*** | 0.065*** | −0.025 | −0.019 | −0.023 | −0.017 |
| (0.035) | (0.019) | (0.017) | (0.017) | (0.028) | (0.024) | (0.018) | (0.019) | |
| Received | −0.034 | −0.025 | −0.028 | −0.076** | 0.129*** | 0.069** | 0.094*** | 0.103*** |
| (0.027) | (0.045) | (0.023) | (0.032) | (0.031) | (0.034) | (0.023) | (0.025) | |
| Period | −0.117*** | −0.111** | −0.113*** | −0.103*** | 0.024 | 0.036 | 0.031 | 0.032 |
| (0.039) | (0.044) | (0.032) | (0.032) | (0.062) | (0.056) | (0.041) | (0.048) | |
| Final period | −0.115 | −0.077 | −0.077 | −0.236 | −1.800*** | 0.595* | −0.159 | −0.014 |
| (0.341) | (0.366) | (0.275) | (0.313) | (0.587) | (0.340) | (0.388) | (0.434) | |
| MTurk | −0.855*** | −0.983*** | −1.132*** | −1.983*** | ||||
| (0.307) | (0.302) | (0.347) | (0.403) | |||||
| Age | 0.005 | 0.025 | ||||||
| (0.017) | (0.019) | |||||||
| Female | −1.171*** | −0.078 | ||||||
| (0.260) | (0.381) | |||||||
| Single child | −0.320 | 0.099 | ||||||
| (0.376) | (0.457) | |||||||
| Foreign | 1.403* | −2.380*** | ||||||
| (0.739) | (0.628) | |||||||
| Membership | 0.019 | −0.838* | ||||||
| (0.288) | (0.462) | |||||||
| Constant | 0.206 | 0.202 | 0.854* | 1.216* | −0.560 | −2.137*** | −0.878** | −1.101 |
| (0.812) | (0.393) | (0.466) | (0.712) | (0.379) | (0.682) | (0.442) | (0.682) | |
| N | 370 | 900 | 1270 | 1201 | 1790 | 6540 | 8330 | 7949 |
| Chi2 | 27.51 | 38.98 | 48.60 | 109.88 | 111.55 | 85.36 | 187.39 | 254.26 |
Logit and Multilevel mixed effects estimation, which allows for individual and group differences, as well as for treatment-specific residuals. We split the analysis into two different types of punishment. Pro-social punishment includes instances where the punisher’s contribution to the public good in that round exceeded that of their target. Anti-social punishment includes instances where the target contributed at least as much as the punisher. ‘Punisher’s contribution’ is the contribution of the participant punishing; ‘Target’s contribution’ is the contribution of the punished participant; ‘Average contribution others’ is the mean contribution of the other two members of the group; ‘Received punishment in t−1′ is the punishment amount received from others in the previous period; ‘Period’ is the period number; ‘Final period’ is a dummy for the last period; ‘MTurk’ is a dummy for the MTurk sample; demographic controls are the same of Table 6. Robust standard errors clustered on groups for the Logit model
* p < 0.10; ** p < 0.05; *** p < 0.01
Determinants of punishment
| Decision to punish (0 = no; 1 = yes) | Punishment severity | |||||
|---|---|---|---|---|---|---|
| Laboratory | MTurk | Pooled | Laboratory | MTurk | Pooled | |
| Target’s contribution | −0.181*** | −0.216*** | −0.203*** | −0.505*** | −0.717*** | −0.641*** |
| (0.034) | (0.015) | (0.016) | (0.074) | (0.051) | (0.043) | |
| Punisher’s contribution | −0.014 | 0.003 | −0.002 | −0.067 | −0.011 | −0.032 |
| (0.034) | (0.026) | (0.022) | (0.106) | (0.074) | (0.063) | |
| Mean contrib. others | 0.040 | 0.065** | 0.058*** | 0.136 | 0.228*** | 0.197*** |
| (0.028) | (0.025) | (0.019) | (0.094) | (0.071) | (0.058) | |
| Rec. punishment in | 0.090*** | 0.097** | 0.092*** | 0.310*** | 0.273** | 0.284*** |
| (0.022) | (0.045) | (0.026) | (0.074) | (0.128) | (0.069) | |
| Period | −0.126** | −0.102*** | −0.111*** | −0.311** | −0.280** | −0.289*** |
| (0.056) | (0.037) | (0.031) | (0.143) | (0.110) | (0.088) | |
| Final period | −0.633* | 0.524* | 0.150 | −1.014 | 2.401*** | 1.271 |
| (0.336) | (0.294) | (0.266) | (1.220) | (0.888) | (0.780) | |
| MTurk | −0.965*** | −2.631*** | ||||
| (0.231) | (0.803) | |||||
| Constant | 0.960* | −0.440 | 0.671** | 1.569 | −1.873* | 1.088 |
| (0.496) | (0.290) | (0.310) | (1.533) | (1.105) | (1.008) | |
| N | 2160 | 7440 | 9600 | 2160 | 7440 | 9600 |
| Pseudo R2 | 0.285 | 0.321 | 0.322 | 0.142 | 0.203 | 0.189 |
Values in columns 1–3 reflect estimates from logistic models fitted to the decisions to punish (0: no deduction points assigned; 1: at least one deduction point assigned). Values in columns 3–6 reflect effect estimates from left-censored Tobit models fitted to the number of deduction points assigned. ‘Target’s contribution’ is the contribution of the punished participant; ‘Punisher’s contribution’ is the contribution of the participant punishing; ‘Average contribution others’ is the mean contribution of the other two members of the group; ‘Received punishment in t−1′ is the punishment amount received from others in the previous period; ‘Period’ is the period number; ‘Final period’ is a dummy for the last period; ‘MTurk’ is a dummy for the MTurk sample. Robust standard errors clustered on group
* p < 0.10; ** p < 0.05; *** p < 0.01
Determinants of attrition
| Participant’s drop out in period t (0 = no; 1 = yes) | |||||
|---|---|---|---|---|---|
| Pooled data | Without punishment | With punishment | |||
| (1) | (2) | (3) | (4) | (5) | |
| Punishment available | 0.056 | 0.362 | 0.107 | ||
| (0.598) | (0.612) | (0.611) | |||
| Period | −0.093* | −0.118** | −0.094* | −0.193*** | −0.184** |
| (0.051) | (0.053) | (0.053) | (0.066) | (0.077) | |
| First period | 2.484*** | 2.375*** | 2.554*** | ||
| (0.377) | (0.376) | (0.382) | |||
| Earnings | −0.002 | 0.011 | |||
| (0.143) | (0.143) | ||||
| Group member(s) dropped out in previous period | 1.896*** | 1.677*** | 2.024*** | ||
| (0.382) | (0.500) | (0.574) | |||
| Relative average contribution | 0.002 | −0.082 | |||
| (0.025) | (0.053) | ||||
| Relative average punishment received | −0.105 | ||||
| (0.086) | |||||
| Relative average punishment given | −0.100 | ||||
| (0.086) | |||||
| Constant | −4.064*** | −3.979*** | −4.220*** | −3.619*** | −2.321* |
| (0.317) | (0.318) | (0.328) | (0.353) | (1.191) | |
| N | 8334 | 8332 | 8332 | 3998 | 3907 |
| AIC | 893.56 | 875.23 | 860.27 | 454.61 | 303.68 |
Values reflect estimates from proportional hazards models fitted to binary events of participants staying (0) or dropping out (1) in a given round of the session, conditional on not having dropped out yet. ‘Punishment available’ is a dummy for the presence or absence of punishment; ‘Period’ is the period number; ‘First period’ is a dummy for the first period; ‘Earnings’ reflect participants’ total earnings relative to all other participants in the experiment in a given period; ‘Group member(s) dropped out in previous period’ is a dummy taking the value of 0 (1) when none (at least one) of the group members had left the session in the previous round (potentially delaying the progress within the session); ‘Relative average contribution’ is the participant’s average contribution to the public good minus the average contribution of their fellow group members in all rounds of the session so far; ‘Relative average received (given) punishment’ are the average punishment received (given) by a participant minus the average punishment received (given) by their fellow group members in all rounds so far
* p < 0.10; ** p < 0.05; *** p < 0.01
Cooperation dynamics
| Contribution to the public good | ||||||||
|---|---|---|---|---|---|---|---|---|
| Tobit estimation | Multilevel mixed effects estimation | |||||||
| No punishment (left censored) | Punishment (right censored) | No punishment | Punishment | |||||
| Period | −1.037*** | −1.042*** | 0.682** | 0.711** | −0.731*** | −0.737*** | 0.091*** | 0.093*** |
| (0.160) | (0.159) | (0.282) | (0.281) | (0.043) | (0.044) | (0.024) | (0.025) | |
| Final period | −2.512*** | −2.611*** | −5.795*** | −5.864*** | −1.277*** | −1.315*** | −1.119*** | −1.082*** |
| (0.881) | (0.907) | (1.797) | (1.863) | (0.408) | (0.420) | (0.233) | (0.241) | |
| MTurk | 5.421*** | 5.390** | 4.193 | 8.282 | 4.218*** | 5.084*** | 0.893 | 2.009 |
| (1.867) | (2.130) | (4.904) | (5.919) | (1.394) | (1.466) | (1.249) | (1.320) | |
| Age | 0.120** | −0.003 | 0.028 | −0.029 | ||||
| (0.058) | (0.144) | (0.029) | (0.025) | |||||
| Female | 2.370** | −4.072 | 0.927* | 0.118 | ||||
| (0.976) | (2.595) | (0.516) | (0.438) | |||||
| Single child | 0.730 | −1.096 | 0.051 | −0.245 | ||||
| (1.262) | (4.071) | (0.715) | (0.607) | |||||
| Foreign | −1.042 | 6.286 | 0.801 | 0.401 | ||||
| (2.557) | (4.622) | (1.196) | (1.065) | |||||
| Membership | 1.242 | 5.536 | 1.764*** | 1.520*** | ||||
| (1.452) | (4.661) | (0.651) | (0.556) | |||||
| Constant | 11.402*** | 6.544** | 29.601*** | 26.270*** | 12.451*** | 10.043*** | 15.871*** | 15.368*** |
| (1.650) | (2.778) | (4.232) | (6.756) | (1.246) | (1.530) | (1.109) | (1.348) | |
| N | 3200 | 3050 | 3200 | 3050 | 3200 | 3050 | 3200 | 3050 |
| Chi2/F | 34.45 | 15.35 | 3.75 | 1.96 | 503.80 | 496.65 | 25.67 | 32.03 |
Tobit and Multilevel mixed effects estimation, which allows for individual and group differences, as well as for treatment-specific residuals. ‘Period’ is period number; ‘Final period’ is a dummy for last period; ‘MTurk’ is a dummy for the MTurk sample; ‘Age’ is the participant’s age; ‘Female’ is a dummy for female participants; ‘Foreign’ is a dummy for participants who grew up outside the US; ‘Membership’ is a dummy for participants who were members of a social club. Robust standard errors clustered on groups for the Tobit model
* p < 0.1; ** p < 0.05; *** p < 0.01
Cooperation dynamics (no punishment)
| Contribution to the public good (no punishment) | ||||
|---|---|---|---|---|
| Tobit estimation | Multilevel mixed effects estimation | |||
| Period | −0.485*** | −0.492*** | −0.500*** | −0.499*** |
| (0.085) | (0.090) | (0.052) | (0.054) | |
| Final period | −1.600** | −1.741** | −0.812** | −0.864** |
| (0.757) | (0.781) | (0.394) | (0.407) | |
| Mean peer contribution in | 1.027*** | 1.000*** | 0.461*** | 0.460*** |
| (0.054) | (0.058) | (0.027) | (0.027) | |
| MTurk | 0.759 | 1.827 | 2.252*** | 2.980*** |
| (0.778) | (1.142) | (0.862) | (0.985) | |
| Age | 0.033 | 0.036 | ||
| (0.039) | (0.031) | |||
| Female | 2.022*** | 1.230** | ||
| (0.674) | (0.557) | |||
| Single child | 0.128 | 0.128 | ||
| (1.017) | (0.789) | |||
| Foreign | −0.321 | 0.191 | ||
| (1.903) | (1.315) | |||
| Membership | 2.153** | 1.692** | ||
| (0.975) | (0.690) | |||
| Constant | −0.696 | −3.582** | 7.073*** | 4.412*** |
| (0.931) | (1.599) | (0.870) | (1.296) | |
| N | 2880 | 2745 | 2880 | 2745 |
| Chi2/F | 177.16 | 80.17 | 822.66 | 799.84 |
Left-censored Tobit and multilevel mixed effects estimation, which allows for individual and group differences, as well as for treatment-specific residuals. ‘Period’ is period number; ‘Final period’ is a dummy for the last period; ‘Mean peer contribution in t−1′ is the average contribution of the other members in the group in t−1; ‘MTurk’ is a dummy for the MTurk sample; demographic controls are the same of Table 6. Robust standard errors clustered on groups; Robust standard errors clustered on groups for the Tobit model
* p < 0.1; ** p < 0.05; *** p < 0.01