| Literature DB >> 32719133 |
Patrick R Heck1,2, Christopher F Chabris2, Duncan J Watts3, Michelle N Meyer4.
Abstract
We resolve a controversy over two competing hypotheses about why people object to randomized experiments: 1) People unsurprisingly object to experiments only when they object to a policy or treatment the experiment contains, or 2) people can paradoxically object to experiments even when they approve of implementing either condition for everyone. Using multiple measures of preference and test criteria in five preregistered within-subjects studies with 1,955 participants, we find that people often disapprove of experiments involving randomization despite approving of the policies or treatments to be tested.Entities:
Keywords: A/B tests; field experiments; pragmatic trials; randomized controlled trials; research ethics
Mesh:
Year: 2020 PMID: 32719133 PMCID: PMC7430984 DOI: 10.1073/pnas.2009030117
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.(Top) Percentages of participants objecting to implementing policy A, policy B, or running an A/B test (experiment). (Bottom) Mean appropriateness ratings, with SEs, for the A, B, and A/B conditions. “A/B Test (WS)” refers to the A/B condition in the present studies using a within-subjects design; “A/B Test (BS)” refers to previous A/B condition ratings from a between-subjects design (3).
Descriptive and inferential statistics for tests of the A/B Effect and “experiment aversion”
| Descriptive results | Inferential results | |||||
| Scenario | Variable | Mean (SD) rating | Rank: Best | Rank: Worst | Test description | Test outcome |
| Hospital Safety Checklist ( | A | 3.85 (1.06) | 26% | 31% | Mean(A,B) vs. Mean(AB) | |
| B | 4.13 (0.90) | 38% | 24% | Min(A,B) vs. Mean(AB) | ||
| AB | 3.33 (1.39) | 37% | 46% | Mean(A,B) < AB | 53%*** ± 6% | |
| Mean(A,B) | 3.99 (0.78) | Min(A,B) < AB | 37%*** ± 6% | |||
| Min(A,B) | 3.63 (1.08) | AB = 1,2 & A,B = 3,4,5 | 27%*** ± 5% | |||
| Best Drug: Walk-In Clinic ( | A | 3.96 (1.04) | 22% | 29% | Mean(A,B) vs. Mean(AB) | |
| B | 3.93 (1.02) | 19% | 35% | Min(A,B) vs. Mean(AB) | ||
| AB | 3.47 (1.40) | 59% | 37% | Mean(A,B) < AB | 43%*** ± 6% | |
| Mean(A,B) | 3.95 (0.99) | Min(A,B) < AB | 40%*** ± 6% | |||
| Min(A,B) | 3.86 (1.08) | AB = 1,2 & A,B = 3,4,5 | 27%*** ± 5% | |||
| Consumer Genetic Testing ( | A | 4.00 (1.07) | 34% | 26% | Mean(A,B) vs. Mean(AB) | |
| B | 4.06 (1.08) | 43% | 19% | Min(A,B) vs. Mean(AB) | ||
| AB | 3.17 (1.31) | 23% | 55% | Mean(A,B) < AB | 59%*** ± 5% | |
| Mean(A,B) | 4.03 (0.89) | Min(A,B) < AB | 46%*** ± 5% | |||
| Min(A,B) | 3.65 (1.13) | AB = 1,2 & A,B = 3,4,5 | 27%*** ± 5% | |||
| Employee Retirement Plans ( | A | 4.12 (1.04) | 37% | 27% | Mean(A,B) vs. Mean(AB) | |
| B | 4.06 (1.02) | 29% | 25% | Min(A,B) vs. Mean(AB) | ||
| AB | 3.36 (1.35) | 34% | 49% | Mean(A,B) < AB | 53%*** ± 5% | |
| Mean(A,B) | 4.09 (0.88) | Min(A,B) < AB | 42%*** ± 5% | |||
| Min(A,B) | 3.77 (1.10) | AB = 1,2 & A,B = 3,4,5 | 26%*** ± 4% | |||
| Autonomous Vehicles ( | A | 3.67 (1.18) | 24% | 44% | Mean(A,B) vs. Mean(AB) | |
| B | 3.94 (1.10) | 34% | 26% | Min(A,B) vs. Mean(AB) | ||
| AB | 3.62 (1.41) | 42% | 31% | Mean(A,B) < AB | 40%*** ± 4% | |
| Mean(A,B) | 3.80 (0.87) | Min(A,B) < AB | 22%*** ± 4% | |||
| Min(A,B) | 3.31 (1.16) | AB = 1,2 & A,B = 3,4,5 | 16%*** ± 3% | |||
The “Scenario” column lists vignettes and sample sizes for studies 1–5. The next four columns display each study’s descriptive results. The last two columns report five hypothesis tests for each study, each assessing a different criterion for the A/B Effect (ABE). The first test evaluates the original criterion (3) and was always preregistered as confirmatory. The second test evaluates Mislavsky et al.’s proposed criterion (9). The remaining tests compare the observed percentage of participants meeting the stated criterion against a null hypothesis of zero, which would indicate no “experiment aversion.” For the fifth test, “AB = 1,2” indicates a rating of very or somewhat inappropriate, and “AB = 3,4,5” indicates a rating that is not explicitly inappropriate. Symbols denote statistical significance for ABE: ***P < 0.001; **P = 0.001, *P = 0.01.