| Literature DB >> 31072934 |
Michelle N Meyer1, Patrick R Heck2,3, Geoffrey S Holtzman2,3, Stephen M Anderson2,3, William Cai4, Duncan J Watts4, Christopher F Chabris3,5.
Abstract
Randomized experiments have enormous potential to improve human welfare in many domains, including healthcare, education, finance, and public policy. However, such "A/B tests" are often criticized on ethical grounds even as similar, untested interventions are implemented without objection. We find robust evidence across 16 studies of 5,873 participants from three diverse populations spanning nine domains-from healthcare to autonomous vehicle design to poverty reduction-that people frequently rate A/B tests designed to establish the comparative effectiveness of two policies or treatments as inappropriate even when universally implementing either A or B, untested, is seen as appropriate. This "A/B effect" is as strong among those with higher educational attainment and science literacy and among relevant professionals. It persists even when there is no reason to prefer A to B and even when recipients are treated unequally and randomly in all conditions (A, B, and A/B). Several remaining explanations for the effect-a belief that consent is required to impose a policy on half of a population but not on the entire population; an aversion to controlled but not to uncontrolled experiments; and a proxy form of the illusion of knowledge (according to which randomized evaluations are unnecessary because experts already do or should know "what works")-appear to contribute to the effect, but none dominates or fully accounts for it. We conclude that rigorously evaluating policies or treatments via pragmatic randomized trials may provoke greater objection than simply implementing those same policies or treatments untested.Entities:
Keywords: A/B tests; field experiments; pragmatic trials; randomized controlled trials; research ethics
Mesh:
Year: 2019 PMID: 31072934 PMCID: PMC6561206 DOI: 10.1073/pnas.1820701116
Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN: 0027-8424 Impact factor: 11.205
Fig. 1.Results of safety checklist study and replications (studies 1 and 2). (A) Initial MTurk experiment; (B) direct replication; (C) replication with alternate vignette; (D) replication on Pollfish platform. Responses were made on a five-point scale but are presented here as percentage of participants who chose “very inappropriate” or “somewhat inappropriate,” to reflect the rate of disapproval.
Experiment disapproval observed in multiple domains (study 3)
| Scenario | Condition | % objecting | SD | SEM | A/B effect | ||
| Genetic | A | 97 | 15.5 | 4.13 | 1.17 | 0.12 | |
| Testing | B | 102 | 8.8 | 4.30 | 1.00 | 0.10 | |
| A/B | 179 | 21.2 | 3.60 | 1.26 | 0.09 | ||
| Autonomous | A | 104 | 11.5 | 4.20 | 1.22 | 0.12 | |
| Vehicles | B | 100 | 19.0 | 3.98 | 1.46 | 0.15 | |
| A/B | 193 | 28.0 | 3.51 | 1.33 | 0.10 | ||
| Retirement | A | 98 | 19.4 | 3.76 | 1.32 | 0.13 | |
| Plans | B | 103 | 19.4 | 3.90 | 1.35 | 0.13 | |
| A/B | 95 | 36.8 | 3.27 | 1.32 | 0.14 | ||
| Health | A | 96 | 9.4 | 4.14 | 0.98 | 0.10 | |
| Worker | B | 101 | 9.9 | 4.10 | 1.03 | 0.10 | |
| Recruitment | A/B | 96 | 16.7 | 3.70 | 1.19 | 0.12 | |
| Poverty | A | 96 | 24.0 | 3.57 | 1.25 | 0.13 | |
| Alleviation | B | 103 | 10.7 | 4.06 | 1.15 | 0.11 | |
| A/B | 103 | 35.0 | 3.31 | 1.24 | 0.12 | ||
| Teacher | A | 99 | 12.1 | 4.04 | 1.10 | 0.11 | |
| Well-being | B | 97 | 27.8 | 3.64 | 1.36 | 0.14 | |
| A/B | 104 | 31.7 | 3.34 | 1.25 | 0.12 | ||
| Basic | A | 102 | 20.6 | 3.73 | 1.15 | 0.11 | |
| Income | B | 106 | 18.9 | 3.75 | 1.14 | 0.11 | |
| A/B | 96 | 21.9 | 3.64 | 1.21 | 0.12 |
Selected coding results for studies 1, 2, 4, and 5: Percentage of participants in each condition who provided each of four reasons for their appropriateness rating
| Codes received | Condition | |||
| Study 1 (checklist) | Badge | Poster | A/B learn | A/B |
| Inequality | 0 | 0 | 11 | 11 |
| Consent | 0 | 0 | 11 | 7 |
| Experimentation | 0 | 1 | 34 | 22 |
| Randomization | 0 | 0 | 1 | 3 |
| Study 2a (checklist—direct replication) | Badge | Poster | A/B learn | A/B |
| Inequality | 0 | 0 | 17 | 16 |
| Consent | 0 | 0 | 7 | 13 |
| Experimentation | 0 | 0 | 32 | 21 |
| Randomization | 0 | 0 | 8 | 4 |
| Study 4 (drug effectiveness) | Drug A | Drug B | A/B learn | |
| Inequality | 0 | 0 | 4 | |
| Consent | 0 | 0 | 16 | |
| Experimentation | 0 | 0 | 18 | |
| Randomization | 0 | 0 | 6 | |
| Study 5a (drug effectiveness walk-in) | Drug A | Drug B | A/B learn | |
| Inequality | 1 | 0 | 2 | |
| Consent | 0 | 2 | 14 | |
| Experimentation | 0 | 0 | 21 | |
| Randomization | 0 | 0 | 4 | |
Fig. 2.Disapproval of experiments not explained primarily by joint evaluation or aversion to randomization. (A) study 4, MTurk; (B) study 5, MTurk; (C) study 5, Pollfish.
Fig. 3.Results of safety checklist and drug effectiveness replications in healthcare clinicians (study 6). (A) safety checklist; (B) drug effectiveness.