| Literature DB >> 32033614 |
Ron Kohavi1,2, Diane Tang3, Ya Xu4, Lars G Hemkens5, John P A Ioannidis6,7,8,9,10.
Abstract
BACKGROUND: Many technology companies, including Airbnb, Amazon, Booking.com, eBay, Facebook, Google, LinkedIn, Lyft, Microsoft, Netflix, Twitter, Uber, and Yahoo!/Oath, run online randomized controlled experiments at scale, namely hundreds of concurrent controlled experiments on millions of users each, commonly referred to as A/B tests. Originally derived from the same statistical roots, randomized controlled trials (RCTs) in medicine are now criticized for being expensive and difficult, while in technology, the marginal cost of such experiments is approaching zero and the value for data-driven decision-making is broadly recognized. METHODS ANDEntities:
Keywords: A/B tests; Healthcare decision-making; Online experiments; Randomization; Trials
Mesh:
Year: 2020 PMID: 32033614 PMCID: PMC7007661 DOI: 10.1186/s13063-020-4084-y
Source DB: PubMed Journal: Trials ISSN: 1745-6215 Impact factor: 2.279
Example: optimizing after-visit summaries
| In the online space, we learned that small changes ranging from making the website faster to changing font colors can meaningfully affect how a user interacts with a product or service, dramatically impacting key metrics, including revenue [ | |
| In medicine, with the increasing use of electronic health records, after-visit summaries (AVS) are increasingly used, providing patients with relevant and actionable information similar to traditional patient handouts with a goal of increasing patient compliance and understanding. | |
| Given that goal: | |
| • What channel should the AVS use (e.g., paper letter, email, mobile notification) to increase patient engagement? | |
| • When should the summary be sent? Is there a time of day or day of week (e.g., Friday) when the patient is more likely to engage with the AVS? | |
| • What text in the message might motivate patients to follow the link? Can we test how to reduce the friction of getting a user to sign-in and view the AVS once they click on a link? How can we reduce the steps required to see the summary? | |
| • In the AVS summary itself, how is the information presented? Do some layouts improve engagement? Should we present checklists? Reminders? Offer tools (e.g., mobile apps) that can help compliance? | |
| • There is an increasing focus on the importance of social determinants of health outcomes, so what can we do in terms of sharing the visit summaries with caretakers, be it family members or friends? | |
| Similar types of questions can be applied in the medical system, and these are exactly the types of questions that online controlled experiments are designed and already used for [ |
Fig. 1Experimentation growth over the years since experimentation operated at scale of over one new experiment per day
Lessons learned
| • The philosophy of ‘test everything with controlled experiments’, i.e., the consistent and systematic implementation and integration of evaluation into the entire development and application of treatments and innovations is equivalent to the philosophy of ‘randomize the first patient’ principle in medicine, that was introduced more than 40 years ago. However, this has met much more resistance in medicine | |
| • Technological advances and the availability of large-scale data makes it tempting to abandon randomized trials, while randomization is precisely what has turned out to be so useful for the most successful technology companies | |
| • Rather than hindering innovation, randomized trials fostered improvements to products and revenue | |
| • The most innovative technological field has recognized that systematic series of randomized trials with numerous failures of the most promising ideas leads to sustainable improvement | |
| • Various parallels exist in the application of randomization, including the importance of selecting the best evaluation criterions (outcome measures) | |
| • Even tiny changes should ideally undergo continuous and repeated evaluations in randomized trials and learning from their results may be indispensable also for healthcare improvement |
Methodological issues that can be overcome in online experiments to date, difficult in traditional medical RCTs, but potentially relevant in future large-scale medical RCTs
| There are usually many quality checks that are feasible in the online space with large-sample A/B tests. Here are a few examples: | |
| • | |
| • | |
| • |