| Literature DB >> 28888225 |
Anne-Laure Boulesteix1, Rory Wilson2, Alexander Hapfelmeier3.
Abstract
BACKGROUND: The goal of medical research is to develop interventions that are in some sense superior, with respect to patient outcome, to interventions currently in use. Similarly, the goal of research in methodological computational statistics is to develop data analysis tools that are themselves superior to the existing tools. The methodology of the evaluation of medical interventions continues to be discussed extensively in the literature and it is now well accepted that medicine should be at least partly "evidence-based". Although we statisticians are convinced of the importance of unbiased, well-thought-out study designs and evidence-based approaches in the context of clinical research, we tend to ignore these principles when designing our own studies for evaluating statistical methods in the context of our methodological research. MAIN MESSAGE: In this paper, we draw an analogy between clinical trials and real-data-based benchmarking experiments in methodological statistical science, with datasets playing the role of patients and methods playing the role of medical interventions. Through this analogy, we suggest directions for improvement in the design and interpretation of studies which use real data to evaluate statistical methods, in particular with respect to dataset inclusion criteria and the reduction of various forms of bias. More generally, we discuss the concept of "evidence-based" statistical research, its limitations and its impact on the design and interpretation of real-data-based benchmark experiments.Entities:
Keywords: Clinical trial; Comparison study; Good practice; Method evaluation
Mesh:
Year: 2017 PMID: 28888225 PMCID: PMC5591542 DOI: 10.1186/s12874-017-0417-2
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Analogy between clinical research and computational statistical research
| Clinical research | Computational statistical research | |
|---|---|---|
| Trial type | In vitro/animal study | Simulation |
| Clinical trial | Benchmark study | |
| Blinded | Neutral and blind analysis | |
| (Placebo) controlled | (Null-model) controlled | |
| Cross-over | Paired samples | |
| Multi-arm | Multiple methods | |
| Investigators | Trialist | Researcher conducting benchmark experiment |
| Medical researcher | Methodological researcher in computational statistics | |
| Sponsor | Methodological researcher in computational statistics | |
| Observation unit | Clinical trial patients | Real datasets |
| Comparators | Therapies, interventions and controls | Statistical and machine learning methods |
| Problem | Treatment of medical condition | Answering a question using data, e.g. prediction problem |
| Context | Patient’s preference, social context | Substantive context |
| Personalized medicine | Meta-learning | |
| Objectives | Improving patient’s health | Yielding reliable answer, e.g. increasing prediction performance |
| Selecting and applying therapy to patient | Selecting and applying methods to datasets | |
| Application by medical practitioner | Application by statistical practitioner/consultant | |
| Endpoints | Relevant clinical endpoints | Error rate, AUC, computing time, etc. |
| Missing value (e.g. dropout) | Failure to produce output |
Some ideas for the improvement of benchmarking practice
| Clinical research | Treated in | Transfer into computational | Example(s) |
|---|---|---|---|
| statistical research? | |||
| Sample size calculation | [ | Possible and desirable [ | [ |
| Strict inclusion criteria | Sec. 3 | Possible and desirable | [ |
| Trial protocol | Sec. 4.1 | Principle might be helpful in adapted form | |
| Quality control | Sec. 4.2 | Principle might be helpful in adapted form | |
| e.g. via platforms like OpenML [ | |||
| Placebo/reference | Sec. 4.3 | Principle might be helpful in adapted form | |
| Blinding | Sec. 4.4.1 | Principle might be helpful in adapted form | |
| Intention-to-treat | Sec. 4.4.2 | Adequate treatment and reporting of | [ |
| missing values: possible and desirable | |||
| Levels of evidence | Sec. 4.5 | Principle might be helpful in adapted form |
Fig. 1Evidence pyramid. Suggested levels of evidence for results of benchmark studies designed for the comparison of statistical methods using real data. A neutral study is conducted by researchers that do not have a preference for any particular method and are (at least as a collective) approximately equally experienced with each of the considered methods. A non-neutral study is one in which the researchers have a potential conscious or subconscious interest in the demonstration of the superiority of a given method (the “preferred method”) or have greater experience in one or more of the methods (again, the “preferred method”) to the extent that it may bias the results. A non-preferred method is a statistical method from a non-neutral study but not that or those method(s) thought to be preferred. Bias in non-neutral studies can advantage preferred methods and disadvantage non-preferred methods, or both