| Literature DB >> 33318113 |
Anne-Laure Boulesteix1, Rolf Hh Groenwold2,3, Michal Abrahamowicz4, Harald Binder5, Matthias Briel6,7, Roman Hornung8, Tim P Morris9, Jörg Rahnenführer10, Willi Sauerbrei5.
Abstract
In health research, statistical methods are frequently used to address a wide variety of research questions. For almost every analytical challenge, different methods are available. But how do we choose between different methods and how do we judge whether the chosen method is appropriate for our specific study? Like in any science, in statistics, experiments can be run to find out which methods should be used under which circumstances. The main objective of this paper is to demonstrate that simulation studies, that is, experiments investigating synthetic data with known properties, are an invaluable tool for addressing these questions. We aim to provide a first introduction to simulation studies for data analysts or, more generally, for researchers involved at different levels in the analyses of health data, who (1) may rely on simulation studies published in statistical literature to choose their statistical methods and who, thus, need to understand the criteria of assessing the validity and relevance of simulation results and their interpretation; and/or (2) need to understand the basic principles of designing statistical simulations in order to efficiently collaborate with more experienced colleagues or start learning to conduct their own simulations. We illustrate the implementation of a simulation study and the interpretation of its results through a simple example inspired by recent literature, which is completely reproducible using the R-script available from online supplemental file 1. © Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.Entities:
Keywords: epidemiology; protocols & guidelines; statistics & research methods
Mesh:
Year: 2020 PMID: 33318113 PMCID: PMC7737058 DOI: 10.1136/bmjopen-2020-039921
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Overview of the main criteria for evaluating statistical methods in the four considered examples
| Example | Evaluation criterion | Target value |
| A: testing and CI | Type 1 error | Close to and not greater than nominal value α |
| Type 2 error | Low | |
| Coverage of (1–α) CI | Close to and not lower than nominal value 1–α | |
| B: explaining | Mean coefficient values | Close to true values (low bias) |
| Precision of coefficient estimation | High (low variance) | |
| Coverage of CI | Close to and not lower than nominal value 1–α | |
| Sensitivity of variable selection | High | |
| Specificity of variable selection | High | |
| C: predicting | Prediction error on independent data | Low |
| Accuracy measures | High | |
| D: clustering | Agreement with true cluster structure | High |
| All settings | Stability | High |
| Computational cost | Low | |
| Success of the computation (eg, ‘convergence’) | Yes |
The last column indicates which values the considered evaluation criterion takes if the investigated method is good.
Overview of the key features of a simulation study (first column) with the NHANES example described in the section "An example of a statistical simulation" (second column)
| Key features of simulation studies | NHANES example |
| Aims | To quantify the impact of measurement error. |
| Data generating mechanism | Take real data, add normally distributed random error to the exposure of interest (HbA1c) and/or the confounder (BMI). |
| Method of analysis | Multivariable linear regression, first on data with no measurement error, then on data with measurement error added. |
| Performance measure | Bias in regression coefficient for exposure of interest (HbA1c). |
| Number of repetitions | 1000 |
This table is inspired by the ‘ADEMP’ system (aims, data generating mechanisms, estimands, methods and performance measures) introduced previously in statistical literature.3
BMI, body mass index; HbA1c, glycated haemoglobin; NHANES, National Health and Nutrition Examination Survey.
Figure 1Schematic illustration of the key steps of the example simulation study.
Figure 2Estimates of the association between HbA1c levels and systolic blood pressure after adjustment for confounding by BMI under various simulation scenarios characterised by different levels of measurement error. Numbers represent effect estimates averaged over 1000 simulation repetitions. Red shading represents low (averaged) estimates. Blue shading represents high (averaged) estimates. CIs are omitted for clarity. See text for details. BMI, body mass index; HbA1c, glycated haemoglobin.