| Literature DB >> 30652356 |
Tim P Morris1, Ian R White1, Michael J Crowther2.
Abstract
Simulation studies are computer experiments that involve creating data by pseudo-random sampling. A key strength of simulation studies is the ability to understand the behavior of statistical methods because some "truth" (usually some parameter/s of interest) is known from the process of generating the data. This allows us to consider properties of methods, such as bias. While widely used, simulation studies are often poorly designed, analyzed, and reported. This tutorial outlines the rationale for using simulation studies and offers guidance for design, execution, analysis, reporting, and presentation. In particular, this tutorial provides a structured approach for planning and reporting simulation studies, which involves defining aims, data-generating mechanisms, estimands, methods, and performance measures ("ADEMP"); coherent terminology for simulation studies; guidance on coding simulation studies; a critical discussion of key performance measures and their estimation; guidance on structuring tabular and graphical presentation of results; and new graphical presentations. With a view to describing recent practice, we review 100 articles taken from Volume 34 of Statistics in Medicine, which included at least one simulation study and identify areas for improvement.Entities:
Keywords: Monte Carlo; graphics for simulation; simulation design; simulation reporting; simulation studies
Mesh:
Year: 2019 PMID: 30652356 PMCID: PMC6492164 DOI: 10.1002/sim.8086
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
Key steps and decisions in the planning, coding, analysis and reporting of simulation studies
| Section | |
|---|---|
|
|
|
| Aims |
|
| · Identify | |
| Data‐generating mechanisms |
|
| · In relation to the aims, decide whether to use resampling or simulation from some parametric model. | |
| · For simulation from a parametric model, decide how simple or | |
| complex the model should be and whether it should be based on real data. | |
| · Determine what factors to vary and the levels of factors to use. | |
| · Decide whether factors should be varied fully factorially, partly factorially or one‐at‐a‐time. | |
| Estimand/target of analysis |
|
| · Define estimands and/or other targets of the simulation study. | |
| Methods |
|
| · Identify methods to be evaluated and consider whether they are appropriate for estimand/target identified. | |
| For method comparison studies, make a careful review of the literature to ensure inclusion of relevant methods. | |
| Performance measures |
|
| · List all performance measures to be estimated, justifying their relevance to estimands or other targets. | |
| · For less‐used performance measures, give explicit formulae for the avoidance of ambiguity. |
|
| · Choose a value of |
|
|
|
|
| · Separate scripts used to analyze simulated datasets from scripts to analyze estimates datasets. | |
| · Start small and build up code, including plenty of checks. | |
| · Set the random number seed once per simulation repetition. | |
| · Store the random number states at the start of each repetition. | |
| · If running chunks of the simulation in parallel, use separate streams of random numbers. | |
|
|
|
| · Conduct exploratory analysis of results, particularly graphical exploration. | |
| · Compute estimates of performance and Monte Carlo SEs for these estimates. |
|
|
|
|
| · Describe simulation study using ADEMP structure with sufficient rationale for choices. | |
| · Structure graphical and tabular presentations to place performance of competing methods side‐by‐side. | |
| · Include Monte Carlo SE as an estimate of simulation uncertainty. |
|
| · Publish code to execute the simulation study including user‐written routines. |
|
Figure A1Reviewer agreement on key variables for Statistics in Medicine Volume 34 review. Frequency of agreement of TPM with IRW (marker W) and MJC (marker C). For the same frequency, C is nudged left and W right to avoid visual clash [Colour figure can be viewed at wileyonlinelibrary.com]
Description of notation
|
| An estimand (conceptually); also true value of the estimand |
|
| Sample size of a simulated dataset |
|
| Number of repetitions used; the simulation sample size |
|
| Indexes the repetitions of the simulation |
|
| the estimator of |
|
| the estimate of |
|
| the mean of
|
|
| the true variance of
|
|
| an estimate of
|
|
| the nominal significance level |
|
| the p‐value returned by the |
Figure A2Results of Statistics in Medicine Volume 34 review for data‐generating mechanisms. Values are both frequency and %
Possible targets of a simulation study and relevant performance measures
| Statistical Task | Target | Examples of Performance | Example |
|---|---|---|---|
| Measures | |||
|
| |||
| Estimation | Estimand | Bias, empirical SE, | Kuss compares a number of existing |
| mean‐squared error, coverage | methods in terms of bias, power, and coverage. | ||
| Testing | Null hypothesis | Type I error rate, power | Chaurasia and Harel compare new methods |
| inn terms of type I and II error rates. | |||
| Model selection | Model | Correct model rate, sensitivity | Wu et al compare four new methods in terms |
| or specificity for covariate selection | of “true positive” and “false positive” rates of covariate | ||
| selection | |||
| Prediction | Prediction/s | Measures of predictive accuracy, | Ferrante compares four methods in terms of mean |
| calibration, discrimination | absolute prediction error, etc. | ||
|
| |||
| Design a study | Selected design | Sample size, expected sample | Zhang compares designs across multiple data‐generating |
| size, power/precision | mechanisms in terms of number of significant test | ||
| results (described as “gain”) and frequency of achieving | |||
| the (near) optimal design. |
Figure A3Results of Statistics in Medicine Volume 34 review for estimands (A) and methods (B) evaluated
Performance measures evaluated in review of Volume 34 (frequency (and %))
|
| By Primary Target | |||||
|---|---|---|---|---|---|---|
| Null Hyp‐ | Selected | Predictive | ||||
| Performance | Estimand | othesis | Model | Performance | Other | |
| Measure | ( | ( | ( | ( | ( | |
| Convergence |
| 10/61 (16%) | 1/15 (7%) | 1/6 (17%) | 0/2 | 0/1 |
| Bias |
| 59/64 (92%) | 1/9 (11%) | 0/2 | 2/3 | 1/2 |
| Empirical SE |
| 31/62 (50%) | 0/9 | 0/2 | 0/3 | 0/2 |
| Mean squared error |
| 22/62 (35%) | 2/9 (22%) | 0/2 | 1/3 | 1/2 |
| Model SE |
| 21/62 (34%) | 1/9 (11%) | 0/2 | 0/3 | 0/1 |
| Type I error |
| 8/62 (13%) | 18/21 (86%) | 4/6 | 0/3 | 1/3 |
| Power |
| 8/63 (13%) | 14/20 (17%) | 4/6 | 0/3 | 2/3 |
| Coverage |
| 39/63 (62%) | 1/9 (11%) | 0/2 | 1/3 | 1/2 |
| Conf. int. length |
| 9/63 (14%) | 0/10 | 0/2 | 1/3 | 1/2 |
Note: Denominator changes across performance measures because not all are applicable in all simulation studies.
The different datasets that may be involved in a simulation study
| Dataset | Description and Notes |
|---|---|
| Simulated | A dataset of size |
| which one or more methods are applied to produce some quantity relating to | |
| the | |
| Estimates | Dataset containing |
| repetitions (eg,
| |
| for each combination of data‐generating mechanism, method, and target (eg, each estimand). | |
| States | Dataset of containing |
| start state for each simulated dataset and the final state (see Section | |
| Performance measures | Dataset containing estimated performance and Monte Carlo standard |
| errors for each data‐generating mechanism, method and target. |
or corresponding summaries for nonestimand targets
Software mentioned in simulation reports, review of Statistics in Medicine Volume 34. Note that there are more than 100 entries as some articles reported more than one package
| Software | Freq. |
|---|---|
|
| 38 |
| C | 1 |
|
| 1 |
|
| 1 |
| R | 41 |
|
| 17 |
| Sa | 1 |
| Stata | 4 |
| Stat | 1 |
| Win | 3 |
Figure 5“Zip plot” of the 1600 confidence intervals for each data‐generating mechanism and analysis method. The vertical axis is the fractional centile of |z| with associated with the confidence interval [Colour figure can be viewed at wileyonlinelibrary.com]
Performance measures: definitions, estimates and Monte Carlo standard errors
| Performance | |||
|---|---|---|---|
| Measure | Definition | Estimate | Monte Carlo SE of Estimate |
| Bias |
|
|
|
| EmpSE |
|
|
|
| Relative % increase in precision (B vs A) |
|
|
|
| MSE |
|
|
|
| Average ModSE |
|
|
|
| Relative % error in ModSE |
|
|
|
| Coverage |
|
|
|
| Bias‐eliminated coverage |
|
|
|
| Rejection % (power or type I error) | Pr( |
|
|
Monte Carlo SEs are approximate for Relative % increase in precision, Average ModSE, and Relative %error in ModSE.
.
Figure 1The impacts of bias and empirical SE on root MSE and coverage of nominal 95% confidence intervals, compared for three methods: Method A is unbiased but imprecise; Method B is biased (independent of n obs) and more precise; Method C is biased (with bias ) and the same precision as method B. The comparison of root MSE and coverage depends on the choice of n obs; the constant bias of method B dominates its increasingly poor MSE and coverage as n obs increases [Colour figure can be viewed at wileyonlinelibrary.com]
Coverage conditional on size of ModSE
| Approach |
| Coverage (Monte Carlo |
|---|---|---|
| All observations | 30,000 | 95.0%(0.1%) |
| Conditional: ModSE in highest third | 10,000 | 98.0%(0.1%) |
| Conditional: ModSE in middle third | 10,000 | 95.5% (0.2%) |
| Conditional: ModSE in lowest third | 10,000 | 91.5% (0.3%) |
Figure 2Visualisation of the true hazard rate over follow‐up time in the two data‐generating mechanisms. Black (flat) lines are for the first data‐generating mechanism, where γ = 1; Red curves are for the second, where γ = 1.5 [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 3Plot of the 1600 (left panels) and (right panels) by data‐generating mechanisms, for the three analysis methods. The vertical axis is repetition number, to provide some separation between points. The yellow pipes are sample means [Colour figure can be viewed at wileyonlinelibrary.com]
Figure 4Comparison of estimates for methods when γ = 1.5, where each point represents one repetition. A, Upper triangle displays ; lower triangle displays ; B, Plot of difference vs mean for and , with Weibull as the comparator
Figure 6Lollipop plot of performance for measures of interest (Monte Carlo 95% confidence intervals in parentheses). Concerning features need not be highlighted since they are readily visible. See, also, Table 8
Estimates of performance for measures of interest (Monte Carlo SEs in parentheses). Concerning results are highlighted in bold. See, also, Figure 6
| Performance | Data‐generating | Method | |||||
|---|---|---|---|---|---|---|---|
| Measure | Mechanism | Exponential | Weibull | Cox | |||
| Bias |
| −0.003 | (0.005) | −0.003 | (0.005) | −0.002 | (0.005) |
|
|
| (0.003) | 0.005 | (0.004) | 0.006 | (0.004) | |
| Coverage |
| 95.4% | (0.5) | 95.4% | (0.5) | 95.4% | (0.5) |
|
| 96.0% | (0.5) | 95.6% | (0.5) | 95.8% | (0.5) | |
| Bias‐eliminated |
| 95.6% | (0.5) | 95.3% | (0.5) | 95.4% | (0.5) |
| coverage |
|
| (0.4) | 95.7% | (0.5) | 96.1% | (0.5) |
| Empirical SE |
| 0.209 | (0.004) | 0.209 | (0.004) | 0.209 | (0.004) |
|
| 0.138 | (0.002) | 0.152 | (0.003) | 0.151 | (0.003) | |
| Relative precision |
| 0.2% | (0.1) | 0 | (–) | 0.3% | (0.1) |
| gain vs Weibull |
| 20.5% | (0.4) | 0 | (–) | 0.6% | (0.2) |
| Model SE |
| 0.208 | (<0.001) | 0.208 | (<0.001) | 0.208 | (<0.001) |
|
| 0.154 | (<0.001) | 0.154 | (<0.001) | 0.154 | (<0.001) | |
| Relative error in |
| −0.7% | (1.8) | −0.7% | (1.8) | −0.5% | (1.8) |
| Model SE |
|
| (2.0) | 1.7% | (1.8) | 2.1% | (1.8) |