| Literature DB >> 27881078 |
Maarten van Smeden1, Joris A H de Groot2, Karel G M Moons2, Gary S Collins3, Douglas G Altman3, Marinus J C Eijkemans2, Johannes B Reitsma2.
Abstract
BACKGROUND: Ten events per variable (EPV) is a widely advocated minimal criterion for sample size considerations in logistic regression analysis. Of three previous simulation studies that examined this minimal EPV criterion only one supports the use of a minimum of 10 EPV. In this paper, we examine the reasons for substantial differences between these extensive simulation studies.Entities:
Keywords: Bias; EPV; Logistic regression; Sample size; Separation; Simulations
Mesh:
Year: 2016 PMID: 27881078 PMCID: PMC5122171 DOI: 10.1186/s12874-016-0267-3
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Graphical representation of separation (complete and quasi-complete) adapted from Albert and Anderson [16]. Sample points for two variables X1 and X2 by outcome (Y): open and filled circles represent different levels of the outcome (Y=0 or 1). (i) No separation; (ii) complete separation by variable X2; (iii) complete separation by variables X1 and X2; (iv) quasi-complete separation by variable X1 and X2
Design factorial simulation studies Ia to Id
| Study | ||||
|---|---|---|---|---|
| Factors | Ia | Ib | Ic | Id |
| Sample size | ||||
| EPV (with steps of) | 15 to 150 (5) | 15 to 150 (5) | 6 to 30 (2) | 6 to 30 (2) |
| Outcome prevalence | 1/2 | 1/2 | 1/2,1/3,1/4,1/5,1/10 | 1/4 |
| Range sample size | 30 to 300 | 60 to 1200 | 24 to 600 | 60 to 300 |
| Effect size | ||||
| Value of | 1/4, 1/2, 1, 2, 4 | 2, 4 | 2 | 2 |
| Value of | Not applicable |
| 2 | 2 |
| Covariates | ||||
| Number ( | 1 | 2, 3, 4 | 2 | 2 |
| Distribution | (Multivariate) standard normal | |||
| Correlation | Not applicable | 0 | 0 | .1,.15,.2,.25 |
Fig. 2Results of simulation study Ia. Accuracy as a function of EPV and true value of the log-odds ratio (β 1). Left panel: maximum likelihood logistic regression, right panel: Firth’s correction
Fig. 3Density of estimated coefficients in simulation at EPV = 20 (study Ia) for different true values of the log-odds ratio. Vertical dashed line is true value of the regression coefficient. Solid line: maximum likelihood logistic regression; dashed line: Firth’s correction
Fig. 4Relative bias simulation studies Ib, Ic, and Id. Left panel: maximum likelihood logistic regression, right panel: Firth’s correction
Results simulation studies Ia to Id
| Study | Study Ia* and Ib | Study Ic and Id | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| EPV | 15 to 30 | 35 to 50 | 55 to 150 | 6 to10 | 12 to 18 | 20 to 30 | ||||||
| Estimator |
|
|
|
|
|
|
|
|
|
|
|
|
| Bias | ||||||||||||
| Average bias | 0.084 | 0.002 | 0.038 | 0.001 | 0.016 | 0.000 | 0.069 | 0.002 | 0.033 | 0.000 | 0.020 | 0.000 |
| max | 0.261 | 0.016 | 0.091 | 0.005 | 0.056 | 0.006 | 0.217 | 0.021 | 0.075 | 0.011 | 0.046 | 0.005 |
| min | 0.025 | -0.004 | 0.013 | -0.002 | 0.004 | -0.005 | 0.023 | -0.005 | 0.016 | -0.003 | 0.009 | -0.003 |
| Average relative bias (%) | 7.8 | 0.1 | 3.6 | 0.1 | 1.5 | 0.0 | 8.4 | 0.4 | 4.8 | 0 | 2.9 | 0 |
| max | 18.8 | 1.2 | 6.6 | 0.5 | 4.0 | 0.5 | 31.2 | 3.0 | 10.8 | 1.6 | 6.5 | 0.7 |
| min | 3.5 | -0.5 | 1.9 | -0.3 | 0.5 | -0.7 | 3.3 | -0.7 | 2.3 | -0.5 | 1.3 | -0.0 |
| >+10% relative bias (%) | 18.8 | 0 | 0 | 0 | 0 | 0 | 37.5 | 0 | 3 | 0 | 0 | 0 |
| Coverage 90% CI | ||||||||||||
| Average coverage (%) | 90.4 | 90.1 | 90.2 | 90.2 | 90.1 | 90.0 | 90.4 | 90.3 | 90.2 | 90.2 | 90.1 | 90.2 |
| max | 92.9 | 90.8 | 91.1 | 90.7 | 91.0 | 90.7 | 92.1 | 91.2 | 90.8 | 90.6 | 90.9 | 90.8 |
| min | 89.1 | 89.4 | 89.3 | 89.6 | 89.4 | 89.2 | 89.6 | 89.6 | 89.7 | 89.6 | 89.3 | 89.6 |
| > ± 1% nominal (%) | 15.6 | 0 | 3.1 | 0 | 0.6 | 0 | 10 | 2.5 | 0 | 0 | 0 | 0 |
| Average width | 1.102 | 1.059 | 0.752 | 0.738 | 0.487 | 0.483 | 1.183 | 1.133 | 0.828 | 0.811 | 0.653 | 0.646 |
| Mean Square Error | ||||||||||||
| Average MSE | 0.160 | 0.118 | 0.063 | 0.055 | 0.025 | 0.024 | 0.169 | 0.125 | 0.070 | 0.062 | 0.042 | 0.039 |
| Separated data sets | ||||||||||||
| Total (%) | 0.006 | 0 | 0 | 0.001 | 0 | 0 | ||||||
*only for β 1≥l o g(1)
Results simulation study IIa, maximum likelihood logistic regression only
| EPV | 15 to 30 | 35 to 50 | 55 to 150 | |||
|---|---|---|---|---|---|---|
| Separated data removed | Yes | No | Yes | No | Yes | No |
| Bias | ||||||
| Average bias | -0.097 | 2.255 | 0.083 | 0.161 | 0.051 | 0.053 |
| max | 0.091 | 7.074 | 0.127 | 0.439 | 0.084 | 0.096 |
| min | -0.556 | 0.234 | 0.050 | 0.056 | 0.048 | 0.022 |
| Average relative bias (%) | -0.087 | 2.110 | 0.079 | 0.145 | 0.048 | 0.049 |
| max | 0.091 | 5.103 | 0.095 | 0.317 | 0.061 | 0.069 |
| min | -0.401 | 0.338 | 0.069 | 0.081 | 0.032 | 0.032 |
| Coverage 90% CI | ||||||
| Average coverage (%) | 92.7 | 93.4 | 89.1 | 89.1 | 90.4 | 90.4 |
| max | 98.3 | 98.8 | 90.6 | 90.6 | 91.8 | 91.8 |
| min | 89.7 | 89.8 | 87.9 | 87.9 | 89.2 | 89.2 |
| > ± 1% nominal (%) | 75 | 75 | 50 | 37.5 | 25 | 25 |
| Average width | 4.087 | 4437.2 | 2.656 | 49.2 | 2.005 | 2.645 |
| Mean Square Error | ||||||
| Average MSE | 1.251 | 64.571 | 0.709 | 2.243 | 0.397 | 0.422 |
| Separated data sets | ||||||
| Total (%) | 13.2 | 4.2 | 0.006 | |||
Fig. 5Simulation study IIb results. Upper panel, solid line: data sets removed from analysis; Upper panel, dashed line: data sets replaced by maximum non-separated effect size. Middle panel: Firth correction. Lower panel: percentage of separated data sets by true effect size
Results simulation study IIb
| Estimator |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|
| Separation detection | NA | Tracingb | Estimatec | None | None | None | None |
| Convergence criteriona | Default | Default | Default | Default | Type I | Type II | Type III |
| Data sets removed (%) | 0 | 8.06 | 16.64 | 5.12 | 0.34 | 6.29 | 0.09 |
| Bias | 0.012 | 0.569 | 0.186 | 1.672 | 17.5 | 0.856 | 41.3 |
| Coverage 90% CI | 0.919 | 0.949 | 0.937 | 0.944 | 0.947 | 0.944 | 0.947 |
| Mean width 90% CI | 4.32 | 4.50 | 3.64 | 5018 | 13620 | 6.03 | 1135784 |
| MSE | 1.080 | 2.681 | 0.904 | 71.563 | 11532 | 319 | 173726 |
adefault: tol: 1e-8, max-iter: 25, Type I: tol: 1e-6, max-iter: 25, Type II: tol: 1e-10, max-iter:25, Type III: tol: 1e-10, max-iter:50
bcriterion: re-estimation process, variance of scaled standard errors >20 (see Appendix)
acriterion: if for any parameter >log(50)