| Literature DB >> 26964707 |
Emmanuel O Ogundimu1, Douglas G Altman2, Gary S Collins2.
Abstract
OBJECTIVES: The choice of an adequate sample size for a Cox regression analysis is generally based on the rule of thumb derived from simulation studies of a minimum of 10 events per variable (EPV). One simulation study suggested scenarios in which the 10 EPV rule can be relaxed. The effect of a range of binary predictors with varying prevalence, reflecting clinical practice, has not yet been fully investigated. STUDY DESIGN ANDEntities:
Keywords: Cox model; Events per variable; External validation; Predictive modeling; Resampling study; Sample size
Mesh:
Year: 2016 PMID: 26964707 PMCID: PMC5045274 DOI: 10.1016/j.jclinepi.2016.02.031
Source DB: PubMed Journal: J Clin Epidemiol ISSN: 0895-4356 Impact factor: 6.437
Descriptive statistics for the predictors (n = 1,973,511)
| Variable | Mean ± std. dev. | Frequency |
|---|---|---|
| Body mass index (BMI) | 26.25 ± 4.41 | |
| Age | 48.66 ± 14.09 | |
| Sex | Male: 0.49; female: 0.51 | |
| Cholesterol ratio (RATIO) | 4.04 ± 1.31 | |
| Systolic blood pressure (SBP) | 131.84 ± 20.34 | |
| Treatment of hypertension (HYPER) | No: 0.95; yes: 0.05 | |
| Type 2 diabetes (TYPE2) | No: 0.98; yes: 0.02 | |
| Smoking status (SMK) | Nonsmoker: 0.55; former smoker: 0.18 | |
| Light smoker: 0.07 | ||
| Moderate smoker: 0.11 | ||
| Heavy smoker: 0.10 | ||
| Family history of coronary | ||
| Heart disease (FHCVD) | No: 0.96; yes: 0.04 | |
| Rheumatoid arthritis (BRA) | No: 0.99; yes: 0.01 | |
| Atrial fibrillation (BAF) | No: 0.99; yes: 0.01 | |
| Renal disease (RENAL) | No: 1.00; yes: 0.00 |
Cox model with 12 covariates fitted to the THIN data
| Predictor | Estimate | Standard error (SE) | |
|---|---|---|---|
| BMI | 0.0233 | 0.0001 | 298.85 |
| Age | 0.0725 | 0.0003 | 258.37 |
| Sex | 0.4667 | 0.0068 | 68.86 |
| RATIO | 0.0410 | 0.0010 | 40.89 |
| SBP | 0.0069 | 0.0002 | 41.48 |
| HYPER | 0.2278 | 0.0071 | 32.04 |
| TYPE2 | 0.5174 | 0.0137 | 37.75 |
| SMK | 0.3964 | 0.0181 | 21.92 |
| FHCVD | 0.8959 | 0.0391 | 22.90 |
| BRA | 0.2991 | 0.0265 | 11.27 |
| BAF | 0.5293 | 0.0490 | 10.80 |
| RENAL | 0.4919 | 0.0599 | 8.21 |
Abbreviations: THIN, The Health Improvement Network; BMI, body mass index; RATIO, cholesterol ratio; SBP, systolic blood pressure; HYPER, hypertension; TYPE2, type 2 diabetes; SMK, smoking status; FHCVD, family history of coronary heart disease; BRA, rheumatoid arthritis; BAF, atrial fibrillation; RENAL, renal disease.
Number and percentage of occasions in which each variable was statistically significant at 0.05 level of significance using the three-predictor model
| Variable | EPV = 2 | EPV = 5 | EPV = 10 | EPV = 15 | EPV = 20 | EPV = 25 | EPV = 50 |
|---|---|---|---|---|---|---|---|
| Converged | 970 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 | 1,000 |
| BMI | 79 (8.1) | 120 (12.0) | 165 (16.5) | 202 (20.2) | 230 (23.0) | 284 (28.4) | 462 (46.2) |
| Age | 732 (75.5) | 991 (99.1) | 1,000 (100.0) | 1,000 (100.0) | 1,000 (100.0) | 1,000 (100.0) | 1,000 (100.0) |
| Sex | 36 (3.7) | 138 (13.8) | 267 (26.7) | 399 (39.9) | 478 (47.8) | 574 (57.4) | 852 (85.2) |
Abbreviations: EPV, events per variable; BMI, body mass index.
Number of models that converged out of 1,000 samples.
Fig. 1Number of events per variable and average percent relative bias for the variables in the data set.
Fig. 2Ratio of model variance to sample variance for the variables in the data set.
Fig. 3Proportion of simulations in which the 95% confidence interval about the simulated regression coefficient includes the “true” value for the variables in the data set.