| Literature DB >> 33380887 |
Ceyhan Ceran Serdar1, Murat Cihan2, Doğan Yücel3, Muhittin A Serdar4.
Abstract
Calculating the sample size in scientific studies is one of the critical issues as regards the scientific contribution of the study. The sample size critically affects the hypothesis and the study design, and there is no straightforward way of calculating the effective sample size for reaching an accurate conclusion. Use of a statistically incorrect sample size may lead to inadequate results in both clinical and laboratory studies as well as resulting in time loss, cost, and ethical problems. This review holds two main aims. The first aim is to explain the importance of sample size and its relationship to effect size (ES) and statistical significance. The second aim is to assist researchers planning to perform sample size estimations by suggesting and elucidating available alternative software, guidelines and references that will serve different scientific purposes. Croatian Society of Medical Biochemistry and Laboratory Medicine.Entities:
Keywords: biostatistics; effect size; power analysis; sample size
Mesh:
Year: 2020 PMID: 33380887 PMCID: PMC7745163 DOI: 10.11613/BM.2021.010502
Source DB: PubMed Journal: Biochem Med (Zagreb) ISSN: 1330-0962 Impact factor: 2.313
Figure 1Illustration of Type I and Type II errors.
Figure 2Nomogram for sample size and power, for comparing two groups of equal size. Gaussian distributions assumed. Standardized difference (effect size) and aimed power values are initially selected on the nomogram. The line connecting these values cross the significance level region of the nomogram. The intercept at the appropriate significance value presents the required sample size for the study. In the above example, for effect size = 1, power = 0.8 and alpha value = 0.05, the sample size is found to be 30. (Adapted from reference 16).
Sample size calculation formulas for some research methods (according to reference 17-23)
| Proportion in survey type of studies | ||
| Group mean | ||
| Two means | ||
| Two proportions | ||
| Odds ratio | ||
| Correlation coefficient | ||
| Diagnostic prognostic studies (ROC) analysis | ||
| Adequate sensitivity/specificity | ||
| Questionnaire (Survey) | ||
Software and websites that can be used for calculation of sample size and/or power analysis
| G*Power | *** | *** | Yes | |
| PS | ** | *** | Yes | |
| Piface | ** | *** | Yes | |
| PASS | **** | *** | No | |
| nQuery | *** | *** | No | |
| R packages | ||||
| pwr | *** | ** | Yes | |
| TrialSize | *** | ** | Yes | |
| PowerUpR | *** | ** | Yes | |
| powerSurvEpi | *** | ** | Yes | |
| SAS (PROC POWER) | **** | *** | No | |
| SPSS (SamplePower) | *** | *** | No | |
| STATA (power) | **** | *** | No | |
| Medcalc | * | **** | No | |
| Minitab | ** | *** | No | |
| Systat | *** | **** | No | |
| Statistica | *** | *** | No | |
| Microsoft Excel | ||||
| PowerUp | ** | *** | Yes | |
| XLSTAT | *** | *** | No | |
| GenStat | ** | *** | No | |
| Websites-Online | ||||
| Power and Sample Size | ** | *** | Yes | |
| StatCalc | ** | *** | Yes | |
| Biomath | ** | ** | Yes | |
| Openepi | ** | *** | ||
| UCSF Biostatistics | ** | *** | Yes | |
| Clincalc.com | * | *** | Yes | |
| Sample Size Calculators | ** | *** | Yes | |
| Genetic Power Calculator | *** | ** | Yes | |
| OSSE, Sample Size Estimator (for SNPs) | * | *** | Yes | |
| Surveys | ** | ** | Yes |
Thresholds for interpreting the effect size
| t-test for means | Cohen’s d | 0.2 | 0.5 | 0.8 |
| Chi-Square | Cohen’s ω | 0.1 | 0.3 | 0.5 |
| r x c frequency tables | Cramer’s V or Phi | 0.1 | 0.3 | 0.5 |
| Correlation studies | 0.2 | 0.5 | 0.8 | |
| 2 x 2 table case control | Odd Ratio (OR) | 1.5 | 2 | 3 |
| 2 x 2 table cohort studies | Risk Ratio (RR) | 2 | 3 | 4 |
| One-way an(c)ova (regression) | Cohen’s f | 0.1 | 0.25 | 0.4 |
| ANOVA (for large sample) | Eta Square ɳ2 | 0.01 | 0.06 | 0.14 |
| ANOVA (for small size) | Omega square Ω2 | |||
| Friedman test | Average spearman Rho | 0.1 | 0.3 | 0.5 |
| Multiple regression | ɳ2 | 0.02 | 0.13 | 0.26 |
| Coefficient of determination | r2 | 0.04 | 0.25 | 0.64 |
| Number needed to treat | NNT | 1 / Initial risk | ||
Figure 3Relationship between effect size and sample size. P – power. ES - effect size. SS - sample size. The required sample size increases as the effect size decreases. In all cases, P value is set to 0.8. The sample sizes (SS) when ES is 0.2, 1, or 2.5; are 788, 34 and 8, respectively. The graphs at the bottom represent the influence of change in the sample size on the power.
Figure 4Relationship between effect size and power. Two different cases are schematized where the sample size is kept constant either at 8 or at 30. When the sample size is kept constant, the power of the study decreases as the effect size decreases. When the effect size is 2.5, even 8 samples are sufficient to obtain power = ~0.8. When the effect size is 1, increasing sample size from 8 to 30 significantly increases the power of the study. Yet, even 30 samples are not sufficient to reach a significant power value if effect size is as low as 0.2.
Cohen’s d for 4–34 samples per group assuming 0.8 and 0.9 power, a 0.05 significance level and a one-sided or two-sided test (Simplified from reference 40)
| 2 | 2.35 | 2.38 | 2.77 | |
| 1.72 | 2.03 | 2.02 | 2.35 | |
| 1.54 | 1.82 | 1.8 | 2.08 | |
| 1.41 | 1.66 | 1.63 | 1.89 | |
| 1.31 | 1.54 | 1.51 | 1.74 | |
| 1.23 | 1.44 | 1.41 | 1.63 | |
| 1.16 | 1.36 | 1.32 | 1.53 | |
| 1.05 | 1.23 | 1.2 | 1.39 | |
| 0.97 | 1.14 | 1.1 | 1.27 | |
| 0.9 | 1.06 | 1.02 | 1.18 | |
| 0.85 | 1 | 0.96 | 1.11 | |
| 0.8 | 0.94 | 0.91 | 1.05 | |
| 0.76 | 0.9 | 0.86 | 1 | |
| 0.73 | 0.86 | 0.83 | 0.96 | |
| 0.7 | 0.82 | 0.79 | 0.92 | |
| 0.67 | 0.79 | 0.76 | 0.88 | |
| 0.65 | 0.76 | 0.74 | 0.85 | |
| 0.63 | 0.74 | 0.71 | 0.82 | |
| 0.61 | 0.72 | 0.69 | 0.8 |
Sample size formulas for different types of group comparison studies (According to reference 45)
| Group comparison (ANOVA) | = (10 / k) + 1 | = (20 / k) + 1 |
| One group, repeated measures (one within factor, repeated measures ANOVA) | = 10 (r - 1) + 1a,b | = 20 (r - 1) + 1a,b |
| Group comparison, repeated measures (one-between, one within factor, repeated measures ANOVA) | = (10 / kr) + 1b | = (20 / kr) + 1b |
| k - number of groups. N - number of subjects | ||
Figure 5Technical vs biological repeat.
Figure 6Interface of Online Sample Size Estimator (OSSE) Tool. (Available at: ).
Number of cases required to achieve 0.8 power according to the different genetic models and various odd ratios of heterozygotes/rare homozygotes (ORhet/ORhomo) in case-control studies
| Allelic | 1974 | 789 | 248 | 134 |
| Dominant | 606 | 258 | 90 | 53 |
| Co-Dominant | 2418 | 964 | 301 | 161 |
| Recessive | 20,294 | 8390 | 2776 | 1536 |
| Effective sample sizes are calculated according to the following assumptions: minor allele frequency is 5%, disease prevalence is 5%, there is complete linkage disequilibrium (D’ = 1), case-to-control ratio is 1:1, and the type I error rate is 5% for single marker analysis (57). | ||||
Figure 7The relationship among clinical significance, statistical significance, power and effect size. In the example above, in order to provide a clinically significant effect, a treatment is required to trigger at least 0.5 mmol/L decreases in cholesterol levels. Four different scenarios are given for a candidate treatment, each having different mean total cholesterol change and 95% confidence interval. ES - effect size. N – number of participant. Adapted from reference 65.
Proposed sample size sizes for Passing Bablok regression, (power at least 0.8, alpha = 0.05) (Simplified from reference 66)
| 2 | > 90 | 30 | < 30 | < 30 | < 30 | < 30 | < 30 | < 30 | |||||||
| 5 | > 90 | > 90 | 80 | 45 | 35 | < 30 | < 30 | < 30 | |||||||
| 7 | > 90 | > 90 | > 90 | 90 | 60 | 45 | 30 | < 30 | |||||||
| 10 | > 90 | > 90 | > 90 | > 90 | > 90 | 80 | 55 | 35 | |||||||
| 13 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | 80 | 50 | |||||||
| 2 | > 90 | 90 | 40 | < 30 | < 30 | < 30 | < 30 | < 30 | |||||||
| 5 | > 90 | > 90 | > 90 | > 90 | 85 | 65 | 40 | < 30 | |||||||
| 7 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | 80 | 45 | |||||||
| 10 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | 80 | |||||||
| 2 | > 90 | > 90 | > 90 | 75 | 50 | 35 | < 30 | < 30 | |||||||
| 5 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | > 90 | 80 | |||||||
| Slope - the steepness of a line and the intercept indicates the location where it intersects an axis. The greater the magnitude of the slope, the steeper the line and the greater the rate of change. The formula for the regression line in method comparison study is y = ax + b, where a is the slope of the line and b is the y-intercept. The range ratio (concentration of the upper limit / concentration of the lower limit). % CV - coefficient of variation (analytical precision). *Sample size values are proposed for respective slope ranges. i.e. for range ratio: 4, CV: 2%, slope range: 1.00–1.02 or 1.00–0.98 requires > 90 samples; whereas slope range: 1.04-1.06 or 0.96-0.94 requires 40 samples. Note: In this example, similar % CV values are assumed for the two methods compared. For methods having dissimilar % CV values, the researcher should refer to the reference 66. | |||||||||||||||
Necessary sample sizes for test of slope deviation from 1 or intercept deviation from zero by Deming and Weighted regression analysis
| 5104 | 1575 | 567 | 343 | 256 | 182 | 150 | 116 | 108 | |||
| 1276 | 410 | 152 | 90 | 69 | 48 | 39 | 32 | 27 | |||
| 585 | 185 | 70 | 42 | 32 | 25 | 20 | 16 | 15 | |||
| 325 | 104 | 41 | 27 | 20 | 15 | 13 | 11 | ≤ 10 | |||
| 544 | 320 | 226 | 150 | 114 | 75 | 64 | 45 | 37 | |||
| 144 | 82 | 61 | 40 | 33 | 23 | 20 | 18 | 15 | |||
| 66 | 42 | 29 | 22 | 17 | ≤ 10 | ≤ 10 | ≤ 10 | ≤ 10 | |||
| 39 | 26 | 19 | 15 | 12 | ≤ 10 | ≤ 10 | ≤ 10 | ≤ 10 | |||
| Type I error = 0.05. Power = 0.9. Standardized Δ value for slope | |||||||||||
Sample size and power values of a lot-to-lot variation studies
| Glucose | 2.77 | 0.33 | 0.055 | 0.033 | 6.0 | 0.6 | 0.6 x Cd | 1 | 0.955 |
| 8.32 | 0.83 | 0.11 | 0.08 | 7.5 | 0.75 | 0.7 x Cd | 1 | > 0.916 | |
| 16.65 | 1.66 | 0.25 | 0.19 | 6.7 | 0.78 | 0.7 x Cd | 1 | > 0.916 | |
| Cd - critical difference is the total allowable error (TAE) according to the CLIA criteria. Sr - repeatability (within-run imprecision). Swrl - within-reagent lot imprecision. Note: Sr and Swrl values should be obtained from the manufacturer. Power is calculated according to critical difference, imprecision values and sample size as explained in detail in CLSI EP 26-A. If the lot-to-lot variation results obtained from three different concentrations are lower than the rejection limits when one sample is used for each concentration (meaning method precision of the tested lots are within the acceptance limits), then the lot variation is said to remain within the acceptance range. (The actual table provided in the guideline (CSLI EP26A) is of 3 pages. Since the primary aim of this paper is to familiarize the reader with sample size estimation methodologies in different study types; for simplification, only a glucose example is included in this table. For different analytes and scenarios ( | |||||||||
Sample size estimation in method verification studies
| 20 | 85 |
| 30 | 87 |
| 40 | 90 |
| 50 | 90 |
| 100 | 91 |
| 200 | 93 |
| 500 | 93 |
| 1000 | 94 |
| N – sample size. CI – confidence interval. | |
Determining sample size in diagnostic studies
| 0.80 | 0.05 | 246 |
| 0.85 | 0.05 | 196 |
| 0.90 | 0.05 | 139 |
| 0.95 | 0.05 | 73 |
| 0.70 | 0.10 | 81 |
| 0.75 | 0.10 | 73 |
| 0.80 | 0.10 | 62 |
| 0.85 | 0.10 | 49 |
| L - desired width of one half of the confidence interval (CI), or maximum allowable error of the estimate. (95% CI for 0.05 and 90% CI for 0.10). TPF - true positive fraction. FPF - false positive fraction. Adapted from CLSI EP24-A2, reference | ||
Relationship between sample size and 95% CI of a test characteristic (sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), ratio of false-positives (FPR) and ratio of false-negatives (FNR) etc; are ratios between 0.00–1.00)
| 20 | 0.00-0.25 | 0.56-0.94 |
| 60 | 0.01-0.14 | 0.68-0.90 |
| 100 | 0.02-0.11 | 0.71-0.87 |
| 500 | 0.03-0.07 | 0.76-0.83 |
| 1000 | 0.04-0.07 | 0.77-0.82 |
| 95% CI of the test characteristic ratios of 0.05 and 0.8 are selected for illustration. | ||
Sample size estimation according to the population size (merely as rough estimates), margin of error (ME) and confidence interval (CI)
| 100 | 50 | 80 | 99 | 74 | 80 | 88 | ||
| 500 | 81 | 218 | 476 | 176 | 218 | 286 | ||
| 1000 | 88 | 278 | 906 | 215 | 278 | 400 | ||
| 10,000 | 96 | 370 | 4900 | 264 | 370 | 623 | ||
| 100,000 | 96 | 383 | 8763 | 270 | 383 | 660 | ||
| 1.000,000 | 97 | 384 | 9513 | 271 | 384 | 664 | ||
| Sample size estimation may be performed according to the actual population size, margin of error and confidence interval. Here most commonly used ME (5%) and CI (95%) levels are exemplified. A variation in ME causes a more drastic change in sample size than a variation in CI. As an example, for a population of 10,000 people, a survey with a 95% CI and 5% ME would require at least 370 samples. When CI is changed from 95% to 90% or 99%, the sample size which was 370 initially would change into 264 or 623 respectively. Whereas, when ME is changed from 5% to 10% or 1%; the sample size which was initially 370 would change into 96 or 4900 respectively. For other ME and CI levels, the researcher should refer to the equations and software provided on Table 1 and Table 2 (102). | ||||||||
The relation among prevalence, sample size and power of a study that will detect a problem after "N" number of interviews
| 0.01 | 0.05 | 0.07 | 0.1 | 0.14 | 0.18 | 0.26 | 0.39 |
| 0.02 | 0.1 | 0.13 | 0.18 | 0.26 | 0.33 | 0.45 | 0.64 |
| 0.03 | 0.14 | 0.19 | 0.26 | 0.37 | 0.46 | 0.6 | 0.78 |
| 0.04 | 0.18 | 0.25 | 0.34 | 0.46 | 0.56 | 0.71 | 0.87 |
| 0.05 | 0.23 | 0.3 | 0.4 | 0.54 | 0.64 | 0.79 | 0.92 |
| 0.10 | 0.41 | 0.52 | 0.65 | 0.79 | 0.88 | 0.96 | > 0.99 |
| 0.15 | 0.56 | 0.68 | 0.8 | 0.91 | 0.96 | > 0.99 | > 0.99 |
| 0.20 | 0.67 | 0.79 | 0.89 | 0.96 | 0.99 | > 0.99 | > 0.99 |
| 0.25 | 0.76 | 0.87 | 0.94 | 0.99 | > 0.99 | > 0.99 | > 0.99 |
| 0.30 | 0.83 | 0.92 | 0.97 | > 0.99 | > 0.99 | > 0.99 | > 0.99 |
| When prevalence is low, higher sample size is required to reach sufficient power. I.e. for a prevalence of 0.2, even 10 interviews | |||||||