| Literature DB >> 25054748 |
Jeovany Martínez-Mesa1, David Alejandro González-Chica2, João Luiz Bastos2, Renan Rangel Bonamigo3, Rodrigo Pereira Duquia3.
Abstract
The importance of estimating sample sizes is rarely understood by researchers, when planning a study. This paper aims to highlight the centrality of sample size estimations in health research. Examples that help in understanding the basic concepts involved in their calculation are presented. The scenarios covered are based more on the epidemiological reasoning and less on mathematical formulae. Proper calculation of the number of participants in a study diminishes the likelihood of errors, which are often associated with adverse consequences in terms of economic, ethical and health aspects.Entities:
Mesh:
Year: 2014 PMID: 25054748 PMCID: PMC4148275 DOI: 10.1590/abd1806-4841.20143705
Source DB: PubMed Journal: An Bras Dermatol ISSN: 0365-0596 Impact factor: 1.896
FIGURE 1Graphic representation of the concepts of population, target population and study population
Description of different parameters to be considered in the calculation of sample size for a study aiming at estimating the frequency of health ouctomes, behaviors or conditions
| Population size | Total population size from which the sample will be drawn and about which researchers will draw conclusions (target population) | Information regarding population size may be obtained based on secondary data from hospitals, health centers, census surveys (population, schools etc.). |
| The smaller the target population (for example, less than 100 individuals), the larger the sample size will proportionally be. | ||
| Expected prevalence of outcome or event of interest | The study outcome must be a percentage, that is, a number that varies from 0% to 100%. | Information regarding expected prevalence rates should be obtained from the literature or by carrying out a pilot-study. |
| When this information is not available in the literature or a pilot-study cannot be carried out, the value that maximizes sample size is used (50% for a fixed value of sample error). | ||
| Sample error for estimate | The value we are willing to accept as error in the estimate obtained by the study. | The smaller the sample error, the larger the sample size and the greater the precision. In health studies, values between two and five percentage points are usually recommended. |
| Significance level | It is the probability that the expected prevalence will be within the error margin being established. | The higher the confidence level (greater expected precision), the larger will be the sample size. This parameter is usually fixed as 95%. |
| Design effect | It is necessary when the study participants are chosen by cluster selection procedures. This means that, instead of the participants being individually selected (simple, systematic or stratified sampling), they are first divided and randomly selected in groups (census tracts, neighborhood, households, days of the week, etc.) and later the individuals are selected within these groups. Thus, greater similarity is expected among the respondents within a group than in the general population. This generates loss of precision, which needs to be compensated by a sample size adjustment (increase). | The principle is that the total estimated variance may have been reduced as a consequence of cluster selection. The value of the design effect may be obtained from the literature. When not available, a value between 1.5 and 2.0 may be determined and the investigators should evaluate, after the study is completed, the actual design effect and report it in their publications. |
| The greater the homogeneity within each group (the more similar the respondents are within each cluster), the greater the design effect will be and the larger the sample size required to increase precision. In studies that do not use cluster selection procedures (simple, systematic or stratified sampling), the design effect is considered as null or 1.0. |
Sample size calculation to estimate the frequency (prevalence) of sunscreen use in the population, considering different scenarios but keeping the significance level (95%) and the design effect (1.0) constant
| Health center users investigated in a single day (population = 100) | 90 | 59 | 96 | 78 | 97 | 80 | ||
| All users in the area covered by a health center (population size = 1,000) | 464 | 122 | 687 | 260 | 707 | 278 | ||
| All users from the areas covered by all health centers in a city (population size = 10,000) | 796 | 137 | 1794 | 338 | 1937 | 370 | ||
| The entire city population (N = 40.000) | 847 | 138 | 2072 | 347 | 2265 | 381 | ||
p.p.= percentage points
FIGURE 2Types of possible results when performing a hypothesis test
Description of different parameters to be considered in the calculation of sample size for a study aiming at estimating the frequency of health ouctomes, behaviors or conditions
| Type I or Alpha error | It is the probability of rejecting H0, when H0 is false in the target population. Usually fixed as 5%. | It is expressed by the p value. It is usually 5% (p<0.05). |
| For sample size calculation, the confidence level may be adopted (usually 95%), calculated as 1-Alpha. | ||
| The smaller the Alpha error (greater confidence level), the larger will be the sample size. | ||
| Statistical Power (1-Beta) | It is the ability of the test to detect a difference in the sample, when it exists in the target population. | Calculated as 1-Beta. |
| The greater the power, the larger the required sample size will be. | ||
| A value between 80%-90% is usually used. | ||
| Relationship between non-exposed/exposed groups in the sample | It indicates the existing relationship between non-exposed and exposed groups in the sample. | For observational studies, the data are usually obtained from the scientific literature. In intervention studies, the value 1:1 is frequently adopted, indicating that half of the individuals will receive the intervention and the other half will be the control or comparison group. Some intervention studies may use a larger number of controls than of individuals receiving the intervention. |
| The more distant this ratio is from one, the larger will be the required sample size. | ||
| Prevalence | Proportion of individuals with the disease (outcome) among those non-exposed to the risk factor (or that are part of the control group). | Data usually obtained from the literature. When this information is not available but there is information on general prevalence/incidence in the population, this value may be used in sample size calculation (values attributed to the control group in intervention studies) or estimated based on the following formula: PONE=pO/(pNE+(pE*PR) ) |
| where pO = prevalence of outcome; pNE = percentage of non-exposed;
pE = percentage of exposed; PR = prevalence | ||
| Expected prevalence | Relationship between the prevalence | It is the value that the investigators intend to find as
HA, with the corresponding H0 equal to one (similar prevalence |
| Usually, a value between 1.50 and 2.00 is used (exposure as risk factor) or between 0.50 and 0.75 (protective factor). | ||
| For intervention studies, the clinical relevance of this value should be considered. | ||
| The smaller the prevalence rate (the smaller the expected difference between the groups), the larger the required sample size. | ||
| Type of statistical test | The test may be one-tailed or two-tailed, depending on the type of the HA. | Two-tailed tests require larger sample sizes |
It may be prevalence, incidence or risk, according to type of study;
Non-exposed or control group;
Ho - null hypothesis; Ha - alternative hypothesis
Sample size calculation to estimate the frequency (prevalence) of sunscreen use in the population, considering different scenarios but keeping the significance level (95%) and the design effect (1.0) constant
| Female: 56%(E) | n=1298 | n=388 | n=487 | n=134 | n=136 | n=28 | |||
| Male:44%(NE) | n=1738 | n=519 | n=652 | n=179 | n=181 | n=38 | |||
| White: 82%(E) | n=2630 | n=822 | n=970 | n=276 | n=275 | n=49 | |||
| Other: 18%(NE) | n=3520 | n=1100 | n=1299 | n=370 | n=368 | n=66 | |||
| 0-4 years: 25%(E) | n=1340 | n=366 | n=488 | n=131 | n=138 | ND | |||
| >4 anos: 75%(NE) | n=1795 | n=490 | n=654 | n=175 | n=184 | ND | |||
| ≤133: 50%(E) | n=1228 | n=360 | n=458 | n=124 | n=128 | n=28 | |||
| >133: 50%(NE) | n=1644 | n=480 | n=612 | n=166 | n=170 | n=36 | |||
E=exposed group; NE=non-exposed group; r=NE/E relationship; PONE=prevalence of outcome in the non-exposed group (percentage of positives in non-exposed group), estimated based on formula from chart 3, considering an PR of 1.50; PR=prevalence ratio/incidence or expected relative risk; n= minimum necessary sample size; ND=value could not be determined, as prevalence of outcome in the exposed would be above 100%, according to specified parameters.