Literature DB >> 35754666

Decoding the Magic Number: Everyone Can do it!

Abstract

Entities: Chemical

Year: 2022 PMID： 35754666 PMCID： PMC9215187 DOI： 10.4103/ijabmr.ijabmr_211_22

Source DB: PubMed Journal: Int J Appl Basic Med Res ISSN： 2229-516X

× No keyword cloud information.

Research work in the life of a majority of medical students starts at the postgraduate level with the thesis work. First month into joining as a postgraduate at the medical college and “Submit your thesis plan within a week,” ordered the guide – quite a common scenario. As a student, it would take many more weeks of struggle to find out what a thesis meant. Being on the other side of the table today, we can really understand the pitiful status of residents who join in 1st year and find out that they need to do research work. Although this whole scenario is a monstrous task, there is one very common problem encountered by almost all the students. One day you would find them running to find out the number of samples to be included in the research work. Some of them land up to the “Community Medicine” department to inquire the magic number they require to satisfy the requirements of their plan. And repeatedly over the years, our answer has been – ”I don’t have a magic wand to generate a figure. It needs to be calculated. So, sit down and answer a few of the questions.” The process of arriving at the appropriate sample size is scientific. There are a few prerequisite questions, which need to be answered and only based on that a figure can be arrived at. At the postgraduate level, this has been an ignored concept, both by the students, their supervisors as well as the evaluators. As we all agree that the thesis is probably the first research work taken up by a majority of medical doctors (barring a few who do some at the undergraduate level); hence, the basis of taking decisions regarding sample size should be clear. Otherwise, this incompetency is going to haunt you for a lifetime. Conducting research by calculated sample size helps to produce reliability and generalizability of the study results. Studies conducted using an insufficient sample size may produce erroneous results and lead to evidence that has no relevance in real situations. On the contrary, using excessive samples will lead to unnecessary wastage of resources, time, and efforts without any added benefits. Thus, there is no magic number, but the estimation has to be arrived at by following the scientific basis. The authors have tried to compile the very basic concepts and formulas based on the personal review and experience over the years of reviewing the research work in the Institute.

Basic Preconcepts

As per the research guidelines, the steps in “Methodology” section of any research plan are depicted in Figure 1.

Figure 1

Methodology of research

Methodology of research Hence, before a researcher proceeds to calculate desired sample size, he must be having clarity in the above steps. The prerequirements for sample determination are tabulated in Box 1.

Box 1

Prerequisites for sample size calculation

What is the type of study? (Determination of estimate OR hypothesis testing)

What is the primary outcome variable?

What is the estimated value of the primary outcome variable, and acceptable precision?

What is acceptable type I and II error?

What is the desired effect size?

Prerequisites for sample size calculation In order to understand the process of answering the above questions, let us go through some of the basic concepts in research and statistics.

Prerequisite 1: what is the type of study? (determination of estimate or hypothesis testing)

Action proposed

The study type has to be decided based on the design which suits best to achieve the desired aim. If the researcher just wants to describe or report some phenomenon/values in his study population, the design goes into the determination of the estimate. The sampling units in the population will be examined once. The estimate needed may be – what is the percentage of adults suffering from hypertension in the community X? or what is the prevalence of hypertension in the adults in the community X? or what is the average (mean) levels of hemoglobin (Hb) in the adolescents in the community Y? In case the denominator on which the results are to be extrapolated is known, then it qualifies for finite population; otherwise, the infinite denominator may be used. Hypothesis needs to be framed, both null and alternative, in the case of experimental or correlational observational or comparative studies. Whereas the null hypothesis is accepted in case of no observed difference, vice versa is true for the alternative hypothesis. This means, there are clearly two comparative data either from the same population or two or more different populations.

Prerequisite 2: what is the primary outcome variable?

The primary objective needs to be identified. In the above examples, they are single, i.e., prevalence of hypertension, or mean levels of Hb. For these primary objectives, the primary outcome variable needs to be chosen – percentage of population with raised blood pressure (BP) levels, or the mean and standard deviation (SD) of Hb levels of the population. The data type of this variable needs to be categorized as: • Nominal – data as qualitative categories, for example, male/female, urban/rural • Ordinal – data placed in meaningful order as categories but the difference in categories not same, for example, mild/moderate/severe, 1st/2nd/3rd, Likert-type scale data • Interval – data placed in meaningful order as well as meaningful interval and measures quantities, however, lack absolute zero, for example, the temperature on the Celsius scale • Ratio has absolute zero, meaningful ratios exist, for example, weight in grams, BP in mmHg, and pulse rate • Discrete variables can take only one value, not in between, for example, days of hospital stay • Continuous variables can take any value. Most biomedical parameters, for example, BP, age, weight, and Hb.

Prerequisite 3: what is the estimated value of the primary outcome variable, and acceptable precision?

Next, the estimate of these primary outcomes needs to be founded from the literature review. The nearest estimates in terms of age, sex, ethnicity, etc., should be preferred; for example, estimates can be like the prevalence of hypertension say 40%, or mean and SD of Hb levels say 10 ± 2 g/dl. In case it is a novel study and no estimate is available even in foreign countries, a pilot study taking 10% of the estimated population size needs to be conducted. The results projected from that pilot study can be used as estimates for further calculating the desired sample size. However, in no case, the pilot samples should be included in the main study. In case more than one primary outcome variable is there, the sample size needs to be calculated using all primary variables, and the maximum number thus calculated has to be adopted. Secondary variables need not be used to estimate sample size. Precision needs to be defined in terms of either absolute i.e. by convention taken as 5% or relative percentage of the estimated outcome, when the estimated prevalence is low.

Prerequisite 4: what is acceptable type I and II error?

The Level of Significance may be decided based on the study needs. By convention, a 95% confidence interval (CI) is taken as standard. It may be adjusted to increase or decrease depending on the researcher’s requirement. The decision of the level affects the acceptance or rejection of the null hypothesis. Hence, with 95% CI , we can say that there could be a 5% probability that the results observed are due to chance. The level of precision accepted is 5%, i.e., results so obtained have a margin of ±5% variability. Type I error (alpha) by convention is taken at 95% CI, giving z = 1.96, and Type II error (beta) by convention is taken as 20%, giving the power of 80%. This power gives us the strength to generalize our study findings to the population at large. Still, we have 20% chances that we have missed a significant difference, though it really existed.

Prerequisite 5: what is the desired effect size?

Desired effect size needs to be decided based on the type of study design again. It actually indicates the magnitude of the relationship between the two variables in the study. Cohen guide for effect size <0.1 is considered small, 0.3–0.5 as medium, and >0.5 is considered as moderate-to-large difference. However, effect size and sample size are inversely proportional; hence deciding on an appropriate clinically significant level again affects the calculation of sample size. A lot of software and online calculators, both free and paid, are available these days which would calculate the sample size at a click of a button, however, which calculator needs to be used has to be decided by the researcher, again depending on the answers to the above five questions. Hence, the calculator would be asking you to fill some values, based on which it would give you the answer and the formula used, which can be further quoted in the justification of the calculation.

Sample Size Calculations

Let us now delineate the step-wise formulas to calculate sample sizes manually.[12345678910111213] This would give you a better understanding of checking whether the online calculator is providing you the right numbers. The sample size estimation for cross-sectional or descriptive studies, case–control studies, cohort studies, and comparative studies is given in Tables 1-5, respectively.

Table 1

Sample size estimation for Cross-sectional or descriptive studies

Population	Primary objective	Data type	Explanation
Infinite	Calculating proportion/prevalence	Nominal/ Ordinal	z₁-α/2=Critical value and a standard value for the corresponding level of confidence. (At 95% CI or 5% level of significance (type-I error) it is 1.96 and at 99% CI it is 2.58) p=Expected prevalence or based on previous researc q=1-p d=Margin of error or precision (Commonly taken as 5%) of the expected prevalence
Infinite	Calculating a mean value	Interval/ Ratio	z₁-α/2=Standardized value for the corresponding level of confidence. (At 95% CI, it is 1.96 and at 99% CI or 1% type I error it is 2.58) d=Margin of error or rate of precision of the expected SD σ=SD which is based on previous study or pilot study
Finite			N=Total population d=Margin of error or precision

CI=Confidence interval, SD=Standard deviation

Table 5

Constant values

C	z_1-α/2 (0.05)	z_1-α/2 (0.01)
z_1-β(0.8)	7.85	11.68
z_1-β(0.9)	10.51	14.88

C: Constant value

Sample size estimation for Cross-sectional or descriptive studies CI=Confidence interval, SD=Standard deviation Sample size estimation for Case-control studies CI=Confidence interval, SD=Standard deviation Sample size estimation for Cohort studies CI=Confidence interval Sample size estimation for Comparative studies SD=Standard deviation, CI=Confidence interval Constant values C: Constant value Few of the constant values used in these formulas are given in Table 5. Many of the thesis research taken up is also related to diagnostic test evaluation. Here, the estimation of sensitivity and specificity is the study outcome. The manual calculation of these parameters is a little elaborate and complex to be taken up in this article; however, an online link for the calculation of sample size has been provided below.

Quick Finger Resources

Some of the sample calculator software and Internet links are provided for easy use by beginners [Box 2].[14151617181920] The only word of caution in using these is that the machine will calculate what you feed into it. Hence, the commands fed should be correct to get the right answers.

Box 2

Electronic resources

Software name	Link	Paid/free
Epi info	www.openepi.com	Free
IBM SPSS	https://www.ibm.com/in-en/analytics/spss-statistics-software	Paid
Raosoft software	http://www.raosoft.com/samplesize.html	Free
P value: A statistical tool app	https://play.google.com/store/apps/details?id=com.drkusumgaur.pvalue	Free on the play store for Android
Sample size calculators for designing clinical research	https://sample-size.net/	Free
Statulator – online statistical calculator	https://statulator.com/	Free
Sample size calculator by Wan Nor Arifin for diagnostic tests	http://wnarifin.github.io	Free

Electronic resources We have tried to simplify the calculation of sample size for beginner researchers as well as early faculty researchers. Going step by step will enable the researcher to reach the scientifically appropriate sample size and quote it for the justification of achieved sample numbers. Conducting the study using a scientifically valid sample size will strengthen the results of the research work.

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

Table 2

Sample size estimation for Case-control studies

Parameter of study	Data type	Desired number of samples per group	Explanation
Proportion	Nominal/ordinal data		r=Control to cases ratio (1 if same numbers of subject in both groups) p=Proportion of population=(P₁+P₂)/2 Z₁-β=It is the desired power (0.84 for 80% power and 1.28 for 90% power) z₁-α/2=Critical value and a standard value for the corresponding level of confidence. (At 95% CI or 5% type I error it is 1.96 and at 99% CI or 1% type I error it is 2.58) P₁=Proportion in cases P₂=Proportion in controls
Mean	Interval/ratio		r=Control to cases ratio Z₁-β=It is the desired power (0.84 for 80% power and 1.28 for 90% power) z₁-α/2=Critical value and a standard value for the corresponding level of confidence. (At 95% CI it is 1.96 and at 99% CI or 1% type I error it is 2.58) σ=SD which is based on a previous study or pilot study d=Effect size (difference in the means from previous studies or pilot study)

CI=Confidence interval, SD=Standard deviation

Table 3

Sample size estimation for Cohort studies

Desired number of samples per group	Explanation
	m=Number of subjects (control) per experimental subject Z₁-β=It is the desired power (0.84 for 80% power and 1.28 for 90% power) z₁-α/2=Critical value and a standard value for the corresponding level of confidence. (At 95% CI it is 1.96 and at 99% CI or 1% type I error it is 2.58) p₀=Possibility of event in controls p₁=Possibility of event in experimental p=p₁+m p₀/m+1

CI=Confidence interval

Table 4

Sample size estimation for Comparative studies

Parameter of study	Data type	Sample size for one group that we need to find out	Explanation
Proportion	Nominal/ordinal		p₁ and p₂=Proportion in two groups C=Standard value for the corresponding level of α and β selected for the study. It is given in table 5
Mean	Interval/ratio		d=difference in means of two group (effect size) σ₁=SD of Group 1 σ₂=SD of Group 2 Z₁-β=It is the desired power z₁-α/2=Critical value and a standard value for the corresponding level of confidence. (At 95% CI it is 1.96 and at 99% CI, or 1% type I error it is 2.58)
Mean	Continuous variable	1+2C (SD/d)²	d=Detected difference in means of two group (effect size) σ=Common SD C=Constant value depends on the value of α andβselected for the study. It is given in below Table 5

SD=Standard deviation, CI=Confidence interval

10 in total

Decoding the Magic Number: Everyone Can do it!

Basic Preconcepts

Prerequisite 1: what is the type of study? (determination of estimate or hypothesis testing)

Action proposed

Prerequisite 2: what is the primary outcome variable?

Prerequisite 3: what is the estimated value of the primary outcome variable, and acceptable precision?

Prerequisite 4: what is acceptable type I and II error?

Prerequisite 5: what is the desired effect size?

Sample Size Calculations

Quick Finger Resources

Financial support and sponsorship

Conflicts of interest

Review 1. Sample size for ophthalmology studies.

2. Determining the sample size in a clinical trial.

3. Sample size determination.

4. How to calculate sample size in randomized controlled trial?

5. Sample Size in Qualitative Interview Studies: Guided by Information Power.

6. Sample size calculations: basic principles and common pitfalls.

Review 7. Requirements for Minimum Sample Size for Sensitivity and Specificity Analysis.

8. Sample size estimation in clinical trial.

Review 9. How to calculate sample size for different study designs in medical research?

10. How to calculate sample size in animal studies?