| Literature DB >> 28746358 |
Abstract
I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: "random chance," which is based on probability sampling, "minimal information," which yields at least one new code per sampling step, and "maximum information," which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.Entities:
Mesh:
Year: 2017 PMID: 28746358 PMCID: PMC5528901 DOI: 10.1371/journal.pone.0181689
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
An overview of the main concepts, definitions and symbols.
| Concept | Definition | Symbol |
|---|---|---|
| Information source | The unit from which information is gathered | |
| Population | The total set of information sources that are potentially relevant to answering the research question | |
| Sub-population | A subset of information sources that are potentially relevant to answering the research question | |
| Sampling step | The number of information sources sampled so far | |
| Code | A unique piece of information in the population relevant to the research | |
| Number of codes | The number of unique pieces of information relevant to the research in the population | |
| Theoretical saturation | All codes are observed at least once. | |
| Probability of reaching theoretical saturation | The probability that each code is observed at least once | |
| Sampling steps to reach theoretical saturation | The number of sampling steps needed to observe each code at least once | |
| Mean probability of observing codes | The mean probability that a code is observed at an information source | |
| Repetitive codes | Codes that are observed more than once. | - |
| Minimum number of repetitive codes | The minimum number of times that a code needs to be observed | |
| Sampling strategy | How the researcher selects the information sources; commonly empirically based. | - |
| Sampling scenario | Three theory based scenarios on how the sampling process proceeds: random chance, minimal information, maximal information | - |
| Efficiency | The fewer sampling steps that a scenario requires to reach theoretical saturation, the more efficient it is | - |
Fig 1A schematic overview of how the algorithms for each scenario operate.
Fig 2The 95th percentile of n against for the values of k between 11 and 101.
Note that the y-axis is logarithmic. The solid black line indicates the calculated random chance’s value of n based on F11. The blue dots represent random chance, the green diamonds represent minimal information, and the red triangles represent maximal information.
Fig 3upon reaching theoretical saturation against for the values of k between 11 and 101.
The blue dots represent random chance, the green diamonds represent minimal information, and the red triangles represent maximal information.