| Literature DB >> 29276692 |
M G M Kok1, M W J de Ronde1,2, P D Moerland2, J M Ruijter3, E E Creemers4, S J Pinto-Sietsma1,2.
Abstract
Since the discovery of microRNAs (miRNAs), circulating miRNAs have been proposed as biomarkers for disease. Consequently, many groups have tried to identify circulating miRNA biomarkers for various types of diseases including cardiovascular disease and cancer. However, the replicability of these experiments has been disappointingly low. In order to identify circulating miRNA candidate biomarkers, in general, first an unbiased high-throughput screen is performed in which a large number of miRNAs is detected and quantified in the circulation. Because these are costly experiments, many of such studies have been performed using a low number of study subjects (small sample size). Due to lack of power in small sample size experiments, true effects are often missed and many of the detected effects are wrong. Therefore, it is important to have a good estimate of the appropriate sample size for a miRNA high-throughput screen. In this review, we discuss the effects of small sample sizes in high-throughput screens for circulating miRNAs. Using data from a miRNA high-throughput experiment on isolated monocytes, we illustrate that the implementation of power calculations in a high-throughput miRNA discovery experiment will avoid unnecessarily large and expensive experiments, while still having enough power to be able to detect clinically important differences.Entities:
Keywords: Array; Biomarkers; High-throughput screens; Methodology; MicroRNA; Small sample size error
Year: 2017 PMID: 29276692 PMCID: PMC5737945 DOI: 10.1016/j.bdq.2017.11.002
Source DB: PubMed Journal: Biomol Detect Quantif
Both numbers of false-negative and false-positive results increase with a decreasing sample size.
| 5 vs 5 | 10 vs 10 | 15 vs 15 | 20 vs 20 | 25 vs 25 | |
|---|---|---|---|---|---|
| A. # of subsamples with >10 miRNAs differentially expressed | 145/10,000 | 127/10,000 | 93/10,000 | 36/10,000 | 9/10,000 |
| B. Highest # of differentially expressed miRNAs (from 461) identified in one subsample | 190 | 176 | 201 | 105 | 13 |
| C. Mean # of miRNAs differentially expressed between patient and control | 47/100 | 73/100 | 85/100 | 91/100 | 93/100 |
Results of the subsampling experiments. The column header indicates the number of patients and controls randomly sampled 10,000 times from the original dataset (A-B) and the perturbed dataset (C). A) Number of subsamples (out of 10,000) from the original dataset with at least 10 differentially expressed miRNAs between patients and controls. B) Maximal number of differentially expressed miRNAs in any of the 10,000 subsamples from the original dataset. C) Mean number of miRNAs (out of 100) that were differentially expressed in subsamples of the perturbed dataset. Differential expression corresponds to a Benjamin-Hochberg false discovery rate adjusted p-value <0.1. # = number.
Fig. 1Inflation of the effect size in small sample size studies.
Fold changes of the most significant miRNA in each of 10,000 subsamples from the original dataset for five different sample sizes (n = 5, 10, 15, 20, 25). In subsamples of 5 versus 5 individuals the heterogeneity in effect sizes is larger compared to subsamples of 25 versus 25 individuals, with larger observed fold changes.
Fig. 2Flowchart for setup of a miRNA biomarker experiment.
Fig. 3Impact of sample size and the minimally detectable effect size on power.
The graph shows the statistical power for a given sample size (per group) in different scenarios of minimally detectable effect sizes ranging from 1.20 to 2. Commonly, a power of 80% is used in power calculation. Therefore, the sample size (X-axis) at which the dashed horizontal grey line crosses the line with the desired effect size is the sample size needed to achieve enough power to detect that effect size. Effect sizes are indicated as fold-change (FC). Smaller effect sizes require a larger sample size.