| Literature DB >> 34686696 |
Estibaliz Gómez-de-Mariscal1,2, Vanesa Guerrero3, Alexandra Sneider4, Hasini Jayatilaka5, Jude M Phillip6, Denis Wirtz4,7, Arrate Muñoz-Barrutia8,9.
Abstract
Biomedical research has come to rely on p-values as a deterministic measure for data-driven decision-making. In the largely extended null hypothesis significance testing for identifying statistically significant differences among groups of observations, a single p-value is computed from sample data. Then, it is routinely compared with a threshold, commonly set to 0.05, to assess the evidence against the hypothesis of having non-significant differences among groups, or the null hypothesis. Because the estimated p-value tends to decrease when the sample size is increased, applying this methodology to datasets with large sample sizes results in the rejection of the null hypothesis, making it not meaningful in this specific situation. We propose a new approach to detect differences based on the dependence of the p-value on the sample size. We introduce new descriptive parameters that overcome the effect of the size in the p-value interpretation in the framework of datasets with large sample sizes, reducing the uncertainty in the decision about the existence of biological differences between the compared experiments. The methodology enables the graphical and quantitative characterization of the differences between the compared experiments guiding the researchers in the decision process. An in-depth study of the methodology is carried out on simulated and experimental data. Code availability at https://github.com/BIIG-UC3M/pMoSS .Entities:
Mesh:
Year: 2021 PMID: 34686696 PMCID: PMC8536742 DOI: 10.1038/s41598-021-00199-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Bootstrapping estimation of two-sided confidence interval for the mean of different normal distributions with standard deviation of 1 and mean values of 0, 0.1, 0.5 or 1. For each value of the sample size, we compute the mean of a simulated normal distribution 15,000 times. The final confidence interval is obtained by clipping of the values among the 15,000 (filled area). The dashed lines show the maximum and minimum values of the sample mean value obtained for each sample size. The information is shown both in linear and logarithmic scale.
Figure 2(a) The p-value is a random variable that depends on the sample size and can be modeled as an exponential function (, Eq. 2). For each pair of normal distributions being compared, two subsets of size n are obtained by sampling from data generated following the corresponding normal distribution. Then, these datasets are compared using the Mann–Whitney statistical test and the p-value obtained is stored. The procedure is repeated many times for each size n. The blue bars with the standard error of the mean (SEM), show the distribution of all the p-values obtained at each size n when two normal distributions of mean 0 and 0.1, and standard deviation 1 are compared. The blue curve shows the corresponding exponential fit. The magenta and yellow curves represent the resulting p(n) function when a normal distribution of mean 0 and standard deviation 1 is compared with a normal distribution of the same standard deviation and mean 0.25 and 0.5, respectively. A normal distribution with a mean of 0 and a standard deviation of 1 is compared with a normal distribution of means 0, 0.01, 0.1, 0.25, 0.5, 0.75, 1, 2 and 3 respectively. Multiple p-values are calculated for sample sizes between 2 and 2500 (Supplementary Fig. S2 in the Supplementary Information). (a) and (b) Locally weighted scatter plot smoothing (LOWESS) fit to the mean p-values (red markers in Supplementary Fig. S2 in the Supplementary Information) computed for each value of the sample size n. Likewise, an exponential function is fitted to all the simulated p-values. (b) Comparison of , with, , , and . (c) is compared with , , and . (d) and (e) Ratio between each LOWESS curve and its differential. Constant ratio and accurate exponential fits show empirically that the relationship between n and the p-value shows an exponential nature.
Figure 3Comparison of a of confidence level () and an n-dependent p-value curve. The parameter represents the minimum sample size to detect statistically significant differences among compared groups. The parameter represents the convergence point of the p-value curve. When the p-value curve expresses practical differences, the area under the red curve () is smaller than the area under the constant function () when it is evaluated between 0 and .
Figure 4Decision index for different values of parameters a and c in the function and threshold : (a) Each of the subplots is drawn for a specific value of , being the dark area the cases for which we conclude that there are meaningful differences (), and white area the rest of the cases ; (b) Colors in the image correspond to the values of for which . The black frontier shows (red box in (a)). All the values of a and c for which (practical differences) lie on the left side of this limit and, the rest, on the right. The plots shown in (a) show the influence of the parameter in a wide range of values, while the plots shown in (b) are limited to the range of values we find in this posterior experiment. The vertical dashed line indicates the cases which are the cases in which p(n) outputs a statistically significant value.
Figure 5Estimation of the p-value as a function of the size (p(n)) enables the correct discrimination between conditions. (a) The decay of p(n) (parameters a and c of the exponential fit) increases with the mean value of the normal distribution being compared with . The larger the distances between the means of the distributions, the higher the decay of the exponential function (Table 1). (b) The empirical estimation of p(n) with small datasets enables the detection of the most extreme cases: those in which the null hypothesis can be accepted, and those in which it clearly cannot. (c) The minimum data size needed to obtain statistical significance () is inverse to the mean value of the normal distributions being compared. (d) The faster the decay of p(n), the stronger the statistical significance of the tested null hypothesis. For , whenever the mean value of the normal distribution compared with is larger than 0.5 (Table 1).
Results of comparing the normal distribution with other simulated normal distributions. Parameters of the function p(n) after the exponential fit with and , for the comparison of a normal distribution with mean value 0 and standard deviation 1, and normal distributions of mean values 0, 0.01, 0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5 and 3.
| Comparison | |||||
|---|---|---|---|---|---|
| 0.256 | 0.000 | 39, 599 | 0 | ||
| 0.255 | 0.000 | 44, 237 | 0 | ||
| 0.257 | 0.000 | 1192 | 988 | 0 | |
| 0.263 | 0.010 | 185 | 165 | 1 | |
| 0.286 | 0.042 | 47 | 41 | 1 | |
| 0.304 | 0.091 | 20 | 19 | 1 | |
| 0.313 | 0.152 | 13 | 12 | 1 | |
| 0.411 | 0.344 | 7 | 6 | 1 | |
| 0.579 | 0.599 | 5 | 4 | 1 | |
| 0.738 | 0.794 | 4 | 3 | 1 | |
| 0.867 | 0.924 | 4 | 3 | 1 |
Figure 6Breast cancer cells (MDA-MB-231) were cultured in collagen and imaged under a microscope to determine if cells change shape when a chemotherapy drug (Taxol) is administered. Three different groups were compared: control (non-treated) cells, cells at 1 nM and at 50 nM Taxol. (a) The cell roundness distribution of control cells (non-treated) and cells treated at 1 nM Taxol have lower values than that of cells treated at 50 nM. (b–d) The three groups were compared, the p-values were estimated and p(n) was fitted for each pair of compared groups. When Taxol at 50 nM is evaluated (blue and yellow dashed curves), is lower and the decay of p(n) is higher (a and c parameters in Eq. 2), i.e. it decreases much faster than the one corresponding comparison of control and Taxol at 1 nM (orange curve) indicating the presence of meaningful differences between cells treated at 50 nM Taxol and the remaining groups.
Figure 7Flow cytometry data recorded to determine the transcriptional changes induced by the in vivo exposure of human eosinophils to glucocorticoids. (a) The entire dataset has a wider range of values (black box-plots) and a smaller confidence interval around the mean (black error-plots) than the distribution obtained when the median fluorescence intensity (MFI) is calculated by each of the 6 subjects (red error-plots). (b,c) There is an increase of the surface expression of CXCR4 when human eosinophils are exposed to 20 or 200 mcg/dL of Methylprednisolone. Namely, (b) the minimum size is low and the decision index when any of those conditions are compared with the vehicle condition. The minimum size when eosinophils are treated (blue circle) is not shown as it has infinite value. (c) The decay parameters a and c are almost the same in those two cases, so the markers co-localize (Supplementary Information).
Figure 8(a,b) The morphology of 2-year-old human cells is compared with the morphology of 3, 9, 16, 29, 35, 45, 55, 65, 85 and 96-year-old human cells. For both, (a) nuclei area and (b) nuclei short axis measures, the minimum size and the decay a change proportionally with the age of the donor. (c) The nuclei orientation does not characterize the age of the human donors for all the comparisons; the parameter c is null, and therefore, p(n) is constant. (d) The analysis of a small dataset is enough to determine that the total diffusivity can characterize the cellular aging in humans. The total diffusivity of 2, 3 and 9-year-old human cells are equivalent, while it differs when compared to cells from older human donors.