| Literature DB >> 31920819 |
Dominique Makowski1, Mattan S Ben-Shachar2, S H Annabel Chen1,3,4, Daniel Lüdecke5.
Abstract
Turmoil has engulfed psychological science. Causes and consequences of the reproducibility crisis are in dispute. With the hope of addressing some of its aspects, Bayesian methods are gaining increasing attention in psychological science. Some of their advantages, as opposed to the frequentist framework, are the ability to describe parameters in probabilistic terms and explicitly incorporate prior knowledge about them into the model. These issues are crucial in particular regarding the current debate about statistical significance. Bayesian methods are not necessarily the only remedy against incorrect interpretations or wrong conclusions, but there is an increasing agreement that they are one of the keys to avoid such fallacies. Nevertheless, its flexible nature is its power and weakness, for there is no agreement about what indices of "significance" should be computed or reported. This lack of a consensual index or guidelines, such as the frequentist p-value, further contributes to the unnecessary opacity that many non-familiar readers perceive in Bayesian statistics. Thus, this study describes and compares several Bayesian indices, provide intuitive visual representation of their "behavior" in relationship with common sources of variance such as sample size, magnitude of effects and also frequentist significance. The results contribute to the development of an intuitive understanding of the values that researchers report, allowing to draw sensible recommendations for Bayesian statistics description, critical for the standardization of scientific reporting.Entities:
Keywords: Bayes factors; Bayesian; NHST; p-value; significance
Year: 2019 PMID: 31920819 PMCID: PMC6914840 DOI: 10.3389/fpsyg.2019.02767
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
FIGURE 1Bayesian indices of effect existence and significance. (A) The probability of Direction (pd) is defined as the proportion of the posterior distribution that is of the median’s sign (the size of the yellow area relative to the whole distribution). (B) The MAP-based p-value is defined as the density value at 0 – the height of the red lollipop, divided by the density at the Maximum A Posteriori (MAP) – the height of the blue lollipop. (C) The percentage in ROPE corresponds to the red area relative to the distribution [with or without tails for ROPE (full) and ROPE (95%), respectively]. (D) The Bayes factor (vs. 0) corresponds to the point-null density of the prior (the blue lollipop on the dotted distribution) divided by that of the posterior (the red lollipop on the yellow distribution), and the Bayes factor (vs. ROPE) is calculated as the odds of the prior falling within vs. outside the ROPE (the blue area on the dotted distribution) divided by that of the posterior (the red area on the yellow distribution).
FIGURE 2Impact of sample size on the different indices, for linear and logistic models, and when the null hypothesis is true or false. Gray vertical lines for p-values and Bayes factors represent commonly used thresholds.
Sensitivity to sample size.
| 0.166 | 0.008 | 0.157 | 0.020 | |
| 0.171 | 0.013 | 0.154 | 0.024 | |
| 0.239 | 0.002 | 0.238 | 0.032 | |
| ROPE (95%) | 0.033 | 0.359 | 0.008 | 0.310 |
| ROPE (full) | 0.025 | 0.363 | 0.016 | 0.315 |
| Bayes factor (vs. 0) | 0.198 | 0.116 | 0.116 | 0.141 |
| Bayes factor (vs. ROPE) | 0.152 | 0.136 | 0.078 | 0.180 |
FIGURE 3Impact of noise. The noise corresponds to the standard deviation of the Gaussian noise that was added to the generated data. It is related to the magnitude of the parameter (the more noise there is, the smaller the coefficient). Gray vertical lines for p-values and Bayes factors represent commonly used thresholds. The scale is capped for the Bayes factors as these extend to infinity.
Sensitivity to noise.
| 0.35 | 0.40 | |
| 0.36 | 0.40 | |
| 0.55 | 0.60 | |
| ROPE (95%) | 0.45 | 0.45 |
| ROPE (full) | 0.46 | 0.45 |
| Bayes factor (vs. 0) | 0.79 | 0.65 |
| Bayes factor (vs. ROPE) | 0.81 | 0.67 |
FIGURE 4Relationship with the frequentist p-value. In each plot, the p-value densities are visualized by the marginal top (absence of true effect) and bottom (presence of true effect) markers, whereas on the left (presence of true effect) and right (absence of true effect), the markers represent the density of the index of interest. Different point shapes, representing different sample sizes, specifically illustrate its impact on the percentages in ROPE, for which each “curve line” is associated with one sample size (the bigger the sample size, the higher the percentage in ROPE).
FIGURE 5The probability of reaching different p-value based significance thresholds (0.1, 0.05, 0.01, 0.001 for solid, long-dashed, short-dashed, and dotted lines, respectively) for different values of the corresponding Bayesian indices.
FIGURE 6Relationship between three Bayesian indices: the probability of direction (pd), the percentage of the full posterior distribution in the ROPE, and the Bayes factor (vs. ROPE).
Summary of Bayesian indices of effect existence and significance.
| Probability of Direction (pd) | Probability that an effect is of the same sign as the median’s | Proportion of the posterior distribution of the same sign than the median’s | Straightforward computation and interpretation. Objective property of the posterior distribution. 1:1 correspondence with the frequentist | Limited information favoring the null hypothesis |
| MAP-based | Relative odds of the presence of an effect against 0 | Density value at 0 divided by the density value at the mode of the posterior distribution | Straightforward computation. Objective property of the posterior distribution | Limited information favoring the null hypothesis. Relates on density approximation. Indirect relationship between mathematical definition and interpretation |
| ROPE (95%) | Probability that the credible effect values are not negligible | Proportion of the 95% CI inside of a range of values defined as the ROPE | Provides information related to the practical relevance of the effects | A ROPE range needs to be arbitrarily defined. Sensitive to the scale (the unit) of the predictors. Not sensitive to highly significant effects |
| ROPE (full) | Probability that the effect possible values are not negligible | Proportion of the posterior distribution inside of a range of values defined as the ROPE | Provides information related to the practical relevance of the effects | A ROPE range needs to be arbitrarily defined. Sensitive to the scale (the unit) of the predictors |
| Bayes factor (vs. 0) | The degree by which the probability mass has shifted away from or toward the null value, after observing the data | Ratio of the density of the null value between the posterior and the prior distributions | An unbounded continuous measure of relative evidence. Allows statistically supporting the null hypothesis | Sensitive to selection of prior distribution shape, location and scale |
| Bayes factor (vs. ROPE) | The degree by which the probability mass has into or outside of the null interval (ROPE), after observing the data | Ratio of the odds of the posterior vs. the prior distribution falling inside of the range of values defined as the ROPE | An unbounded continuous measure of relative evidence. Allows statistically supporting the null hypothesis. Compared to the BF (vs. 0), evidence is accumulated faster for the null when the null is true | Sensitive to selection of prior distribution shape, location and scale. Additionally, a ROPE range needs to be arbitrarily defined, which is sensitive to the scale (the unit) of the predictors |