Literature DB >> 29033884

Commentary: Psychological Science's Aversion to the Null.

Jose D Perezgonzalez¹, Dolores Frías-Navarro², Juan Pascual-Llobell².

Abstract

Entities: Chemical Disease Species

Keywords: data testing; effect size; falsificationism; hypothesis testing; null hypothesis significance testing; statistics

Year: 2017 PMID： 29033884 PMCID： PMC5626975 DOI： 10.3389/fpsyg.2017.01715

Source DB: PubMed Journal: Front Psychol ISSN： 1664-1078

× No keyword cloud information.

A commentary on Psychological Science's Aversion to the Null by Heene, M., and Ferguson, C. J. (2017). Psychological Science under Scrutiny: Recent Challenges and Proposed Solutions, eds S. O. Lilienfeld and I. D. Waldman (Chichester: John Wiley & Sons), 34–52. Heene and Ferguson (2017) contributed important epistemological, ethical and didactical ideas to the debate on null hypothesis significance testing, chief among them ideas about falsificationism, statistical power, dubious statistical practices, and publication bias. Important as those contributions are, the authors do not fully resolve four confusions which we would like to clarify. One confusion is equating the null hypothesis (H0) with randomness when “chance” actually resides in the sample. We can, indeed, read three different instances of randomness in the text: associated with the sample on pages 36 (trial performance) and 37; associated with the alternative hypothesis (HA) on page 41 (“less likely to observe mean differences…far off the true…mean difference of 0.7”); and associated with H0 throughout the text, starting on page 36. In reality, H0 simply claims a population non-effect (H0: Δ = 0) while HA claims a constant effect (e.g., HA: Δ = 0.7), their corresponding distributions assuming random sampling variation in both cases. It is in the (random) sample where “chance” resides, as by chance we may pick a sample which shows a given effect (e.g., δ = 0.3) when the true effect in the population is either “0” (H0) or “0.7” (HA). Frequentist tests only assess the probability of getting the observed sample effect under H0 while Bayesian statistics also assesses the probability of such effect under HA (e.g., Rouder et al., 2009). Therefore, the p-value does not inform about a hypothesis of chance but about the probability of the data under H0 (Fisher, 1954). A second issue confuses power with missing true effects, something explicitly expressed on page 42 but also suggested when discussing sample sizes throughout the text (p. 36 onwards). The underlying argument is that larger sample sizes allow for achieving statistical significance so that a true effect may not be missed—something which is, at the same time, portrayed as unethical, e.g., p. 36, and ludicrous, e.g., p. 44. In reality, “we cannot manipulate population effect sizes” (p. 41), as they are deemed constant in the population (e.g., HA: Δ = 0.7), and a significant result at 50% power will not be missed at 80% power. As Heene and Ferguson's Figures 3.1A,C show, power simply moves the goalposts on the real line, reducing the Type II error (β), while the larger sample size also reduces the standard error. By moving the goalposts, smaller (by chance) sample effects get associated with HA, which is a correct association as long as there is a true population effect. Thus, power is there not to prevent missing effects due to small sample sizes but to be able to justify whether we could plausibly accept H0 when results are not significant (Neyman, 1955; Cohen, 1988). A third issue is about falsificationism (pp. 35–37), which the authors argue cannot happen in psychology because we never accept H0, only reject it or fail to reject it. In reality, frequentist tests are logically based on modus tollens, the valid argument form for the falsification of statements (Perezgonzalez, 2017a). H0 is simply the contrapositive of our research hypothesis, and denying H0 allows us to affirm the latter. Therefore, frequentist tests are eminently falsificationist, attempting to disprove H0 via reductio arguments (p, α; Mayo, 2017). Indeed, H0 does not even need to be “zero” in the population: We could perfectly substitute the actual value of our HA, so that we may prove the theory false with a significant result (the “strong” test purported by Meehl, 1997). A fourth issue is whether we always need to be in the position of accepting H0 (something argued on pages 36–37). This is not necessarily so. Just testing H0 as for rejecting it is suitable when we are only interested in learning about our research hypothesis (e.g., does the treatment have an effect?—Perezgonzalez, 2016). In such context, H0 provides a precise statistical hypothesis for carrying out the test and, because the actual parameter (Δ) is unknown, it only provides informative value via its rejection (Fisher, 1954), H0 acting merely as a “straw man” (Cortina and Dunlap, 1997). This testing procedure was not only developed in the context of small samples (Fisher, 1954) but the lack of a specific HA precludes the control of Type II errors and of power. (A way forward would be to assess the effects warranted under H0—Mayo and Spanos, 2006—or to control sample size via a sensitiveness analysis—Perezgonzalez, 2017b). If we wish to be able to accept H0, then we are stating that we are also interested in the potential demise of our intervention (i.e., if the treatment has no effect, we want to make sure it is akin to placebo; Perezgonzalez, 2016). This testing seems similar to Fisher's, but it requires active control of the severity with which the alternative hypothesis is to be tested (ideally, ≥80% power; Neyman, 1955; Cohen, 1988). Such control necessarily means more information—a precise alternative hypothesis (e.g., HA: μ1 – μ2 = 0.7, vs. H0: μ1 – μ2 = 0) and a specified Type II error for HA (e.g., β = 0.20)—so that the power of the test can be managed (given α, β, and N). This approach not only allows for accepting H0 but also illustrates that power is only relevant for such purpose, not for rejecting H0. Such approach, and similar ones, have also been available since Fisher's tests of significance (e.g., Neyman and Pearson, 1928; Jeffreys, 1939). As final note, frequentist approaches only deal with the probability of data under H0 [p(D|H0)]. If we want to say anything about the (posterior) probability of the hypotheses, then a Bayesian approach is needed in order to confirm which hypothesis is most likely given both the likelihood of the data and the prior probabilities of the hypotheses themselves (Jeffreys, 1961; Gelman et al., 2013).

Author contributions

JDP initiated and drafted the general commentary. DF and JP contributed theoretical background and feedback. All authors approved the final version of the manuscript for submission.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

3 in total

1 in total

1. A feminist ethos for caring knowledge production in transdisciplinary sustainability science.

Authors: Rachel K Staffa; Maraja Riechers; Berta Martín-López
Journal: Sustain Sci Date: 2021-12-11 Impact factor: 6.367

1 in total

Commentary: Psychological Science's Aversion to the Null.

Author contributions

Conflict of interest statement

1. Bayesian t tests for accepting and rejecting the null hypothesis.

2. Commentary: The Need for Bayesian Hypothesis Testing in Psychological Science.

3. Commentary: How Bayes factors change scientific practice.

1. A feminist ethos for caring knowledge production in transdisciplinary sustainability science.