| Literature DB >> 25784889 |
Abstract
Despite frequent calls for the overhaul of null hypothesis significance testing (NHST), this controversial procedure remains ubiquitous in behavioral, social and biomedical teaching and research. Little change seems possible once the procedure becomes well ingrained in the minds and current practice of researchers; thus, the optimal opportunity for such change is at the time the procedure is taught, be this at undergraduate or at postgraduate levels. This paper presents a tutorial for the teaching of data testing procedures, often referred to as hypothesis testing theories. The first procedure introduced is Fisher's approach to data testing-tests of significance; the second is Neyman-Pearson's approach-tests of acceptance; the final procedure is the incongruent combination of the previous two theories into the current approach-NSHT. For those researchers sticking with the latter, two compromise solutions on how to improve NHST conclude the tutorial.Entities:
Keywords: Fisher; NHST; Neyman-Pearson; null hypothesis significance testing; statistical education; teaching statistics; test of significance; test of statistical hypotheses
Year: 2015 PMID: 25784889 PMCID: PMC4347431 DOI: 10.3389/fpsyg.2015.00223
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Location of a . The actual p-value conveys stronger evidence against H0 than sig ≈0.05 and can be considered highly significant.
Figure 2A conventional large difference—Cohen's .
Figure 3Sampling distributions (. MES (d = 0.8), assuming β = 0.20, is d = 0.32 (i.e., the expected difference in the population ranges between d = 0.32 and infinity).
Figure 4Neyman-Pearson's approach tests data under H. HA contributes MES and β. Differences of research interest will be equal or larger than MES and will fall within this rejection region.
Figure 5Neyman-Pearson's test in action: CV.
Equivalence of constructs in Fisher's and Neyman-Pearson's theories, and amalgamation of constructs under NHST.
| Test object | Data—P(D|H0) | = | Data—P(D|HM) |
| NHST | ➥ | Data as if testing a falsifiable hypothesis—P(H0|D) | |
| Approach | A posteriori | ≠ | A priori |
| NHST | ➥ | A posteriori, sometimes both | |
| Research goal | Statistical significance of research results | ≠ | Deciding between competing hypotheses |
| NHST | ➥ | Statistical significance, also used for deciding between hypotheses | |
| Hs under test | H0, to be nullified with evidence | ≈ | HM, to be favored against HA |
| NHST | ➥ | Both (H0 = HM) | |
| Alternative hypothesis | Not needed (implicitly, “No H0″) | ≠ | Needed. Provides ES and β |
| NHST | ➥ | HA posed as ‘No H0’ (ES and β sometimes considered) | |
| Prob. distr. of test | As appropriate for H0 | = | As appropriate for HM |
| NHST | ➥ | As appropriate for H0 | |
| Cut-off point | Sig identifies noteworthy results; can be gradated; can be corrected a posteriori | ≠ | Common to CVtest, α, β, and MES; cannot be gradated; cannot be corrected a posteriori |
| NHST | ➥ | Sig = α, can be gradated, can be corrected a posteriori | |
| Sample size calculator | None | ≠ | Based on test, ES, α, and power (1 − β) |
| NHST | ➥ | Either | |
| Statistic of interest | ≠ | CVtest ( | |
| NHST | ➥ | ||
| Error prob. | α possible, but irrelevant with single studies | ≠ | α = Type I error prob. β = Type II error prob. |
| NHST | (partly) ➥ | ||
| Result falls outside critical region | Ignore result as not significant | ≠ | Accept HMif good power; conclude nothing otherwise |
| NHST | ➥ | Either ignore result as not significant; or accept H0; or conclude nothing | |
| Result falls in critical region | Reject H0 | ≠ | Accept HA (= Reject HM in favor of HA) |
| NHST | ➥ | Either | |
| Interpretation of results in critical region | Either a rare event occurred or H0 does not explain the research data | ≠ | HA explains research data better than HM does (given α) |
| NHST | HA has been proved / is true; or H0 has been disproved / is false; or both | ||
| Next steps | Rejecting H0 does not automatically justify not H0. Replication needed, meta-analysis is useful. | ≠ | Impossible to know whether α error has been made. Repeated sampling of same population needed, Monte Carlo is useful. |
| NHST | None (results taken as definitive, especially if significant); further studies may be sometimes recommended (especially if results are not significant) |