| Literature DB >> 31031963 |
Abstract
Science has striven to do better since its inception and has given us good philosophies, methodologies and statistical tools that, in their own way, do reasonably well for purpose. Unfortunately, progress has also been marred by historical clashes among perspectives, typically between frequentists and Bayesians, leading to troubles such as the current reproducibility crises. Here I wish to propose that science could do better with more resilient structures, more useful methodological tutorials, and clearer signaling regarding how much we can trust what it produces.Entities:
Keywords: methodology; philosophy of science; statistics
Year: 2018 PMID: 31031963 PMCID: PMC6468710 DOI: 10.12688/f1000research.16358.2
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Reasonable conclusions based on frequentist and Bayesian results.
| Case | Cohen’s d | Test | p | Decision | SEV | BF | Evidence |
|---|---|---|---|---|---|---|---|
| I (2t) | 0.20 | t (44) = 0.67 | 0.507 | H 0 | 0.75 | BF 01 = 2.85 | M 0 = anecdotal |
| II (1t) | 0.80 | t (44) = 2.71 | 0.995 | H 0 | 0.99 | BF 01 = 10.96 | M 0 = strong |
| III (2t) | 0.80 | t (44) = 2.71 | 0.010 | noH 0 | 0.99 | BF 10 = 5.04 | M 1 = moderate |
| IV (1t) | -0.67 | t (31) = -1.88 | 0.965 | H 0 | 0.99 | BF 01 = 7.20 | M 0 = moderate |
| V (2t) | -0.67 | t (31) = -1.88 | 0.071 | H 0 | 0.51 | BF 10 = 1.25 | M 1 = anecdotal |
| VI (1t) | -0.93 | t (44) = -3.14 | 0.999 | H 0 | 0.99 | BF 01 = 12.08 | M 0 = strong |
| VII (2t) | -0.93 | t (44) = -3.14 | 0.003 | noH 0 | 0.99 | BF 10 = 12.70 | M 1 = strong |
Notes. Based on data from ( Vincent, 2018; Perezgonzalez & Vincent, 2019). Case: tests are one-tailed (1t) or two-tailed (2t). Cohen’s : exploratory tests assessing observed effect sizes against Cohen d = 0.5 (i.e., the sample size—n1= 23; n2 = 23—was sensitive to d ≥ 0.5, one-tailed; Perezgonzalez, 2017). Test: t-tests statistics and degrees of freedom. p: p-values from independent t-tests (Fisher’s approach, e.g., 1954). Decision: frequentist decision—noH 0 = reject H 0; H 0 = no decision—based on level of significance = 0.05 (e.g., Perezgonzalez, 2015). SEV: severity tests based on the observed effects (severity is strong if greater than 0.80; e.g., Mayo, 1996). BF: Bayes Factors with alternative model based on a Cauchy distribution (e.g., Rouder ). Evidence: Bayesian evidence in favor of the null model (M 0) or the alternative model (M 1; e.g., Wagenmakers ). The effect sizes of Cases II, IV, and VI had signs opposite to those expected (therefore, the high p’s); Cases III, V, and VII are two-tailed tests of Cases II, IV, and VI (thus, the similar d’s). Only Case V may lead a Jeffreysian to an inference contrary to those of frequentists; most likely, they would refrain from inferring support based on anecdotal posterior probabilities (e.g., Jarosz & Wiley, 2014).