Literature DB >> 31031963

Three more steps toward better science.

Jose D Perezgonzalez1.   

Abstract

Science has striven to do better since its inception and has given us good philosophies, methodologies and statistical tools that, in their own way, do reasonably well for purpose. Unfortunately, progress has also been marred by historical clashes among perspectives, typically between frequentists and Bayesians, leading to troubles such as the current reproducibility crises. Here I wish to propose that science could do better with more resilient structures, more useful methodological tutorials, and clearer signaling regarding how much we can trust what it produces.

Entities:  

Keywords:  methodology; philosophy of science; statistics

Year:  2018        PMID: 31031963      PMCID: PMC6468710          DOI: 10.12688/f1000research.16358.2

Source DB:  PubMed          Journal:  F1000Res        ISSN: 2046-1402


Science has striven to do better since its inception. For example, empiricism was sought as an alternative mode of learning as early as the XVI Century ( Ball, 2012); XIX Century researchers sought a less subjective approach to learning from data via frequentist statistics, which progressively displaced Bayesian inference ( Gigerenzer ); in the XX Century, seeking a better way of establishing causation, Fisher (e.g., 1954) popularized a consistent framework of experimental design and frequentist inference based on small samples; Neyman & Pearson (e.g., 1928) expanded on Fisher’s statistical innovations to bring about more control of research power; Jeffreys (e.g., 1961) countered with a more nuanced approach toward evidential support for hypotheses via his Bayes factor; Cohen (1988) veered the focus away from significance testing and toward practical importance with his seminal work on effect sizes and power analyses; Mayo (e.g., 2018) is nowadays popularizing a framework based on severity testing for better frequentist inference; and computational advancements are giving full Bayesian inference a new opportunity to claw back the territory lost since the XX Century ( McGrayne, 2012). Such historical drive has given us good tools for purpose, including philosophies and methodologies, as well as statistical tools for exploratory data analyses, data testing, hypothesis testing, and replication research. The path has not been easy, with a lot of effort gone onto warring among different philosophies, methodologies, and statistical approaches, and leading to troubles such as the current reproducibility crises (e.g., Fanelli, 2018). Still, most approaches have been put forth and defended on the common goal of bettering science and, in their own way, all do so reasonably well. For example, Table 1 summarizes results obtained using different testing approaches, all concluding with similar inferences. Therefore, the real “enemy” is not what makes for better science but what makes for worse science: namely, problems with methodological control, with the misunderstanding and misuse of statistics, and with unsupported conclusions (i.e., with ethical concerns and with the use of scientific methods in a pseudoscientific manner; Perezgonzalez & Frías-Navarro, 2018).
Table 1.

Reasonable conclusions based on frequentist and Bayesian results.

CaseCohen’s dTestpDecisionSEVBFEvidence
I (2t)0.20t (44) = 0.670.507H 0 0.75BF 01 = 2.85M 0 = anecdotal
II (1t)0.80t (44) = 2.710.995H 0 0.99BF 01 = 10.96M 0 = strong
III (2t)0.80t (44) = 2.710.010noH 0 0.99BF 10 = 5.04M 1 = moderate
IV (1t)-0.67t (31) = -1.880.965H 0 0.99BF 01 = 7.20M 0 = moderate
V (2t)-0.67t (31) = -1.880.071H 0 0.51BF 10 = 1.25M 1 = anecdotal
VI (1t)-0.93t (44) = -3.140.999H 0 0.99BF 01 = 12.08M 0 = strong
VII (2t)-0.93t (44) = -3.140.003noH 0 0.99BF 10 = 12.70M 1 = strong

Notes. Based on data from ( Vincent, 2018; Perezgonzalez & Vincent, 2019). Case: tests are one-tailed (1t) or two-tailed (2t). Cohen’s : exploratory tests assessing observed effect sizes against Cohen d = 0.5 (i.e., the sample size—n1= 23; n2 = 23—was sensitive to d ≥ 0.5, one-tailed; Perezgonzalez, 2017). Test: t-tests statistics and degrees of freedom. p: p-values from independent t-tests (Fisher’s approach, e.g., 1954). Decision: frequentist decision—noH 0 = reject H 0; H 0 = no decision—based on level of significance = 0.05 (e.g., Perezgonzalez, 2015). SEV: severity tests based on the observed effects (severity is strong if greater than 0.80; e.g., Mayo, 1996). BF: Bayes Factors with alternative model based on a Cauchy distribution (e.g., Rouder ). Evidence: Bayesian evidence in favor of the null model (M 0) or the alternative model (M 1; e.g., Wagenmakers ). The effect sizes of Cases II, IV, and VI had signs opposite to those expected (therefore, the high p’s); Cases III, V, and VII are two-tailed tests of Cases II, IV, and VI (thus, the similar d’s). Only Case V may lead a Jeffreysian to an inference contrary to those of frequentists; most likely, they would refrain from inferring support based on anecdotal posterior probabilities (e.g., Jarosz & Wiley, 2014).

Notes. Based on data from ( Vincent, 2018; Perezgonzalez & Vincent, 2019). Case: tests are one-tailed (1t) or two-tailed (2t). Cohen’s : exploratory tests assessing observed effect sizes against Cohen d = 0.5 (i.e., the sample size—n1= 23; n2 = 23—was sensitive to d ≥ 0.5, one-tailed; Perezgonzalez, 2017). Test: t-tests statistics and degrees of freedom. p: p-values from independent t-tests (Fisher’s approach, e.g., 1954). Decision: frequentist decision—noH 0 = reject H 0; H 0 = no decision—based on level of significance = 0.05 (e.g., Perezgonzalez, 2015). SEV: severity tests based on the observed effects (severity is strong if greater than 0.80; e.g., Mayo, 1996). BF: Bayes Factors with alternative model based on a Cauchy distribution (e.g., Rouder ). Evidence: Bayesian evidence in favor of the null model (M 0) or the alternative model (M 1; e.g., Wagenmakers ). The effect sizes of Cases II, IV, and VI had signs opposite to those expected (therefore, the high p’s); Cases III, V, and VII are two-tailed tests of Cases II, IV, and VI (thus, the similar d’s). Only Case V may lead a Jeffreysian to an inference contrary to those of frequentists; most likely, they would refrain from inferring support based on anecdotal posterior probabilities (e.g., Jarosz & Wiley, 2014). Such enemy will be difficult to defeat. On first impression, science seems to suffer the fate of the ‘tragedy of the commons’, the ‘free-rider dilemma’ being, perhaps, its most specific affliction ( Fisher, 2008). A recent book by Taleb (2018) on asymmetry sheds some light on the gaming element of science, namely on its misuse of analytical models, agency problems, asymmetric information sharing, and the rationality of the enterprise. Taleb also proposes three solutions that we could expand upon to provide a synergic path for how to go about bettering science ( Perezgonzalez, 2018). Firstly, there is a need to make ‘scientific structures’ more resilient, for them to deliver the outcomes they were set up for: widespread accessibility and quality control. For example, open access publishing is nowadays countering the paywall limitations of traditional scientific publishing and its bias toward novel research with significant results, thus addressing important academic and social backlashes ( Kelly, 2018; Schiltz, 2018). Unfortunately, it has also motivated the rise of predatory journals catering for the same pool of conscientious researchers. To counter the explosion of these predatory journals some idiosyncratic blacklists (e.g., the defunct Beall’s list) and organizational whitelists (e.g., Directory of Open Access Journals) have been created, albeit with mix success. Meanwhile, online repositories and preprint servers are challenging the entry costs of open access journals, thus making widespread communication more resilient but with the drawback of lacking good quality control—although overlay journals are taking care of the latter drawback. Quality control itself has received more attention of lately, with some journals becoming more transparent about who peer-reviews, while platforms such as Publons.com provide peer-review services and credit, including access to peer-reviews when allowed. Among quality-control structures is worth mentioning F1000Research, a publication platform that sits at the fringe of a paid preprint and a fully transparent peer-reviewed open access journal. This seems a more resilient structure worthy of emulation and improvement. Perhaps more importantly, a new need is becoming imperative: To find an effective solution to the indexing and curation of the ever expanding universe of research outputs. We do have, for example, Altmetric.com, albeit it is too geared toward scoring research outputs. Instead, what we need is an integrated solution to the indexing of both an output and all related content relevant to it, including post-publication reviews, comments in blogs and preprint servers, retraction notices, and the like. We also need a good solution to curating the entire spectrum of research outputs, moving from a plethora of stand-alone manuscripts toward mega-content organized as, for example, research topics. Secondly, ‘minority movements’ do have an impact on science via creating the above new structures (e.g., open access, repositories…), but also by improving on legacy ones (e.g., post-publication review sites such as PubPeer.com). Paramount among such movements have been those calling for Open Science (e.g., Banks ) and research ethics (e.g., Committee on Publications Ethics, RetractionWatch.com). Minority movements also have an impact on other aspects of science, from calls toward a better use of frequentist statistics ( Perezgonzalez, 2015) to the outright banning of p-values ( Trafimow & Marks, 2015), to the alternative use of Bayesian statistics ( Wagenmakers ) or mixed approaches ( Perezgonzalez & Frías-Navarro, 2018). Because of the intrinsic social dynamics of minority groups, the polarization of inter-group attitudes and consequential external warring are not only unsurprising but also expected. Yet, as alternative scientific approaches mostly have a different research focus, science has been less productive than it could be because more effort has been put into warring among factions than into clearly explaining what each provides to the advancement of science ( Mayo, 2018). This has allowed specific methodological knowledge to be too much textbook-based, thereby more aligned with editorial concerns than with the advancement of science ( Gigerenzer, 2004), or to be polarized by the intrinsic dynamics of minority groups. Thus, what we presently need are good tutorials on the purpose of each approach and on how to effectively use them for such purpose; preferably, tutorials which are independently created by unfettered authors rather than centrally abridged by textbook editors, so as to provide a diversity of options able to address the same topic from different perspectives and to cater to different stakeholders (e.g., researchers, reviewers, and readers; novice and experts; technically-focused, philosophically-aware, as well as practitioners; etc.—see also the STRATOS Initiative, already working toward a similar goal, Stratos-initiative.org). Such diversity will also allow for progressively developing optimal tutorials that minimize steep learning curves, capture methodological errors, and avoid philosophical and interpretive misconceptions. Finally, there is the need to signal how much ‘soul is in the game’ in each piece of published research. The pre-registration movement is achieving this via badges; most journals require authors to signal adherence to ethical principles via the corresponding disclaimers; some journals actively signal their peer-reviewing by naming peer-reviewers—e.g., Frontiersin.com, F1000Research.com—or by allowing open access to peer-reviews—e.g., via Publons.com. What we are presently lacking is good signaling to address methodological concerns and the avoidance of pseudoscience. That is, for authors to signal that they have followed, for example, Fisher’s approach to data testing, or Neyman-Pearson’s approach, or Mayo’s severity approach, or Jeffreys’s approach, or a full Bayesian approach; in brief, for them to signal when their research is compliant with the requisites of any of those approaches. The purpose of this signaling is to prevent what Farrington (1961, p. 311) already denounced, that “. . . there is no human knowledge which cannot lose its scientific character when [we] forget the conditions under which it originated, the questions which it answered, and the function it was created to serve”. This signaling could work in a manner similar to when authors specify a creative commons license for an open-access document: for a particular manuscript researchers could signal the specific methodological approach followed. This, of course, calls for negotiating the appropriate standards and for hosting them for quick referencing both by prospective authors and their peers. In brief, following from the ideas of Taleb (2018), science could do better with more resilient structures, with more useful methodological tutorials, and with good signaling regarding how much we can trust what it produces. Thus my overall recommendation: let’s veer the focus from warring and onto improving our structures, tutorials and signals.

Data availability

No data is associated with this article. No further comments. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. No further comments I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. I think this paper could benefit from a minor revision. Some of the statements in the paper are a little vague. Most of the "meat" of the paper comes in the penultimate paper. It might be worth separating out and clearly stating the specific recommendations. Table 1 presented SEV values. However, SEV is always with respect to a specific inference and an observation (an observation). For example, for an observation of e.g., mu = 17, the inference mu1 = 12 and the inference mu1 = 14 would have different SEV values, but Table 1 gives no indication of what inference is being made (I assume the inference is the same as the observed result, but this should at least be indicated somewhere). It also appears that Table 1 presented the results of t-tests. I would be inclined to include a test statistic (or an N or both - with an N the reader could calculate the test stat, or with a test stat the reader could calculate the N). I think it is almost essential, because the sample size is one of the key determinants of when the various inferential frameworks actually come apart. I think in its current form Table 1 paints a slightly misleading picture. On Page 3, Para 1: It may be worth discussing "overlay journals". These exist in physics and computing, but recently an overlay journal in neuroscience has also been launch (Neuroscience, Behaviour, Data and Theory). Page 3, Para 5: There are some examples of these tutorial style papers: 1. Four reasons to prefer Bayesian analyses over significance testing, Dienes et al. [1] and 2. Statistical Inference and the Replication Crisis, Colling et al. [2]. I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. Thank you very much for your review and useful pointers. I agree that the text is a little vague at times, albeit this is namely for the reason of lack of effective control on how the recommendations will eventually be implemented. For example, I can foresee the benefit of tutorials, in general (recommendation 2), but I cannot phantom whether there is a perfect tutorial to satisfy everyone (multiple and diverse tutorials may be needed for a “market-type” selection to kick-in, thus signalling helpful from less helpful ones). Actually, it is the third recommendation (in the penultimate paragraph) the one I see affected the least by variability in our imagination, as it calls for specific standards similar to standards already existing elsewhere (which is also why I placed it last, so to somewhat finish the commentary with a clearer, less vague, recommendation). It is for those reasons that I find it difficult to make the remaining recommendations more specific, and had to resort to the use of the conventional keywords “firstly”, “secondly”, and “finally” to somewhat constrain them to particular sections in the manuscript. I think this paper could benefit from a minor revision. Some of the statements in the paper are a little vague. Most of the "meat" of the paper comes in the penultimate paper. It might be worth separating out and clearly stating the specific recommendations. Still so, I also aimed to put the recommendations more clearly towards the end of their sections: better indexing and curation as Recommendation 1 (this implies software-based indexing and curation, but I have little idea whether such software will work well); tutorials (but, again, I am not sure how well the idea will work; nonetheless, I added a correction linking to the STRATOS Initiative, at www.stratos-initiative.org, which may be one of the ways forward; other could be a methodology-based overlay journal, as you mentioned). I have added a new column with the t-test statistics and corresponding degrees of freedom. I have also extended the note on Severity to clarify it and also give a quick pointer for assessment, as “SEV: severity tests based on the observed effects (severity is strong if greater than 0.80; e.g., Mayo, 1996)” Table 1 presented SEV values. However, SEV is always with respect to a Thanks for the pointer. I wasn’t aware of overlay journals. But they certainly are a pretty good solution. I have added the following statement to paragraph 1, page 3: “—although overlay journals are taking care of the later drawback” . On Page 3, Para 1: It may be worth discussing "overlay journals". These exist in physics and computing, but recently an overlay journal in neuroscience has also been launch (Neuroscience, Behaviour, Data and Theory). I am aware of several tutorials; I even have one of my own. However, I thought it would be not too good an idea to name them as the recommendation focuses on the idea of generating them; thus pointing to some (which may or may not be useful, in hindsight) seems distracting. I have, however, added a new entry to an initiative that seems to be working on the same idea, the STRATOS Initiative). Page 3, Para 5: There are some examples of these tutorial style papers: 1. Four reasons to prefer Bayesian analyses over significance testing, Dienes et al. and 2. Statistical Inference and the Replication Crisis, Colling et al. This is an opinion piece, so I reviewed it as such. The paper discusses a few possible directions for improving the scientific process. I largely agree with the expressed opinions. That said, I think that the text remains quite general and vague at times. For example, with respect to the second issue, what do you mean with ‘tutorials which are independently created rather than centrally edited’? About the third issue: what is your specific suggestion? That researchers should better frame their work before starting it (as to what methodology and statistical approach they will use), and adhere to that when writing the report? This is unclear. Further comments: In the abstract, the statement ‘warring among different perspective’ may need a bit more clarity. The paragraph where the second issue is explained, could start with ‘Secondly’. (The first issue is introduced with ‘firstly’, the third issue with ‘finally’.) Table 1: too concise in my opinion. On what N are the p-valus based? Are the Cohen’s d values observed ones? A bit more information on severity tests might be useful. Perhaps columns that belong together (p and Decision; BF and Evidence; if I’m correct) may be put together more clearly. The paragraph starting with ‘such enemy will be difficult to defeat’ is quite vague. E.g. what do you mean with terms like ‘tragedy of the commons’, ‘asymmetric information sharing’, ‘rationality of the enterprise’, and others? Pre-print servers (line 2 of p3): do you refer to repositories like arXiv? Regarding scientific structures and peer-review: my issue is that peer review is very important (I am afraid that reviewers are sometimes the only people who check the contents of a study report), but it remains a largely uncredited – almost secret – endeavor. Don’t you agree that it still needs to be made more official, e.g. that universities do not only demand their researchers to publish papers, but also to deliver decent peer review reports? For example, when I started my tenure track, the university told me what they expected in terms of publication output, grants, and PhD guidance, but they said nothing about peer review, and they never evaluated me on these terms. A decent peer review should almost count as much as publishing a paper? 'Factions' (bottom of first column on p3): what is that? Would the STRATOS initiative ( www.stratos-initiative.org), coordinated by Willi Sauerbrei from Freiburg, be in line with the second issue? It brings together experts on different methodological topics in the context of observational studies, with the aim to provide guidance on how to address these topics. (I am a member of this initiative.) I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Thank you very much for you review (and apologies for the long response time). I agree that the text is “general and vague at times”, albeit this is namely because I don’t know in which form the recommendations will eventually be implemented. This is an opinion piece, so I reviewed it as such. The paper discusses a few possible directions for improving the scientific process. I largely agree with the expressed opinions. That said, I think that the text remains quite general and vague at times. For example, with respect to the second issue, what do you mean with ‘tutorials which are independently created rather than centrally edited’? About the third issue: what is your specific suggestion? That researchers should better frame their work before starting it (as to what methodology and statistical approach they will use), and adhere to that when writing the report? This is unclear. Independently created tutorials calls for multiple and diverse works to be done by independent authors or groups of authors, as opposed to them being ‘centrally edited’ by a publisher (e.g., of textbooks on methods, or statistics). The noted sentence follows from an earlier assertion, that “methodological knowledge   [is]   too   much   textbook-based,   thereby   more aligned  with  editorial  concerns  than  with  the  advancement  of science  (Gigerenzer,  2004)”. At the time, I also found it difficult to write it better without repeating ‘independently / independent authors’, and ‘centrally edited / textbook editors’. I have attempted it with synonyms here (albeit it may read a bit ‘forced’). It now reads “…tutorials which are independently created by unfettered authors rather than centrally abridged by textbook editors…” The third issue is a bit simpler than framing our work a priori (although it may help with this, as well). It is more like framing our work for publication. E.g., those who have followed a Fisherian approach, would say so but also write in a manner that is consistent with such approach, as the assumptions and inferential process are different to those following a Neyman-Pearson approach, or a Jeffreysian approach, etc. If no approach is clear or if they got mixed up in the process of doing the research, then no standard ought to be indicated (it would be misleading, otherwise). The recommendation is more or less the following: In the same way we may decide to release a document under a particular creative commons license (or none), a license which we need to specify and which binds the document to it; we could also release a research report under a particular research standard (or none), and we shall specify such standard so that peer-reviewers can assess the manuscript as per compliance with those standards, and readers can understand the results in reference to such standards. This also means the standards need to be negotiated, approved and hosted as for facilitating quick referencing for the aforementioned peer-reviewers and readers (and, of course, authors). I have re- written the abstract, substituting ‘warring’ by “historical clashes among different perspectives, typically between frequentists and Bayesians”. Further comments: In the abstract, the statement ‘warring among different perspective’ may need a bit more clarity. The paragraph where the second issue is explained, could start with ‘Secondly’. (The first issue is introduced with ‘firstly’, the third issue with ‘finally’.) The second issue actually starts with ‘secondly’, when I introduced the idea of ‘minority movements’. I have added a new column to Table 1, now describing the test, degrees of freedom, and test statistic. I also noted explicitly that Cohen’s d describes observed effects. The columns in the Table follow the suggested grouping: I also clarified a bit more severity testing. All those constructs are found in the book by Taleb (2018). The paragraph is meant to quickly brush over them as a quick introduction, as the really relevant constructs (also Taleb’s) follow: ‘scientific structures’, ‘minority movements’, and ‘soul in the game’.   Both, actually, as they often have such a dual role: either as a final repository or as a repository of manuscripts prior to them being sent for publication elsewhere (nowadays, many journals accept the latter, as long as they are in repositories / preprint servers). I have nonetheless added the concept ‘online repositories’ to the text. I fully agree. In fact, I think that a decent peer-review should count as much as publishing a paper and academics could, ideally, gain tenure and the like on peer-reviewing alone, not just on teaching or  researching, as this allows for good, committed peer-reviewers and increased quality standards (the problem is how to define ‘decent’!). That’s why I added an entry on ‘quality control’ under recommendation 1 for more resilient scientific structures. However, it will be very difficult to know who has peer-reviewed what and with what quality. This means that all comes down to trust: the academic on the tenure track would claim to have done ‘x’ reviews for ‘y’ journals…but it will be difficult to prove. Table 1: too concise in my opinion. On what N are the p-valus based? Are the Cohen’s d values observed ones? A bit more information on severity tests might be useful. Perhaps columns that belong together (p and Decision; BF and Evidence; if I’m correct) may be put together more clearly. p-values and 'frequentist decision' Severity statistics (which is also a frequentist approach) Bayes factors and ‘Bayesian inference’ The paragraph starting with ‘such enemy will be difficult to defeat’ is quite vague. E.g. what do you mean with terms like ‘tragedy of the commons’, ‘asymmetric information sharing’, ‘rationality of the enterprise’, and others? Pre-print servers (line 2 of p3): do you refer to repositories like arXiv? Regarding scientific structures and peer-review: my issue is that peer review is very important (I am afraid that reviewers are sometimes the only people who check the contents of a study report), but it remains a largely uncredited – almost secret – endeavor. Don’t you agree that it still needs to be made more official, e.g. that universities do not only demand their researchers to publish papers, but also to deliver decent peer review reports? For example, when I started my tenure track, the university told me what they expected in terms of publication output, grants, and PhD guidance, but they said nothing about peer review, and they never evaluated me on these terms. A decent peer review should almost count as much as publishing a paper? Thus far, I know of platforms such as Publons, PubPeer, FrontiersIn, and F1000Research, which would somewhat “reward” reviewers and/or publish peer-reviews. Of these, Publons actually rewards peer-reviewing and allows for generating some statistics in the form of percentiles and graphs. But reviews are only displayed depending on particular journal permissions. On the other hand, F1000Research does actually publish reviews and give them a doi. But there is no statistical summary of any form for reviewers.  Thus, independently acknowledging / rewarding peer-review will be difficult unless the action of peer-reviewing have been somewhat independently ‘vetted’ (Publons have such control…but I am not too sure how well it works) or peer reviews are openly displayed (F1000Research). I don’t’ see Universities caring for a career in peer-reviewing any time soon, though.   Factions follow the “war” theme of Mayo’s latest book, consistent with similar concepts in the paragraph (enemies, polarization, warring, etc.). I have repeated Mayo’s reference in this paragraph. Yes… I think that you have almost ‘pulled the rug from under my feet’ here. The STRATOS initiative is pretty much in line with that point. I have added the following reference to the manuscript (“ —see also the STRATOS Initiative, already working toward a similar goal, www.stratos-initiative.org” ). 'Factions' (bottom of first column on p3): what is that? Would the STRATOS initiative (www.stratos-initiative.org), coordinated by Willi Sauerbrei from Freiburg, be in line with the second issue? It brings together experts on different methodological topics in the context of observational studies, with the aim to provide guidance on how to address these topics. (I am a member of this initiative.)
  7 in total

1.  Bayesian t tests for accepting and rejecting the null hypothesis.

Authors:  Jeffrey N Rouder; Paul L Speckman; Dongchu Sun; Richard D Morey; Geoffrey Iverson
Journal:  Psychon Bull Rev       Date:  2009-04

2.  Opinion: Is science really facing a reproducibility crisis, and do we need it to?

Authors:  Daniele Fanelli
Journal:  Proc Natl Acad Sci U S A       Date:  2018-03-13       Impact factor: 11.205

Review 3.  Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing.

Authors:  Jose D Perezgonzalez
Journal:  Front Psychol       Date:  2015-03-03

4.  Four reasons to prefer Bayesian analyses over significance testing.

Authors:  Zoltan Dienes; Neil Mclatchie
Journal:  Psychon Bull Rev       Date:  2018-02

5.  Bayesian inference for psychology. Part II: Example applications with JASP.

Authors:  Eric-Jan Wagenmakers; Jonathon Love; Maarten Marsman; Tahira Jamil; Alexander Ly; Josine Verhagen; Ravi Selker; Quentin F Gronau; Damian Dropmann; Bruno Boutin; Frans Meerhoff; Patrick Knight; Akash Raj; Erik-Jan van Kesteren; Johnny van Doorn; Martin Šmíra; Sacha Epskamp; Alexander Etz; Dora Matzke; Tim de Jong; Don van den Bergh; Alexandra Sarafoglou; Helen Steingroever; Koen Derks; Jeffrey N Rouder; Richard D Morey
Journal:  Psychon Bull Rev       Date:  2018-02

6.  Bayesian inference for psychology. Part I: Theoretical advantages and practical ramifications.

Authors:  Eric-Jan Wagenmakers; Maarten Marsman; Tahira Jamil; Alexander Ly; Josine Verhagen; Jonathon Love; Ravi Selker; Quentin F Gronau; Martin Šmíra; Sacha Epskamp; Dora Matzke; Jeffrey N Rouder; Richard D Morey
Journal:  Psychon Bull Rev       Date:  2018-02

7.  Retract p < 0.005 and propose using JASP, instead.

Authors:  Jose D Perezgonzalez; M Dolores Frías-Navarro
Journal:  F1000Res       Date:  2017-12-12
  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.