| Literature DB >> 33335808 |
Stephen Midway1, Matthew Robertson2, Shane Flinn3, Michael Kaller4.
Abstract
Multiple comparisons tests (MCTs) include the statistical tests used to compare groups (treatments) often following a significant effect reported in one of many types of linear models. Due to a variety of data and statistical considerations, several dozen MCTs have been developed over the decades, with tests ranging from very similar to each other to very different from each other. Many scientific disciplines use MCTs, including >40,000 reports of their use in ecological journals in the last 60 years. Despite the ubiquity and utility of MCTs, several issues remain in terms of their correct use and reporting. In this study, we evaluated 17 different MCTs. We first reviewed the published literature for recommendations on their correct use. Second, we created a simulation that evaluated the performance of nine common MCTs. The tests examined in the simulation were those that often overlapped in usage, meaning the selection of the test based on fit to the data is not unique and that the simulations could inform the selection of one or more tests when a researcher has choices. Based on the literature review and recommendations: planned comparisons are overwhelmingly recommended over unplanned comparisons, for planned non-parametric comparisons the Mann-Whitney-Wilcoxon U test is recommended, Scheffé's S test is recommended for any linear combination of (unplanned) means, Tukey's HSD and the Bonferroni or the Dunn-Sidak tests are recommended for pairwise comparisons of groups, and that many other tests exist for particular types of data. All code and data used to generate this paper are available at: https://github.com/stevemidway/MultipleComparisons. ©2020 Midway et al.Entities:
Keywords: ANOVA; Bonferroni; Contrasts; Multiple comparisons; Scheffé; Tukey HSD
Year: 2020 PMID: 33335808 PMCID: PMC7720730 DOI: 10.7717/peerj.10387
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Four outcomes of hypothesis testing.
The two types of error are presented in boldface text.
| Null Hypothesis ( | Null Hypothesis ( | |
|---|---|---|
| Fail to reject | Correct (true negative) | |
| Reject | Correct (true positive) |
Multiple comparisons tests (MCTs) searched in the literature and total number of reported uses from 1960–2019.
Tests are ordered by their general application and then by popularity—the number of times cited in ecological literature. Note: Terms like test and procedure have been removed where not necessary. Based on the literature, the bottom three tests are often not recommended, which is guidance we have adopted (and discuss in the study). Finally, we are not able to differentiate Bonferroni from sequential Bonferroni, but we expect that the number of reported citations captures most of both uses.
| Test | Citations ( | General application |
|---|---|---|
| Bonferroni | 20,801 | Parametric situations |
| Tukey’s Honest Significant Difference (HSD) | 7,800 | Parametric situations |
| Tukey-Kramer | 1,930 | Parametric situations |
| Scheffé’s | 1,370 | Parametric situations |
| Dunn-Šidák (Šidák) | 905 | Parametric situations |
| Dunnett’s | 332 | Parametric situations |
| Ryan’s | 108 | Parametric situations |
| Waller-Duncan | 103 | Parametric situations |
| Dunn Procedure | 2,870 | Nonparametric situations |
| Mann-Whitney-Wilcoxon | 1,950 | Nonparametric situations |
| Games-Howell | 248 | Nonparametric situations |
| Nemenyi | 184 | Nonparametric situations |
| Steel-Dwass | 94 | Nonparametric situations |
| Fligner-Policello | 8 | Nonparametric situations |
| Student-Newman-Keuls (SNK) | 1,800 | Not recommended |
| Fisher’s Least Significant Difference (LSD) | 971 | Not recommended |
| Duncan’s Multiple Range Test (DMRT) | 1,707 | Not recommended |
Figure 1Diagram of multiple comparison simulations.
This diagram shows the organization of study design, treatment, iterations, and descriptions of the data used for each treatment in balanced and unbalanced designs (separated by the dashed line). The data description circles relate the treatment abbreviations to the number of samples in each group (n) and the number of groups (n). The abbreviations for the four simulation treatments are LSFG, low sample size with few groups; LSMG, low sample size with many groups; HSFG, high sample size with few groups; and HSMG, high sample size with many groups.
Common multiple comparisons tests and their software implementations.
This table is meant to serve as a reference for functions and is not meant to advocate for particular packages and functions over others. Functions may give different results from one another, and we recommend reading any instructions or helpfiles for details on specific test implementations. Boldface functions indicate those used in the simulation component of this manuscript.
| Multiple comparisons test | R package::function | SAS Statements | SPSS Options |
|---|---|---|---|
| Tukey’s HSD | MEANS / tukey; | Available by menu | |
| agricolae::HSD.test | LSMEANS / adjust = “tukey” | ||
| TukeyC::TukeyC | |||
| DescTools::PostHocTest | |||
| MEANS / sidak; | EMMEANS ADJ(SIDAK) | ||
| mutoss::sidak | LSMEANS/ adjust = “sidak” | ||
| MEANS / BON; | Available by menu | ||
| mutoss::bonferroni | LSMEANS/ adjust = “Bon”; | EMMEANS ADJ(BONFERRONI) | |
| Scheffé’s | MEANS / scheffe; | Available by menu | |
| DescTools::ScheffeTest | LSMEANS / adjust = “scheffe”; | ||
| GAD::snk.test | |||
| DescTools::PostHocTest | |||
| Student-Neumen-Keul’s Test | MEANS / snk; | Available by menu | |
| GAD::snk.test | LSMEANS /adjust = “snk”; | ||
| DescTools::PostHocTest | |||
| Fisher’s LSD | MEANS / LSD | Available by menu | |
| PMCMRplus::lsdTest | EMMEANS ADJ(LSD) | ||
| Fisher’s LSD with Šidák correction | |||
| MHTdiscrete::Sidak.p.adjust | |||
| mutoss::sidak | |||
| Duncan’s MRT | MEANS / duncan | Available by menu | |
| PMCMRplus::duncanTest | |||
Notes.
as Tukey-Kramer.
Figure 2Reported uses of the three most common parametric multiple comparisons tests (MCTs) by 5-year intervals.
Other MCTs excluded here (but listed in Table 1) show relatively similar trends to SNK or were too infrequently reported to visualize on this figure.
Figure 3Type I errors in simulations.
Proportion of type I per comparison error rates (PCERs) between the nine multiple comparison tests (MCTs) in each of the four simulation treatments for (A) balanced and (B) unbalanced study designs. Simulation treatment abbreviations can be found in the Fig. 1 caption.
Figure 4p-values for type I errors in simulations.
Distribution of p-values for type I error tests with balanced study designs (A–D). Distributions shown for the nine multiple comparison tests (MCTs) in each of the four simulation treatments. Dashed red line indicates a p-value of 0.05. Simulation treatment abbreviations can be found in the Fig. 1 caption.
Figure 5Type II errors in simulations.
Proportion of type II per comparison error rates (PCERs) between the nine multiple comparison tests (MCTs) in each of the four simulation treatment for (A) balanced and (B) unbalanced study designs. Simulation treatment abbreviations can be found in the Fig. 1 caption.
Figure 6p-values for type II errors in simulations.
Distribution of p-values for type II error tests with balanced study designs (A–D). Distributions shown for the nine multiple comparison tests in each of the four simulation treatment for (A) balanced and (B) unbalanced study designs. Dashed red line indicates a p-value of 0.05. Simulation treatment abbreviations can be found in the Fig. 1 caption.
Figure 7Decision diagram for selecting a multiple comparisons test.
(A) Multiple comparisons tests based on parametric data and models, followed by additional diagnostics. (B) Non-parametric data and models. The gray box defines some of the terms used in the diagram.
Example of a MCT in a two-way interaction.
Variable X has 3 levels (j = 3) and variable X has 2 levels (j = 2). Levels are presented as numbers in this example, but also may be words or characters. The method of estimating each variable-level combination (e.g., X, X) depends on MCT, as does the test-statistic.
| Variable | Variable | Difference |
|---|---|---|
| 1 | 1 | Estimated |
| 1 | 2 | Estimated |
| 2 | 1 | Estimated |
| 2 | 2 | Estimated |
| 3 | 1 | Estimated |
| 3 | 2 | Estimated |
Example of a MCT in a three-way interaction.
Variable X has 2 levels (j = 2), variable X2 has 2 levels (j = 2), and variable X3 has 2 levels (j = 2). The notation variable 1—variable 2 indicates the estimate is conditional on the second variable. For all situations, the test statistic and adjusted p-value depends on the choice of MCT.
| Variable | Variable | Variable | Difference |
|---|---|---|---|
| 1 | 1 | 1 | Estimated |
| 1 | 1 | 2 | Estimated |
| 1 | 2 | 1 | Estimated |
| 1 | 2 | 2 | Estimated |
| 2 | 1 | 1 | Estimated |
| 2 | 1 | 2 | Estimated |
| 2 | 2 | 1 | Estimated |
| 2 | 2 | 2 | Estimated |
| 1 | 1 | 1 | Estimated |
| 1 | 1 | 2 | Estimated |
| 2 | 1 | 1 | Estimated |
| 2 | 1 | 2 | Estimated |
| 1 | 2 | 1 | Estimated |
| 1 | 2 | 2 | Estimated |
| 2 | 2 | 1 | Estimated |
| 2 | 2 | 2 | Estimated |
| 1 | 1 | 1 | Estimated |
| 1 | 2 | 1 | Estimated |
| 2 | 1 | 1 | Estimated |
| 2 | 2 | 1 | Estimated |
| 1 | 1 | 2 | Estimated |
| 1 | 2 | 2 | Estimated |
| 2 | 1 | 2 | Estimated |
| 2 | 2 | 2 | Estimated |