Literature DB >> 30783320

Multiple Testing and Protection Against a Type 1 (False Positive) Error Using the Bonferroni and Hochberg Corrections.

Abstract

In a given study, if many related outcomes are tested for statistical significance, one or more outcomes may emerge significant at the P < 0.05 level not because they are truly significant in the population but because of chance. The larger the number of statistical tests performed, the greater the risk that some of the significant findings are significant because of chance. There are many ways to protect against such false positive or Type 1 errors. The simplest way is to set a more stringent threshold for statistical significance than P < 0.05. This can be done using either the Bonferroni or the Hochberg correction. Using the Bonferroni correction, 0.05 is divided by the number of statistical tests being performed and the result is set as the critical P value for statistical significance. Using the Hochberg correction, the P values obtained from the different statistical tests are arranged in descending order of magnitude, and each P value is assessed for significance against progressively more stringent levels for significance. The Bonferroni and Hochberg procedures are explained with the help of examples.

Entities: Chemical Disease Species

Keywords: Bonferroni correction; Hochberg correction; P value; false positive error; multiple testing; type 1 error

Year: 2019 PMID： 30783320 PMCID： PMC6337927 DOI： 10.4103/IJPSYM.IJPSYM_499_18

Source DB: PubMed Journal: Indian J Psychol Med ISSN： 0253-7176

Imagine that you conduct a 3-month trial in which patients randomized to receive risperidone or haloperidol are examined to determine which antipsychotic is associated with better outcomes for negative symptoms and cognitive functioning. In this trial, negative symptoms are assessed using the Positive and Negative Syndrome Scale-Negative Syndrome subscale (PANSS-N) and the Scale for Assessment of Negative Symptoms (SANS); the total score on each scale is the outcome of interest. Cognitive functioning is assessed using tests of attention and concentration, visual memory, verbal memory, working memory, and ideational fluency; each test yields a single score. Thus, there are two negative symptom outcomes and five cognitive outcomes, making a total of seven outcomes to be compared between groups. You know that if you compare just one outcome between the two groups, and if the two groups actually (in the population) do not differ on this outcome, there is only a 5% probability that the result will be statistically significant because of chance; this is what P < 0.05 means.[1] You also know that the larger the number of outcomes compared between the two groups, the greater the likelihood that one or more outcomes will be significant by chance alone. In fact, if five related outcomes are tested, there is a 23% probability that one of the outcomes will be significant by chance.[1] This is known as a false positive error or a Type 1 statistical error.[1] So how would you protect against an inflated Type 1 error when you compare the risperidone and haloperidol groups? Although negative symptom burden and cognitive impairment are correlated, because they represent different conceptual entities it would be reasonable to protect against a Type 1 error separately for the two negative symptom outcomes and for the five cognitive outcomes. Protection against a Type 1 error can be done in many ways. One method sets a more stringent value of P for statistical significance. This can be done using the Bonferroni correction or the Hochberg correction.[2]

THE BONFERRONI CORRECTION

With this method, the value of P for statistical significance (conventionally, 0.05) is divided by the number of statistical tests performed. So, for the negative symptom outcomes, because there are two tests (one for PANSS-N and one for SANS), P for statistical significance is set at 0.05/2 or 0.025. This means that the outcomes for PANSS-N and SANS will be considered significant only if the P values associated with these tests are <0.025 instead of <0.05, as conventional. With regard to the cognitive outcomes, because there are five tests, for any of the five outcomes to be considered statistically significant, it should result in a P value that is <0.05/5; that is, <0.01. The Bonferroni correction is considered conservative; that is, it makes it quite difficult to obtain statistically significant results. This is because when the number of tests performed is large, the P value required for statistical significance becomes quite small and is hard to achieve. In other words, the Bonferroni correction magnifies the risk of a false negative or Type 2 statistical error.[1] The Hochberg sequential procedure offers a better balance between the Type 1 and Type 2 error risks.

THE HOCHBERG SEQUENTIAL PROCEDURE

With this method, after the groups are compared on each of the five cognitive outcomes, the P values obtained are arranged in descending order of magnitude. If the outcome with the largest P value is significant at the 0.05 level (i.e., P < 0.05), then all the outcomes are considered significant. If the first P value is >0.05, then the second P value is examined; if the second P value is <0.05/2 (that is, 0.025), then this outcome and all the outcomes with smaller P values are considered significant. If the second P value is >0.025, then the third P value is examined; if the third P value is <0.05/3 (that is, 0.017), then this outcome and all the outcomes with smaller P values are considered significant; and so on. For the negative symptom outcomes, if the larger of the two P values is <0.05, then both outcomes are considered significant. If the larger value is >0.05, the second P value will be considered significant only if it is <0.05/2; that is, 0.025. Effectively, the Hochberg sequential procedure applies progressively more stringent criteria for statistical significance, and the last P value is examined at the Bonferroni correction level if the previous P values were not significant on Hochberg testing.

NOTES

Corrections for a Type 1 statistical error are necessary only when many tests of the same construct (e.g., cognition) are conducted. Correction is generally considered unnecessary if different tests examine different constructs (e.g., psychosis, memory, and extrapyramidal symptoms). However, in such a context, the issue of primary outcome vs secondary outcomes must be considered[3] Avoidance of a Type 1 error is desirable in confirmatory studies but may be dispensed with in exploratory studies where authors do not wish to miss a potentially significant outcome Sometimes, authors may set an arbitrarily conservative P value (e.g., P < 0.01) for all tests to modestly protect against a Type 1 error.[4]

Financial support and sponsorship

Nil.

Conflicts of interest

There are no conflicts of interest.

10 in total

Review 1. A Bioinformatics Crash Course for Interpreting Genomics Data.

Authors: Daniel M Rotroff
Journal: Chest Date: 2020-07 Impact factor: 9.410

2. Institutional Variation in 30-Day Complications Following Catheter Ablation of Atrial Fibrillation.

Authors: Linh Ngo; Anna Ali; Anand Ganesan; Richard Woodman; Harlan M Krumholz; Robert Adams; Isuru Ranasinghe
Journal: J Am Heart Assoc Date: 2022-02-12 Impact factor: 6.106

3. Planning Statistical Analysis: Wrong and Right Approaches Explained Using an Entertaining Example from Everyday Life.

Authors: Chittaranjan Andrade; Nilesh B Shah
Journal: Indian J Psychol Med Date: 2019 Jul-Aug

4. Analysis of gastroscopy results among healthy people undergoing a medical checkup: a retrospective study.

Authors: Haosu Huang; Yanting Rong; Meng Wang; Zimeng Guo; Yanghua Yu; Zhenpu Long; Xiaoxiao Chen; Hanyue Wang; Junjie Ding; Lu Yan; Jie Peng
Journal: BMC Gastroenterol Date: 2020-12-09 Impact factor: 3.067

5. Interleukin-2-inducible T-cell kinase (Itk) signaling regulates potent noncanonical regulatory T cells.

Authors: Mahinbanu Mammadli; Rebecca Harris; Liye Suo; Adriana May; Teresa Gentile; Adam T Waickman; Alaji Bah; Avery August; Elmar Nurmemmedov; Mobin Karimi
Journal: Clin Transl Med Date: 2021-12

6. Z Scores, Standard Scores, and Composite Test Scores Explained.

Authors: Chittaranjan Andrade
Journal: Indian J Psychol Med Date: 2021-10-10

7. Curcumin Offers No Additional Benefit to Lifestyle Intervention on Cardiometabolic Status in Patients with Non-Alcoholic Fatty Liver Disease.

Authors: Kaveh Naseri; Saeede Saadati; Zahra Yari; Behzad Askari; Davood Mafi; Pooria Hoseinian; Omid Asbaghi; Azita Hekmatdoost; Barbora de Courten
Journal: Nutrients Date: 2022-08-06 Impact factor: 6.706

8. Family Matters: Trauma and Quality of Life in Family Members of Individuals With Prader-Willi Syndrome.

Authors: Anja Bos-Roubos; Ellen Wingbermühle; Anneloes Biert; Laura de Graaff; Jos Egger
Journal: Front Psychiatry Date: 2022-06-28 Impact factor: 5.435

9. Relationships between parent and adolescent/young adult mental health among Hispanic and non-Hispanic childhood cancer survivors.

Authors: Rhona I Slaughter; Ann S Hamilton; Julie A Cederbaum; Jennifer B Unger; Lourdes Baezconde-Garbanati; Joel E Milam
Journal: J Psychosoc Oncol Date: 2020-09-08

10. Long-term Effects of a Brief Mindfulness Intervention Versus a Health Enhancement Program for Treating Depression and Anxiety in Patients Undergoing Hemodialysis: A Randomized Controlled Trial.

Authors: Christina Rigas; Haley Park; Marouane Nassim; Chien-Lin Su; Kyle Greenway; Mark Lipman; Clare McVeigh; Marta Novak; Emilie Trinh; Ahsan Alam; Rita S Suri; Istvan Mucsi; Susana G Torres-Platas; Helen Noble; Harmehr Sekhon; Soham Rej; Michael Lifshitz
Journal: Can J Kidney Health Dis Date: 2022-03-04

10 in total