Anyela Camargo1, Francisco Azuaje, Haiying Wang, Huiru Zheng. 1. University of Ulster at Jordanstown, School of Computing and Mathematics, Shore Road, Newtownabbey, Co, Antrim, BT37 0QB, Northern Ireland, UK. hy.wang@ulster.ac.uk.
Abstract
BACKGROUND: Genomics and proteomics analyses regularly involve the simultaneous test of hundreds of hypotheses, either on numerical or categorical data. To correct for the occurrence of false positives, validation tests based on multiple testing correction, such as Bonferroni and Benjamini and Hochberg, and re-sampling, such as permutation tests, are frequently used. Despite the known power of permutation-based tests, most available tools offer such tests for either t-test or ANOVA only. Less attention has been given to tests for categorical data, such as the Chi-square. This project takes a first step by developing an open-source software tool, Ptest, that addresses the need to offer public software tools incorporating these and other statistical tests with options for correcting for multiple hypotheses. RESULTS: This study developed a public-domain, user-friendly software whose purpose was twofold: first, to estimate test statistics for categorical and numerical data; and second, to validate the significance of the test statistics via Bonferroni, Benjamini and Hochberg, and a permutation test of numerical and categorical data. The tool allows the calculation of Chi-square test for categorical data, and ANOVA test, Bartlett's test and t-test for paired and unpaired data. Once a test statistic is calculated, Bonferroni, Benjamini and Hochberg, and a permutation tests are implemented, independently, to control for Type I errors. An evaluation of the software using different public data sets is reported, which illustrates the power of permutation tests for multiple hypotheses assessment and for controlling the rate of Type I errors. CONCLUSION: The analytical options offered by the software can be applied to support a significant spectrum of hypothesis testing tasks in functional genomics, using both numerical and categorical data.
BACKGROUND: Genomics and proteomics analyses regularly involve the simultaneous test of hundreds of hypotheses, either on numerical or categorical data. To correct for the occurrence of false positives, validation tests based on multiple testing correction, such as Bonferroni and Benjamini and Hochberg, and re-sampling, such as permutation tests, are frequently used. Despite the known power of permutation-based tests, most available tools offer such tests for either t-test or ANOVA only. Less attention has been given to tests for categorical data, such as the Chi-square. This project takes a first step by developing an open-source software tool, Ptest, that addresses the need to offer public software tools incorporating these and other statistical tests with options for correcting for multiple hypotheses. RESULTS: This study developed a public-domain, user-friendly software whose purpose was twofold: first, to estimate test statistics for categorical and numerical data; and second, to validate the significance of the test statistics via Bonferroni, Benjamini and Hochberg, and a permutation test of numerical and categorical data. The tool allows the calculation of Chi-square test for categorical data, and ANOVA test, Bartlett's test and t-test for paired and unpaired data. Once a test statistic is calculated, Bonferroni, Benjamini and Hochberg, and a permutation tests are implemented, independently, to control for Type I errors. An evaluation of the software using different public data sets is reported, which illustrates the power of permutation tests for multiple hypotheses assessment and for controlling the rate of Type I errors. CONCLUSION: The analytical options offered by the software can be applied to support a significant spectrum of hypothesis testing tasks in functional genomics, using both numerical and categorical data.
Authors: A I Saeed; V Sharov; J White; J Li; W Liang; N Bhagabati; J Braisted; M Klapa; T Currier; M Thiagarajan; A Sturn; M Snuffin; A Rezantsev; D Popov; A Ryltsov; E Kostukovich; I Borisovsky; Z Liu; A Vinsavich; V Trush; J Quackenbush Journal: Biotechniques Date: 2003-02 Impact factor: 1.993
Authors: David A Hinds; Laura L Stuve; Geoffrey B Nilsen; Eran Halperin; Eleazar Eskin; Dennis G Ballinger; Kelly A Frazer; David R Cox Journal: Science Date: 2005-02-18 Impact factor: 47.728
Authors: Andreas S Barth; Ruprecht Kuner; Andreas Buness; Markus Ruschhaupt; Sylvia Merk; Ludwig Zwermann; Stefan Kääb; Eckart Kreuzer; Gerhard Steinbeck; Ulrich Mansmann; Annemarie Poustka; Michael Nabauer; Holger Sültmann Journal: J Am Coll Cardiol Date: 2006-09-27 Impact factor: 24.094
Authors: Michelle M Kittleson; Khalid M Minhas; Rafael A Irizarry; Shui Q Ye; Gina Edness; Elayne Breton; John V Conte; Gordon Tomaselli; Joe G N Garcia; Joshua M Hare Journal: Physiol Genomics Date: 2005-03-15 Impact factor: 3.107
Authors: Christopher S Carlson; Michael A Eberle; Mark J Rieder; Joshua D Smith; Leonid Kruglyak; Deborah A Nickerson Journal: Nat Genet Date: 2003-03-24 Impact factor: 38.330
Authors: Falk W Lohoff; Paul J Bloch; Thomas N Ferraro; Wade H Berrettini; Helen M Pettinati; Charles A Dackis; Charles P O'Brien; Kyle M Kampman; David W Oslin Journal: Neurosci Lett Date: 2009-02-21 Impact factor: 3.046
Authors: Haihong Liu; Yoshio Kaneko; Xuan Ouyang; Li Li; Yihui Hao; Eric Y H Chen; Tianzi Jiang; Yuan Zhou; Zhening Liu Journal: Schizophr Bull Date: 2010-06-30 Impact factor: 9.306
Authors: Chelsie E Benca; Jaime L Derringer; Robin P Corley; Susan E Young; Matthew C Keller; John K Hewitt; Naomi P Friedman Journal: Behav Genet Date: 2016-10-14 Impact factor: 2.805
Authors: Falk W Lohoff; Thomas N Ferraro; Edward S Brodkin; Andrew E Weller; Paul J Bloch Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2010-04-05 Impact factor: 3.568
Authors: Jessica L Fleming; Amy M Dworkin; Dawn C Allain; Soledad Fernandez; Lai Wei; Sara B Peters; O Hans Iwenofu; Katie Ridd; Boris C Bastian; Amanda Ewart Toland Journal: Int J Cancer Date: 2013-07-16 Impact factor: 7.396