Thomas Breslin1, Patrik Edén, Morten Krogh. 1. Complex Systems Division, Department of Theoretical Physics, Lund University, Lund, Sweden. thomas@thep.lu.se <thomas@thep.lu.se>
Abstract
BACKGROUND: Ranked gene lists from microarray experiments are usually analysed by assigning significance to predefined gene categories, e.g., based on functional annotations. Tools performing such analyses are often restricted to a category score based on a cutoff in the ranked list and a significance calculation based on random gene permutations as null hypothesis. RESULTS: We analysed three publicly available data sets, in each of which samples were divided in two classes and genes ranked according to their correlation to class labels. We developed a program, Catmap (available for download at http://bioinfo.thep.lu.se/Catmap), to compare different scores and null hypotheses in gene category analysis, using Gene Ontology annotations for category definition. When a cutoff-based score was used, results depended strongly on the choice of cutoff, introducing an arbitrariness in the analysis. Comparing results using random gene permutations and random sample permutations, respectively, we found that the assigned significance of a category depended strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller p-values for large categories with many coexpressed genes. CONCLUSIONS: In gene category analyses of ranked gene lists, a cutoff independent score is preferable. The choice of null hypothesis is very important; random gene permutations does not work well as an approximation to sample label permutations.
BACKGROUND: Ranked gene lists from microarray experiments are usually analysed by assigning significance to predefined gene categories, e.g., based on functional annotations. Tools performing such analyses are often restricted to a category score based on a cutoff in the ranked list and a significance calculation based on random gene permutations as null hypothesis. RESULTS: We analysed three publicly available data sets, in each of which samples were divided in two classes and genes ranked according to their correlation to class labels. We developed a program, Catmap (available for download at http://bioinfo.thep.lu.se/Catmap), to compare different scores and null hypotheses in gene category analysis, using Gene Ontology annotations for category definition. When a cutoff-based score was used, results depended strongly on the choice of cutoff, introducing an arbitrariness in the analysis. Comparing results using random gene permutations and random sample permutations, respectively, we found that the assigned significance of a category depended strongly on the choice of null hypothesis. Compared to sample label permutations, gene permutations gave much smaller p-values for large categories with many coexpressed genes. CONCLUSIONS: In gene category analyses of ranked gene lists, a cutoff independent score is preferable. The choice of null hypothesis is very important; random gene permutations does not work well as an approximation to sample label permutations.
Authors: U Alon; N Barkai; D A Notterman; K Gish; S Ybarra; D Mack; A J Levine Journal: Proc Natl Acad Sci U S A Date: 1999-06-08 Impact factor: 11.205
Authors: T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander Journal: Science Date: 1999-10-15 Impact factor: 47.728
Authors: Douglas A Hosack; Glynn Dennis; Brad T Sherman; H Clifford Lane; Richard A Lempicki Journal: Genome Biol Date: 2003-09-11 Impact factor: 13.583
Authors: P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher Journal: Mol Biol Cell Date: 1998-12 Impact factor: 4.138
Authors: Vamsi K Mootha; Cecilia M Lindgren; Karl-Fredrik Eriksson; Aravind Subramanian; Smita Sihag; Joseph Lehar; Pere Puigserver; Emma Carlsson; Martin Ridderstråle; Esa Laurila; Nicholas Houstis; Mark J Daly; Nick Patterson; Jill P Mesirov; Todd R Golub; Pablo Tamayo; Bruce Spiegelman; Eric S Lander; Joel N Hirschhorn; David Altshuler; Leif C Groop Journal: Nat Genet Date: 2003-07 Impact factor: 38.330
Authors: A Gioti; S Wigby; B Wertheim; E Schuster; P Martinez; C J Pennington; L Partridge; T Chapman Journal: Proc Biol Sci Date: 2012-09-12 Impact factor: 5.349
Authors: Sonita Afschar; Janne M Toivonen; Julia Marianne Hoffmann; Luke Stephen Tain; Daniela Wieser; Andrew John Finlayson; Yasmine Driege; Nazif Alic; Sahar Emran; Julia Stinn; Jenny Froehlich; Matthew D Piper; Linda Partridge Journal: Proc Natl Acad Sci U S A Date: 2016-01-19 Impact factor: 11.205