Song Zhang1, Jing Cao, Y Megan Kong, Richard H Scheuermann. 1. Department of Clinical Sciences, U.T. Southwestern Medical Center, 5323 Harry Hines Boulevard Dallas, TX 75390-9072, USA. song.zhang@utsouthwestern.edu
Abstract
MOTIVATION: A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy. RESULTS: We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.
MOTIVATION: A typical approach for the interpretation of high-throughput experiments, such as gene expression microarrays, is to produce groups of genes based on certain criteria (e.g. genes that are differentially expressed). To gain more mechanistic insights into the underlying biology, overrepresentation analysis (ORA) is often conducted to investigate whether gene sets associated with particular biological functions, for example, as represented by Gene Ontology (GO) annotations, are statistically overrepresented in the identified gene groups. However, the standard ORA, which is based on the hypergeometric test, analyzes each GO term in isolation and does not take into account the dependence structure of the GO-term hierarchy. RESULTS: We have developed a Bayesian approach (GO-Bayes) to measure overrepresentation of GO terms that incorporates the GO dependence structure by taking into account evidence not only from individual GO terms, but also from their related terms (i.e. parents, children, siblings, etc.). The Bayesian framework borrows information across related GO terms to strengthen the detection of overrepresentation signals. As a result, this method tends to identify sets of closely related GO terms rather than individual isolated GO terms. The advantage of the GO-Bayes approach is demonstrated with a simulation study and an application example.
Authors: R J Cho; M Huang; M J Campbell; H Dong; L Steinmetz; L Sapinoso; G Hampton; S J Elledge; R W Davis; D J Lockhart Journal: Nat Genet Date: 2001-01 Impact factor: 38.330
Authors: Andrea Alter; Martin Duddy; Sherry Hebert; Katarzyna Biernacki; Alexandre Prat; Jack P Antel; Voon Wee Yong; Robert K Nuttall; Caroline J Pennington; Dylan R Edwards; Amit Bar-Or Journal: J Immunol Date: 2003-05-01 Impact factor: 5.422
Authors: Chen-Feng Qi; Annica Martensson; Michela Mattioli; Riccardo Dalla-Favera; Victor V Lobanenkov; Herbert C Morse Journal: Proc Natl Acad Sci U S A Date: 2003-01-10 Impact factor: 11.205
Authors: Takashi Okada; Yingkao Hu; Matthew R Tucker; Jennifer M Taylor; Susan D Johnson; Andrew Spriggs; Tohru Tsuchiya; Karsten Oelkers; Julio C M Rodrigues; Anna M G Koltunow Journal: Plant Physiol Date: 2013-07-17 Impact factor: 8.340
Authors: Thomas W Chittenden; Eleanor A Howe; Jennifer M Taylor; Jessica C Mar; Martin J Aryee; Harold Gómez; Razvan Sultana; John Braisted; Sarita J Nair; John Quackenbush; Chris Holmes Journal: Bioinformatics Date: 2012-01-13 Impact factor: 6.937
Authors: Vid Podpečan; Nada Lavrač; Igor Mozetič; Petra Kralj Novak; Igor Trajkovski; Laura Langohr; Kimmo Kulovesi; Hannu Toivonen; Marko Petek; Helena Motaln; Kristina Gruden Journal: BMC Bioinformatics Date: 2011-10-26 Impact factor: 3.169