Shuangge Ma1, Michael R Kosorok. 1. Department of Epidemiology and Public Health, Yale University, New Haven, CT 06510, USA. shuangge.ma@yale.edu
Abstract
MOTIVATION: Development of high-throughput technology makes it possible to measure expressions of thousands of genes simultaneously. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated biological functions. It is of great interest to identify differential gene pathways that are associated with the variations of phenotypes. RESULTS: We propose the following approach for detecting differential gene pathways. First, we construct gene pathways using databases such as KEGG or GO. Second, for each pathway, we extract a small number of representative features, which are linear combinations of gene expressions and/or their transformations. Specifically, we propose using (i) principal components (PCs) of gene expression sets, (ii) PCs of expanded gene expression sets and (iii) expanded sets of PCs of gene expressions, as the representative features. Third, we identify differential gene pathways as those with representative features significantly associated with the variations of phenotypes, particularly disease clinical outcomes, in regression models. The false discovery rate approach is used to adjust for multiple comparisons. Analysis of three gene expression datasets suggests that (i) the proposed approach can effectively identify differential gene pathways; (ii) PCs that explain only a small amount of variations of gene expressions may bear significant associations between gene pathways and phenotypes; (iii) including second-order terms of gene expressions may lead to identification of new differential gene pathways; (iv) the proposed approach is relatively insensitive to additional noises; and (v) the proposed approach can identify gene pathways missed by alternative approaches. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Development of high-throughput technology makes it possible to measure expressions of thousands of genes simultaneously. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated biological functions. It is of great interest to identify differential gene pathways that are associated with the variations of phenotypes. RESULTS: We propose the following approach for detecting differential gene pathways. First, we construct gene pathways using databases such as KEGG or GO. Second, for each pathway, we extract a small number of representative features, which are linear combinations of gene expressions and/or their transformations. Specifically, we propose using (i) principal components (PCs) of gene expression sets, (ii) PCs of expanded gene expression sets and (iii) expanded sets of PCs of gene expressions, as the representative features. Third, we identify differential gene pathways as those with representative features significantly associated with the variations of phenotypes, particularly disease clinical outcomes, in regression models. The false discovery rate approach is used to adjust for multiple comparisons. Analysis of three gene expression datasets suggests that (i) the proposed approach can effectively identify differential gene pathways; (ii) PCs that explain only a small amount of variations of gene expressions may bear significant associations between gene pathways and phenotypes; (iii) including second-order terms of gene expressions may lead to identification of new differential gene pathways; (iv) the proposed approach is relatively insensitive to additional noises; and (v) the proposed approach can identify gene pathways missed by alternative approaches. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Andreas Rosenwald; George Wright; Adrian Wiestner; Wing C Chan; Joseph M Connors; Elias Campo; Randy D Gascoyne; Thomas M Grogan; H Konrad Muller-Hermelink; Erlend B Smeland; Michael Chiorazzi; Jena M Giltnane; Elaine M Hurt; Hong Zhao; Lauren Averett; Sarah Henrickson; Liming Yang; John Powell; Wyndham H Wilson; Elaine S Jaffe; Richard Simon; Richard D Klausner; Emilio Montserrat; Francesc Bosch; Timothy C Greiner; Dennis D Weisenburger; Warren G Sanger; Bhavana J Dave; James C Lynch; Julie Vose; James O Armitage; Richard I Fisher; Thomas P Miller; Michael LeBlanc; German Ott; Stein Kvaloy; Harald Holte; Jan Delabie; Louis M Staudt Journal: Cancer Cell Date: 2003-02 Impact factor: 31.743
Authors: Min Sun Shin; Torgny N Fredrickson; Janet W Hartley; Takeshi Suzuki; Keiko Akagi; Keiko Agaki; Herbert C Morse Journal: Cancer Res Date: 2004-07-01 Impact factor: 12.701
Authors: Minerva M Carrasquillo; Andrew S McCallion; Erik G Puffenberger; Carl S Kashuk; Nassim Nouri; Aravinda Chakravarti Journal: Nat Genet Date: 2002-09-23 Impact factor: 38.330
Authors: Jie Tan; Georgia Doing; Kimberley A Lewis; Courtney E Price; Kathleen M Chen; Kyle C Cady; Barret Perchuk; Michael T Laub; Deborah A Hogan; Casey S Greene Journal: Cell Syst Date: 2017-07-12 Impact factor: 10.304
Authors: Anita Goldinger; Anjali K Henders; Allan F McRae; Nicholas G Martin; Greg Gibson; Grant W Montgomery; Peter M Visscher; Joseph E Powell Journal: Genetics Date: 2013-09-11 Impact factor: 4.562