Yinglei Lai1, Fanni Zhang1, Tapan K Nayak1, Reza Modarres1, Norman H Lee2, Timothy A McCaffrey3. 1. Department of Statistics, The George Washington University, Washington, DC 20052, USA. 2. Department of Pharmacology and Physiology. 3. Division of Genomic Medicine, Department of Medicine, The George Washington University Medical Center, Washington, DC 20037, USA.
Abstract
MOTIVATION: We have proposed a mixture model based approach to the concordant integrative analysis of multiple large-scale two-sample expression datasets. Since the mixture model is based on the transformed differential expression test P-values (z-scores), it is generally applicable to the expression data generated by either microarray or RNA-seq platforms. The mixture model is simple with three normal distribution components for each dataset to represent down-regulation, up-regulation and no differential expression. However, when the number of datasets increases, the model parameter space increases exponentially due to the component combination from different datasets. RESULTS: In this study, motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we focus on the concordant components and assume that the proportions of non-concordant components follow a special structure. We discuss the exchangeable, multiset coefficient and autoregressive structures for model reduction, and their related expectation-maximization (EM) algorithms. Then, the parameter space is linear with the number of datasets. In our previous study, we have applied the general mixture model to three microarray datasets for lung cancer studies. We show that more gene sets (or pathways) can be detected by the reduced mixture model with the exchangeable structure. Furthermore, we show that more genes can also be detected by the reduced model. The Cancer Genome Atlas (TCGA) data have been increasingly collected. The advantage of incorporating the concordance feature has also been clearly demonstrated based on TCGA RNA sequencing data for studying two closely related types of cancer. AVAILABILITY AND IMPLEMENTATION: Additional results are included in a supplemental file. Computer program R-functions are freely available at http://home.gwu.edu/∼ylai/research/Concordance. CONTACT: ylai@gwu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: We have proposed a mixture model based approach to the concordant integrative analysis of multiple large-scale two-sample expression datasets. Since the mixture model is based on the transformed differential expression test P-values (z-scores), it is generally applicable to the expression data generated by either microarray or RNA-seq platforms. The mixture model is simple with three normal distribution components for each dataset to represent down-regulation, up-regulation and no differential expression. However, when the number of datasets increases, the model parameter space increases exponentially due to the component combination from different datasets. RESULTS: In this study, motivated by the well-known generalized estimating equations (GEEs) for longitudinal data analysis, we focus on the concordant components and assume that the proportions of non-concordant components follow a special structure. We discuss the exchangeable, multiset coefficient and autoregressive structures for model reduction, and their related expectation-maximization (EM) algorithms. Then, the parameter space is linear with the number of datasets. In our previous study, we have applied the general mixture model to three microarray datasets for lung cancer studies. We show that more gene sets (or pathways) can be detected by the reduced mixture model with the exchangeable structure. Furthermore, we show that more genes can also be detected by the reduced model. The Cancer Genome Atlas (TCGA) data have been increasingly collected. The advantage of incorporating the concordance feature has also been clearly demonstrated based on TCGA RNA sequencing data for studying two closely related types of cancer. AVAILABILITY AND IMPLEMENTATION: Additional results are included in a supplemental file. Computer program R-functions are freely available at http://home.gwu.edu/∼ylai/research/Concordance. CONTACT: ylai@gwu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: A Bhattacharjee; W G Richards; J Staunton; C Li; S Monti; P Vasa; C Ladd; J Beheshti; R Bueno; M Gillette; M Loda; G Weber; E J Mark; E S Lander; W Wong; B E Johnson; T R Golub; D J Sugarbaker; M Meyerson Journal: Proc Natl Acad Sci U S A Date: 2001-11-13 Impact factor: 11.205
Authors: David G Beer; Sharon L R Kardia; Chiang-Ching Huang; Thomas J Giordano; Albert M Levin; David E Misek; Lin Lin; Guoan Chen; Tarek G Gharib; Dafydd G Thomas; Michelle L Lizyness; Rork Kuick; Satoru Hayasaka; Jeremy M G Taylor; Mark D Iannettoni; Mark B Orringer; Samir Hanash Journal: Nat Med Date: 2002-07-15 Impact factor: 53.440
Authors: Vamsi K Mootha; Cecilia M Lindgren; Karl-Fredrik Eriksson; Aravind Subramanian; Smita Sihag; Joseph Lehar; Pere Puigserver; Emma Carlsson; Martin Ridderstråle; Esa Laurila; Nicholas Houstis; Mark J Daly; Nick Patterson; Jill P Mesirov; Todd R Golub; Pablo Tamayo; Bruce Spiegelman; Eric S Lander; Joel N Hirschhorn; David Altshuler; Leif C Groop Journal: Nat Genet Date: 2003-07 Impact factor: 38.330
Authors: S Lani Park; Megan D Fesinmeyer; Maria Timofeeva; Christian P Caberto; Jonathan M Kocarnik; Younghun Han; Shelly-Ann Love; Alicia Young; Logan Dumitrescu; Yi Lin; Robert Goodloe; Lynne R Wilkens; Lucia Hindorff; Jay H Fowke; Cara Carty; Steven Buyske; Frederick R Schumacher; Anne Butler; Holli Dilks; Ewa Deelman; Michele L Cote; Wei Chen; Mala Pande; David C Christiani; John K Field; Heike Bickebller; Angela Risch; Joachim Heinrich; Paul Brennan; Yufei Wang; Timothy Eisen; Richard S Houlston; Michael Thun; Demetrius Albanes; Neil Caporaso; Ulrike Peters; Kari E North; Gerardo Heiss; Dana C Crawford; William S Bush; Christopher A Haiman; Maria Teresa Landi; Rayjean J Hung; Charles Kooperberg; Christopher I Amos; Loïc Le Marchand; Iona Cheng Journal: J Natl Cancer Inst Date: 2014-03-28 Impact factor: 13.506
Authors: Wenbo Tang; Matthew Kowgier; Daan W Loth; María Soler Artigas; Bonnie R Joubert; Emily Hodge; Sina A Gharib; Albert V Smith; Ingo Ruczinski; Vilmundur Gudnason; Rasika A Mathias; Tamara B Harris; Nadia N Hansel; Lenore J Launer; Kathleen C Barnes; Joyanna G Hansen; Eva Albrecht; Melinda C Aldrich; Michael Allerhand; R Graham Barr; Guy G Brusselle; David J Couper; Ivan Curjuric; Gail Davies; Ian J Deary; Josée Dupuis; Tove Fall; Millennia Foy; Nora Franceschini; Wei Gao; Sven Gläser; Xiangjun Gu; Dana B Hancock; Joachim Heinrich; Albert Hofman; Medea Imboden; Erik Ingelsson; Alan James; Stefan Karrasch; Beate Koch; Stephen B Kritchevsky; Ashish Kumar; Lies Lahousse; Guo Li; Lars Lind; Cecilia Lindgren; Yongmei Liu; Kurt Lohman; Thomas Lumley; Wendy L McArdle; Bernd Meibohm; Andrew P Morris; Alanna C Morrison; Bill Musk; Kari E North; Lyle J Palmer; Nicole M Probst-Hensch; Bruce M Psaty; Fernando Rivadeneira; Jerome I Rotter; Holger Schulz; Lewis J Smith; Akshay Sood; John M Starr; David P Strachan; Alexander Teumer; André G Uitterlinden; Henry Völzke; Arend Voorman; Louise V Wain; Martin T Wells; Jemma B Wilk; O Dale Williams; Susan R Heckbert; Bruno H Stricker; Stephanie J London; Myriam Fornage; Martin D Tobin; George T O'Connor; Ian P Hall; Patricia A Cassano Journal: PLoS One Date: 2014-07-01 Impact factor: 3.240