Yaomin Xu1, Xingyi Guo2, Jiayang Sun2, Zhongming Zhao1. 1. Department of Biomedical Informatics, Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, Department of Psychiatry and Department of Cancer Biology, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, Department of Psychiatry and Department of Cancer Biology, Vanderbilt University, Nashville, TN 37212, USA Department of Biomedical Informatics, Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, Department of Psychiatry and Department of Cancer Biology, Vanderbilt University, Nashville, TN 37212, USA. 2. Department of Biomedical Informatics, Department of Biostatistics and Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, Department of Psychiatry and Department of Cancer Biology, Vanderbilt University, Nashville, TN 37212, USA.
Abstract
MOTIVATION: Large-scale cancer genomic studies, such as The Cancer Genome Atlas (TCGA), have profiled multidimensional genomic data, including mutation and expression profiles on a variety of cancer cell types, to uncover the molecular mechanism of cancerogenesis. More than a hundred driver mutations have been characterized that confer the advantage of cell growth. However, how driver mutations regulate the transcriptome to affect cellular functions remains largely unexplored. Differential analysis of gene expression relative to a driver mutation on patient samples could provide us with new insights in understanding driver mutation dysregulation in tumor genome and developing personalized treatment strategies. RESULTS: Here, we introduce the Snowball approach as a highly sensitive statistical analysis method to identify transcriptional signatures that are affected by a recurrent driver mutation. Snowball utilizes a resampling-based approach and combines a distance-based regression framework to assign a robust ranking index of genes based on their aggregated association with the presence of the mutation, and further selects the top significant genes for downstream data analyses or experiments. In our application of the Snowball approach to both synthesized and TCGA data, we demonstrated that it outperforms the standard methods and provides more accurate inferences to the functional effects and transcriptional dysregulation of driver mutations. AVAILABILITY AND IMPLEMENTATION: R package and source code are available from CRAN at http://cran.r-project.org/web/packages/DESnowball, and also available at http://bioinfo.mc.vanderbilt.edu/DESnowball/.
MOTIVATION: Large-scale cancer genomic studies, such as The Cancer Genome Atlas (TCGA), have profiled multidimensional genomic data, including mutation and expression profiles on a variety of cancer cell types, to uncover the molecular mechanism of cancerogenesis. More than a hundred driver mutations have been characterized that confer the advantage of cell growth. However, how driver mutations regulate the transcriptome to affect cellular functions remains largely unexplored. Differential analysis of gene expression relative to a driver mutation on patient samples could provide us with new insights in understanding driver mutation dysregulation in tumor genome and developing personalized treatment strategies. RESULTS: Here, we introduce the Snowball approach as a highly sensitive statistical analysis method to identify transcriptional signatures that are affected by a recurrent driver mutation. Snowball utilizes a resampling-based approach and combines a distance-based regression framework to assign a robust ranking index of genes based on their aggregated association with the presence of the mutation, and further selects the top significant genes for downstream data analyses or experiments. In our application of the Snowball approach to both synthesized and TCGA data, we demonstrated that it outperforms the standard methods and provides more accurate inferences to the functional effects and transcriptional dysregulation of driver mutations. AVAILABILITY AND IMPLEMENTATION: R package and source code are available from CRAN at http://cran.r-project.org/web/packages/DESnowball, and also available at http://bioinfo.mc.vanderbilt.edu/DESnowball/.
Authors: Levi A Garraway; Hans R Widlund; Mark A Rubin; Gad Getz; Aaron J Berger; Sridhar Ramaswamy; Rameen Beroukhim; Danny A Milner; Scott R Granter; Jinyan Du; Charles Lee; Stephan N Wagner; Cheng Li; Todd R Golub; David L Rimm; Matthew L Meyerson; David E Fisher; William R Sellers Journal: Nature Date: 2005-07-07 Impact factor: 49.962
Authors: Abel Gonzalez-Perez; Ville Mustonen; Boris Reva; Graham R S Ritchie; Pau Creixell; Rachel Karchin; Miguel Vazquez; J Lynn Fink; Karin S Kassahn; John V Pearson; Gary D Bader; Paul C Boutros; Lakshmi Muthuswamy; B F Francis Ouellette; Jüri Reimand; Rune Linding; Tatsuhiro Shibata; Alfonso Valencia; Adam Butler; Serge Dronov; Paul Flicek; Nick B Shannon; Hannah Carter; Li Ding; Chris Sander; Josh M Stuart; Lincoln D Stein; Nuria Lopez-Bigas Journal: Nat Methods Date: 2013-08 Impact factor: 28.547
Authors: Daniel J Lindner; Yan Wu; Rebecca Haney; Barbara S Jacobs; John P Fruehauf; Ralph Tuthill; Ernest C Borden Journal: Matrix Biol Date: 2012-11-30 Impact factor: 11.583
Authors: Helen Davies; Graham R Bignell; Charles Cox; Philip Stephens; Sarah Edkins; Sheila Clegg; Jon Teague; Hayley Woffendin; Mathew J Garnett; William Bottomley; Neil Davis; Ed Dicks; Rebecca Ewing; Yvonne Floyd; Kristian Gray; Sarah Hall; Rachel Hawes; Jaime Hughes; Vivian Kosmidou; Andrew Menzies; Catherine Mould; Adrian Parker; Claire Stevens; Stephen Watt; Steven Hooper; Rebecca Wilson; Hiran Jayatilake; Barry A Gusterson; Colin Cooper; Janet Shipley; Darren Hargrave; Katherine Pritchard-Jones; Norman Maitland; Georgia Chenevix-Trench; Gregory J Riggins; Darell D Bigner; Giuseppe Palmieri; Antonio Cossu; Adrienne Flanagan; Andrew Nicholson; Judy W C Ho; Suet Y Leung; Siu T Yuen; Barbara L Weber; Hilliard F Seigler; Timothy L Darrow; Hugh Paterson; Richard Marais; Christopher J Marshall; Richard Wooster; Michael R Stratton; P Andrew Futreal Journal: Nature Date: 2002-06-09 Impact factor: 49.962
Authors: Sam Ng; Eric A Collisson; Artem Sokolov; Theodore Goldstein; Abel Gonzalez-Perez; Nuria Lopez-Bigas; Christopher Benz; David Haussler; Joshua M Stuart Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937