Genevieve L Stein-O'Brien1,2, Jacob L Carey3, Wai Shing Lee3, Michael Considine3, Alexander V Favorov3,4,5, Emily Flam6, Theresa Guo6, Sijia Li6, Luigi Marchionni6, Thomas Sherman7, Shawn Sivy7, Daria A Gaykalova6, Ronald D McKay2, Michael F Ochs7, Carlo Colantuoni8,9, Elana J Fertig3. 1. McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA. 2. Lieber Institute for Brain Development, Baltimore, MD, USA. 3. Department of Oncology and Division of Biostatistics and Bioinformatics, Johns Hopkins School of Medicine, Baltimore, MD, USA. 4. Vavilov Institute of General Genetics, Moscow, Russia. 5. Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow, Russia. 6. Department of Otolaryngology-Head and Neck Surgery, Johns Hopkins School of Medicine, Baltimore, MD, USA. 7. Department of Mathematics and Statistics, The College of New Jersey, Ewing Township, NJ, USA. 8. Department of Neurology and Department of Neuroscience, Johns Hopkins School of Medicine, Baltimore, MD, USA. 9. Institute for Genome Sciences, University of Maryland School of Medicine.
Abstract
SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Elana J Fertig; Jie Ding; Alexander V Favorov; Giovanni Parmigiani; Michael F Ochs Journal: Bioinformatics Date: 2010-09-01 Impact factor: 6.937
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Michael F Ochs; Lori Rink; Chi Tarn; Sarah Mburu; Takahiro Taguchi; Burton Eisenberg; Andrew K Godwin Journal: Cancer Res Date: 2009-11-10 Impact factor: 12.701
Authors: Marta Melé; Pedro G Ferreira; Ferran Reverter; David S DeLuca; Jean Monlong; Michael Sammeth; Taylor R Young; Jakob M Goldmann; Dmitri D Pervouchine; Timothy J Sullivan; Rory Johnson; Ayellet V Segrè; Sarah Djebali; Anastasia Niarchou; Fred A Wright; Tuuli Lappalainen; Miquel Calvo; Gad Getz; Emmanouil T Dermitzakis; Kristin G Ardlie; Roderic Guigó Journal: Science Date: 2015-05-08 Impact factor: 47.728
Authors: E Mejía-Roa; P Carmona-Saez; R Nogales; C Vicente; M Vázquez; X Y Yang; C García; F Tirado; A Pascual-Montano Journal: Nucleic Acids Res Date: 2008-05-30 Impact factor: 16.971
Authors: Elana J Fertig; Ana Markovic; Ludmila V Danilova; Daria A Gaykalova; Leslie Cope; Christine H Chung; Michael F Ochs; Joseph A Califano Journal: PLoS One Date: 2013-11-04 Impact factor: 3.240
Authors: Jaclyn N Taroni; Peter C Grayson; Qiwen Hu; Sean Eddy; Matthias Kretzler; Peter A Merkel; Casey S Greene Journal: Cell Syst Date: 2019-05-22 Impact factor: 10.304
Authors: Brian S Clark; Genevieve L Stein-O'Brien; Fion Shiau; Gabrielle H Cannon; Emily Davis-Marcisak; Thomas Sherman; Clayton P Santiago; Thanh V Hoang; Fatemeh Rajaii; Rebecca E James-Esposito; Richard M Gronostajski; Elana J Fertig; Loyal A Goff; Seth Blackshaw Journal: Neuron Date: 2019-05-22 Impact factor: 17.173
Authors: Rossin Erbe; Michael D Kessler; Alexander V Favorov; Hariharan Easwaran; Daria A Gaykalova; Elana J Fertig Journal: Nucleic Acids Res Date: 2020-07-09 Impact factor: 16.971
Authors: Dalal S Aldeghaither; David J Zahavi; Joseph C Murray; Elana J Fertig; Garrett T Graham; Yong-Wei Zhang; Allison O'Connell; Junfeng Ma; Sandra A Jablonski; Louis M Weiner Journal: Cancer Immunol Res Date: 2018-12-18 Impact factor: 12.020
Authors: Jie Tan; Matthew Huyck; Dongbo Hu; René A Zelaya; Deborah A Hogan; Casey S Greene Journal: BMC Bioinformatics Date: 2017-11-22 Impact factor: 3.169
Authors: Genevieve Stein-O'Brien; Luciane T Kagohara; Sijia Li; Manjusha Thakar; Ruchira Ranaweera; Hiroyuki Ozawa; Haixia Cheng; Michael Considine; Sandra Schmitz; Alexander V Favorov; Ludmila V Danilova; Joseph A Califano; Evgeny Izumchenko; Daria A Gaykalova; Christine H Chung; Elana J Fertig Journal: Genome Med Date: 2018-05-23 Impact factor: 11.117
Authors: Nicola Micali; Suel-Kee Kim; Marcelo Diaz-Bustamante; Genevieve Stein-O'Brien; Seungmae Seo; Joo-Heon Shin; Brian G Rash; Shaojie Ma; Yanhong Wang; Nicolas A Olivares; Jon I Arellano; Kristen R Maynard; Elana J Fertig; Alan J Cross; Roland W Bürli; Nicholas J Brandon; Daniel R Weinberger; Joshua G Chenoweth; Daniel J Hoeppner; Nenad Sestan; Pasko Rakic; Carlo Colantuoni; Ronald D McKay Journal: Cell Rep Date: 2020-05-05 Impact factor: 9.423
Authors: Genevieve L Stein-O'Brien; Raman Arora; Aedin C Culhane; Alexander V Favorov; Lana X Garmire; Casey S Greene; Loyal A Goff; Yifeng Li; Aloune Ngom; Michael F Ochs; Yanxun Xu; Elana J Fertig Journal: Trends Genet Date: 2018-08-22 Impact factor: 11.639