Zi Yang1, George Michailidis1. 1. Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA.
Abstract
MOTIVATION: Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects. RESULTS: We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes. AVAILABILITY AND IMPLEMENTATION: The source code repository is publicly available at https://github.com/yangzi4/iNMF. CONTACT: gmichail@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent advances in high-throughput omics technologies have enabled biomedical researchers to collect large-scale genomic data. As a consequence, there has been growing interest in developing methods to integrate such data to obtain deeper insights regarding the underlying biological system. A key challenge for integrative studies is the heterogeneity present in the different omics data sources, which makes it difficult to discern the coordinated signal of interest from source-specific noise or extraneous effects. RESULTS: We introduce a novel method of multi-modal data analysis that is designed for heterogeneous data based on non-negative matrix factorization. We provide an algorithm for jointly decomposing the data matrices involved that also includes a sparsity option for high-dimensional settings. The performance of the proposed method is evaluated on synthetic data and on real DNA methylation, gene expression and miRNA expression data from ovarian cancer samples obtained from The Cancer Genome Atlas. The results show the presence of common modules across patient samples linked to cancer-related pathways, as well as previously established ovarian cancer subtypes. AVAILABILITY AND IMPLEMENTATION: The source code repository is publicly available at https://github.com/yangzi4/iNMF. CONTACT: gmichail@umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Melissa S Cline; Michael Smoot; Ethan Cerami; Allan Kuchinsky; Nerius Landys; Chris Workman; Rowan Christmas; Iliana Avila-Campilo; Michael Creech; Benjamin Gross; Kristina Hanspers; Ruth Isserlin; Ryan Kelley; Sarah Killcoyne; Samad Lotia; Steven Maere; John Morris; Keiichiro Ono; Vuk Pavlovic; Alexander R Pico; Aditya Vailaya; Peng-Liang Wang; Annette Adler; Bruce R Conklin; Leroy Hood; Martin Kuiper; Chris Sander; Ilya Schmulevich; Benno Schwikowski; Guy J Warner; Trey Ideker; Gary D Bader Journal: Nat Protoc Date: 2007 Impact factor: 13.491
Authors: Marcin Imielinski; Sangwon Cha; Tomas Rejtar; Elizabeth A Richardson; Barry L Karger; Dennis C Sgroi Journal: Mol Cell Proteomics Date: 2012-01-12 Impact factor: 5.911
Authors: Sushmita Roy; Stephen Lagree; Zhonggang Hou; James A Thomson; Ron Stewart; Audrey P Gasch Journal: PLoS Comput Biol Date: 2013-10-17 Impact factor: 4.475