Zi Wang1, Edward Curry1, Giovanni Montana2. 1. Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK. 2. Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK Department of Mathematics, Imperial College London, London SW7 2AZ, Division of Cancer, Imperial College London, Hammersmith Hospital, London W12 0NN and Department of Biomedical Engineering, King's College London, St Thomas' Hospital, London SE1 7EH, UK.
Abstract
MOTIVATION: High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. These data could yield discoveries that may lead to breakthroughs in the diagnosis and treatment of human disease, but require statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood. RESULTS: We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to The Cancer Genome Atlas datasets on primary ovarian tumours. AVAILABILITY AND IMPLEMENTATION: R code implementing the NsRRR model is available at http://www2.imperial.ac.uk/∼gmontana CONTACT: giovanni.montana@kcl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: High-throughput profiling in biological research has resulted in the availability of a wealth of data cataloguing the genetic, epigenetic and transcriptional states of cells. These data could yield discoveries that may lead to breakthroughs in the diagnosis and treatment of human disease, but require statistical methods designed to find the most relevant patterns from millions of potential interactions. Aberrant DNA methylation is often a feature of cancer, and has been proposed as a therapeutic target. However, the relationship between DNA methylation and gene expression remains poorly understood. RESULTS: We propose Network-sparse Reduced-Rank Regression (NsRRR), a multivariate regression framework capable of using prior biological knowledge expressed as gene interaction networks to guide the search for associations between gene expression and DNA methylation signatures. We use simulations to show the advantage of our proposed model in terms of variable selection accuracy over alternative models that do not use prior network information. We discuss an application of NsRRR to The Cancer Genome Atlas datasets on primary ovarian tumours. AVAILABILITY AND IMPLEMENTATION: R code implementing the NsRRR model is available at http://www2.imperial.ac.uk/∼gmontana CONTACT: giovanni.montana@kcl.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Dong D Wang; Yan Zheng; Estefanía Toledo; Cristina Razquin; Miguel Ruiz-Canela; Marta Guasch-Ferré; Edward Yu; Dolores Corella; Enrique Gómez-Gracia; Miquel Fiol; Ramón Estruch; Emilio Ros; José Lapetra; Montserrat Fito; Fernando Aros; Lluis Serra-Majem; Clary B Clish; Jordi Salas-Salvadó; Liming Liang; Miguel A Martínez-González; Frank B Hu Journal: Int J Epidemiol Date: 2018-12-01 Impact factor: 7.196