Martin Schäfer1, Hans-Ulrich Klein2,3,4, Holger Schwender1. 1. Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany. 2. Program in Translational Neuropsychiatric Genomics, Ann Romney Center for Neurologic Diseases, Department of Neurology, Brigham and Women's Hospital, Boston, MA 02115, USA. 3. Harvard Medical School, Boston, MA 02115, USA. 4. Program in Medical and Population Genetics, Broad Institute, Cambridge, MA 02141, USA.
Abstract
MOTIVATION: Genes showing congruent differences in several genomic variables between two biological conditions are crucial to unravel causalities behind phenotypes of interest. Detecting such genes is important in biomedical research, e.g. when identifying genes responsible for cancer development. Small sample sizes common in next-generation sequencing studies are a key challenge, and there are still only very few statistical methods to analyze more than two genomic variables in an integrative, model-based way. Here, we present a novel bioinformatics approach to detect congruent differences between two biological conditions in a larger number of different measurements such as various epigenetic marks or mRNA transcript levels. RESULTS: We propose a coefficient quantifying the degree to which genes present consistent alterations in multiple (more than two) genomic variables when comparing samples presenting a condition of interest (e.g. cancer) to a reference group. A hierarchical Bayesian model is employed to assess uncertainty on a gene level, incorporating information on functional relationships between genes. We demonstrate the approach on different data sets containing RNA-seq gene transcripton and up to four ChIP-seq histone modification measurements. Both the coefficient-based ranking and the inference based on the model lead to a plausible prioritizing of candidate genes when analyzing multiple genomic variables. AVAILABILITY AND IMPLEMENTATION: BUGS code in the Supplement. CONTACT: m.schaefer@uni-duesseldorf.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Genes showing congruent differences in several genomic variables between two biological conditions are crucial to unravel causalities behind phenotypes of interest. Detecting such genes is important in biomedical research, e.g. when identifying genes responsible for cancer development. Small sample sizes common in next-generation sequencing studies are a key challenge, and there are still only very few statistical methods to analyze more than two genomic variables in an integrative, model-based way. Here, we present a novel bioinformatics approach to detect congruent differences between two biological conditions in a larger number of different measurements such as various epigenetic marks or mRNA transcript levels. RESULTS: We propose a coefficient quantifying the degree to which genes present consistent alterations in multiple (more than two) genomic variables when comparing samples presenting a condition of interest (e.g. cancer) to a reference group. A hierarchical Bayesian model is employed to assess uncertainty on a gene level, incorporating information on functional relationships between genes. We demonstrate the approach on different data sets containing RNA-seq gene transcripton and up to four ChIP-seq histone modification measurements. Both the coefficient-based ranking and the inference based on the model lead to a plausible prioritizing of candidate genes when analyzing multiple genomic variables. AVAILABILITY AND IMPLEMENTATION: BUGS code in the Supplement. CONTACT: m.schaefer@uni-duesseldorf.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Hans-Ulrich Klein; Martin Schäfer; David A Bennett; Holger Schwender; Philip L De Jager Journal: PLoS Comput Biol Date: 2020-04-07 Impact factor: 4.475