Hongyu Li1, Biqing Zhu2, Zhichao Xu1, Taylor Adams3, Naftali Kaminski3, Hongyu Zhao4,5. 1. Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA. 2. Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA. 3. Section of Pulmonary, Critical Care and Sleep Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, CT, 06520, USA. 4. Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06511, USA. hongyu.zhao@yale.edu. 5. Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, 06511, USA. hongyu.zhao@yale.edu.
Abstract
BACKGROUND: Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). RESULTS: We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. CONCLUSIONS: The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.
BACKGROUND: Recent development of single cell sequencing technologies has made it possible to identify genes with different expression (DE) levels at the cell type level between different groups of samples. In this article, we propose to borrow information through known biological networks to increase statistical power to identify differentially expressed genes (DEGs). RESULTS: We develop MRFscRNAseq, which is based on a Markov random field (MRF) model to appropriately accommodate gene network information as well as dependencies among cell types to identify cell-type specific DEGs. We implement an Expectation-Maximization (EM) algorithm with mean field-like approximation to estimate model parameters and a Gibbs sampler to infer DE status. Simulation study shows that our method has better power to detect cell-type specific DEGs than conventional methods while appropriately controlling type I error rate. The usefulness of our method is demonstrated through its application to study the pathogenesis and biological processes of idiopathic pulmonary fibrosis (IPF) using a single-cell RNA-sequencing (scRNA-seq) data set, which contains 18,150 protein-coding genes across 38 cell types on lung tissues from 32 IPF patients and 28 normal controls. CONCLUSIONS: The proposed MRF model is implemented in the R package MRFscRNAseq available on GitHub. By utilizing gene-gene and cell-cell networks, our method increases statistical power to detect differentially expressed genes from scRNA-seq data.
Authors: John E McDonough; Farida Ahangari; Qin Li; Siddhartha Jain; Stijn E Verleden; Jose Herazo-Maya; Milica Vukmirovic; Giuseppe DeIuliis; Argyrios Tzouvelekis; Naoya Tanabe; Fanny Chu; Xiting Yan; Johny Verschakelen; Robert J Homer; Dimitris V Manatakis; Junke Zhang; Jun Ding; Karen Maes; Laurens De Sadeleer; Robin Vos; Arne Neyrinck; Panayiotis V Benos; Ziv Bar-Joseph; Dean Tantin; James C Hogg; Bart M Vanaudenaerde; Wim A Wuyts; Naftali Kaminski Journal: JCI Insight Date: 2019-11-14
Authors: Fengrong Zuo; Naftali Kaminski; Elsie Eugui; John Allard; Zohar Yakhini; Amir Ben-Dor; Lance Lollini; David Morris; Yong Kim; Barbara DeLustro; Dean Sheppard; Annie Pardo; Moises Selman; Renu A Heller Journal: Proc Natl Acad Sci U S A Date: 2002-04-30 Impact factor: 11.205
Authors: Hui-Leng Tan; Nicolas Regamey; Sarah Brown; Andrew Bush; Clare M Lloyd; Jane C Davies Journal: Am J Respir Crit Care Med Date: 2011-04-07 Impact factor: 21.405
Authors: Keegan D Korthauer; Li-Fang Chu; Michael A Newton; Yuan Li; James Thomson; Ron Stewart; Christina Kendziorski Journal: Genome Biol Date: 2016-10-25 Impact factor: 13.583
Authors: Rose Oughtred; Chris Stark; Bobby-Joe Breitkreutz; Jennifer Rust; Lorrie Boucher; Christie Chang; Nadine Kolas; Lara O'Donnell; Genie Leung; Rochelle McAdam; Frederick Zhang; Sonam Dolma; Andrew Willems; Jasmin Coulombe-Huntington; Andrew Chatr-Aryamontri; Kara Dolinski; Mike Tyers Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971