Shen Yin1,2, Xinlei Wang1, Gaoxiang Jia1, Yang Xie2. 1. Department of Statistical Science, Southern Methodist University, Dallas, TX 75275-0332, USA. 2. Department of Population and Data Sciences, Quantitative Biomedical Research Center, The University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
Abstract
MOTIVATION: Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance. RESULTS: We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data. AVAILABILITY AND IMPLEMENTATION: R code available at https://github.com/S-YIN/MIXnorm. CONTACT: swang@smu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent studies have shown that RNA-sequencing (RNA-seq) can be used to measure mRNA of sufficient quality extracted from formalin-fixed paraffin-embedded (FFPE) tissues to provide whole-genome transcriptome analysis. However, little attention has been given to the normalization of FFPE RNA-seq data, a key step that adjusts for unwanted biological and technical effects that can bias the signal of interest. Existing methods, developed based on fresh-frozen or similar-type samples, may cause suboptimal performance. RESULTS: We proposed a new normalization method, labeled MIXnorm, for FFPE RNA-seq data. MIXnorm relies on a two-component mixture model, which models non-expressed genes by zero-inflated Poisson distributions and models expressed genes by truncated normal distributions. To obtain maximum likelihood estimates, we developed a nested EM algorithm, in which closed-form updates are available in each iteration. By eliminating the need for numerical optimization in the M-step, the algorithm is easy to implement and computationally efficient. We evaluated MIXnorm through simulations and cancer studies. MIXnorm makes a significant improvement over commonly used methods for RNA-seq expression data. AVAILABILITY AND IMPLEMENTATION: R code available at https://github.com/S-YIN/MIXnorm. CONTACT: swang@smu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Mark A Perlmutter; Carolyn J M Best; John W Gillespie; Yvonne Gathright; Sergio González; Alfredo Velasco; W Marston Linehan; Michael R Emmert-Buck; Rodrigo F Chuaqui Journal: J Mol Diagn Date: 2004-11 Impact factor: 5.568
Authors: Jennifer Hansson; David Lindgren; Helén Nilsson; Elinn Johansson; Martin Johansson; Lena Gustavsson; Håkan Axelson Journal: Clin Cancer Res Date: 2016-09-23 Impact factor: 12.531
Authors: Rom S Leidner; Cheryl L Thompson; Matthew L Morton; Xiaodong Bai; Callie R Merry; Philip A Linden; Ahmad M Khalil Journal: Lung Cancer Date: 2014-03-29 Impact factor: 5.705